论文标题
使用生成模型的可转移的通用对抗扰动
Transferable Universal Adversarial Perturbations Using Generative Models
论文作者
论文摘要
深层神经网络往往容易受到对抗性扰动的影响,通过增加自然图像可以使各自的模型充满信心。最近,发现了图像敏锐的扰动的存在,也称为通用对抗扰动(UAP)。但是,当应用于未知目标模型时,现有的UAP仍然缺乏足够高的愚弄率。在本文中,我们提出了一种新颖的深度学习技术,以产生更可转移的UAP。我们利用一个扰动生成器和一些给定的验证网络所谓的源模型来使用Imagenet数据集生成UAPS。由于第一层中各种模型体系结构的类似特征表示,我们提出了一种仅在源模型的相应第一层中的对抗能量上的损耗公式。这支持了我们生成的UAP向任何其他目标模型的可传递性。我们进一步凭经验分析了我们生成的UAP,并证明这些扰动对不同的目标模型很好地推广。我们可以超越目前的艺术状态,愚蠢的速度和模型转移性,我们可以展示我们提出的方法的优越性。使用我们生成的非目标UAPS,我们在源模型(最新状态:82.16%)的平均愚蠢率为93.36%。在Deep Resnet-152上生成UAP,我们获得了在VGG-16和VGG-19目标模型上与尖端方法相比的绝对愚蠢率优势约为12%。
Deep neural networks tend to be vulnerable to adversarial perturbations, which by adding to a natural image can fool a respective model with high confidence. Recently, the existence of image-agnostic perturbations, also known as universal adversarial perturbations (UAPs), were discovered. However, existing UAPs still lack a sufficiently high fooling rate, when being applied to an unknown target model. In this paper, we propose a novel deep learning technique for generating more transferable UAPs. We utilize a perturbation generator and some given pretrained networks so-called source models to generate UAPs using the ImageNet dataset. Due to the similar feature representation of various model architectures in the first layer, we propose a loss formulation that focuses on the adversarial energy only in the respective first layer of the source models. This supports the transferability of our generated UAPs to any other target model. We further empirically analyze our generated UAPs and demonstrate that these perturbations generalize very well towards different target models. Surpassing the current state of the art in both, fooling rate and model-transferability, we can show the superiority of our proposed approach. Using our generated non-targeted UAPs, we obtain an average fooling rate of 93.36% on the source models (state of the art: 82.16%). Generating our UAPs on the deep ResNet-152, we obtain about a 12% absolute fooling rate advantage vs. cutting-edge methods on VGG-16 and VGG-19 target models.
