论文标题
CD-UAP:类别歧视性通用对抗扰动
CD-UAP: Class Discriminative Universal Adversarial Perturbation
论文作者
论文摘要
可以将单个通用对抗扰动(UAP)添加到所有自然图像中,以更改其大多数预测的类标签。攻击者对要攻击的目标类具有灵活的控制是很高的实际相关性,但是,现有的UAP方法攻击了所有类别的样本。在这项工作中,我们提出了一种新的通用攻击方法,以产生一种单一的扰动,该方法愚弄了目标网络,以错误地分类仅选择一组类,同时对其余类别的影响有限。由于所提出的攻击会产生通用的对抗扰动,该扰动对目标和非目标类别具有歧视性,因此我们称其为“ IT类别”类别歧视性通用对抗扰动(CD-UAP)。我们提出了一个简单而有效的算法框架,在该算法框架下,我们设计并比较了针对类别的歧视性通用攻击量身定制的各种损失功能配置。已在各种基准数据集上通过广泛的实验评估了所提出的方法。此外,我们提出的方法为UAP攻击所有类别的原始任务实现了最先进的绩效,这证明了我们方法的有效性。
A single universal adversarial perturbation (UAP) can be added to all natural images to change most of their predicted class labels. It is of high practical relevance for an attacker to have flexible control over the targeted classes to be attacked, however, the existing UAP method attacks samples from all classes. In this work, we propose a new universal attack method to generate a single perturbation that fools a target network to misclassify only a chosen group of classes, while having limited influence on the remaining classes. Since the proposed attack generates a universal adversarial perturbation that is discriminative to targeted and non-targeted classes, we term it class discriminative universal adversarial perturbation (CD-UAP). We propose one simple yet effective algorithm framework, under which we design and compare various loss function configurations tailored for the class discriminative universal attack. The proposed approach has been evaluated with extensive experiments on various benchmark datasets. Additionally, our proposed approach achieves state-of-the-art performance for the original task of UAP attacking all classes, which demonstrates the effectiveness of our approach.
