论文标题
在Instahide上,相位检索和稀疏基质分解
On InstaHide, Phase Retrieval, and Sparse Matrix Factorization
论文作者
论文摘要
在这项工作中,我们研究了Instahide的安全性,Instahide的安全性是[Huang,Song,Li和Arora,ICML'20]最近提出的,用于在分布式学习的背景下保留私人数据集的安全性。为了生成一个合成训练示例,要在分布式学习者之间共享,Instahide采用了私有特征向量的凸组合,并随机翻转所得矢量的每个条目的符号,概率为1/2。一个明显的问题是,在任何可证明的意义上,该方案是否在合理的硬度假设下是否可以安全,并假设生成公共数据和私人数据的分布满足了某些属性。 我们表明,对此的答案似乎与新的多任务,缺失的数据的平均复杂性非常微妙且密切相关。在这种联系的推动下,我们设计了一种可证明的算法,它可以仅使用Instahide生成的公共向量和合成向量恢复私人向量,并假设私人和公共向量是各向异性高斯。
In this work, we examine the security of InstaHide, a scheme recently proposed by [Huang, Song, Li and Arora, ICML'20] for preserving the security of private datasets in the context of distributed learning. To generate a synthetic training example to be shared among the distributed learners, InstaHide takes a convex combination of private feature vectors and randomly flips the sign of each entry of the resulting vector with probability 1/2. A salient question is whether this scheme is secure in any provable sense, perhaps under a plausible hardness assumption and assuming the distributions generating the public and private data satisfy certain properties. We show that the answer to this appears to be quite subtle and closely related to the average-case complexity of a new multi-task, missing-data version of the classic problem of phase retrieval. Motivated by this connection, we design a provable algorithm that can recover private vectors using only the public vectors and synthetic vectors generated by InstaHide, under the assumption that the private and public vectors are isotropic Gaussian.
