论文标题
关于“通过分歧评估SGD概括”的注释
A Note on "Assessing Generalization of SGD via Disagreement"
论文作者
论文摘要
最近的几项著作从经验上发现,可以通过模型的预测分歧来估算深神网络的平均测试误差,而模型不需要标签。特别是,江等。 (2022)显示了两个单独训练的网络之间的分歧,即这种“概括分歧平等”遵循在拟议的“集体聚集校准”的概念下深度校准的深度合奏的性质。在这种繁殖中,我们表明建议的理论可能是不切实际的,因为随着预测分歧的增加,深度合奏的校准可能会恶化,这正是在测试误差和分歧的耦合时,这是相关的,而需要标签,而在新数据集中估算校准。此外,我们简化了理论陈述和证据,表明它们在概率环境中是直接的,这与Jiang等人使用的原始假设空间视图不同。 (2022)。
Several recent works find empirically that the average test error of deep neural networks can be estimated via the prediction disagreement of models, which does not require labels. In particular, Jiang et al. (2022) show for the disagreement between two separately trained networks that this `Generalization Disagreement Equality' follows from the well-calibrated nature of deep ensembles under the notion of a proposed `class-aggregated calibration.' In this reproduction, we show that the suggested theory might be impractical because a deep ensemble's calibration can deteriorate as prediction disagreement increases, which is precisely when the coupling of test error and disagreement is of interest, while labels are needed to estimate the calibration on new datasets. Further, we simplify the theoretical statements and proofs, showing them to be straightforward within a probabilistic context, unlike the original hypothesis space view employed by Jiang et al. (2022).
