论文标题
朝着更好的准确性效率折衷:划分和共同培训
Towards Better Accuracy-efficiency Trade-offs: Divide and Co-training
论文作者
论文摘要
神经网络的宽度很重要,因为增加宽度一定会增加模型容量。但是,网络的性能不会随宽度而线性地提高,并且很快就会饱和。在这种情况下,我们认为,增加网络数量(合奏)可以实现比纯粹增加宽度更好的准确性效率折衷。为了证明这一点,一个大型网络就其参数和正则化组件分为几个小网络。这些小型网络中的每一个都有原始参数的一小部分。然后,我们一起训练这些小型网络,使他们看到相同数据的各种观点,以增加它们的多样性。在这个共同培训过程中,网络也可以相互学习。结果,小型网络比很少或没有额外的参数或失败的大型网络可以实现更好的合奏性能,即实现更好的准确性效率权衡。通过并发运行,小型网络还可以比大型推理速度更快。以上所有内容都表明,网络的数量是模型缩放的新维度。我们通过广泛的实验在共同基准上使用8种不同的神经体系结构来验证我们的论点。该代码可在\ url {https://github.com/freeformrobotics/divide-and-and-co-training}中获得。
The width of a neural network matters since increasing the width will necessarily increase the model capacity. However, the performance of a network does not improve linearly with the width and soon gets saturated. In this case, we argue that increasing the number of networks (ensemble) can achieve better accuracy-efficiency trade-offs than purely increasing the width. To prove it, one large network is divided into several small ones regarding its parameters and regularization components. Each of these small networks has a fraction of the original one's parameters. We then train these small networks together and make them see various views of the same data to increase their diversity. During this co-training process, networks can also learn from each other. As a result, small networks can achieve better ensemble performance than the large one with few or no extra parameters or FLOPs, \ie, achieving better accuracy-efficiency trade-offs. Small networks can also achieve faster inference speed than the large one by concurrent running. All of the above shows that the number of networks is a new dimension of model scaling. We validate our argument with 8 different neural architectures on common benchmarks through extensive experiments. The code is available at \url{https://github.com/FreeformRobotics/Divide-and-Co-training}.
