论文标题
元组包装:图形神经网络中小图的有效批处理
Tuple Packing: Efficient Batching of Small Graphs in Graph Neural Networks
论文作者
论文摘要
在处理机器学习模型(例如图形神经网络(GNN))中的一批图表时,通常将几个小图组合为一个整体图,以加速处理并删除或减少填充的开销。例如,这是PYG库中支持的。但是,小图的尺寸在节点和边缘的数量方面可能有很大变化,因此组合图的大小仍然可能有很大差异,尤其是对于小批量尺寸而言。因此,在使用静态形状时仍会产生过多的填充和浪费计算的成本,这是最大加速的首选。本文提出了一种新的硬件不可知论方法 - 元组包装 - 用于生成导致最小开销的批次。该算法扩展了最近引入的序列包装方法,以在(| nodes |,| edges |)的2D元组上工作。单调启发式词被应用于元组值的2D直方图,以定义填充直方图箱的优先级,并具有达到节点数量和边缘数量的目标。实验验证了多个数据集上算法的有效性。
When processing a batch of graphs in machine learning models such as Graph Neural Networks (GNN), it is common to combine several small graphs into one overall graph to accelerate processing and remove or reduce the overhead of padding. This is for example supported in the PyG library. However, the sizes of small graphs can vary substantially with respect to the number of nodes and edges, and hence the size of the combined graph can still vary considerably, especially for small batch sizes. Therefore, the costs of excessive padding and wasted compute are still incurred when working with static shapes, which are preferred for maximum acceleration. This paper proposes a new hardware agnostic approach -- tuple packing -- for generating batches that cause minimal overhead. The algorithm extends recently introduced sequence packing approaches to work on the 2D tuples of (|nodes|, |edges|). A monotone heuristic is applied to the 2D histogram of tuple values to define a priority for packing histogram bins together with the objective to reach a limit on the number of nodes as well as the number of edges. Experiments verify the effectiveness of the algorithm on multiple datasets.
