论文标题
通过偏置值编码进行稳健有效的排序
Robust and Efficient Sorting with Offset-Value Coding
论文作者
论文摘要
排序和搜索是数据库查询处理的大部分,例如,以索引创建,索引维护和索引查找的形式;比较钥匙对是分类和搜索努力的重要组成部分。我们曾在数十年,被忽视的,有效的技术进行快速比较和快速分类(尤其是偏移值编码)方面进行了简单,有效的实施。在此过程中,我们发生了它与运行文件中前缀截断的互惠关系,以及在行和列形式存储结构中的压缩技术的二元性,即前缀截断和领先密钥列的运行长度编码。我们还发现了与分类流的消费者(例如合并平行流,流入集合和合并加入)的有益关系。我们在Google的NAPA和F1查询系统以及对性能和可扩展性的实验评估中报告了我们的实施。
Sorting and searching are large parts of database query processing, e.g., in the forms of index creation, index maintenance, and index lookup; and comparing pairs of keys is a substantial part of the effort in sorting and searching. We have worked on simple, efficient implementations of decades-old, neglected, effective techniques for fast comparisons and fast sorting, in particular offset-value coding. In the process, we happened upon its mutually beneficial relationship with prefix truncation in run files as well as the duality of compression techniques in row- and column-format storage structures, namely prefix truncation and run-length encoding of leading key columns. We also found a beneficial relationship with consumers of sorted streams, e.g., merging parallel streams, in-stream aggregation, and merge join. We report on our implementation in the context of Google's Napa and F1 Query systems as well as an experimental evaluation of performance and scalability.
