论文标题
用于自动填充数据输入表格的机器学习方法
A Machine Learning Approach for Automated Filling of Categorical Fields in Data Entry Forms
论文作者
论文摘要
用户经常通过数据输入表与软件系统进行交互。但是,表格填充是耗时的,容易出错。尽管已经提出了几种技术以自动填充或预填充字段的表格,但它们提供了有限的支持,以帮助用户填写分类字段,即要求用户在大量选项中选择正确的价值的字段。 在本文中,我们提出了Laff,这是一种基于学习的自动化方法,用于填写数据输入表中的分类字段。拉夫首先通过从一组历史输入实例中学习字段依赖性来构建贝叶斯网络模型,代表了过去填充的字段的价值。为了提高其学习能力,拉夫使用本地建模在一组输入实例中有效地挖掘了字段的本地依赖性。在形式填充阶段,拉夫使用此类模型根据已经填充的形式及其依赖项的值中的值来预测目标场的可能值。然后将预测值(基于现场依赖和预测置信度认可),然后将其作为建议列表提供给最终用户。 我们通过评估了拉夫在两个数据集上填充表格填充的有效性和效率来评估,其中一个是银行领域的专有。实验结果表明,拉夫能够提供准确的建议,其平均相互等级值高于0.73。此外,拉夫是有效的,每个建议最多需要317毫秒。
Users frequently interact with software systems through data entry forms. However, form filling is time-consuming and error-prone. Although several techniques have been proposed to auto-complete or pre-fill fields in the forms, they provide limited support to help users fill categorical fields, i.e., fields that require users to choose the right value among a large set of options. In this paper, we propose LAFF, a learning-based automated approach for filling categorical fields in data entry forms. LAFF first builds Bayesian Network models by learning field dependencies from a set of historical input instances, representing the values of the fields that have been filled in the past. To improve its learning ability, LAFF uses local modeling to effectively mine the local dependencies of fields in a cluster of input instances. During the form filling phase, LAFF uses such models to predict possible values of a target field, based on the values in the already-filled fields of the form and their dependencies; the predicted values (endorsed based on field dependencies and prediction confidence) are then provided to the end-user as a list of suggestions. We evaluated LAFF by assessing its effectiveness and efficiency in form filling on two datasets, one of them proprietary from the banking domain. Experimental results show that LAFF is able to provide accurate suggestions with a Mean Reciprocal Rank value above 0.73. Furthermore, LAFF is efficient, requiring at most 317 ms per suggestion.
