论文标题
NLG系统的自动化文本自然评估
Automating Text Naturalness Evaluation of NLG Systems
论文作者
论文摘要
自动生成文本的各种质量标准的自动方法和指标对于开发NLG系统很重要,因为它们会产生可重复的结果并允许快速开发周期。我们在这里提出了一种自动化文本自然性评估的尝试,这是自然语言生成方法的非常重要的特征。我们建议通过使用人类的可能性度量标准和基于概率分布的大型语言模型来自动化该过程,而不是依靠人类参与者来评分或标记文本样本。我们分析了文本概率分数,并观察它们如何受到该过程中涉及的生成和歧视模型的大小的影响。根据我们的结果,较大的发电机和较大的预估计歧视者更适合更好地评估文本自然性。需要对人类参与者进行全面验证程序,以便随访以检查该自动评估方案与人类判断的关系程度。
Automatic methods and metrics that assess various quality criteria of automatically generated texts are important for developing NLG systems because they produce repeatable results and allow for a fast development cycle. We present here an attempt to automate the evaluation of text naturalness which is a very important characteristic of natural language generation methods. Instead of relying on human participants for scoring or labeling the text samples, we propose to automate the process by using a human likeliness metric we define and a discrimination procedure based on large pretrained language models with their probability distributions. We analyze the text probability fractions and observe how they are influenced by the size of the generative and discriminative models involved in the process. Based on our results, bigger generators and larger pretrained discriminators are more appropriate for a better evaluation of text naturalness. A comprehensive validation procedure with human participants is required as follow up to check how well this automatic evaluation scheme correlates with human judgments.
