论文标题
使用多目标优化模型使用不完整数据集的机器学习
Machine learning with incomplete datasets using multi-objective optimization models
论文作者
论文摘要
已经开发了机器学习技术来从完整的数据中学习。当数据集中存在缺失值时,应通过删除具有缺失值或插补的数据点来分别预处理数据。在本文中,我们提出了一种在线方法来处理缺失值的方法,同时学习了分类模型。为了实现这一目标,我们开发了一个多目标优化模型,该模型具有两个目标函数,用于插补和模型选择。我们还提出了三种用于归纳目标函数的公式。我们使用基于NSGA II的进化算法来找到最佳溶液作为帕累托溶液。我们使用实验来研究拟议模型的可靠性和鲁棒性,通过定义处理缺失值和分类的几种情况。我们还描述了建议的模型如何为医学信息学做出贡献。我们通过实验结果比较了三种不同配方的性能。提出的模型结果可以通过与可比文献进行比较来验证。
Machine learning techniques have been developed to learn from complete data. When missing values exist in a dataset, the incomplete data should be preprocessed separately by removing data points with missing values or imputation. In this paper, we propose an online approach to handle missing values while a classification model is learnt. To reach this goal, we develop a multi-objective optimization model with two objective functions for imputation and model selection. We also propose three formulations for imputation objective function. We use an evolutionary algorithm based on NSGA II to find the optimal solutions as the Pareto solutions. We investigate the reliability and robustness of the proposed model using experiments by defining several scenarios in dealing with missing values and classification. We also describe how the proposed model can contribute to medical informatics. We compare the performance of three different formulations via experimental results. The proposed model results get validated by comparing with a comparable literature.
