论文标题
改进了使用自动分化的高斯混合物模型的推理,用于聚类和可重复性分析
Improved Inference of Gaussian Mixture Copula Model for Clustering and Reproducibility Analysis using Automatic Differentiation
论文作者
论文摘要
Copulas提供了多元分布的模块化参数化,该参数化将边际建模与它们之间的依赖关系解剖。高斯混合物模型(GMCM)是一种高度柔性的副物,可以对多种多模式依赖性以及不对称和尾部依赖性进行建模。它们已有效地用于聚类非高斯数据和可重复性分析,这是一种荟萃分析方法,旨在验证多个高通量实验的可靠性和一致性。 GMCM的参数估计由于其棘手的可能性而具有挑战性。最好的先前方法通过伪预期最大化(PEM)算法最大程度地提高了替代可能性。他们不能保证收敛或收敛到正确的参数。在本文中,我们使用自动分化工具(AD)工具来开发一种称为AD-GMCM的方法,该方法可以最大化确切的GMCM可能性。在我们的仿真研究和实验数据实验中,AD-GMCM比PEM发现更准确的参数估计值,并且在聚类和可重复性分析中的性能更高。我们讨论了基于AD的方法的优势,以解决与GMCM中可能性增加和参数可识别性有关的问题。我们还分析了GMCM,这是GMM中最大似然的两个众所周知的退化案例,这可能导致虚假聚类溶液。我们的分析表明,与GMM不同,在其中一种情况下,GMCM不会受到影响。
Copulas provide a modular parameterization of multivariate distributions that decouples the modeling of marginals from the dependencies between them. Gaussian Mixture Copula Model (GMCM) is a highly flexible copula that can model many kinds of multi-modal dependencies, as well as asymmetric and tail dependencies. They have been effectively used in clustering non-Gaussian data and in Reproducibility Analysis, a meta-analysis method designed to verify the reliability and consistency of multiple high-throughput experiments. Parameter estimation for GMCM is challenging due to its intractable likelihood. The best previous methods have maximized a proxy-likelihood through a Pseudo Expectation Maximization (PEM) algorithm. They have no guarantees of convergence or convergence to the correct parameters. In this paper, we use Automatic Differentiation (AD) tools to develop a method, called AD-GMCM, that can maximize the exact GMCM likelihood. In our simulation studies and experiments with real data, AD-GMCM finds more accurate parameter estimates than PEM and yields better performance in clustering and Reproducibility Analysis. We discuss the advantages of an AD-based approach, to address problems related to monotonic increase of likelihood and parameter identifiability in GMCM. We also analyze, for GMCM, two well-known cases of degeneracy of maximum likelihood in GMM that can lead to spurious clustering solutions. Our analysis shows that, unlike GMM, GMCM is not affected in one of the cases.
