论文标题
集群DAG中的因果效应鉴定
Causal Effect Identification in Cluster DAGs
论文作者
论文摘要
关于干预措施和反事实的影响的推理是整个数据科学的基本任务。在过去的几十年中,已经开发了用于执行此类任务的原则,算法和工具的集合(Pearl,2000)。整个文献中发现的普遍要求之一是对假设的表达,通常以因果图的形式出现。尽管有这种方法的力量,但在某些情况下,在所有变量上指定因果图所需的知识尚无可用,尤其是在复杂的高维域中。在本文中,我们引入了一种称为群集DAG(简短的C-DAG)的新的图形建模工具,该工具允许基于有限的先验知识的变量之间的关系部分规范,从而减轻了指定完整因果图的严格要求。 C-DAG指定变量簇之间的关系,而群集中变量之间的关系则未指定,可以看作是等价因果图的图形表示,这些因果图在集群之间共享关系。我们开发了有关pearl因果关系层次结构各层变量簇的有效推断的基础和机械(Pearl and Mackenzie 2018; Bareinboim等,2020) - L1(概率)(概率),L2(介入),L2(介入)和L3(firfactual)。特别是,我们证明了C-DAG中概率推断的D分隔的健全性和完整性。此外,我们证明了Pearl对C-DAGS的DO-Calculus规则的有效性,并表明标准ID识别算法是合理的,并且可以系统地从观察数据中系统地计算出c-DAG的观测数据。最后,我们表明C-DAGS对于执行有关变量簇的反事实推断有效。
Reasoning about the effect of interventions and counterfactuals is a fundamental task found throughout the data sciences. A collection of principles, algorithms, and tools has been developed for performing such tasks in the last decades (Pearl, 2000). One of the pervasive requirements found throughout this literature is the articulation of assumptions, which commonly appear in the form of causal diagrams. Despite the power of this approach, there are significant settings where the knowledge necessary to specify a causal diagram over all variables is not available, particularly in complex, high-dimensional domains. In this paper, we introduce a new graphical modeling tool called cluster DAGs (for short, C-DAGs) that allows for the partial specification of relationships among variables based on limited prior knowledge, alleviating the stringent requirement of specifying a full causal diagram. A C-DAG specifies relationships between clusters of variables, while the relationships between the variables within a cluster are left unspecified, and can be seen as a graphical representation of an equivalence class of causal diagrams that share the relationships among the clusters. We develop the foundations and machinery for valid inferences over C-DAGs about the clusters of variables at each layer of Pearl's Causal Hierarchy (Pearl and Mackenzie 2018; Bareinboim et al. 2020) - L1 (probabilistic), L2 (interventional), and L3 (counterfactual). In particular, we prove the soundness and completeness of d-separation for probabilistic inference in C-DAGs. Further, we demonstrate the validity of Pearl's do-calculus rules over C-DAGs and show that the standard ID identification algorithm is sound and complete to systematically compute causal effects from observational data given a C-DAG. Finally, we show that C-DAGs are valid for performing counterfactual inferences about clusters of variables.
