论文标题
视频域适应性的对抗性二手图学习
Adversarial Bipartite Graph Learning for Video Domain Adaptation
论文作者
论文摘要
由于跨源(即训练)和目标(即测试)域的显着空间和时间变化,因此很少在视频识别区域探索域名适应技术,该域适应技术集中在分布不同域之间的模型。因此,最近在视觉领域适应性上进行的著作,利用对抗性学习来统一源和目标视频表示并增强功能可传递性在视频中并不有效。为了克服这一局限性,在本文中,我们学习了一个域 - 不可思议的视频分类器,而不是学习域 - 不变表示形式,并提出了一个对抗性的双分化图(ABG)学习框架,该框架将其直接与源代码 - 塔吉的交互建模,并与Biptittite图的网络拓扑结构进行建模。具体而言,将源框架和目标框架采样为异质顶点,而连接两种类型节点的边缘测量了它们之间的亲和力。通过消息通话,每个顶点汇总了其异质邻居的功能,迫使来自同一类的功能均匀混合。在培训和测试阶段将视频分类器显式暴露于此类跨域表示,这会使我们的模型降低对标记的源数据的偏见,这导致对目标域进行更好的概括。为了进一步增强模型容量并证明了在困难的传输任务上提出的体系结构的鲁棒性,我们将模型扩展到使用其他视频级别的两部分图,以在半监督的设置中工作。在四个基准上进行的广泛实验证明了拟议方法对SOTA方法对视频识别任务的有效性。
Domain adaptation techniques, which focus on adapting models between distributionally different domains, are rarely explored in the video recognition area due to the significant spatial and temporal shifts across the source (i.e. training) and target (i.e. test) domains. As such, recent works on visual domain adaptation which leverage adversarial learning to unify the source and target video representations and strengthen the feature transferability are not highly effective on the videos. To overcome this limitation, in this paper, we learn a domain-agnostic video classifier instead of learning domain-invariant representations, and propose an Adversarial Bipartite Graph (ABG) learning framework which directly models the source-target interactions with a network topology of the bipartite graph. Specifically, the source and target frames are sampled as heterogeneous vertexes while the edges connecting two types of nodes measure the affinity among them. Through message-passing, each vertex aggregates the features from its heterogeneous neighbors, forcing the features coming from the same class to be mixed evenly. Explicitly exposing the video classifier to such cross-domain representations at the training and test stages makes our model less biased to the labeled source data, which in-turn results in achieving a better generalization on the target domain. To further enhance the model capacity and testify the robustness of the proposed architecture on difficult transfer tasks, we extend our model to work in a semi-supervised setting using an additional video-level bipartite graph. Extensive experiments conducted on four benchmarks evidence the effectiveness of the proposed approach over the SOTA methods on the task of video recognition.
