论文标题
视频异常检测的时空关系学习
Spatio-Temporal Relation Learning for Video Anomaly Detection
论文作者
论文摘要
异常识别高度取决于对象与场景之间的关系,因为相同/不同场景中的不同/相同对象动作可能导致不同程度的正态性和异常。因此,对象场所关系实际上在异常检测中起着至关重要的作用,但在以前的工作中探讨了不足。在本文中,我们提出了一个时空关系学习(STRL)框架来应对视频异常检测任务。首先,考虑到对象的动态特征以及场景区域,我们构建了一个时空自动编码器(STAE),以共同利用代表学习的空间和时间演化模式。为了获得更好的图案提取,在STAE模块中设计了两个解码分支,即通过直接预测下一个帧来捕获空间提示的外观分支,而运动分支则重点是通过光流预测来建模动力学。然后,为了很好地融合对象场所关系,设计了一个关系学习(RL)模块来通过引入知识图嵌入方法来分析和总结正常关系。在此过程中,具体来说,对象场景关系的合理性是通过共同建模对象/场景特征和优化的对象场所关系图来衡量的。在三个公共数据集上进行了广泛的实验,而对最先进的方法的出色性能证明了我们方法的有效性。
Anomaly identification is highly dependent on the relationship between the object and the scene, as different/same object actions in same/different scenes may lead to various degrees of normality and anomaly. Therefore, object-scene relation actually plays a crucial role in anomaly detection but is inadequately explored in previous works. In this paper, we propose a Spatial-Temporal Relation Learning (STRL) framework to tackle the video anomaly detection task. First, considering dynamic characteristics of the objects as well as scene areas, we construct a Spatio-Temporal Auto-Encoder (STAE) to jointly exploit spatial and temporal evolution patterns for representation learning. For better pattern extraction, two decoding branches are designed in the STAE module, i.e. an appearance branch capturing spatial cues by directly predicting the next frame, and a motion branch focusing on modeling the dynamics via optical flow prediction. Then, to well concretize the object-scene relation, a Relation Learning (RL) module is devised to analyze and summarize the normal relations by introducing the Knowledge Graph Embedding methodology. Specifically in this process, the plausibility of object-scene relation is measured by jointly modeling object/scene features and optimizable object-scene relation maps. Extensive experiments are conducted on three public datasets, and the superior performance over the state-of-the-art methods demonstrates the effectiveness of our method.
