Transformer-based deep learning approach to real-time violence detection

Citation

Abstract

Violence detection has always been a challenging task in the field of computer vision and machine learning due to the complexity of real-world environments, imbalanced data, and the need for real-time performance. Furthermore, automated violence detection in surveillance systems is essential for enhancing public safety and enabling advanced security applications. Over these years several models such as CNN+LSTM, MSBT, SlowFast and many machine learning techniques have been adopted to classify violence. While existing models have achieved strong results in binary classification, their performance often falters when applied to large, imbalanced multiclass datasets like UCF-Crime. In this work, we propose a transformer based lightweight model, the Dynamic Memory Bank Fused Attention Network (DMFA-Net), designed to overcome these limitations. Our model leverages a Cross Attention mechanism to selectively retrieve relevant information from a Memory Bank, allowing it to achieve significantly higher accuracy in both binary and multiclass violence detection tasks. Experimental results demonstrate that DMFA-Net outperforms existing state-of-the-art models in the field. We also discuss the practical integration of our approach into real-time, autonomous surveillance systems for reliable violence detection.

LC Subject Headings

Description

Cataloged from PDF version of thesis.
Includes bibliographical references (pages 46-48).
This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2024.

Publisher Link

Type

Thesis