Article
CrimeNet: Neural Structured Learning using Vision Transformer for violence detection
Author/s | Rendón Segador, Fernando José
Álvarez García, Juan Antonio Salazar González, Jose Luis Tommasi, Tatiana |
Department | Universidad de Sevilla. Departamento de Lenguajes y Sistemas Informáticos |
Publication Date | 2023-02-02 |
Deposit Date | 2023-02-09 |
Published in |
|
Abstract | The state of the art in violence detection in videos has improved in recent years thanks to deep learning models, but it is still below 90% of average precision in the most complex datasets, which may pose a problem of ... The state of the art in violence detection in videos has improved in recent years thanks to deep learning models, but it is still below 90% of average precision in the most complex datasets, which may pose a problem of frequent false alarms in video surveillance environments and may cause security guards to disable the artificial intelligence system. In this study, we propose a new neural network based on Vision Transformer (ViT) and Neural Structured Learning (NSL) with adversarial training. This network, called CrimeNet, outperforms previous works by a large margin and reduces practically to zero the false positives. Our tests on the four most challenging violence-related datasets (binary and multi-class) show the effectiveness of CrimeNet, improving the state of the art from 9.4 to 22.17 percentage points in ROC AUC depending on the dataset. In addition, we present a generalisation study on our model by training and testing it on different datasets. The obtained results show that CrimeNet improves over competing methods with a gain of between 12.39 and 25.22 percentage points, showing remarkable robustness. |
Funding agencies | Ministerio de Ciencia e Innovación (MICIN). España European Union (UE) |
Project ID. | DISARM project - Grant n. PDC2021-121197
HORUS project - Grant n. PID2021-126359OB-I00 |
Citation | Rendón Segador, F.J., Álvarez García, J.A., Salazar González, J.L. y Tommasi, T. (2023). CrimeNet: Neural Structured Learning using Vision Transformer for violence detection. Neural Networks, February 2023, 1-24. https://doi.org/10.1016/j.neunet.2023.01.048. |
Files | Size | Format | View | Description |
---|---|---|---|---|
1-s2.0-S0893608023000606-main.pdf | 1.528Mb | [PDF] | View/ | |
This item appears in the following collection(s)
This document is protected by intellectual and industrial property rights. Without prejudice to existing legal exemptions, its reproduction, distribution, public communication or transformation is prohibited without the authorization of the rights holder, unless otherwise indicated.