ViolenceNet: Dense Multi-Head Self-Attention with Bidirectional Convolutional LSTM for Detecting Violence

Rendón Segador, Fernando José; Álvarez García, Juan Antonio; Enríquez de Salamanca Ros, Fernando; Deniz, Oscar

doi:10.3390/electronics10131601

Artículo

dc.creator	Rendón Segador, Fernando José	es
dc.creator	Álvarez García, Juan Antonio	es
dc.creator	Enríquez de Salamanca Ros, Fernando	es
dc.creator	Deniz, Oscar	es
dc.date.accessioned	2021-09-08T09:41:22Z
dc.date.available	2021-09-08T09:41:22Z
dc.date.issued	2021
dc.identifier.citation	Rendón Segador, F.J., Álvarez García, J.A., Enríquez de Salamanca Ros, F. y Deniz, O. (2021). ViolenceNet: Dense Multi-Head Self-Attention with Bidirectional Convolutional LSTM for Detecting Violence. Electronics, 10 (13), 1-16.
dc.identifier.issn	2079-9292	es
dc.identifier.uri	https://hdl.handle.net/11441/125566
dc.description.abstract	Introducing efficient automatic violence detection in video surveillance or audiovisual content monitoring systems would greatly facilitate the work of closed-circuit television (CCTV) operators, rating agencies or those in charge of monitoring social network content. In this paper we present a new deep learning architecture, using an adapted version of DenseNet for three dimensions, a multi-head self-attention layer and a bidirectional convolutional long short-term memory (LSTM) module, that allows encoding relevant spatio-temporal features, to determine whether a video is violent or not. Furthermore, an ablation study of the input frames, comparing dense optical flow and adjacent frames subtraction and the influence of the attention layer is carried out, showing that the combination of optical flow and the attention mechanism improves results up to 4.4%. The conducted experiments using four of the most widely used datasets for this problem, matching or exceeding in some cases the results of the state of the art, reducing the number of network parameters needed (4.5 millions), and increasing its efficiency in test accuracy (from 95.6% on the most complex dataset to 100% on the simplest one) and inference time (less than 0.3 s for the longest clips). Finally, to check if the generated model is able to generalize violence, a cross-dataset analysis is performed, which shows the complexity of this approach: using three datasets to train and testing on the remaining one the accuracy drops in the worst case to 70.08% and in the best case to 81.51%, which points to future work oriented towards anomaly detection in new datasets.	es
dc.description.sponsorship	Ministerio de Economía y Competitividad TIN2017-82113-C2-1-R	es
dc.description.sponsorship	MInisterio de Economía y Competitividad TIN2017-82113-C2-2-R	es
dc.format	application/pdf	es
dc.format.extent	16	es
dc.language.iso	eng	es
dc.publisher	MDPI	es
dc.relation.ispartof	Electronics, 10 (13), 1-16.
dc.rights	Attribution-NonCommercial-NoDerivatives 4.0 Internacional	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/	*
dc.subject	violence detection	es
dc.subject	fight detection	es
dc.subject	deep learning	es
dc.subject	dense net	es
dc.subject	bidirectional ConvLSTM	es
dc.title	ViolenceNet: Dense Multi-Head Self-Attention with Bidirectional Convolutional LSTM for Detecting Violence	es
dc.type	info:eu-repo/semantics/article	es
dcterms.identifier	https://ror.org/03yxnpp24
dc.type.version	info:eu-repo/semantics/publishedVersion	es
dc.rights.accessRights	info:eu-repo/semantics/openAccess	es
dc.contributor.affiliation	Universidad de Sevilla. Departamento de Lenguajes y Sistemas Informáticos	es
dc.relation.projectID	TIN2017-82113-C2-1-R	es
dc.relation.projectID	TIN2017-82113-C2-2-R	es
dc.relation.publisherversion	https://www.mdpi.com/2079-9292/10/13/1601/htm	es
dc.identifier.doi	10.3390/electronics10131601	es
dc.contributor.group	Universidad de Sevilla. TIC-134: Sistemas Informáticos	es
dc.journaltitle	Electronics	es
dc.publication.volumen	10	es
dc.publication.issue	13	es
dc.publication.initialPage	1	es
dc.publication.endPage	16	es
dc.contributor.funder	Ministerio de Economía y Competitividad (MINECO). España	es

Ficheros	Tamaño	Formato	Ver	Descripción
ViolenceNet Dense Multi-Head ...	4.547Mb	[PDF]	Ver/Abrir

Este registro aparece en las siguientes colecciones

Artículos (Lenguajes y Sistemas Informáticos)

Mostrar el registro sencillo del ítem

Excepto si se señala otra cosa, la licencia del ítem se describe como: Attribution-NonCommercial-NoDerivatives 4.0 Internacional