Improving the Performance of a Tagger Generator in an Information Extraction Application

Troyano Jiménez, José Antonio; Enríquez de Salamanca Ros, Fernando; Cruz Mata, Fermín; Cañete Valdeón, José Miguel; Ortega Rodríguez, Francisco Javier

Artículo

dc.creator	Troyano Jiménez, José Antonio	es
dc.creator	Enríquez de Salamanca Ros, Fernando	es
dc.creator	Cruz Mata, Fermín	es
dc.creator	Cañete Valdeón, José Miguel	es
dc.creator	Ortega Rodríguez, Francisco Javier	es
dc.date.accessioned	2020-08-03T08:06:38Z
dc.date.available	2020-08-03T08:06:38Z
dc.date.issued	2007
dc.identifier.citation	Troyano Jiménez, J.A., Enríquez de Salamanca Ros, F., Cruz Mata, F., Cañete Valdeón, J.M. y Ortega Rodríguez, F.J. (2007). Improving the Performance of a Tagger Generator in an Information Extraction Application. Journal of Universal Computer Science, 13 (9), 1287-1299.
dc.identifier.issn	0948-695X	es
dc.identifier.uri	https://hdl.handle.net/11441/100068
dc.description.abstract	In this paper we present an experience in the extraction of named entities from Spanish texts using stacking. Named Entity Extraction (NEE) is a subtask of Information Extraction that involves the identification of groups of words that make up the name of an entity, and the classification of these names into a set of predefined categories. Our approach is corpus-based, we use a re-trainable tagger generator to obtain a named entity extractor from a set of tagged examples. The main contribution of our work is that we obtain the systems needed in a stacking scheme without making use of any additional training material or tagger generators. Instead of it, we have generated the variability needed in stacking by applying corpus transformation to the original training corpus. Once we have several versions of the training corpus we generate several extractors and combine them by means of a machine learning algorithm. Experiments show that the combination of corpus transformation and stacking improve the performance of the tagger generator in this kind of natural language processing applications. The best of our experiments achieves an improvement of more than six percentual points respect to the predefined baseline.	es
dc.format	application/pdf	es
dc.format.extent	13	es
dc.language.iso	eng	es
dc.publisher	Graz University of Technology, Institut für Informations systeme und Computer Medien (IICM)	es
dc.relation.ispartof	Journal of Universal Computer Science, 13 (9), 1287-1299.
dc.rights	Attribution-NonCommercial-NoDerivatives 4.0 Internacional	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/	*
dc.subject	Named Entity Extraction	es
dc.subject	Corpus Transformation	es
dc.subject	System Combination	es
dc.subject	Stacking	es
dc.title	Improving the Performance of a Tagger Generator in an Information Extraction Application	es
dc.type	info:eu-repo/semantics/article	es
dcterms.identifier	https://ror.org/03yxnpp24
dc.type.version	info:eu-repo/semantics/publishedVersion	es
dc.rights.accessRights	info:eu-repo/semantics/openAccess	es
dc.contributor.affiliation	Universidad de Sevilla. Departamento de Lenguajes y Sistemas Informáticos	es
dc.relation.publisherversion	http://www.jucs.org/jucs_13_9/improving_the_performance_of	es
dc.journaltitle	Journal of Universal Computer Science	es
dc.publication.volumen	13	es
dc.publication.issue	9	es
dc.publication.initialPage	1287	es
dc.publication.endPage	1299	es
dc.identifier.sisius	6634177	es

Ficheros	Tamaño	Formato	Ver	Descripción
Improving the Performance of a ...	150.3Kb	[PDF]	Ver/Abrir

Este registro aparece en las siguientes colecciones

Artículos (Lenguajes y Sistemas Informáticos)

Mostrar el registro sencillo del ítem

Excepto si se señala otra cosa, la licencia del ítem se describe como: Attribution-NonCommercial-NoDerivatives 4.0 Internacional