Named Entity Recognition Through Corpus Transformation and System Combination
|Author/s||Troyano Jiménez, José Antonio
Carrillo Montero, Vicente
Enríquez de Salamanca Ros, Fernando
Galán Morillo, Francisco José
|Department||Universidad de Sevilla. Departamento de Lenguajes y Sistemas Informáticos|
|Abstract||In this paper we investigate the way of combining different
taggers to improve their performance in the named entity recognition
task. The main resources used in our experiments are the publicly available
taggers TnT ...
In this paper we investigate the way of combining different taggers to improve their performance in the named entity recognition task. The main resources used in our experiments are the publicly available taggers TnT and TBL and a corpus of Spanish texts in which named entities occurrences are tagged with BIO tags. We have defined three transformations that provide us three additional versions of the training corpus. The transformations change either the words or the tags, and the three of them improve the results of TnT and TBL when they are trained with the original version of the corpus. With the four versions of the corpus and the two taggers, we have eight different models that can be combined with several techniques. The experiments carried out show that using machine learning techniques to combine them the performance improves considerably. We improve the baselines for TnT (Fβ=1 value of 85.25) and TBL (Fβ=1 value of 87.45) up to a value of 90.90 in the best of our experiments.
|Citation||Troyano Jiménez, J.A., Carrillo Montero, V., Enríquez de Salamanca Ros, F. y Galán Morillo, F.J. (2004). Named Entity Recognition Through Corpus Transformation and System Combination. En EsTAL 2004: 4th International Conference on Natural Language Processing (255-266), Alicante, España: Springer.|