dc.creator | Cotelo Moya, Juan Manuel | es |
dc.creator | Cruz Mata, Fermín | es |
dc.creator | Troyano Jiménez, José Antonio | es |
dc.creator | Ortega Rodríguez, Francisco Javier | es |
dc.date.accessioned | 2020-07-09T07:33:27Z | |
dc.date.available | 2020-07-09T07:33:27Z | |
dc.date.issued | 2015 | |
dc.identifier.citation | Cotelo Moya, J.M., Cruz Mata, F., Troyano Jiménez, J.A. y Ortega Rodríguez, F.J. (2015). A modular approach for lexical normalization applied to Spanish tweets. Expert Systems with Applications, 42 (10), 4743-4754. | |
dc.identifier.issn | 0957-4174 | es |
dc.identifier.uri | https://hdl.handle.net/11441/99108 | |
dc.description.abstract | Twitter is a social media platform with widespread success where millions of people continuously
express ideas and opinions about a myriad of topics. It is a huge and interesting source of data but most
of these texts are usually written hastily and very abbreviated, rendering them unsuitable for traditional
Natural Language Processing (NLP). The two main contributions of this work are: the characterization of
the textual error phenomena in Twitter and the proposal of a modular normalization system that
improves the textual quality of tweets. Instead of focusing on a single technique, we propose an extensible
normalization system that relies on the combination of several independent ‘‘expert modules’’, each
one addressing an very specific error phenomenon in its own way, thus increasing module accuracy and
lowering the module building costs. Broadly speaking, the system resembles to an ‘‘expert board’’: modules
independently propose correction candidates for each Out of Vocabulary (OOV) word, rank the candidates
and the best one is selected. In order to evaluate our proposal, we perform several experiments
using texts from Twitter written in Spanish about a specific topic. The flexibility of defining resources at
different language levels (core language, domain, genre) combined with the modular architecture lead to
lower costs and a good performance: requiring a minimal effort for building the resources and achieving
more than 82% of accuracy compared to the 31% yielded by the baseline. | es |
dc.description.sponsorship | Ministerio de Economía y Competitividad TIN2012-38536-C03-02 | es |
dc.description.sponsorship | Junta de Andalucía P11-TIC-7684 MO | es |
dc.format | application/pdf | es |
dc.format.extent | 12 | es |
dc.language.iso | eng | es |
dc.publisher | Elsevier | es |
dc.relation.ispartof | Expert Systems with Applications, 42 (10), 4743-4754. | |
dc.rights | Attribution-NonCommercial-NoDerivatives 4.0 Internacional | * |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/4.0/ | * |
dc.subject | Twitter | es |
dc.subject | Text normalization | es |
dc.subject | Domain adaptation | es |
dc.title | A modular approach for lexical normalization applied to Spanish tweets | es |
dc.type | info:eu-repo/semantics/article | es |
dcterms.identifier | https://ror.org/03yxnpp24 | |
dc.type.version | info:eu-repo/semantics/submittedVersion | es |
dc.rights.accessRights | info:eu-repo/semantics/openAccess | es |
dc.contributor.affiliation | Universidad de Sevilla. Departamento de Lenguajes y Sistemas Informáticos | es |
dc.relation.projectID | TIN2012-38536-C03-02 | es |
dc.relation.projectID | P11-TIC-7684 MO | es |
dc.relation.publisherversion | https://www.sciencedirect.com/science/article/pii/S0957417415000962 | es |
dc.identifier.doi | 10.1016/j.eswa.2015.02.003 | es |
dc.journaltitle | Expert Systems with Applications | es |
dc.publication.volumen | 42 | es |
dc.publication.issue | 10 | es |
dc.publication.initialPage | 4743 | es |
dc.publication.endPage | 4754 | es |
dc.identifier.sisius | 20947060 | es |
dc.contributor.funder | Ministerio de Economía y Competitividad (MINECO). España | es |
dc.contributor.funder | Junta de Andalucía | es |