Mostrar el registro sencillo del ítem

Tesis Doctoral

dc.contributor.advisorMurillo Fuentes, Juan Josées
dc.contributor.advisorMartínez Olmos, Pabloes
dc.creatorAradillas Jaramillo, Jose Carloses
dc.date.accessioned2022-03-10T10:20:03Z
dc.date.available2022-03-10T10:20:03Z
dc.date.issued2021-12-17
dc.identifier.citationAradillas Jaramillo, J.C. (2021). Machine Learning for handwriting text recognition in historical documents. (Tesis Doctoral Inédita). Universidad de Sevilla, Sevilla.
dc.identifier.urihttps://hdl.handle.net/11441/130648
dc.description.abstractOlmos ABSTRACT In this thesis, we focus on the handwriting text recognition task over historical documents that are difficult to read for any person that is not an expert in ancient languages and writing style. We aim to take advantage and improve the neural networks architectures and techniques that other authors are proposing for handwriting text recognition in modern handwritten documents. These models perform this task very precisely when a large amount of data is available. However, the low availability of labeled data is a widespread problem in historical documents. The type of writing is singular, and it is pretty expensive to hire an expert to transcribe a large number of pages. After investigating and analyzing the state-of-the-art, we propose the efficient application of methods such as transfer learning and data augmentation. We also contribute an algorithm for purging mislabeled samples that affect the learning of models. Finally, we develop a variational auto encoder method for generating synthetic samples of handwritten text images for data augmentation. Experiments are performed on various historical handwritten text databases to validate the performance of the proposed algorithms. The various included analyses focus on the evolution of the character and word error rate (CER and WER) as we increase the training dataset. One of the most important results is the participation in a contest for transcription of historical handwritten text. The organizers provided us with a dataset of documents to train the model, then just a few labeled pages of 5 new documents were handled to adjust the solution further. Finally, the transcription of nonlabeled images was requested to evaluate the algorithm. Our method raked second in this contest.es
dc.formatapplication/pdfes
dc.format.extent139 p.es
dc.language.isoenges
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 Internacional*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/*
dc.titleMachine Learning for handwriting text recognition in historical documentses
dc.typeinfo:eu-repo/semantics/doctoralThesises
dcterms.identifierhttps://ror.org/03yxnpp24
dc.type.versioninfo:eu-repo/semantics/publishedVersiones
dc.rights.accessRightsinfo:eu-repo/semantics/openAccesses
dc.contributor.affiliationUniversidad de Sevilla. Departamento de Teoría de la Señal y Comunicacioneses
dc.publication.endPage121es

FicherosTamañoFormatoVerDescripción
Aradillas Jaramillo, José Carlos ...34.81MbIcon   [PDF] Ver/Abrir  

Este registro aparece en las siguientes colecciones

Mostrar el registro sencillo del ítem

Attribution-NonCommercial-NoDerivatives 4.0 Internacional
Excepto si se señala otra cosa, la licencia del ítem se describe como: Attribution-NonCommercial-NoDerivatives 4.0 Internacional