Modelado en Pygame del péndulo invertido y pautas para un control mediante técnicas de Reinforcement Learning

Gómez Urbano, Sergio

Trabajo Fin de Grado

dc.contributor.advisor	Real Torres, Alejandro del	es
dc.creator	Gómez Urbano, Sergio	es
dc.date.accessioned	2020-07-23T15:07:53Z
dc.date.available	2020-07-23T15:07:53Z
dc.date.issued	2020
dc.identifier.citation	Gómez Urbano, S. (2020). Modelado en Pygame del péndulo invertido y pautas para un control mediante técnicas de Reinforcement Learning. (Trabajo Fin de Máster Inédito). Universidad de Sevilla, Sevilla.
dc.identifier.uri	https://hdl.handle.net/11441/99789
dc.description.abstract	El Reinforcement Learning, es un campo del aprendizaje automático que cuenta con unas características distintivas, las cuales le dotan de utilidades específicas respecto al resto de áreas de este ámbito. A grandes rasgos, el Reinforcement Learning consiste en un agente software, cuyo acometido será maximizar una determinada recompensa o premio, para ello realizará unas determinadas acciones que en un correcto funcionamiento lograrán optimizar el resultado perseguido por el programador. De lo citado anteriormente, se puede extraer que el Reinforcement Learning no es un método para encontrar una solución óptima, sino más bien un aprendizaje progresivo o aproximación. Dicho esto, es fácilmente deducible que el método en cuestión será una opción a tener en cuenta en entornos donde la perfección es difícil de obtener, incluso de definir. El resultado de lo expuesto, es la elección de este método para buscar una solución en el difícil acometido de controlar el péndulo invertido. Consideraremos éste como un sistema 2D, donde el agente ya descrito se encargará de dar unos impulsos al carro hacia delante o atrás, elecciones que será capaz de tomar tras su aprendizaje. De la ambición por mostrar los resultados del trabajo realizado en tiempo real, surge la tarea de elegir un entorno de programación donde sea fácilmente implementable el método de aprendizaje por refuerzo, así como posible la representación de gráficos en directo para escenificar el péndulo invertido. El que se tuvo en cuenta desde el primer momento, y finalmente el elegido, fue Python. Su elección se entiende a raíz de que existe una librería muy conocida para este lenguaje, que permite la implementación y actualización en tiempo real de dinámicas y gráficos en consonancia con ellas. Esta librería es Pygame, originalmente concebida para realización de videojuegos, pero cuya versatilidad le permite ser utilizada en todo tipo de aplicaciones que conlleven actualización de gráficos basados en ecuaciones. Así, se tienen las herramientas con las que se perseguirá el objetivo de mantener erguido el poste de péndulo invertido. Si bien este es el objetivo final de un proyecto muy ambicioso, en el presente trabajo de fin de grado se estrecha el cerco hasta llegar únicamente a la consecución de un modelo funcional del péndulo invertido, así como a dar las pautas de un futuro control.	es
dc.description.abstract	Reinforcement Learning is a region of machine learning which has many distinctive features. These endow it of specific utilities in regard to the rest of areas belonging to this ambit. Roughly, Reinforcement Learning consists of a software agent whose aim is to maximize a certain reward or prize, chasing this target, it will do certain actions that in a correct operation will optimize the result pursued by the programmer. From the aforementioned, it is posible to deduce that Reinforcement Learning is not a method to find a unique and optimal solution, rather, it’s a progressive learning or aproximation. Having said that, is easy to deduct that the method in question will be an option to consider in environments where the perfection is hard to obtain even to define. As a result of the above, Reinforcement Learning has been chosen for the difficult challenge of controlling the inverted pendulum. It will be considered as a 2D system, where the agent already described will be responsible of provide impulses forward or backward to the cart. Decisions that it will be able to take after it learning. From the ambition to show the results of the performance of the agent in real time, it comes up the task of choosing a programming environment. This one must be able to easily represent live graphics to stage the inverted pendulum, it will also be necessary to consider the ease of implementation of the learning method. The one that was kept on mind from the first moment, and finally the one chosen was Python. That choice is understood because of the existence of a library well known in this language, Pygame. This library is able to implement and update dynamics and graphs in real time in line with equations inserted by the programmer. Accordingly, these are the tools which will be used to pursuit the objective of keeping erect the pole of the inverted pendulum. While this is the final objective of a very ambicious project, in the current final degree project, the scope is limited to get a functional model of the cart-pole and set the guidelines for a future control.	es
dc.format	application/pdf	es
dc.format.extent	61	es
dc.language.iso	spa	es
dc.rights	Attribution-NonCommercial-NoDerivatives 4.0 Internacional	*
dc.rights.uri	http://creativecommons.org/licenses/by-nc-nd/4.0/	*
dc.title	Modelado en Pygame del péndulo invertido y pautas para un control mediante técnicas de Reinforcement Learning	es
dc.type	info:eu-repo/semantics/bachelorThesis	es
dc.type.version	info:eu-repo/semantics/publishedVersion	es
dc.rights.accessRights	info:eu-repo/semantics/openAccess	es
dc.contributor.affiliation	Universidad de Sevilla. Departamento de Ingeniería de Sistemas y Automática	es
dc.description.degree	Universidad de Sevilla. Grado en Ingeniería Electrónica, Robótica y Mecatrónica	es
dc.publication.endPage	52 p.	es

Ficheros	Tamaño	Formato	Ver	Descripción
TFG-2840-GOMEZ URBANO.pdf	2.386Mb	[PDF]	Ver/Abrir

Este registro aparece en las siguientes colecciones

Grado en Ingeniería Electrónica, Robótica y Mecatrónica (UMA/USE)

Mostrar el registro sencillo del ítem

Excepto si se señala otra cosa, la licencia del ítem se describe como: Attribution-NonCommercial-NoDerivatives 4.0 Internacional