Modelos de datos de conteo para el estudio de datos de RNA-SEQ
|Author||Córdoba Chamizo, Yolanda|
|Director||Barranco Chamorro, Inmaculada
Luque Calvo, Pedro Luis
|Department||Universidad de Sevilla. Departamento de Estadística e Investigación Operativa|
|Document type||Final Degree Work|
|Academic Title||Universidad de Sevilla. Grado en Matemáticas|
|Abstract||In this work we will explain the knowledge and techniques which are necessaries to work with RNA-Seq data, a technology used in order to detect and quantify the quantity of DNA of a genome. Firstly, in Chapter 1, we will ...
In this work we will explain the knowledge and techniques which are necessaries to work with RNA-Seq data, a technology used in order to detect and quantify the quantity of DNA of a genome. Firstly, in Chapter 1, we will explain the aforementioned technique and its importance nowadays, emphasizing its application in medicine. We will also compare it with another technology called microarrays. In order to carry out our study, we will need statistical specific models for data count. For this reason, we will explain the Generalized Linear Models (GLM) in Chapter 2, along with other necessary algorithms which estimate the parameters of GLM. Later we will focus on the most useful models for count data. We will obtain their probability distributions, will estimate the parameters and will give the interpretation of them. So we will study the Poisson regression model in Chapter 3, and subsequently, the negative binomial regression model, which is the most appropriate one when we have to deal with RNA-seq data, since the Poisson regression model presents quite often the problem of overdispersion. That’s the reason why we focus on the study of the negative binomial regression model in Chapter 4. Finally, in Chapter 5, we will give an application to a real dataset, obtained with RNA-seq which proceed from a biological study whose aim is to find Drosophila gens which are di!erentially expressed. We will carry out the statistical analysis with R , specifically with the software Bioconductor and the packages DESeq and DESeq2.