Artículos (Ciencias de la Computación e Inteligencia Artificial)

URI permanente para esta colecciónhttps://hdl.handle.net/11441/11302

Examinar

Envíos recientes

Mostrando 1 - 20 de 392
  • Acceso AbiertoArtículo
    D3A-TS: Denoising-Driven Data Augmentation in Time Series
    (2024) Solís Martín, David; Galán Páez, Juan; Borrego Díaz, Joaquín; Universidad de Sevilla. Departamento de Ciencias de la Computación e Inteligencia Artificial
    It has been demonstrated that the amount of data is crucial in data-driven machine learning methods. Data is always valuable, but in some tasks, it is almost like gold. This occurs in engineering areas where data is scarce or very expensive to obtain, such as predictive maintenance, where faults are rare. In this context, a mechanism to generate synthetic data can be very useful. While in fields such as Computer Vision or Natural Language Processing synthetic data generation has been extensively explored with promising results, in other domains such as time series it has received less attention. Previous works have proposed techniques like geometric transformations, interpolation, and generative models like GANs and VAEs for time series data augmentation. However, the use of denoising models, particularly Diffusion Probabilistic Models (DPMs), remains largely unexplored in this context. This work specifically focuses on studying and analyzing the use of different techniques for data augmentation in time series for classification and regression problems. The proposed D3A-TS methodology involves the use of DPMs, which have recently achieved successful results in the field of Image Processing, for data augmentation in time series. Additionally, the use of meta- attributes to condition the data augmentation process is investigated. The results highlight the high utility of this methodology in creating synthetic data to train classification and regression models. To assess the results, six different datasets from diverse domains were employed, showcasing versatility in terms of input size and output types. Finally, an extensive ablation study is conducted to further support the obtained outcomes.
  • Acceso AbiertoArtículo
    A Logical–Algebraic Approach to Revising Formal Ontologies: Application in Mereotopology
    (MDPI, 2024-04-29) Aranda-Corral, Gonzalo A.; Borrego Díaz, Joaquín; Chávez González, Antonia María; Gulayeva, Nataliya M.; Universidad de Sevilla. Departamento de Ciencias de la Computación e Inteligencia Artificial
    In ontology engineering, reusing (or extending) ontologies poses a significant challenge, requiring revising their ontological commitments and ensuring accurate representation and coherent reasoning. This study aims to address two main objectives. Firstly, it seeks to develop a methodological approach supporting ontology extension practices. Secondly, it aims to demonstrate its feasibility by applying the approach to the case of extending qualitative spatial reasoning (QSR) theories. Key questions involve effectively interpreting spatial extensions while maintaining consistency. The framework systematically analyzes extensions of formal ontologies, providing a reconstruction of a qualitative calculus. Reconstructed qualitative calculus demonstrates improved interpretative capabilities and reasoning accuracy. The research underscores the importance of methodological approaches when extending formal ontologies, with spatial interpretation serving as a valuable case study.
  • Acceso AbiertoArtículo
    PHMD: An easy data access tool for prognosis and health management datasets
    (Elsevier, 2025) Solís Martín, David; Galán Páez, Juan; Borrego Díaz, Joaquín; Universidad de Sevilla. Departamento de Ciencias de la Computación e Inteligencia Artificial; MICIU/AEI/10.13039/501100011033 (Agencia Estatal de Investigación)
    This work introduces a comprehensive open-source Python library designed for seamless access and handling of Prognostics and Health Management (PHM) datasets. The library currently supports 59 datasets from diverse domains, and has been developed to simplify, datasets search, retrieval, load, and preprocessing while standardizing data formats for easy integration in machine learning workflows. With built-in metadata handling and task-specific experiment settings for diagnosis, prognosis, and detection, users can efficiently prepare and analyze data without needing to manage raw file formats or directories. Available through GitHub and PyPI, the library provides a robust foundation for PHM research and application, offering useful resources to boost the projects of practitioners and researchers alike.
  • Acceso AbiertoArtículo
    CONELPABO: composite networks learning via parallel Bayesian optimization to predict remaining useful life in predictive maintenance
    (Springer Nature, 2025) Solís Martín, David; Galán Páez, Juan; Borrego Díaz, Joaquín; Universidad de Sevilla. Departamento de Ciencias de la Computación e Inteligencia Artificial; Universidad de Sevilla/ CBUA
    Maintaining equipment and machinery in industries is imperative for maximizing operational efficiency and prolonging their lifespan. The adoption of predictive maintenance enhances resource allocation, productivity, and product quality by proactively identifying and addressing potential equipment anomalies through rigorous data analysis before they escalate into critical issues. Consequently, these measures strengthen market competitiveness and generate favorable economic outcomes. In many applications, sensors operate at high frequencies or capture data over extended periods. This work introduces CONELPABO (Composite Networks Learning via Parallel Bayesian Optimization), a framework for analyzing long time series data, particularly for predicting the remaining useful life of a system or component. It uses a divide-andconquer strategy to manage the exponential growth in the hyperparameter search space during Bayesian Optimization and to accelerate model training by 50%. Additionally, this strategy enables the training of deeper networks with limited resources. The usefulness of the framework is demonstrated through two case studies, in which it achieves state-of-the-art results, showing that CNN-CNN and RNN-RNN architectures are highly effective for long time-series data. These architectures outperform many existing approaches and challenge the common academic focus on CNN-RNN hybrids.
  • Acceso AbiertoArtículo
    A Model for Learning-Curve Estimation in Efficient Neural Architecture Search and Its Application in Predictive Health Maintenance
    (MDPI, 2025-02-07) Solís Martín, David; Galán Páez, Juan; Borrego Díaz, Joaquín; Universidad de Sevilla. Departamento de Ciencias de la Computación e Inteligencia Artificial; MICIU/AEI/10.13039/501100011033 (Agencia Estatal de Investigación)
    A persistent challenge in machine learning is the computational inefficiency of neural architecture search (NAS), particularly in resource-constrained domains like predictive maintenance. This work introduces a novel learning-curve estimation framework that reduces NAS computational costs by over 50% while maintaining model performance, addressing a critical bottleneck in automated machine learning design. By developing a data-driven estimator trained on 62 different predictive maintenance datasets, we demonstrate a generalized approach to early-stopping trials during neural network optimization. Our methodology not only reduces computational resources but also provides a transferable technique for efficient neural network architecture exploration across complex industrial monitoring tasks. The proposed approach achieves a remarkable balance between computational efficiency and model performance, with only a 2% performance degradation, showcasing a significant advancement in automated neural architecture optimization strategies.
  • Acceso AbiertoArtículo
    SpaceRL — A reinforcement learning-based knowledge graph driver
    (Elsevier, 2025) Bermudo Bayo, Miguel; Ayala Hernández, Daniel; Hernández Salmerón, Inmaculada Concepción; Ruiz Cortés, David; Toro Bonilla, Miguel; Universidad de Sevilla. Departamento de Ciencias de la Computación e Inteligencia Artificial; Universidad de Sevilla. Departamento de Lenguajes y Sistemas Informáticos; Spanish Ministry of Science, Innovation and Universities
    Knowledge Graphs are powerful data structures used by large IT companies and the scientific community alike. They aid in the representation of related information by means of nodes connected through links indicating types of relations. These graphs are used as the basis for several smart applications, such as question answering or product recommendation. However, they are built in an automated unsupervised way, which leads to gaps in information, usually in the form of missing links between related entities in the original data source, which have to be added later by completion techniques. SpaceRL is an end-to-end Python framework designed for the generation of reinforcement learning (RL) agents, which can be used to complete knowledge graphs through link discovery. The purpose of the generated agents is to help identify missing links in a knowledge graph by finding paths that implicitly connect two nodes, incidentally providing a reasoned explanation for the inferred new link. The generation of such agents is a complex task, even more so for a non-expert user. SpaceRL is meant to overcome these limitations by providing a flexible set of tools designed with a wide variety of customization options, in order to adapt to different users’ needs, while also including a variety of state-of-the-art RL algorithms and several embedding models that can be combined to optimize the agents performance. Furthermore, SpaceRL offers different interfaces to make it available either locally (programmatically or via a GUI), or through an OpenAPI-compliant REST API.
  • Acceso AbiertoArtículo
    PharaohFUN: PHylogenomic Analysis foR plAnt prOtein History and FUNction elucidation
    (2023) Ramos González, Marcos; Ramos González, Víctor; Arvanitidou, Christina; Hernández García, Jorge; García González, Mercedes; Romero Campero, Francisco José; Universidad de Sevilla. Departamento de Ciencias de la Computación e Inteligencia Artificial; Universidad de Sevilla. Departamento de Bioquímica Vegetal y Biología Molecular
    Motivation Since DNA sequencing has turned commonplace, the development of efficient methods and tools to explore gene sequences has become indispensable. In particular, despite photosynthetic eukaryotes constituting the largest percentage of terrestrial biomass, computational functional characterization of gene sequences in these organisms still predominantly relies on comparisons with Arabidopsis thaliana and other angiosperms. This paper introduces PharaohFUN, a web application designed for the evolutionary and functional analysis of protein sequences in photosynthetic eukaryotes, leveraging orthology relationships between them. Results PharaohFUN incorporates a homogeneous representative sampling of key species in this group, bridging clades that have traditionally been studied separately, thus establishing a comprehensive evolutionary framework to draw conclusions about sequence evolution and function. For this purpose, it incorporates modules for exploring gene tree evolutionary history, domain identification, multiple sequence alignments, and functional annotation. The study of the CCA1 protein exemplifies how PharaohFUN unifies results for both land plants and chlorophyte microalgae, accurately tracing the evolutionary history of this protein.
  • Acceso AbiertoArtículo
    Multiomics responses to seasonal variations in diel cycles in the marine phytoplanktonic picoeukaryoteOstreococcus tauri
    (2023-08) Romero Losada, Ana Belén; Arvanitidou, Christina; García Gómez, María Elena; Morales Pineda, María; Castro Pérez, M. José; García González, Mercedes; Romero Campero, Francisco José; Universidad de Sevilla. Departamento de Ciencias de la Computación e Inteligencia Artificial; Universidad de Sevilla. Departamento de Bioquímica Vegetal y Biología Molecular
    Earth tilted rotation and translation around the Sun produce one of the most pervasive periodic environmental signals on our planet giving rise to seasonal variations in diel cycles. Although marine phytoplankton plays a key role on ecosystems and present promising biotechnological applications, multiomics integrative analysis of their response to these rhythms remains largely unexplored. We have chosen the marine picoeukaryote Ostreococcus tauri as model organism grown under summer long days, winter short days, constant light and constant dark conditions to characterize these responses in marine phytoplankton. Although 80% of the transcriptome present diel rhythmicity under both seasonal conditions less than 5% maintained oscillations under all constant conditions. A drastic reduction in protein abundance rhythmicity was observed with 55% of the proteome oscillating. Seasonally specific rhythms were found in key physiological processes such as cell cycle progression, photosynthetic efficiency, carotenoid content, starch accumulation and nitrogen assimilation. A global orchestration between transcriptome, proteome and physiological dynamics was observed with specific seasonal temporal offsets between transcript, protein and physiological peaks.
  • Acceso AbiertoArtículo
    Multiomics integration unveils photoperiodic plasticity in the molecular rhythms of marine phytoplankton
    (Oxford University Press, 2025-02-11) Romero Losada, Ana Belén; Arvanitidou, Christina; García-Gómez, M.E.; Morales-Pineda, M.; Castro-Pérez, M.J.; Chew, Y.P.; van Ooijen, G.; García González, Mercedes; Romero Campero, Francisco José; Universidad de Sevilla. Departamento de Ciencias de la Computación e Inteligencia Artificial; Universidad de Sevilla. Departamento de Bioquímica Vegetal y Biología Molecular; Ministerio de Ciencia e Innovación (MICIN). España
    Earth’s tilted rotation and translation around the Sun produce pervasive rhythms on our planet, giving rise to photoperiodic changes in diel cycles. Although marine phytoplankton plays a key role in ecosystems, multiomics analysis of its responses to these periodic environmental signals remains largely unexplored. The marine picoalga Ostreococcus tauri was chosen as a model organism due to its cellular and genomic simplicity. Ostreococcus was subjected to different light regimes to investigate its responses to periodic environmental signals: long summer days, short winter days, constant light, and constant dark conditions. Although <5% of the transcriptome maintained oscillations under both constant conditions, 80% presented diel rhythmicity. A drastic reduction in diel rhythmicity was observed at the proteome level, with 39% of the detected proteins oscillating. Photoperiod-specific rhythms were identified for key physiological processes such as the cell cycle, photosynthesis, carotenoid biosynthesis, starch accumulation, and nitrate assimilation. In this study, a photoperiodic plastic global orchestration among transcriptome, proteome, and physiological dynamics was characterized to identify photoperiod-specific temporal offsets between the timing of transcripts, proteins, and physiological responses.
  • Acceso AbiertoArtículo
    Complexity Assessment in Projects Using Small-World Networks for Risk Factor Reduction
    (MDPI, 2024-12-21) Álvarez Espada, Juan Manuel; Fuentes-Bargues, José Luis; Sánchez-Lite, Alberto; González-Gaya, Cristina; Universidad de Sevilla. Departamento de Ciencias de la Computación e Inteligencia Artificial
    Despite following standard practices of well-known project management methodologies, some projects fail to achieve expected results, incurring unexplained cost overruns or delays. These problems occur regardless of the type of project, the environment, or the project manager’s experience and are characteristic of complex projects. Such projects require special control using a multidimensional network approach that includes contractual aspects, supply and resource considerations, and information exchange between stakeholders. By modelling project elements as nodes and their interrelations as links within a network, we can analyze how components evolve and influence each other, a phenomenon known as coevolution. This network analysis allows us to observe not only the evolution of individual nodes but also the impact of their interrelations on the overall dynamics of the project. Two metrics are proposed to address the inherent complexity of these projects: one to assess Structural Complexity (SC) and the other to measure Dynamic Complexity (DC). These metrics are based on Boonstra and Reezigt’s studies on the dimensions and domains of complex projects. These two metrics have been combined to create a Global Complexity Index (GCI) for measuring project complexity under uncertainty using fuzzy logic. These concepts are applied to a case of study, the construction of a wastewater treatment plant, a complex project due to the intense interrelations, the integration of new technologies that require R&D, and its location next to a natural park. The application of the GCI allows constant monitoring of dynamic complexity, thus providing a tool for risk anticipation and decision support. Also, the integration of fuzzy logic in the model facilitates the incorporation of imprecise or partially defined information. It makes it possible to deal efficiently with the dynamic variation of complexity parameters in the project, adapting to the inherent uncertainties of the environment.
  • Acceso AbiertoArtículo
    Approach and Success in the Management of Peacekeeping Operations (PKOs): Application to Two Case Studies, the UNMISS and MONUSCO Missions of the UN
    (MDPI, 2022-05-17) Álvarez Espada, Juan Manuel; Fuentes-Bargues, J.L.; González-Gaya, C.; Universidad de Sevilla. Departamento de Ciencias de la Computación e Inteligencia Artificial
    A Peacekeeping Operation (PKO) of the United Nations (UN) is a complex project whose objective is determined by the mandate, and which seeks to eliminate violence, achieve peace, and consolidate the future of society in conflict zones. For a PKO is important to assess the success or failure of the mission because might have implications for the outcomes of future missions. In this paper, it is proposed a methodology that combines two available tools, on the one hand the tool of PMI to determine the most appropriate approach to manage a PKO and on the other hand the NUPI tool, to measure the success of a PKO. The methodology is applied to two studies cases of fourth generation PKOs, the UNMISS PKO in South Sudan and the MONUSCO mission in the Democratic Republic of Congo. From the results obtained an adaptive approach enjoys a greater guarantee of success than does a predefined approach.
  • Acceso AbiertoArtículo
    Generador de Grafos Multi-relacionales a partir de redes sociales
    (2014) Almagro Blanco, Pedro; Ordoñez Salinas, Sonia; Universidad de Sevilla. Departamento de Ciencias de la Computación e Inteligencia Artificial
    La herramienta presentada en este artículo, CorpuRed, permite obtener datos de plataformas sociales en línea para ser utilizados en proyectos de investigación que requieran información sobre el comportamiento social en internet. La forma de obtener dichos datos depende ligeramente de cada plataforma (se muestra el caso particular de Facebook), y posteriormente son almacenados en una base de datos en grafo que será accesible a través de una API bajo una licencia académica.
  • Acceso AbiertoArtículo
    Small worlds and clustering in spatial networks
    (AMER PHYSICAL SOC, 2020-04-14) Boguñá, M.; Krioukov, D.; Almagro Blanco, Pedro; Serrano, M. Á.; Universidad de Sevilla. Departamento de Ciencias de la Computación e Inteligencia Artificial
    Networks with underlying metric spaces attract increasing research attention in network science, statistical physics, applied mathematics, computer science, sociology, and other fields. This attention is further amplified by the current surge of activity in graph embedding. In the vast realm of spatial network models, only a few reproduce even the most basic properties of real-world networks. Here, we focus on three such properties—sparsity, small worldness, and clustering—and identify the general subclass of spatial homogeneous and heterogeneous network models that are sparse small worlds and that have nonzero clustering in the thermodynamic limit. We rely on the maximum entropy approach in which network links correspond to noninteracting fermions whose energy depends on spatial distances between nodes.
  • Acceso AbiertoArtículo
    Characterizing the Temperature of SAT Formulas
    (Springer Nature, 2022-08-24) Almagro Blanco, Pedro; Giraldez Cru, Jesus; Universidad de Sevilla. Departamento de Ciencias de la Computación e Inteligencia Artificial
    The remarkable advances in SAT solving achieved in the last years have allowed to use this technology to solve many realworld applications, such as planning, formal verification and cryptography, among others. Interestingly, these industrial SAT problems are commonly believed to be easier than classical random SAT formulas, but estimating their actual hardness is still a very challenging question, which in some cases even requires to solve them. In this context, realistic pseudo-industrial random SAT generators have emerged with the aim of reproducing the main features of these application problems to better understand the success of those SAT solving techniques on them. In this work, we present a model to estimate the temperature of real-world SAT instances. This temperature represents the degree of distortion into the expected structure of the formula, from highly structured benchmarks (more similar to real-world SAT instances) to the complete absence of structure (observed in the classical random SAT model). Our solution is based on the popularity–similarity random model for SAT, which has been recently presented to reproduce two crucial features of application SAT benchmarks: scale-free and community structures. This model is able to control the hardness of the generated formula by introducing some randomizations in the expected structure. Using our regression model, we observe that the estimated temperature of the applications benchmarks used in the last SAT Competitions correlates to their hardness in most of the cases.
  • Acceso AbiertoArtículo
    Logical–Mathematical Foundations of a Graph Query Framework for Relational Learning
    (MDPI, 2023-11-16) Almagro Blanco, Pedro; Sancho Caparrini, Fernando; Borrego Díaz, Joaquín; Universidad de Sevilla. Departamento de Ciencias de la Computación e Inteligencia Artificial; MCIN/AEI/10.13039/501100011033
    Relational learning has attracted much attention from the machine learning community in recent years, and many real-world applications have been successfully formulated as relational learning problems. In recent years, several relational learning algorithms have been introduced that follow a pattern-based approach. However, this type of learning model suffers from two fundamental problems: the computational complexity arising from relational queries and the lack of a robust and general framework to serve as the basis for relational learning methods. In this paper, we propose an efficient graph query framework that allows for cyclic queries in polynomial time and is ready to be used in pattern-based learning methods. This solution uses logical predicates instead of graph isomorphisms for query evaluation, reducing complexity and allowing for query refinement through atomic operations. The main differences between our method and other previous pattern-based graph query approaches are the ability to evaluate arbitrary subgraphs instead of nodes or complete graphs, the fact that it is based on mathematical formalization that allows the study of refinements and their complementarity, and the ability to detect cyclic patterns in polynomial time. Application examples show that the proposed framework allows learning relational classifiers to be efficient in generating data with high expressiveness capacities. Specifically, relational decision trees are learned from sets of tagged subnetworks that provide both classifiers and characteristic patterns for the identified classes.
  • Acceso AbiertoArtículo
    Handling Non-determinism in Spiking Neural P Systems: Algorithms and Simulations
    (IO Press, 2019) Carandang, Jym Paul; Cabarle, Francis George C.; Adorna, Henry Natividad; Hernandez, Nestine Hope S.; Martínez del Amor, Miguel Ángel; Universidad de Sevilla. Departamento de Ciencias de la Computación e Inteligencia Artificial; Universidad de Sevilla. Departamento de Lenguajes y Sistemas Informáticos
    Spiking Neural P system is a computing model inspired on how the neurons in a living being are interconnected and exchange information. As a model in embrane computing, it is a non-deterministic and massively-parallel system. The latter makes GPU a good candidate for ac celerating the simulation of these models. A matrix representation for systems with and without delay have been previously designed, and algorithms for simulating them with deterministic sys tems was also developed. So far, non-determinism has been problematic for the design of parallel simulators. In this work, an algorithm for simulating non-deterministic spiking neural P system with delays is presented. In order to study how the simulations get accelerated on a GPU, this algorithm was implemented in CUDA and used to simulate non-uniform and uniform solutions to the Subset Sum problem as a case study. The analysis is completed with a comparison of time and space resources in the GPU of such simulations.
  • Acceso AbiertoArtículo
    Análisis de la tasa de abandono en un Centro con varios Grados en Ingeniería Informática
    (2017) Ruiz Cortés, David; Gómez Rodríguez, Francisco de Asís; Ruiz Reina, José Luis; Universidad de Sevilla. Departamento de Lenguajes y Sistemas Informáticos; Universidad de Sevilla. Departamento de Arquitectura y Tecnología de Computadores; Universidad de Sevilla. Departamento de Ciencias de la Computación e Inteligencia Artificial
    En este trabajo se muestra el análisis realizado del impacto que sobre la tasa de abandono tiene el cambio de estudios entre los tres Grados en Ingeniería Informática que se imparten en un Centro concreto. Dicho análisis ha sido llevado a cabo por el Equipo de Dirección del Centro a instancia de los informes realizados tras las visitas para la renovación de la acreditación de dichos títulos. Las principales conclusiones a las que hemos llegado son: i) el cambio de estudios entre Grados en Informática siempre tiene un efecto negativo sobre la tasa de abandono, oscilando este entre el 3% y el 20 %; ii) dicho cambio de estudios puede responder a cuestiones académicas en algunos casos, pero también se apuntan cuestiones económicas por el ahorro que puede llegar a suponer; iii) aproximadamente un tercio de nuestros estudiantes abandona los estudios en Ingeniería Informática; iv) la tasa de abandono a lo largo de los últimos 5 años se ha mantenido acorde con lo establecido en las memorias de verificación y conforme a la media nacional en la rama de conocimiento de Ingeniería y Arquitectura; v) los sistemas de indicadores definidos por los distintos sistemas de garantía de calidad de los Títulos en ocasiones no son homogéneos, lo que dificulta realizar cualquier tipo de análisis.
  • Acceso AbiertoArtículo
    Review of ensembles of multi-label classifiers: Models, experimental study and prospects
    (Elsevier, 2018-11) Moyano Murillo, José María; Gibaja, E.L.; Cios, K.J.; Ventura, S.; Universidad de Sevilla. Departamento de Ciencias de la Computación e Inteligencia Artificial; Ministerio de Economía y Competitividad. España; Ministerio de Educación. España
    The great attention given by the scientific community to multi-label learning in recent years has led to the development of a large number of methods, many of them based on ensembles. A comparison of the state-of-theart in ensembles of multi-label classifiers over a wide set of 20 datasets have been carried out in this paper, evaluating their performance based on the characteristics of the datasets such as imbalance, dependence among labels and dimensionality. In each case, suggestions are given to choose the algorithm that fits best. Further, given the absence of taxonomies of ensembles of multi-label classifiers, a novel taxonomy for these methods is proposed.
  • Acceso AbiertoArtículo
    MLDA: A tool for analyzing multi-label datasets
    (Elsevier, 2017) Moyano Murillo, José María; Universidad de Sevilla. Departamento de Ciencias de la Computación e Inteligencia Artificial
    The objective of this paper is to present MLDA, a tool for the exploration and analysis of multi-labeldatasets with both simple and multiple views. MLDA comprises a GUI and a Java API, providing the userwith a wide set of charts, metrics, methods for transforming and preprocessing data, as well as compari- son of several datasets. The paper introduces the main features of the framework, and introduces its usetoward some illustrative examples.
  • Acceso AbiertoArtículo
    KEEL 3.0: An Open Source Software for Multi-Stage Analysis in Data Mining
    (Atlantis Press, 2017) Triguero, Isabel; González, Sergio; Moyano Murillo, José María; García, Salvador; Alcalá Fernández, Jesús; Luengo, Julián; Fernández, Alberto; Jesús, María José del; Sánchez, Luciano; Herrera, Francisco; Universidad de Sevilla. Departamento de Ciencias de la Computación e Inteligencia Artificial; Ministerio de Educación y Ciencia (MEC). España; Ministerio de Educación. España
    This paper introduces the 3rd major release of the KEEL Software. KEEL is an open source Java framework (GPLv3 license) that provides a number of modules to perform a wide variety of data mining tasks. It includes tools to perform data management, design of multiple kind of experiments, statistical analyses, etc. This framework also contains KEEL-dataset, a data repository for multiple learning tasks featuring data partitions and algorithms’ results over these problems. In this work, we describe the most recent components added to KEEL 3.0, including new modules for semi-supervised learning, multi-instance learning, imbalanced classification and subgroup discovery. In addition, a new interface in R has been incorporated to execute algorithms included in KEEL. These new features greatly improve the versatility of KEEL to deal with more modern data mining problems.