Feature Engineering for Data-Based Predictive Maintenance

Carrasco Muñoz, AlejandroLuque Sendra, AmaliaCampos Olivares, Daniel2025-07-282025-07-282025-05-28Campos Olivares, D. (2025). Feature Engineering for Data-Based Predictive Maintenance. (Tesis Doctoral Inédita). Universidad de Sevilla, Sevilla.https://hdl.handle.net/11441/175665This thesis explores a systematic methodology for enhancing Predictive Maintenance (PdM) through careful feature engineering and selection, focusing on a real-world bottling system where sensor data from key components (notably bearings and springs) serve as the foundation for fault detection and classification. The work is motivated by the growing industry need to move beyond reactive or condition-based approaches, toward proactive maintenance regimes that rely on detailed data acquisition and machine learning (ML) algorithms to detect and predict machinery failures in advance. Early chapters introduce the challenges of PdM: acquiring reliable, high-frequency sensor data, ensuring data quality, and dealing with class imbalances—particularly because real-life datasets often contain many more “healthy” recordings than fault-state samples. The author highlights the bottling system’s mechanical structure, its sensors, and the manual fault injection experiments, which yield vibration signals at various operating speeds. By segmenting these signals into windows, it becomes possible to increase the overall sample count for each health state, though careful selection of window length is needed to strike a balance between enough data per sample and over-segmentation. Subsequent chapters delve into feature engineering. Three main approaches emerge: (1) specialized attributes, such as kurtosis and RMS, which target known vibration signatures of mechanical defects; (2) a broader set of “basic” time- and frequency-domain descriptors (for instance, mean amplitude, signal energy, or spectral entropy); and (3) dimension-reduced projections, typically via Principal Component Analysis (PCA), to handle high-dimensional feature vectors and maintain interpretability. Comparisons across approaches illustrate how computational costs and classification accuracy intersect: specialized features can yield rapid, high-accuracy results if the domain knowledge is sufficient, while large sets of basic features or direct raw data might require more computational power and risk overfitting. The author then investigates various selection techniques—wrappers, embedded approaches (such as Lasso or impurity-based measures in Random Forest), and filtering methods based on correlation, mutual information, ANOVA, or divergence metrics. These selection methods are compared systematically on both real and synthetic datasets, revealing how each can prune redundant or non-informative features while retaining enough discriminative power to distinguish between normal and faulty states. Finally, the conclusions emphasize the delicate balance between accuracy, processing time, and domain knowledge in designing feature-engineering pipelines for PdM. The work demonstrates that combining specialized domain insights with systematic feature-selection algorithms can improve classification outcomes, reduce overfitting, and expedite model training. This offers a robust roadmap for industrial environments aiming to implement or refine their predictive maintenance strategies, showing how machine learning and targeted data preprocessing deliver tangible gains in reliability and operational efficiency.application/pdf166 p.engAttribution-NonCommercial 4.0 Internationalhttp://creativecommons.org/licenses/by-nc/4.0/Feature Engineering for Data-Based Predictive Maintenanceinfo:eu-repo/semantics/doctoralThesisinfo:eu-repo/semantics/openAccess