Biclustering in bioinformatics using big data and High Performance Computing applications: challenges and perspectives, a review

2025-07-152025-07-152025-070920-85421573-0484https://hdl.handle.net/11441/175311Biclustering is a powerful machine learning technique that simultaneously groups rows and columns in matrix-based datasets. Applied to gene expression data in bio informatics, its use has expanded alongside the rapid growth of high-throughput sequencing technologies, leading to massive and complex biological datasets. This review aims to examine how biclustering methods and their validation strategies are evolving to meet the demands of High Performance Computing (HPC) and Big Data environments. We present a structured classification of existing approaches based on the computational paradigms they employ, including MPI/OpenMP, Apache Hadoop/Spark, and GPU/CUDA. By synthesising these developments, we highlight current trends and outline key research challenges. The knowledge gathered in this work may support researchers in adapting and scaling biclustering algorithms to analyse large-scale biomedical data more efficiently. Our contribution is intended to bridge the gap between algorithmic innovation and computational scalability in the context of bioinformatics and data-intensive applications.application/pdf52 p.engAttribution 4.0 Internationalhttp://creativecommons.org/licenses/by/4.0/BiclusteringBig dataHigh Performance ComputingBioinformaticsBiclustering in bioinformatics using big data and High Performance Computing applications: challenges and perspectives, a reviewinfo:eu-repo/semantics/articleinfo:eu-repo/semantics/openAccesshttps://doi.org/10.1007/s11227-025-07563-6