COVID-19 Detection Using Machine Learning: A Dataset-Centric Review
Abstract
The COVID-19 pandemic has posed unprecedented challenges to global healthcare systems, necessitating rapid and accurate diagnostic strategies. Machine learning and deep learning approaches have emerged as powerful tools for analyzing diverse datasets, including clinical records, imaging data, audio signals, and multimodal sources, to detect and predict COVID-19 infection and severity. This review systematically examines studies utilizing these datasets, highlighting the predictive performance of various models, including convolutional neural networks, support vector machines, recurrent neural networks, and ensemble methods. Clinical datasets provide critical insights for risk stratification and mortality prediction, imaging datasets enable precise assessment of pulmonary involvement, and audio datasets offer non-invasive, rapid screening opportunities. Multimodal approaches integrating these diverse sources demonstrate superior predictive accuracy and robustness. Despite significant advancements, challenges such as dataset heterogeneity, class imbalance, limited sample sizes, and model interpretability persist. Future research directions include attention-based fusion, self-supervised and contrastive learning, and federated learning to improve generalization and facilitate real-world deployment. This review emphasizes the importance of dataset-driven machine learning strategies in advancing COVID-19 diagnostics and provides a framework for future AI-based approaches to infectious disease detection.
Downloads
Published
Issue
Section
License
Copyright (c) 2026 Ravneet Kaur, Vipul Sharma

This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.