COVID-19 Detection Using Machine Learning: A Dataset-Centric Review

Authors

  • Ravneet Kaur IK Gujral Punjab Technical University Kapurthala
  • Vipul Sharma IK Gujral Punjab Technical University Kapurthala

Abstract

The COVID-19 pandemic has posed unprecedented challenges to global healthcare systems, necessitating rapid and accurate diagnostic strategies. Machine learning and deep learning approaches have emerged as powerful tools for analyzing diverse datasets, including clinical records, imaging data, audio signals, and multimodal sources, to detect and predict COVID-19 infection and severity. This review systematically examines studies utilizing these datasets, highlighting the predictive performance of various models, including convolutional neural networks, support vector machines, recurrent neural networks, and ensemble methods. Clinical datasets provide critical insights for risk stratification and mortality prediction, imaging datasets enable precise assessment of pulmonary involvement, and audio datasets offer non-invasive, rapid screening opportunities. Multimodal approaches integrating these diverse sources demonstrate superior predictive accuracy and robustness. Despite significant advancements, challenges such as dataset heterogeneity, class imbalance, limited sample sizes, and model interpretability persist. Future research directions include attention-based fusion, self-supervised and contrastive learning, and federated learning to improve generalization and facilitate real-world deployment. This review emphasizes the importance of dataset-driven machine learning strategies in advancing COVID-19 diagnostics and provides a framework for future AI-based approaches to infectious disease detection.

Downloads

Published

2026-01-22