The Potential of Machine Learning Techniques in Predicting Malaria Outbreaks in Nigeria
Main Article Content
Abstract
Malaria, an infectious disease transmitted by mosquitoes and caused by protists of the Plasmodium genus, poses a significant global health threat, contributing substantially to morbidity and mortality rates. World Health Organization (WHO) estimated approximately 229 million cases worldwide, with children under five years old comprising 67% (274,000) of those affected, representing the most vulnerable demographic group. Despite the prevalence of malaria, existing research has not extensively explored the utilization of machine learning techniques to predict malaria outbreaks, in Nigeria. This study aims to fill this gap by employing five supervised machine learning methods: Naive Bayes, Support Vector Machines (SVM), Linear Regression, Logistic Regression, and K-Nearest Neighbor. Utilizing meteorological and malaria incidence data spanning from 2010 to 2020, the research employed the Scikit-learn library within the Anaconda IDE, utilizing the Python programming language. Results indicate that Naive Bayes achieved the highest accuracy, with an average accuracy of 79.1% for both testing and training datasets, making it the optimal model for predicting malaria incidence outbreaks based on the dataset utilized. Following closely is Support Vector Machine (SVM) with an average accuracy of 75.45% for both testing and training data, followed by K-Nearest Neighbor with an average accuracy of 70.8%. Logistic Regression exhibited an average accuracy of 68%. However, Linear Regression, with an average accuracy of 26.05%, is not recommended for predicting malaria incidence outbreaks based on the findings of this research.