Comparative analysis between machine learning algorithms in efficiency of Coronary Heart Disease (CHD) prediction

Citation

Abstract

The world of Machine Learning is expanding everyday through its implementations in modern day healthcare. Researchers have sketched out many ways to implement Machine Learning algorithms and droned into ways to make them work in their utmost efficiencies. As there will always be the need for healthcare in the world, we believe that there will always be a need of comparison between Machine Learning algorithms in terms of their performance and relevance to make healthcare more reliable through Machine Learning. For this study, we have picked up the most commonly used Machine Learning algorithms, Logistic Regression, Support Vector Machine, Decision Tree and Random Forest to produce a comparative analysis on a dataset of Framingham Heart Study which is dedicated to the prediction of risk of Coronary Heart Disease (CHD). We have used a combination of Data Preprocessing and Feature Selection methods, namely The Row Elimination method and Recursive Feature Elimination respectively. To understand the impact of each prevailing features in the dataset on the target feature, we have applied the Chi Squared Technique which is a highly recommended technique when it comes to classification problems. To compare and analyze performance of the algorithms, we applied concepts of the Confusion Matrix, Precision, Recall and F1 Scores; we have plotted ROC curves using Sensitivity and Specificity scores to categorize the algorithms’ behavior. We have found out that the highest average accuracy in our study was given by the Logistic Regression algorithm (83.9%) while the other algorithms have come fairly close.

Description

Cataloged from PDF version of thesis.
Includes bibliographical references (pages 33-34).
This thesis is submitted in partial fulfillment of the requirements for the degree of Bachelor of Science in Computer Science, 2018.

Publisher Link

Type

Thesis