1 Department of Business Analytics, International American University, Los Angeles, CA 90010, USA.
2 Department of Business Administration, International American University, Los Angeles, CA 90010, USA.
3 Department of Engineering Management, Westcliff University, Irvine, CA 92614, USA.
4 Department of Engineering Project Management, Westcliff University, Irvine, CA 92614, USA.
5 Department of Computer Science, Westcliff University, Irvine, CA 92614, USA.
World Journal of Advanced Engineering Technology and Sciences, 2025, 17(01), 200–217
Article DOI: 10.30574/wjaets.2025.17.1.1390
Received on 30 August 2025; revised on 04 October 2025; accepted on 07 October 2025
Lung cancer means the birth of malignant cell inside the body which is out of control. A rising number of death rates in both genders has prompted researchers in the medical field to figure out ways on how it can be detected at an early stage for the purpose of mitigation, which also increases the patient’s survival. Although there have been number of researches done using machine learning specially ensemble models, there still remains a gap on the research which is to have a comparative analysis done between the ensemble models such as Hybrid Majority Voting and Ensemble Stacking on tabular data. The objective of this study is to apply machine learning specially Ensemble models and compare their results on different datasets to identify the general pattern of the algorithms in this field and figure out if any particular method of ensemble performs better than the other in predicting lung cancer. Two Datasets were collected from public online sources and analysed to make sure it follows the distribution properly and there are no outliers. A pool of 9 Machine Learning algorithms with 50 hyper-parameter settings were studied to pick the best 3 Machine Learning models. After that a number of ensemble techniques were applied such as Majority Hard Voting, Weighted Hard Voting, Soft Voting, and Ensemble Stacking and their performance were analysed. Different Evaluation metrics such as Accuracy, F1-Score, ROC-AUC Score, Average Precision and Confusion Matrices were applied which highlighted the superior performance of the Ensemble Models. Particularly, Weighted Ensemble Learning Model for Dataset 1 achieved 89.04% Accuracy and F1-Score and Ensemble Stacking for Dataset 2 achieved 87.95% Accuracy and F1-Score, which indicates the superior effectiveness and generalizability of the ensemble models.
Lung Cancer; Tabular Data; Machine Learning; Classification; Confusion Matrix; Heat Map
Preview Article PDF
Shakil Khan, Mehedi Hasan, Mohammed Imam Hossain Tarek, Mostafizur Rahman Shakil, Md Fakrul islam Polash and Istiak Kabir. Generalizable Ensemble Learning Models for Early Lung Cancer Detection. World Journal of Advanced Engineering Technology and Sciences, 2025, 17(01), 200-217.Article DOI: https://doi.org/10.30574/wjaets.2025.17.1.1390.