Comparison of the Performances of Machine Learning Algorithms Using WEKA Feature Selection Methods"
Abstract views: 20 / PDF downloads: 7
DOI:
https://doi.org/10.5281/zenodo.14568594Keywords:
Machine learning algorithms, software defect prediction, feature seletion, accuracy score, WEKAAbstract
In this study, the topic of feature selection, which is featured in many publications on software fault prediction in the literature, has been investigated. Feature selection is generally used to increase the accuracy of the classifier by reducing irrelevant and unnecessary features in datasets. In the study, various feature extraction methods were tested on NASA datasets and an experimental dataset, and the operations were performed using the two most suitable methods, Cfs Subset Eval algorithm and Principal Component feature selection methods. As a result, an attempt was made to determine which algorithms have higher success rates.
When the obtained results were examined, an improvement in accuracy rates was generally observed, while some algorithms showed only a minimal difference. When different feature extraction methods were tested on the JM1, KC1, CM1, and PC1 datasets, the 22 features present in all datasets were reduced to 8 features by selecting the most appropriate methods, namely the Cfs Subset Eval algorithm and Principal Component feature selection methods. Subsequently, the accuracy rates of 46 classification algorithms were calculated on the WEKA platform. The best changes in accuracy rates across all datasets were observed with the Bayes Net, Voted Perceptron, K*, and Random Forest algorithms.
It was observed that the loc, n, v, and defect features of the software metrics should definitely be included in all feature selection methods applied on the NASA datasets and experimental datasets. It is clear that the loc (lines of code), n (total number of distinct operators and distinct operands), v (program volume), and defect (whether there is a fault or not) features are quite important in the calculation of software metrics that constitute each dataset.
References
Çatal Ç., (2008). Yazılım Kusur Kestirimi Probleminde Yapay Bağışıklık Sistemlerinin Uygulanması, Doktora, Yıldız Teknik Üniversitesi Fen Bilimleri Enstitüsü.
Güven Aydın, Z. B. (2021). Makine Öğrenmesi Yöntemleri İle Yazılım Hata Tahmini, Doktora Tezi, İstanbul Üniversitesi-Cerrahpaşa, Lisansüstü Eğitim Fakültesi
Abe, S., Thawonmas, R. and Kobayashi, Y., (1998). Feature selection by analyzing class regions approximated by ellipsoids, IEEE Trans. On Systems, Man, and Cybernetics-Part C: Applications and Reviews, 28(2), 282 – 287.
Huang, D., Chow, T. W. S., (2005). Efficiently searching the important input variables using Bayesian discriminant. IEEE Trans. on Circuits and Systems-I: Regular Papers, 52(4), 785
Hall, Mark A., (1999). Correlation-based Feature Selection for Machine Learning, Doktora Tezi, University of Waikato, Department of Computer Science.
Gümüşçü, İ. B. Aydilek ve R. Taşaltın, “Mikro-dizilim Veri Sınıflandırmasında Öznitelik Seçme Algoritmalarının Karşılaştırılması,” Harran Üniversitesi Mühendislik Dergisi, 1(1), 1-7, 2016.
Downloads
Published
How to Cite
Issue
Section
License
Copyright (c) 2024 Euroasia Journal of Mathematics, Engineering, Natural & Medical Sciences
This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.