mdh.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Towards benchmarking feature subset selection methods for software fault prediction
Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems. Bahria University, Islamabad, Pakistan .ORCID iD: 0000-0003-0611-2655
Blekinge Institute of Technology, Karlskrona, Sweden; Chalmers University of Technology, Sweden.
2016 (English)In: Computational Intelligence and Quantitative Software Engineering / [ed] Witold Pedrycz, Giancarlo Succi and Alberto Sillitti, Springer-Verlag , 2016, 33-58 p.Chapter in book (Other academic)
Abstract [en]

Despite the general acceptance that software engineering datasets often contain noisy, irrele- vant or redundant variables, very few benchmark studies of feature subset selection (FSS) methods on real-life data from software projects have been conducted. This paper provides an empirical comparison of state-of-the-art FSS methods: information gain attribute ranking (IG); Relief (RLF); principal com- ponent analysis (PCA); correlation-based feature selection (CFS); consistency-based subset evaluation (CNS); wrapper subset evaluation (WRP); and an evolutionary computation method, genetic program- ming (GP), on five fault prediction datasets from the PROMISE data repository. For all the datasets, the area under the receiver operating characteristic curve—the AUC value averaged over 10-fold cross- validation runs—was calculated for each FSS method-dataset combination before and after FSS. Two diverse learning algorithms, C4.5 and na ??ve Bayes (NB) are used to test the attribute sets given by each FSS method. The results show that although there are no statistically significant differences between the AUC values for the different FSS methods for both C4.5 and NB, a smaller set of FSS methods (IG, RLF, GP) consistently select fewer attributes without degrading classification accuracy. We conclude that in general, FSS is beneficial as it helps improve classification accuracy of NB and C4.5. There is no single best FSS method for all datasets but IG, RLF and GP consistently select fewer attributes without degrading classification accuracy within statistically significant boundaries.

Place, publisher, year, edition, pages
Springer-Verlag , 2016. 33-58 p.
Series
Studies in Computational Intelligence, ISSN 1860-949X ; 617
Keyword [en]
software fault prediction, Feature subset selection, Empirical
National Category
Computer Systems Software Engineering
Identifiers
URN: urn:nbn:se:mdh:diva-28129DOI: 10.1007/978-3-319-25964-2_3ISI: 000372751300004Scopus ID: 2-s2.0-84955278082ISBN: 978-3-319-25962-8 (print)OAI: oai:DiVA.org:mdh-28129DiVA: diva2:818796
Available from: 2015-06-09 Created: 2015-06-08 Last updated: 2016-06-09Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full textScopus

Authority records BETA

Afzal, Wasif

Search in DiVA

By author/editor
Afzal, Wasif
By organisation
Embedded Systems
Computer SystemsSoftware Engineering

Search outside of DiVA

GoogleGoogle Scholar

doi
isbn
urn-nbn

Altmetric score

doi
isbn
urn-nbn
Total: 204 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf