BELIEF: A distance-based redundancy-proof feature selection method for Big DataShow others and affiliations
2021 (English)In: Information Sciences, ISSN 0020-0255, E-ISSN 1872-6291, Vol. 558, p. 124-139Article in journal (Refereed) Published
Abstract [en]
With the advent of Big Data era, data reduction methods are in highly demand given their ability to simplify huge data, and ease complex learning processes. Concretely, algorithms able to select relevant dimensions from a set of millions are of huge importance. Although effective, these techniques also suffer from the “scalability” curse when they are brought into tackle large-scale problems. In this paper, we propose a distributed feature weighting algorithm which precisely estimates feature importance in large datasets using the well-know algorithm RELIEF in small problems. Our solution, called BELIEF, incorporates a novel redundancy elimination measure that generates similar schemes to those based on entropy, but at a much lower time cost. Furthermore, BELIEF provides a smooth scale-up when more instances are required to increase precision in estimations. Empirical tests performed on our method illustrate the estimation ability of BELIEF in manifold huge sets – both in number of features and instances, as well as its reduced runtime cost as compared to other state-of-the-art methods.
Place, publisher, year, edition, pages
Elsevier Inc. , 2021. Vol. 558, p. 124-139
Keywords [en]
Apache spark, Big Data, Feature selection (FS), High-dimensional, Redundancy elimination
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:mdh:diva-53524DOI: 10.1016/j.ins.2020.12.082ISI: 000634824100008Scopus ID: 2-s2.0-85100519874OAI: oai:DiVA.org:mdh-53524DiVA, id: diva2:1541622
2021-04-012021-04-012022-08-29Bibliographically approved