https://www.mdu.se/

mdu.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
BELIEF: A distance-based redundancy-proof feature selection method for Big Data
Department of Computer Science and Artificial Intelligence, CITIC-UGR, University of Granada, Granada, Spain.
Department of Computer Science and Artificial Intelligence, CITIC-UGR, University of Granada, Granada, Spain.
Department of Computer Science and Artificial Intelligence, CITIC-UGR, University of Granada, Granada, Spain.
Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.ORCID iD: 0000-0001-9857-4317
Show others and affiliations
2021 (English)In: Information Sciences, ISSN 0020-0255, E-ISSN 1872-6291, Vol. 558, p. 124-139Article in journal (Refereed) Published
Abstract [en]

With the advent of Big Data era, data reduction methods are in highly demand given their ability to simplify huge data, and ease complex learning processes. Concretely, algorithms able to select relevant dimensions from a set of millions are of huge importance. Although effective, these techniques also suffer from the “scalability” curse when they are brought into tackle large-scale problems. In this paper, we propose a distributed feature weighting algorithm which precisely estimates feature importance in large datasets using the well-know algorithm RELIEF in small problems. Our solution, called BELIEF, incorporates a novel redundancy elimination measure that generates similar schemes to those based on entropy, but at a much lower time cost. Furthermore, BELIEF provides a smooth scale-up when more instances are required to increase precision in estimations. Empirical tests performed on our method illustrate the estimation ability of BELIEF in manifold huge sets – both in number of features and instances, as well as its reduced runtime cost as compared to other state-of-the-art methods. 

Place, publisher, year, edition, pages
Elsevier Inc. , 2021. Vol. 558, p. 124-139
Keywords [en]
Apache spark, Big Data, Feature selection (FS), High-dimensional, Redundancy elimination
National Category
Computer Sciences
Identifiers
URN: urn:nbn:se:mdh:diva-53524DOI: 10.1016/j.ins.2020.12.082ISI: 000634824100008Scopus ID: 2-s2.0-85100519874OAI: oai:DiVA.org:mdh-53524DiVA, id: diva2:1541622
Available from: 2021-04-01 Created: 2021-04-01 Last updated: 2022-08-29Bibliographically approved

Open Access in DiVA

No full text in DiVA

Other links

Publisher's full textScopus

Authority records

Xiong, Ning

Search in DiVA

By author/editor
Xiong, Ning
By organisation
Embedded Systems
In the same journal
Information Sciences
Computer Sciences

Search outside of DiVA

GoogleGoogle Scholar

doi
urn-nbn

Altmetric score

doi
urn-nbn
Total: 87 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf