mdh.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Quality versus efficiency in document scoring with learning-to-rank models
Mälardalen University, School of Innovation, Design and Engineering, Embedded Systems.
“Istituto di Scienza e Tecnologie dell'Informazione” (ISTI) of the National Research Council of Italy (CNR), Pisa, Italy.
“Istituto di Scienza e Tecnologie dell'Informazione” (ISTI) of the National Research Council of Italy (CNR), Pisa, Italy.
University Ca’ Foscari of Venice, Italy.
Show others and affiliations
2016 (English)In: Information Processing & Management, ISSN 0306-4573, E-ISSN 1873-5371, Vol. 52, no 6, 1161-1177 p.Article in journal (Refereed) Published
Abstract [en]

Learning-to-Rank (LtR) techniques leverage machine learning algorithms and large amounts of training data to induce high-quality ranking functions. Given a set of documents and a user query, these functions are able to precisely predict a score for each of the documents, in turn exploited to effectively rank them. Although the scoring efficiency of LtR models is critical in several applications – e.g., it directly impacts on response time and throughput of Web query processing – it has received relatively little attention so far. The goal of this work is to experimentally investigate the scoring efficiency of LtR models along with their ranking quality. Specifically, we show that machine-learned ranking models exhibit a quality versus efficiency trade-off. For example, each family of LtR algorithms has tuning parameters that can influence both effectiveness and efficiency, where higher ranking quality is generally obtained with more complex and expensive models. Moreover, LtR algorithms that learn complex models, such as those based on forests of regression trees, are generally more expensive and more effective than other algorithms that induce simpler models like linear combination of features. We extensively analyze the quality versus efficiency trade-off of a wide spectrum of state-of-the-art LtR, and we propose a sound methodology to devise the most effective ranker given a time budget. To guarantee reproducibility, we used publicly available datasets and we contribute an open source C++ framework providing optimized, multi-threaded implementations of the most effective tree-based learners: Gradient Boosted Regression Trees (GBRT), Lambda-Mart (Λ-MART), and the first public-domain implementation of Oblivious Lambda-Mart (Ωλ-MART), an algorithm that induces forests of oblivious regression trees. We investigate how the different training parameters impact on the quality versus efficiency trade-off, and provide a thorough comparison of several algorithms in the quality-cost space. The experiments conducted show that there is not an overall best algorithm, but the optimal choice depends on the time budget.

Place, publisher, year, edition, pages
2016. Vol. 52, no 6, 1161-1177 p.
Keyword [en]
Document scoring, Efficiency, Learning-to-rank, Artificial intelligence, Budget control, C++ (programming language), Economic and social effects, Forestry, Learning algorithms, Learning systems, Parameter estimation, Regression analysis, Boosted regression trees, Effectiveness and efficiencies, Learning to rank, Linear combinations, Multi-threaded implementation, Training parameters, Tree-based learners, Algorithms
National Category
Computer and Information Science
Identifiers
URN: urn:nbn:se:mdh:diva-33482DOI: 10.1016/j.ipm.2016.05.004ISI: 000385605300010ScopusID: 2-s2.0-84991228123OAI: oai:DiVA.org:mdh-33482DiVA: diva2:1040771
Available from: 2016-10-28 Created: 2016-10-28 Last updated: 2016-12-22Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full textScopus

Search in DiVA

By author/editor
Capannini, Gabriele
By organisation
Embedded Systems
In the same journal
Information Processing & Management
Computer and Information Science

Search outside of DiVA

GoogleGoogle Scholar

Altmetric score

Total: 14 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • harvard1
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf