mdh.sePublications
Change search
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf
Term Ranking Adaptation to the Domain: Genetic Algorithm-Based Optimisation of the C-Value
LIMSI-CNRS, Orsay, France.
Mälardalen University, School of Education, Culture and Communication, Educational Sciences and Mathematics. (Mathematics and Applied Mathematics)ORCID iD: 0000-0002-1624-5147
Mälardalen University, School of Education, Culture and Communication, Educational Sciences and Mathematics. (Mathematics and Applied Mathematics)ORCID iD: 0000-0003-4554-6528
2014 (English)In: Advances in Natural Language Processing: 9th International Conference on NLP, PolTAL 2014, Warsaw, Poland, September 17-19, 2014. Proceedings, Springer International Publishing , 2014, Vol. 8686, 71-83 p.Conference paper, Published paper (Refereed)
Abstract [en]

Term extraction methods based on linguistic rules have been proposed to help the terminology building from corpora. As they face the difficulty of identifying the relevant terms among the noun phrases extracted, statistical measures have been proposed. However, the term selection results may depend on corpus and strong assumptions reflecting specific terminological practice. We tackle this problem by proposing a parametrised C-Value which optimally considers the length and the syntactic roles of the nested terms thanks to a genetic algorithm. We compare its impact on the ranking of terms extracted from three corpora. Results show average precision increased by 9% above the frequency-based ranking and by 12% above the C-Value-based ranking.

Place, publisher, year, edition, pages
Springer International Publishing , 2014. Vol. 8686, 71-83 p.
Series
Lecture Notes in Computer Science, ISSN 0302-9743 ; 8686
Series
Lecture Notes in Computer Science, 8686
Keyword [en]
Terminology; term extraction; term ranking; genetic algorithm
National Category
Mathematics Computational Mathematics Other Mathematics
Research subject
Mathematics/Applied Mathematics
Identifiers
URN: urn:nbn:se:mdh:diva-27266DOI: 10.1007/978-3-319-10888-9_8Scopus ID: 2-s2.0-84921774624ISBN: 978-3-319-10888-9 (print)OAI: oai:DiVA.org:mdh-27266DiVA: diva2:775379
Conference
9th International Conference on NLP, PolTAL 2014, Warsaw, Poland, September 17-19, 2014.
Available from: 2015-01-02 Created: 2015-01-02 Last updated: 2016-10-24Bibliographically approved
In thesis
1. PageRank in Evolving Networks and Applications of Graphs in Natural Language Processing and Biology
Open this publication in new window or tab >>PageRank in Evolving Networks and Applications of Graphs in Natural Language Processing and Biology
2016 (English)Doctoral thesis, comprehensive summary (Other academic)
Abstract [en]

This thesis is dedicated to the use of graph based methods applied to ranking problems on the Web-graph and applications in natural language processing and biology.

Chapter 2-4 of this thesis is about PageRank and its use in the ranking of home pages on the Internet for use in search engines. PageRank is based on the assumption that a web page should be high ranked if it is linked to by many other pages and/or by other important pages. This is modelled as the stationary distribution of a random walk on the Web-graph.

Due to the large size and quick growth of the Internet it is important to be able to calculate this ranking very efficiently. One of the main topics of this thesis is how this can be made more efficiently, mainly by considering specific types of subgraphs and how PageRank can be calculated or updated for those type of graph structures. In particular we will consider the graph partitioned into strongly connected components and how this partitioning can be utilized.

Chapter 5-7 is dedicated to graph based methods and their application to problems in Natural language processing. Specifically given a collection of texts (corpus) we will compare different clustering methods applied to Pharmacovigilance terms (5), graph based models for the identification of semantic relations between biomedical words (6) and modifications of CValue for the annotation of terms in a corpus.

In Chapter 8-9 we look at biological networks and the application of graph centrality measures for the identification of cancer genes. Specifically in (8) we give a review over different centrality measures and their application to finding cancer genes in biological networks and in (9) we look at how well the centrality of vertices in the true network is preserved in networks generated from experimental data.

Place, publisher, year, edition, pages
Västerås: Mälardalen University, 2016
Series
Mälardalen University Press Dissertations, ISSN 1651-4238 ; 217
National Category
Mathematics
Research subject
Mathematics/Applied Mathematics
Identifiers
urn:nbn:se:mdh:diva-33459 (URN)978-91-7485-298-1 (ISBN)
Public defence
2016-12-08, Kappa, Mälardalens högskola, Västerås, 13:15 (English)
Opponent
Supervisors
Available from: 2016-10-24 Created: 2016-10-24 Last updated: 2016-11-23Bibliographically approved

Open Access in DiVA

No full text

Other links

Publisher's full textScopushttp://dx.doi.org/10.1007/978-3-319-10888-9_8

Search in DiVA

By author/editor
Engström, ChristopherSilvestrov, Sergei
By organisation
Educational Sciences and Mathematics
MathematicsComputational MathematicsOther Mathematics

Search outside of DiVA

GoogleGoogle Scholar

Altmetric score

Total: 40 hits
CiteExportLink to record
Permanent link

Direct link
Cite
Citation style
  • apa
  • ieee
  • modern-language-association-8th-edition
  • vancouver
  • Other style
More styles
Language
  • de-DE
  • en-GB
  • en-US
  • fi-FI
  • nn-NO
  • nn-NB
  • sv-SE
  • Other locale
More languages
Output format
  • html
  • text
  • asciidoc
  • rtf