Creating Maps of Science Using Topic Models : A Reproducibility Study
Show simple item record
dc.contributor |
Helsingin yliopisto, Humanistinen tiedekunta |
fi |
dc.contributor |
University of Helsinki, Faculty of Arts |
en |
dc.contributor |
Helsingfors universitet, Humanistiska fakulteten |
sv |
dc.contributor.author |
An, Yu |
|
dc.date.issued |
2020 |
|
dc.identifier.uri |
URN:NBN:fi:hulib-202012155145 |
|
dc.identifier.uri |
http://hdl.handle.net/10138/322948 |
|
dc.description.abstract |
Maps of science, or cartography of scientific fields, provide insights into the state of scientific knowledge. Analogous to geographical maps, maps of science present the fields as positions and show the paths connecting each other, which can serve as an intuitive illustration for the history of science or a hint to spot potential opportunities for collaboration. In this work, I investigate the reproducibility of a method to generate such maps. The idea of the method is to derive representations representations for the given scientific fields with topic models and then perform hierarchical clustering on these, which in the end yields a tree of scientific fields as the map. The result is found unreproducible, as my result obtained on the arXiv data set (~130k articles from arXiv Computer Science) shows an inconsistent structure from the one in the reference study. To investigate the cause of the inconsistency, I derive a second set of maps using the same method and an adjusted data set, which is constructed by re-sampling the arXiv data set to a more balanced distribution. The findings show the confounding factors in the data cannot account for the inconsistency; instead, it should be due to the stochastic nature of the unsupervised algorithm. I also improve the approach by using ensemble topic models to derive representations. It is found the method to derive maps of science can be reproducible when it uses an ensemble topic model fused from a sufficient number of base models. |
en |
dc.language.iso |
eng |
|
dc.publisher |
Helsingin yliopisto |
fi |
dc.publisher |
University of Helsinki |
en |
dc.publisher |
Helsingfors universitet |
sv |
dc.subject |
machine learning |
|
dc.subject |
text mining |
|
dc.subject |
topic models |
|
dc.subject |
scientometrics |
|
dc.title |
Creating Maps of Science Using Topic Models : A Reproducibility Study |
en |
dc.type.ontasot |
pro gradu -tutkielmat |
fi |
dc.type.ontasot |
master's thesis |
en |
dc.type.ontasot |
pro gradu-avhandlingar |
sv |
dct.identifier.urn |
URN:NBN:fi:hulib-202012155145 |
|
dc.subject.specialization |
Kieliteknologia |
fi |
dc.subject.specialization |
Language Technology |
en |
dc.subject.specialization |
Språkteknologi |
sv |
dc.subject.degreeprogram |
Kielellisen diversiteetin ja digitaalisten menetelmien maisteriohjelma |
fi |
dc.subject.degreeprogram |
Master's Programme Linguistic Diversity in the Digital Age |
en |
dc.subject.degreeprogram |
Magisterprogrammet i språklig diversitet och digitala metoder |
sv |
Files in this item
Total number of downloads: Loading...
This item appears in the following Collection(s)
Show simple item record