Using Metadata to Analyze Trajectories of Finnish Newspapers

Näytä kaikki kuvailutiedot



Pysyväisosoite

http://urn.fi/URN:NBN:fi:hulib-202006243439
Julkaisun nimi: Using Metadata to Analyze Trajectories of Finnish Newspapers
Tekijä: Hussain, Zafar
Muu tekijä: Helsingin yliopisto, Matemaattis-luonnontieteellinen tiedekunta
University of Helsinki, Faculty of Science
Helsingfors universitet, Matematisk-naturvetenskapliga fakulteten
Julkaisija: Helsingin yliopisto
Päiväys: 2020
Kieli: eng
URI: http://urn.fi/URN:NBN:fi:hulib-202006243439
http://hdl.handle.net/10138/316951
Opinnäytteen taso: pro gradu -tutkielmat
Koulutusohjelma: Datatieteen maisteriohjelma
Master's Programme in Data Science
Magisterprogrammet i data science
Opintosuunta: ei opintosuuntaa
no specialization
ingen studieinriktning
Oppiaine: none
Tiivistelmä: The National Library of Finland has digitized newspapers starting from late eighteenth century. Digitized data of Finnish newspapers is a heterogeneous data set, which contains the content and metadata of historical newspapers. This research work is focused to study this rich materiality data to find the data-driven categorization of newspapers. Since the data is not known beforehand, the objective is to understand the development of newspapers and use statistical methods to analyze the fluctuations in the attributes of this metadata. An important aspect of this research work is to study the computational and statistical methods which can better express the complexity of Finnish historical newspaper metadata. Exploratory analyses are performed to get an understanding of the attributes and extract the patterns among them. To explicate the attributes’ dependencies on each other, Ordinary Least Squares and Linear Regression methods are applied. The results of these regression methods confirm the significant correlation between the attributes. To categorize the data, spectral and hierarchical clustering methods are studied for grouping the newspapers with similar attributes. The clustered data further helps in dividing and understanding the data over time and place. Decision trees are constructed to split the newspapers after attributes’ logical divisions. The results of Random Forest decision trees show the paths of development of the attributes. The goal of applying various methods is to get a comprehensive interpretation of the attributes’ development based on language, time, and place and evaluate the usefulness of these methods on the newspaper data. From the features’ perspective, area appears as the most imperative feature and from language based comparison Swedish newspapers are ahead of Finnish newspapers in adapting popular trends of the time. Dividing the newspaper publishing places into regions, small towns show more fluctuations in publishing trends, while from the perspective of time the second half of twentieth century has seen a large increase in newspapers and publishing trends. This research work coordinates information on regions, language, page size, density, and area of newspapers and offers robust statistical analysis of newspapers published in Finland.
Avainsanat: Heterogeneous Data
Metadata Analyses
Exploratory Analyses
Newspaper Data
Finnish Newspapers


Tiedostot

Latausmäärä yhteensä: Ladataan...

Tiedosto(t) Koko Formaatti Näytä
Hussain_Zafar_Thesis_2020.pdf 5.761MB PDF Avaa tiedosto

Viite kuuluu kokoelmiin:

Näytä kaikki kuvailutiedot