Using Metadata to Analyze Trajectories of Finnish Newspapers

Visa fullständig post



Permalänk

http://urn.fi/URN:NBN:fi:hulib-202006243439
Titel: Using Metadata to Analyze Trajectories of Finnish Newspapers
Författare: Hussain, Zafar
Medarbetare: Helsingin yliopisto, Matemaattis-luonnontieteellinen tiedekunta
University of Helsinki, Faculty of Science
Helsingfors universitet, Matematisk-naturvetenskapliga fakulteten
Utgivare: Helsingin yliopisto
Datum: 2020
Språk: eng
Permanenta länken (URI): http://urn.fi/URN:NBN:fi:hulib-202006243439
http://hdl.handle.net/10138/316951
Nivå: pro gradu-avhandlingar
Utbildningsprogram: Datatieteen maisteriohjelma
Master's Programme in Data Science
Magisterprogrammet i data science
Studieinriktning: ei opintosuuntaa
no specialization
ingen studieinriktning
Ämne: none
Abstrakt: The National Library of Finland has digitized newspapers starting from late eighteenth century. Digitized data of Finnish newspapers is a heterogeneous data set, which contains the content and metadata of historical newspapers. This research work is focused to study this rich materiality data to find the data-driven categorization of newspapers. Since the data is not known beforehand, the objective is to understand the development of newspapers and use statistical methods to analyze the fluctuations in the attributes of this metadata. An important aspect of this research work is to study the computational and statistical methods which can better express the complexity of Finnish historical newspaper metadata. Exploratory analyses are performed to get an understanding of the attributes and extract the patterns among them. To explicate the attributes’ dependencies on each other, Ordinary Least Squares and Linear Regression methods are applied. The results of these regression methods confirm the significant correlation between the attributes. To categorize the data, spectral and hierarchical clustering methods are studied for grouping the newspapers with similar attributes. The clustered data further helps in dividing and understanding the data over time and place. Decision trees are constructed to split the newspapers after attributes’ logical divisions. The results of Random Forest decision trees show the paths of development of the attributes. The goal of applying various methods is to get a comprehensive interpretation of the attributes’ development based on language, time, and place and evaluate the usefulness of these methods on the newspaper data. From the features’ perspective, area appears as the most imperative feature and from language based comparison Swedish newspapers are ahead of Finnish newspapers in adapting popular trends of the time. Dividing the newspaper publishing places into regions, small towns show more fluctuations in publishing trends, while from the perspective of time the second half of twentieth century has seen a large increase in newspapers and publishing trends. This research work coordinates information on regions, language, page size, density, and area of newspapers and offers robust statistical analysis of newspapers published in Finland.
Subject: Heterogeneous Data
Metadata Analyses
Exploratory Analyses
Newspaper Data
Finnish Newspapers


Filer under denna titel

Totalt antal nerladdningar: Laddar...

Filer Storlek Format Granska
Hussain_Zafar_Thesis_2020.pdf 5.761Mb PDF Granska/Öppna

Detta dokument registreras i samling:

Visa fullständig post