Using Metadata to Analyze Trajectories of Finnish Newspapers

Show full item record



Permalink

http://urn.fi/URN:NBN:fi:hulib-202006243439
Title: Using Metadata to Analyze Trajectories of Finnish Newspapers
Author: Hussain, Zafar
Other contributor: Helsingin yliopisto, Matemaattis-luonnontieteellinen tiedekunta
University of Helsinki, Faculty of Science
Helsingfors universitet, Matematisk-naturvetenskapliga fakulteten
Publisher: Helsingin yliopisto
Date: 2020
Language: eng
URI: http://urn.fi/URN:NBN:fi:hulib-202006243439
http://hdl.handle.net/10138/316951
Thesis level: master's thesis
Degree program: Datatieteen maisteriohjelma
Master's Programme in Data Science
Magisterprogrammet i data science
Specialisation: ei opintosuuntaa
no specialization
ingen studieinriktning
Discipline: none
Abstract: The National Library of Finland has digitized newspapers starting from late eighteenth century. Digitized data of Finnish newspapers is a heterogeneous data set, which contains the content and metadata of historical newspapers. This research work is focused to study this rich materiality data to find the data-driven categorization of newspapers. Since the data is not known beforehand, the objective is to understand the development of newspapers and use statistical methods to analyze the fluctuations in the attributes of this metadata. An important aspect of this research work is to study the computational and statistical methods which can better express the complexity of Finnish historical newspaper metadata. Exploratory analyses are performed to get an understanding of the attributes and extract the patterns among them. To explicate the attributes’ dependencies on each other, Ordinary Least Squares and Linear Regression methods are applied. The results of these regression methods confirm the significant correlation between the attributes. To categorize the data, spectral and hierarchical clustering methods are studied for grouping the newspapers with similar attributes. The clustered data further helps in dividing and understanding the data over time and place. Decision trees are constructed to split the newspapers after attributes’ logical divisions. The results of Random Forest decision trees show the paths of development of the attributes. The goal of applying various methods is to get a comprehensive interpretation of the attributes’ development based on language, time, and place and evaluate the usefulness of these methods on the newspaper data. From the features’ perspective, area appears as the most imperative feature and from language based comparison Swedish newspapers are ahead of Finnish newspapers in adapting popular trends of the time. Dividing the newspaper publishing places into regions, small towns show more fluctuations in publishing trends, while from the perspective of time the second half of twentieth century has seen a large increase in newspapers and publishing trends. This research work coordinates information on regions, language, page size, density, and area of newspapers and offers robust statistical analysis of newspapers published in Finland.
Subject: Heterogeneous Data
Metadata Analyses
Exploratory Analyses
Newspaper Data
Finnish Newspapers


Files in this item

Total number of downloads: Loading...

Files Size Format View
Hussain_Zafar_Thesis_2020.pdf 5.761Mb PDF View/Open

This item appears in the following Collection(s)

Show full item record