Using Metadata to Analyze Trajectories of Finnish Newspapers

Show simple item record

dc.contributor Helsingin yliopisto, Matemaattis-luonnontieteellinen tiedekunta fi
dc.contributor University of Helsinki, Faculty of Science en
dc.contributor Helsingfors universitet, Matematisk-naturvetenskapliga fakulteten sv
dc.contributor.author Hussain, Zafar
dc.date.issued 2020
dc.identifier.uri URN:NBN:fi:hulib-202006243439
dc.identifier.uri http://hdl.handle.net/10138/316951
dc.description.abstract The National Library of Finland has digitized newspapers starting from late eighteenth century. Digitized data of Finnish newspapers is a heterogeneous data set, which contains the content and metadata of historical newspapers. This research work is focused to study this rich materiality data to find the data-driven categorization of newspapers. Since the data is not known beforehand, the objective is to understand the development of newspapers and use statistical methods to analyze the fluctuations in the attributes of this metadata. An important aspect of this research work is to study the computational and statistical methods which can better express the complexity of Finnish historical newspaper metadata. Exploratory analyses are performed to get an understanding of the attributes and extract the patterns among them. To explicate the attributes’ dependencies on each other, Ordinary Least Squares and Linear Regression methods are applied. The results of these regression methods confirm the significant correlation between the attributes. To categorize the data, spectral and hierarchical clustering methods are studied for grouping the newspapers with similar attributes. The clustered data further helps in dividing and understanding the data over time and place. Decision trees are constructed to split the newspapers after attributes’ logical divisions. The results of Random Forest decision trees show the paths of development of the attributes. The goal of applying various methods is to get a comprehensive interpretation of the attributes’ development based on language, time, and place and evaluate the usefulness of these methods on the newspaper data. From the features’ perspective, area appears as the most imperative feature and from language based comparison Swedish newspapers are ahead of Finnish newspapers in adapting popular trends of the time. Dividing the newspaper publishing places into regions, small towns show more fluctuations in publishing trends, while from the perspective of time the second half of twentieth century has seen a large increase in newspapers and publishing trends. This research work coordinates information on regions, language, page size, density, and area of newspapers and offers robust statistical analysis of newspapers published in Finland. en
dc.language.iso eng
dc.publisher Helsingin yliopisto fi
dc.publisher University of Helsinki en
dc.publisher Helsingfors universitet sv
dc.subject Heterogeneous Data
dc.subject Metadata Analyses
dc.subject Exploratory Analyses
dc.subject Newspaper Data
dc.subject Finnish Newspapers
dc.title Using Metadata to Analyze Trajectories of Finnish Newspapers en
dc.type.ontasot pro gradu -tutkielmat fi
dc.type.ontasot master's thesis en
dc.type.ontasot pro gradu-avhandlingar sv
dc.subject.discipline none und
dct.identifier.urn URN:NBN:fi:hulib-202006243439
dc.subject.specialization ei opintosuuntaa fi
dc.subject.specialization no specialization en
dc.subject.specialization ingen studieinriktning sv
dc.subject.degreeprogram Datatieteen maisteriohjelma fi
dc.subject.degreeprogram Master's Programme in Data Science en
dc.subject.degreeprogram Magisterprogrammet i data science sv

Files in this item

Total number of downloads: Loading...

Files Size Format View
Hussain_Zafar_Thesis_2020.pdf 5.761Mb PDF View/Open

This item appears in the following Collection(s)

Show simple item record