Transfer learning methods for palaeoecology : comparing local models and global models

Show full item record

Title: Transfer learning methods for palaeoecology : comparing local models and global models
Author: Lin, Han
Contributor: University of Helsinki, Faculty of Science, Department of Computer Science
Publisher: Helsingin yliopisto
Date: 2018
Language: eng
Thesis level: master's thesis
Abstract: In order to understand the relationship between organisms and environment, and reconstruct the environment in the past, where occurrence of animal species is known from fossils and climate is unknown, we build predictive models using machine learning algorithms. Our response variable for prediction is terrestrial net primary productivity (NPP) which represents fixed energy stored in vegetation. NPP is one of the main climate determinants and previous research has shown that NPP can be robustly predicted from dental traits of plant-eating mammals. Global occurrence of large plant-eating mammals and their dental traits are used as inputs. Since occurrence of species, their traits and climate characteristic data are not uniformly distributed over time and geographical space, models built on all available training data may generate low prediction accuracy. To achieve accurate prediction, we propose three types of local models such that training data are similar to testing data. They are baseline models, hierarchical clustering based models(HCM) and advanced hierarchical clustering based models(AHCM). Moreover, hierarchical clustering are utilised for clustering data points in HCM and AHCM in order to find training data that match testing data the most. Considering input data are not independently distributed over geographical space and therefore model evaluation is not trivial, we also propose vertical spatial cross validation (VSCV) for evaluating performance of predictive models as well as tuning parameters of models. In experiments, ordinary least squares regression (OLS), decision tree, random forest, rotation forest and gradient boosting regressor are utilised in both global models and local models. Root mean squared error(RMSE) and mean absolute error(MAE) indicates performance of models. In an experiment, we apply VSCV to tune parameters of all models. The baseline is the global model with OLS and Africa continent is testing continent. Experimental results illustrate that there are no models that can perform the best on each small geographic regions. Thus, we develop a scheme to give recommendations on selecting models on different regions. We recommend to use modified hierarchical clustering based models (MHCMs) and global models on the area of Lake Turkana. We propose MHCM as a new strategy to optimize HCMs. In addition, we discover that the prediction on data points in equatorial climate zone is most reliable and prediction error on the Africa continent is equatorial symmetric. Last but not the least, we demonstrate applicability of our models with a case study of fossil data from the Turkana Basin in Africa between 0:01 and 7 millions years ago. The trend of NPP over time for fossil is that NPP firstly decreases slowly and it reaches the lowest value at around 2 to 3 Ma. Then, NPP starts increasing and tends to be stable. NPP in time period between 4 and 7 Ma is higher than in present day.

Files in this item

Total number of downloads: Loading...

Files Size Format View
HanLinMasterThesis.pdf 6.288Mb PDF View/Open

This item appears in the following Collection(s)

Show full item record