Comparing descriptors for molecular clusters in unsupervised learning

Show full item record



Permalink

http://urn.fi/URN:NBN:fi:hulib-202006122791
Title: Comparing descriptors for molecular clusters in unsupervised learning
Author: Jääskeläinen, Matias
Contributor: University of Helsinki, Faculty of Science
Publisher: Helsingin yliopisto
Date: 2020
Language: eng
URI: http://urn.fi/URN:NBN:fi:hulib-202006122791
http://hdl.handle.net/10138/316568
Thesis level: master's thesis
Degree program: Teoreettisten ja laskennallisten menetelmien maisteriohjelma
Master's Programme in Theoretical and Computational Methods
Magisterprogrammet i teoretiska och beräkningsmetoder
Specialisation: ei opintosuuntaa
no specialization
ingen studieinriktning
Discipline: none
Abstract: This thesis is about exploring descriptors for atmospheric molecular clusters. Descriptors are needed for applying machine learning methods for molecular systems. There is a collection of descriptors readily available in the DScribe-library developed in Aalto University for custom machine learning applications. The question of which descriptors to use is up to the user to decide. This study takes the first steps in integrating machine learning into existing procedure of configurational sampling that aims to find the optimal structure for any given molecular cluster of interest. The structure selection step forms a bottleneck in the configurational sampling procedure. A new structure selection method presented in this study uses k-means clustering to find structures that are similar to each other. The clustering results can be used to discard redundant structures more effectively than before which leaves fewer structures to be calculated with more expensive computations. Altogether that speeds up the configurational sampling procedure. To aid the selection of suitable descriptor for this application, a comparison of four descriptors available in DScribe is made. A procedure for structure selection by representing atmospheric clusters with descriptors and labeling them into groups with k-means was implemented. The performance of descriptors was compared with a custom score suitable for this application, and it was found that MBTR outperforms the other descriptors. This structure selection method will be utilized in the existing configurational sampling procedure for atmospheric molecular clusters but it is not restricted to that application.
Subject: configurational sampling
machine learning
k-means
molecular descriptors


Files in this item

Total number of downloads: Loading...

Files Size Format View
MastersThesis_MatiasJaaskelainen.pdf 8.583Mb PDF View/Open

This item appears in the following Collection(s)

Show full item record