Review of popular word embedding models for event log anomaly detection purposes

Show full item record



Permalink

http://urn.fi/URN:NBN:fi:hulib-202003251659
Title: Review of popular word embedding models for event log anomaly detection purposes
Author: Tuulio, Ville
Contributor: University of Helsinki, Faculty of Science
Publisher: Helsingin yliopisto
Date: 2020
Language: eng
URI: http://urn.fi/URN:NBN:fi:hulib-202003251659
http://hdl.handle.net/10138/313630
Thesis level: master's thesis
Discipline: Matematiikka
Abstract: System logs are the diagnostic window to the state of health of the server. Logs are collected to files from which system administrators can monitor the status and events in the server. The logs are usually unstructured textual messages which are difficult to go through manually, because of the ever-growing data. Natural language processing contains different styles and techniques for a computer to interpret textual data. Word2vec and fastText are popular word embedding methods which project words to vectors of real numbers. Doc2vec is the equivalent for paragraphs and it is an extension to Word2vec. With these embedding models I will attempt to create an anomaly detector to assist the log monitoring task. For the actual anomaly detection, I will utilize Independent component analysis (ICA), Hidden Markov Model (HMM) and Long short-term memory to dig deeper in to the vectorized event log messages. The embedding models are then reviewed for their performance in this task. The results of this study show that there is no clear difference between the success of Word2vec and fastText, but it seems that Doc2vec does not work well with the short messages the event logs contain. The anomaly detector would still need some tuning in order to work reliably in production, but it is a decent attempt to achieve useful tool for event log analysing.


Files in this item

Total number of downloads: Loading...

Files Size Format View
Tuulio_Ville_Pro_gradu_2020.pdf 2.195Mb PDF View/Open

This item appears in the following Collection(s)

Show full item record