Review of popular word embedding models for event log anomaly detection purposes

Show simple item record

dc.contributor Helsingin yliopisto, Matemaattis-luonnontieteellinen tiedekunta fi
dc.contributor University of Helsinki, Faculty of Science en
dc.contributor Helsingfors universitet, Matematisk-naturvetenskapliga fakulteten sv
dc.contributor.author Tuulio, Ville
dc.date.issued 2020
dc.identifier.uri URN:NBN:fi:hulib-202003251659
dc.identifier.uri http://hdl.handle.net/10138/313630
dc.description.abstract System logs are the diagnostic window to the state of health of the server. Logs are collected to files from which system administrators can monitor the status and events in the server. The logs are usually unstructured textual messages which are difficult to go through manually, because of the ever-growing data. Natural language processing contains different styles and techniques for a computer to interpret textual data. Word2vec and fastText are popular word embedding methods which project words to vectors of real numbers. Doc2vec is the equivalent for paragraphs and it is an extension to Word2vec. With these embedding models I will attempt to create an anomaly detector to assist the log monitoring task. For the actual anomaly detection, I will utilize Independent component analysis (ICA), Hidden Markov Model (HMM) and Long short-term memory to dig deeper in to the vectorized event log messages. The embedding models are then reviewed for their performance in this task. The results of this study show that there is no clear difference between the success of Word2vec and fastText, but it seems that Doc2vec does not work well with the short messages the event logs contain. The anomaly detector would still need some tuning in order to work reliably in production, but it is a decent attempt to achieve useful tool for event log analysing. en
dc.language.iso eng
dc.publisher Helsingin yliopisto fi
dc.publisher University of Helsinki en
dc.publisher Helsingfors universitet sv
dc.title Review of popular word embedding models for event log anomaly detection purposes en
dc.type.ontasot pro gradu -tutkielmat fi
dc.type.ontasot master's thesis en
dc.type.ontasot pro gradu-avhandlingar sv
dc.subject.discipline Matematiikka und
dct.identifier.urn URN:NBN:fi:hulib-202003251659

Files in this item

Total number of downloads: Loading...

Files Size Format View
Tuulio_Ville_Pro_gradu_2020.pdf 2.195Mb PDF View/Open

This item appears in the following Collection(s)

Show simple item record