Review of popular word embedding models for event log anomaly detection purposes

Näytä kaikki kuvailutiedot

Julkaisun nimi: Review of popular word embedding models for event log anomaly detection purposes
Tekijä: Tuulio, Ville
Muu tekijä: Helsingin yliopisto, Matemaattis-luonnontieteellinen tiedekunta
University of Helsinki, Faculty of Science
Helsingfors universitet, Matematisk-naturvetenskapliga fakulteten
Julkaisija: Helsingin yliopisto
Päiväys: 2020
Kieli: eng
Opinnäytteen taso: pro gradu -tutkielmat
Oppiaine: Matematiikka
Tiivistelmä: System logs are the diagnostic window to the state of health of the server. Logs are collected to files from which system administrators can monitor the status and events in the server. The logs are usually unstructured textual messages which are difficult to go through manually, because of the ever-growing data. Natural language processing contains different styles and techniques for a computer to interpret textual data. Word2vec and fastText are popular word embedding methods which project words to vectors of real numbers. Doc2vec is the equivalent for paragraphs and it is an extension to Word2vec. With these embedding models I will attempt to create an anomaly detector to assist the log monitoring task. For the actual anomaly detection, I will utilize Independent component analysis (ICA), Hidden Markov Model (HMM) and Long short-term memory to dig deeper in to the vectorized event log messages. The embedding models are then reviewed for their performance in this task. The results of this study show that there is no clear difference between the success of Word2vec and fastText, but it seems that Doc2vec does not work well with the short messages the event logs contain. The anomaly detector would still need some tuning in order to work reliably in production, but it is a decent attempt to achieve useful tool for event log analysing.


Latausmäärä yhteensä: Ladataan...

Tiedosto(t) Koko Formaatti Näytä
Tuulio_Ville_Pro_gradu_2020.pdf 2.195MB PDF Avaa tiedosto

Viite kuuluu kokoelmiin:

Näytä kaikki kuvailutiedot