Review of popular word embedding models for event log anomaly detection purposes

Visa fullständig post

Titel: Review of popular word embedding models for event log anomaly detection purposes
Författare: Tuulio, Ville
Medarbetare: Helsingin yliopisto, Matemaattis-luonnontieteellinen tiedekunta
University of Helsinki, Faculty of Science
Helsingfors universitet, Matematisk-naturvetenskapliga fakulteten
Utgivare: Helsingin yliopisto
Datum: 2020
Språk: eng
Permanenta länken (URI):
Nivå: pro gradu-avhandlingar
Ämne: Matematiikka
Abstrakt: System logs are the diagnostic window to the state of health of the server. Logs are collected to files from which system administrators can monitor the status and events in the server. The logs are usually unstructured textual messages which are difficult to go through manually, because of the ever-growing data. Natural language processing contains different styles and techniques for a computer to interpret textual data. Word2vec and fastText are popular word embedding methods which project words to vectors of real numbers. Doc2vec is the equivalent for paragraphs and it is an extension to Word2vec. With these embedding models I will attempt to create an anomaly detector to assist the log monitoring task. For the actual anomaly detection, I will utilize Independent component analysis (ICA), Hidden Markov Model (HMM) and Long short-term memory to dig deeper in to the vectorized event log messages. The embedding models are then reviewed for their performance in this task. The results of this study show that there is no clear difference between the success of Word2vec and fastText, but it seems that Doc2vec does not work well with the short messages the event logs contain. The anomaly detector would still need some tuning in order to work reliably in production, but it is a decent attempt to achieve useful tool for event log analysing.

Filer under denna titel

Totalt antal nerladdningar: Laddar...

Filer Storlek Format Granska
Tuulio_Ville_Pro_gradu_2020.pdf 2.195Mb PDF Granska/Öppna

Detta dokument registreras i samling:

Visa fullständig post