SMS Spam Detection in a Real-World Platform using Machine Learning

Show full item record



Permalink

http://urn.fi/URN:NBN:fi:hulib-201908133203
Title: SMS Spam Detection in a Real-World Platform using Machine Learning
Author: Rodriguez Villanueva, Cesar Adolfo
Contributor: University of Helsinki, Faculty of Science
Publisher: Helsingin yliopisto
Date: 2019
Language: eng
URI: http://urn.fi/URN:NBN:fi:hulib-201908133203
http://hdl.handle.net/10138/304706
Thesis level: master's thesis
Discipline: Algorithms and Machine Learning
Abstract: Spam detection techniques have made our lives easier by unclogging our inboxes and keeping unsafe messages from being opened. With the automation of text messaging solutions and the increase in telecommunication companies and message providers, the volume of text messages has been on the rise. With this growth came along malicious traffic which users had little control over. In this thesis, we present an implementation of a spam detection system in a real-world text messaging platform. Using well-established machine learning algorithms, we make an in-depth analysis on the performance of the models using two different datasets: one publicly available (N=5,574) and the other gathered from actual traffic of the platform (N=1,477). Making use of the empirical results, we outline the models and hyperparameters which can be used in the platform and in which scenarios they produce optimal performance. The results indicate that our dataset poses a great challenge at accurate classification, most likely due to the small sample size and unbalanced dataset, along with nuances in the dataset. Nevertheless, there were models that were found to have a good all-around performance and they can be trained and used in the platform.


Files in this item

Total number of downloads: Loading...

Files Size Format View
cesar_rodriguez_thesis.pdf 3.781Mb PDF View/Open

This item appears in the following Collection(s)

Show full item record