dc.contributor 
Helsingin yliopisto, Matemaattisluonnontieteellinen tiedekunta, Tietojenkäsittelytieteen laitos 
fi 
dc.contributor 
University of Helsinki, Faculty of Science, Department of Computer Science 
en 
dc.contributor 
Helsingfors universitet, Matematisknaturvetenskapliga fakulteten, Institutionen för datavetenskap 
sv 
dc.contributor.author 
Jääsaari, Elias 

dc.date.issued 
2017 

dc.identifier.uri 
URN:NBN:fi:hulib201711135689 

dc.identifier.uri 
http://hdl.handle.net/10138/228836 

dc.description.abstract 
Tasks such as data compression and prediction commonly require choosing a probability distribution over all possible sequences. To achieve an efficient prediction strategy, the chosen distribution should be a good approximation of the true distribution underlying the data. Similarly, an efficient compression strategy should assign shorter codes for more probable sequences. In particular, a compression strategy that minimizes the codelength can be shown to minimize the oftenused logarithmic prediction loss. However, the optimal strategy requires knowing the true distribution which is not available in most applications.
In universal compression or prediction we assume that the true probability distribution is not known but belongs to a known class of distributions. A universal code is a code that can compress the data essentially as well as the best distribution in the class in hindsight. Similarly, a universal predictor achieves low prediction loss regardless of the distribution. We call a universal code minimax optimal if it minimizes the worstcase regret, i.e. excess codelength or prediction loss compared to the best distribution in the class.
In this thesis we assume the known class to be discrete memoryless sources. The minimax optimal code for this class is given by the normalized maximum likelihood (NML) distribution. However, in practice computationally more efficient distributions such as Bayes mixtures have to be used. A Bayes mixture is a mixture of the probability distributions in the class weighted by a prior distribution. The conjugate prior to the multinomial distribution is the Dirichlet distribution, using which asymptotically minimax codes have been developed. The Dirichlet distribution requires a hyperparameter that dictates the amount of prior mass given to the outcomes. The distribution given by the symmetric hyperparameter 1/2 has been widely studied and has been shown to minimize the worstcase expected regret asymptotically.
Previous work on minimax optimal Bayes mixtures has mainly been concerned with large sample sizes in comparison to the alphabet size. In this thesis we investigate the minimax optimal Dirichlet prior in the large alphabet setting. In particular, we find that when the alphabet size is large compared to the sample size, the optimal hyperparameter for the Dirichlet distribution is 1/3. The worstcase regret of this mixture turns out to approach the NML regret when the alphabet size grows and the distribution provides an efficient approximation of the NML distribution. Furthermore, we develop an efficient algorithm for finding the optimal hyperparameter for any sample size or alphabet size. 
en 
dc.language.iso 
en 

dc.publisher 
Helsingfors universitet 
sv 
dc.publisher 
University of Helsinki 
en 
dc.publisher 
Helsingin yliopisto 
fi 
dc.title 
Minimax Optimal Bayes Mixtures for Memoryless Sources 
en 
dc.type.ontasot 
pro graduavhandlingar 
sv 
dc.type.ontasot 
pro gradu tutkielmat 
fi 
dc.type.ontasot 
master's thesis 
en 
dc.subject.discipline 
Datavetenskap 
sv 
dc.subject.discipline 
Computer science 
en 
dc.subject.discipline 
Tietojenkäsittelytiede 
fi 
dct.identifier.urn 
URN:NBN:fi:hulib201711135689 
