Browsing by Title

Sort by: Order: Results:

Now showing items 2629-2648 of 28497
  • Ilves, Airi (Helsingin yliopisto, 2016)
    The study analyses the widening scope of competition law in the area of intellectual property rights law and the risk factors of compulsory licensing remedy for the intellectual property rights owners at European Union market. The subject of current thesis is interesting as despite the great amount of legal literature discussing the topic it still remains a controversial and developing area of European Union competition law. Intellectual property owner operating in Europe should take advantage of knowledge of the Court of Justice of the European Union case law on compulsory licensing cases to protect its commercial interests and assess the risks of European Commission and Member States court’s to be convinced that a compulsory license is the appropriate remedy if parties will not reach the agreement on licensing through their own negotiations. The refusal to license has been considered to be an abuse of a dominant position regulated under the Article 102 of the Treaty of the Functioning the European. The EU authorities have developed a list of “exceptional circumstances” for finding a refusal to license as an abuse under the Article 102 through their decisions. The Court of Justice of the European Union develops EU law by applying dynamic interpretation, thus the primary source for addressing the research topic is the case law of the Court of Justice of the European Union. The scope of this work is limited to the analysis of the most noteworthy cases in EU jurisprudence concerning the Article 102 of the TFEU and refusal to license. In some situations when IP law fails to guarantee the level of innovation in the market the competition law’s intervention may be justified as it happened e.g. in factual situation of case Magill. The landmark decision by Court of Justice is IMS Health, setting forth the legal standard applicable in the European Union today. However, European policy will be also assessed in the light of the recent European Commission decisions and General Court case law. The most recent compulsory licensing case Microsoft will be examined to analyse the policy developments and examine what test might be applied under European competition law in the future cases. In this research paper it will be examined if the competition law in Europe has graduated towards a more economic effect based approach and how the relationship between intellectual property and competition law may be seen as complementary and not as antagonistic. The different characteristics of intellectual property rights rather than “normal” property rights will be discussed according to the development of case law and analysis conducted to see what is the rationale of the new product criterion of the exceptional circumstances test. When considering the effectiveness of the jurisprudence it is necessary to take into account the need to balance the effective competition on the market and the encouragement for further innovation. The intellectual property rights protection has an important role in promoting the technological development and thus also in providing more choice for the consumers. The exceptional circumstances test created by the Court of Justice is formalistic and does not take fully into consideration the situation where intellectual property rights owner may block the innovation, however, it must be stressed that the courts are not generally well equipped to conduct the effect-based cost-benefit analysis that is necessary in order to balance the incentives of the dominant undertaking and its competitors to innovate, and such evaluation may prove to be a difficult task for the judiciary. The standards developed in case law are fact-specific and ultimately a source of uncertainty for undertakings at EU market. The study gathers together the most significant snapshots of law and assesses the possibilities where the EU jurisprudence on compulsory licensing is heading. The author concludes that the law on compulsory licensing in Europe will continue to evolve towards lesser intellectual protection to advance competition, innovation and free movement of goods, however, in spite of the widening scope of the European competition law the conditions for issuing compulsory licenses are still highly restrictive.
  • Lahesmaa-Korpinen, Anna-Maria (Helsingin yliopisto, 2012)
    Proteins are key components in biological systems as they mediate the signaling responsible for information processing in a cell and organism. In biomedical research, one goal is to elucidate the mechanisms of cellular signal transduction pathways to identify possible defects that cause disease. Advancements in technologies such as mass spectrometry and flow cytometry enable the measurement of multiple proteins from a system. Proteomics, or the large-scale study of proteins of a system, thus plays an important role in biomedical research. The analysis of all high-throughput proteomics data requires the use of advanced computational methods. Thus, the combination of bioinformatics and proteomics has become an important part in research of signal transduction pathways. The main objective in this study was to develop and apply computational methods for the preprocessing, analysis and interpretation of high-throughput proteomics data. The methods focused on data from tandem mass spectrometry and single cell flow cytometry, and integration of proteomics data with gene expression microarray data and information from various biological databases. Overall, the methods developed and applied in this study have led to new ways of management and preprocessing of proteomics data. Additionally, the available tools have successfully been used to help interpret biomedical data and to facilitate analysis of data that would have been cumbersome to do without the use of computational methods.
  • Ta, Hung (Helsingin yliopisto, 2012)
    Living systems, which are composed of biological components such as molecules, cells, organisms or entire species, are dynamic and complex. Their behaviors are difficult to study with respect to the properties of individual elements. To study their behaviors, we use quantitative techniques in the "omic" fields such as genomics, bioinformatics and proteomics to measure the behavior of groups of interacting components, and we use mathematical and computational modeling to describe and predict their dynamical behavior. The first step in the understanding of a biological system is to investigate how its individual elements interact with each other. This step consist of drawing a static wiring diagram that connects the individual parts. Experimental techniques that are used - are designed to observe interactions among the biological components in the laboratory while computational approaches are designed to predict interactions among the individual elements based on their properties. In the first part of this thesis, we present techniques for network inference that are particularly targeted at protein-protein interaction networks. These techniques include comparative genomics, structure-based, biological context methods and integrated frameworks. We evaluate and compare the prediction methods that have been most often used for domain-domain interactions and we discuss the limitations of the methods and data resources. We introduce the concept of the Enhanced Phylogenetic Tree, which is a new graphical presentation of the evolutionary history of protein families; then, we propose a novel method for assigning functional linkages to proteins. This method was applied to predicting both human and yeast protein functional linkages. The next step is to obtain insights into the dynamical aspects of the biological systems. One of the outreaching goals of systems biology is to understand the emergent properties of living systems, i.e., to understand how the individual components of a system come together to form distinct, collective and interactive properties and functions. The emergent properties of a system are neither to be found in nor are directly deducible from the lower-level properties of that system. An example of the emergent properties is synchronization, a dynamical state of complex network systems in which the individual components of the systems behave coherently, almost in unison. In the second part of the thesis, we apply computational modeling to mimic and simplify real-life complex systems. We focus on clarifying how the network topology determines the initiation and propagation of synchronization. A simple but efficient method is proposed to reconstruct network structures from functional behaviors for oscillatory systems such as brain. We study the feasibility of network reconstruction systematically for different regimes of coupling and for different network topologies. We utilize the Kuramoto model, an interacting system of oscillators, which is simple but relevant enough to address our questions.
  • Floréen, Patrik (Helsingin yliopisto, 1992)
  • Herrmann, Erik (Helsingin yliopisto, 2010)
    Nucleation is the first step in the formation of a new phase inside a mother phase. Two main forms of nucleation can be distinguished. In homogeneous nucleation, the new phase is formed in a uniform substance. In heterogeneous nucleation, on the other hand, the new phase emerges on a pre-existing surface (nucleation site). Nucleation is the source of about 30% of all atmospheric aerosol which in turn has noticeable health effects and a significant impact on climate. Nucleation can be observed in the atmosphere, studied experimentally in the laboratory and is the subject of ongoing theoretical research. This thesis attempts to be a link between experiment and theory. By comparing simulation results to experimental data, the aim is to (i) better understand the experiments and (ii) determine where the theory needs improvement. Computational fluid dynamics (CFD) tools were used to simulate homogeneous onecomponent nucleation of n-alcohols in argon and helium as carrier gases, homogeneous nucleation in the water-sulfuric acid-system, and heterogeneous nucleation of water vapor on silver particles. In the nucleation of n-alcohols, vapor depletion, carrier gas effect and carrier gas pressure effect were evaluated, with a special focus on the pressure effect whose dependence on vapor and carrier gas properties could be specified. The investigation of nucleation in the water-sulfuric acid-system included a thorough analysis of the experimental setup, determining flow conditions, vapor losses, and nucleation zone. Experimental nucleation rates were compared to various theoretical approaches. We found that none of the considered theoretical descriptions of nucleation captured the role of water in the process at all relative humidities. Heterogeneous nucleation was studied in the activation of silver particles in a TSI 3785 particle counter which uses water as its working fluid. The role of the contact angle was investigated and the influence of incoming particle concentrations and homogeneous nucleation on counting efficiency determined.
  • Cervera Taboada, Alejandra (2012)
    High-throughput technologies have had a profound impact in transcriptomics. Prior to microarrays, measuring gene expression was not possible in a massively parallel way. As of late, deep RNA sequencing has been constantly gaining ground to microarrays in transcriptomics analysis. RNA-Seq promises several advantages over microarray technologies, but it also comes with its own set of challenges. Different approaches exist to tackle each of the required processing steps of the RNA-Seq data. The proposed solutions need to be carefully evaluated to find the best methods depending on the particularities of the datasets and the specific research questions that are being addressed. In this thesis I propose a computational framework that allows the efficient analysis of RNA-Seq datasets. The parallelization of tasks and organization of the data files was handled by the Anduril framework on which the workflow was implemented. Particular emphasis was bestowed on the quality control of the RNA-Seq files. Several measures were taken to prune the data of low quality bases and reads that hamper the alignment step. Furthermore, various existing processing algorithms for transcript assembly and abundance estimation were tested. The best methods have been coupled together into an automated pipeline that takes the raw reads and delivers expression matrices at isoform and gene level. Additionally, a module for obtaining sets of differentially expressed genes under different conditions or when measuring an experiment across a time course is included.
  • Kankainen, Matti (Helsingin yliopisto, 2015)
    Lactobacilli are generally harmless gram-positive lactic acid bacteria and well known for their broad spectrum of beneficial effects on human health and usage in food production. However, relatively little is known at the molecular level about the relationships between lactobacilli and humans and about their food processing abilities. The aim of this thesis was to establish bioinformatics approaches for classifying proteins involved in the health effects and food production abilities of lactobacilli and to elucidate the functional potential of two biomedically important Lactobacillus species using whole-genome sequencing. To facilitate the genome-based analysis of lactobacilli, two new bioinformatics approaches were developed for the systematic analysis of protein function. The first approach, called LOCP, fulfilled the need for accurate genome-wide annotation of putative pilus operons in gram-positive bacteria, whereas the second approach, BLANNOTATOR, represented an improved homology-based solution for general function annotation of bacterial proteins. Importantly, both approaches showed superior accuracy in evaluation tests and proved to be useful in finding information ignored by other homology-search methods, illustrating their added value to the current repertoire of function classification systems. Their application also led to the discovery of several putative pilus operons and new potential effector molecules in lactobacilli, including many of the key findings of this thesis work. Lactobacillus rhamnosus GG is one of the clinically best-studied Lactobacillus strains and has a long history of safe use in the food industry. The whole-genome sequencing of the strain GG and a closely related dairy strain L. rhamnosus LC705 revealed two almost identical genomes, despite the physiological differences between the strains. Nevertheless of the extensive genomic similarity, present only in GG was a genomic region containing genes for three pilin subunits and a pilin-dedicated sortase. The presence of these pili on the cell surface of L. rhamnosus GG was also confirmed, and one of the GG-specific pilins was demonstrated to be central for the mucus interaction of strain GG. These discoveries established the presence of gram-positive pilus structures also in non-pathogenic bacteria and provided a long-awaited explanation for the highly efficient adhesion of the strain GG to the intestinal mucosa. The other Lactobacillus species investigated in this thesis was Lactobacillus crispatus. To gain insights into its physiology and to identify components by which this important constituent of the healthy human vagina may promote urogenital health, the genome of a representative L. crispatus strain was sequenced and compared to those of nine others. These analyses provided an accurate account of features associated with vaginal health and revealed a set of 1,224 gene families that were universally conserved across all the ten strains, and, most likely, also across the entire L. crispatus species. Importantly, this set of genes was shown to contain adhesion genes involved in the displacement of the bacterial vaginosis-associated Gardnerella vaginalis from vaginal cells and provided a molecular explanation for the inverse association between L. crispatus and G. vaginalis colonisation in the vagina. Taken together, the present study demonstrates the power of whole-genome sequencing and computer-assisted genome annotation in identifying genes that are involved in host-interactions and have industrial value. The discovery of gram-positive pili in L. rhamnosus GG and the mechanism by which L. crispatus excludes G. vaginalis from vaginal cells are both major steps forward in understanding the interaction between lactobacilli and host. We envisage that these findings together with the developed bioinformatics methods will aid the improvement of probiotic products and human health in the future.
  • Laakso, Marko (Helsingin yliopisto, 2007)
    This thesis presents a highly sensitive genome wide search method for recessive mutations. The method is suitable for distantly related samples that are divided into phenotype positives and negatives. High throughput genotype arrays are used to identify and compare homozygous regions between the cohorts. The method is demonstrated by comparing colorectal cancer patients against unaffected references. The objective is to find homozygous regions and alleles that are more common in cancer patients. We have designed and implemented software tools to automate the data analysis from genotypes to lists of candidate genes and to their properties. The programs have been designed in respect to a pipeline architecture that allows their integration to other programs such as biological databases and copy number analysis tools. The integration of the tools is crucial as the genome wide analysis of the cohort differences produces many candidate regions not related to the studied phenotype. CohortComparator is a genotype comparison tool that detects homozygous regions and compares their loci and allele constitutions between two sets of samples. The data is visualised in chromosome specific graphs illustrating the homozygous regions and alleles of each sample. The genomic regions that may harbour recessive mutations are emphasised with different colours and a scoring scheme is given for these regions. The detection of homozygous regions, cohort comparisons and result annotations are all subjected to presumptions many of which have been parameterized in our programs. The effect of these parameters and the suitable scope of the methods have been evaluated. Samples with different resolutions can be balanced with the genotype estimates of their haplotypes and they can be used within the same study.
  • Sharma, Vivek (Helsingin yliopisto, 2012)
    Heme-copper oxidases terminate the respiratory chain in many eukaryotes and prokaryotes as the final electron acceptors. They catalyze the reduction of molecular oxygen to water, and conserve the free-energy by proton pumping across the inner mitochondrial membrane or plasma membrane of bacteria. This leads to the generation of an electrochemical gradient across the membrane, which is utilized in the synthesis of ATP. The catalytic mechanism of oxidase is a complex coupling of electrons and protons, which has been studied with the help of numerous biophysical and biochemical methods. The superfamily of oxidases is classified into three different subfamilies; A-, B- and C-type. The A- and B-type oxidases have been studied in great depth, whereas relatively less is known about the molecular mechanism of distinct C-type (or cbb3-type) oxidases. The latter enzymes, which are known to possess unusually high oxygen affinity relative to the former class of enzymes, also share little sequence or structural similarity with the A- and B-type oxidases. In the work presented in this thesis, C-type oxidases have been studied using a variety of computational procedures, such as homology modeling, molecular dynamics simulations, density functional theory calculations and continuum electrostatics. Homology models of the C-type oxidase correctly predicts the side-chain orientation of the cross-linked tyrosine and a proton-channel. The active-site region is also modelled with high accuracy in the models, which are subsequently used in the DFT calculations. With the help of these calculations it is proposed that the different orientation of the cross-linked tyrosine, and a strong hydrogen bond in the proximal side of the high-spin heme are responsible for the higher apparent oxygen affinity and a more rhombic EPR signal in the C-type oxidases. Furthermore, the pKa profiles of two amino acid residues, which are located close to the active-site, suggest a strong electron-proton coupling and a unique proton pumping route. Molecular dynamics simulations on the two-subunit C-type oxidase allowed for the first time to observe redox state dependent water-chain formation in the protein interior, which can be utilized for the redox coupled proton transfer.
  • Ikäläinen, Suvi (Helsingin yliopisto, 2012)
    Theoretical examination of traditional nuclear magnetic resonance (NMR) parameters as well as novel quantities related to magneto-optic phenomena is carried out in this thesis for a collection of organic molecules. Electronic structure methods are employed, and reliable calculations involving large molecules and computationally demanding properties are made feasible through the use of completeness-optimized basis sets. In addition to introducing the foundations of NMR, a theory for the nuclear spin-induced optical rotation (NSOR) is formulated. In the NSOR, the plane of polarization of linearly polarized light is rotated by spin-polarized nuclei in an NMR sample as predicted by the Faraday effect. It has been hypothesized that this could be an advantageous alternative to traditional NMR detection. The opposite phenomenon, i.e., the laser-induced NMR splitting, is also investigated. Computational methods are discussed, including the method of completeness optimization. Nuclear shielding and spin-spin coupling are evaluated for hydrocarbon systems that simulate graphene nanoflakes, while the laser-induced NMR splitting is studied for hydrocarbons of increasing size in order to find molecules that may potentially interest the experimentalist. The NSOR is calculated for small organic systems with inequivalent nuclei to prove the existence of an optical chemical shift. The existence of the optical shift is verified in a combined experimental and computational study. Finally, relativistic effects on the size of the optical rotation are evaluated for xenon, and they are found to be significant. Completeness-optimized basis sets are used in all cases, and extensive analysis regarding the accuracy of results is made.
  • Kontkanen, Petri (Helsingin yliopisto, 2009)
    The Minimum Description Length (MDL) principle is a general, well-founded theoretical formalization of statistical modeling. The most important notion of MDL is the stochastic complexity, which can be interpreted as the shortest description length of a given sample of data relative to a model class. The exact definition of the stochastic complexity has gone through several evolutionary steps. The latest instantation is based on the so-called Normalized Maximum Likelihood (NML) distribution which has been shown to possess several important theoretical properties. However, the applications of this modern version of the MDL have been quite rare because of computational complexity problems, i.e., for discrete data, the definition of NML involves an exponential sum, and in the case of continuous data, a multi-dimensional integral usually infeasible to evaluate or even approximate accurately. In this doctoral dissertation, we present mathematical techniques for computing NML efficiently for some model families involving discrete data. We also show how these techniques can be used to apply MDL in two practical applications: histogram density estimation and clustering of multi-dimensional data.
  • Vasko, Kari (Helsingin yliopisto, 2004)
  • Ovaska, Kristian (Helsingin yliopisto, 2014)
    Cancers are a heterogeneous group of diseases that cause 7.6 million deaths yearly worldwide. At the cellular level, cancer is characterized by increased proliferation and invasion of tissue. These phenotypes are caused by environmental or inherited factors that increase the mutability of the genome, leading to dysregulation of a number of cellular processes. Identifying the genotypic changes and their phenotypic consequences is key to accurate diagnosis and prognosis, as well as improved treatment regimens. Cancer cells can be investigated at a genome-wide scale using high-throughput measurement techniques such as DNA sequencing and microarrays. These rapidly evolving technologies provide experimental data that have two challenging characteristics: the volume of data is large and data are structurally complex. These data need to be analyzed in an accurate and scalable manner to arrive at biomedically relevant conclusions. I have developed three computational methods for analyzing high-throughput genomic data, and applied the methods to experimental data from three cancers. The first computational method is an extensible workflow framework, Anduril, for organizing the overall software structure of an analysis in a scalable manner. The second method, SPINLONG, is a flexible algorithm for analyzing chromatin immunoprecipitation followed by deep sequencing (ChIP-seq) data from complex experimental designs, such as time series measurements of multiple markers. The third method, GROK, is used for preprocessing deep sequencing data. Its design is based on a mathematical formalism that provides a succinct language for these operations. The experimental part studies gene regulation and expression in glioblastoma multiforme, and breast and prostate cancer. The results demonstrate the applicability of the developed methods to cancer research and provide insights into the dysregulation of gene expression in cancer. All three studies use both cell line and clinical material to connect the molecular and disease outcome aspects of cancer. These experiments yield results at two conceptual levels. At the holistic level, lists of significant genes or genomic regions provide a genome-wide view into genomic alterations in cancer. At the specific level, we focus on one or a few central genes, which are experimentally validated, to provide an accessible starting point for understanding the results. Together, the thesis focuses on understanding the complexity of cancer and managing the complexity of genome-wide data.
  • Eronen, Lauri (Helsingin yliopisto, 2013)
    The context and motivation for this thesis is gene mapping, the discovery of genetic variants that affect susceptibility to disease. The goals of gene mapping research include understanding of disease mechanisms, evaluating individual disease risks and ultimately developing new medicines and treatments. Traditional genetic association mapping methods test each measured genetic variant independently for association with the disease. One way to improve the power of detecting disease-affecting variants is to base the tests on haplotypes, strings of adjacent variants that are inherited together, instead of individual variants. To enable haplotype analyses in large-scale association studies, this thesis introduces two novel statistical models and gives an efficient algorithm for haplotype reconstruction, jointly called HaloRec. HaploRec is based on modeling local regularities of variable length in the haplotypes of the studied population and using the obtained model to statistically reconstruct the most probable haplotypes for each studied individual. Our experiments demonstrate that HaploRec is especially well suited to data sets with a large number or markers and subjects, such as those typically used in currently popular genome-wide association studies. Public biological databases contain large amounts of data that can help in determining the relevance of putative associations. In this thesis, we introduce Biomine, a database and search engine that integrates data from several such databases under a uniform graph representation. The graph database is used to derive a general proximity measure for biological entities represented as graph nodes, based on a novel scheme of weighting individual graph edges based on their informativeness and type. The resulting proximity measure can be used as a basis for various data analysis tasks, such as ranking putative disease genes and visualization of gene relationships. Our experiments show that relevant disease genes can be identified from among the putative ones with a reasonable accuracy using Biomine. Best accuracy is obtained when a pre-known reference set of disease genes is available, but experiments using a novel clustering-based method demonstrate that putative disease genes can also be ranked without a reference set under suitable conditions. An important complementary use of Biomine is the search and visualization of indirect relationships between graph nodes, which can be used e.g. to characterize the relationship of putative disease genes to already known disease genes. We provide two methods for selecting subgraphs to be visualized: one based on weights of the edges on the paths connecting query nodes, and one based on using context free grammars to define the types of paths to be displayed. Both of these query interfaces to Biomine are available online.
  • Kollin, Jussi (Helsingin yliopisto, 2010)
    Large-scale chromosome rearrangements such as copy number variants (CNVs) and inversions encompass a considerable proportion of the genetic variation between human individuals. In a number of cases, they have been closely linked with various inheritable diseases. Single-nucleotide polymorphisms (SNPs) are another large part of the genetic variance between individuals. They are also typically abundant and their measuring is straightforward and cheap. This thesis presents computational means of using SNPs to detect the presence of inversions and deletions, a particular variety of CNVs. Technically, the inversion-detection algorithm detects the suppressed recombination rate between inverted and non-inverted haplotype populations whereas the deletion-detection algorithm uses the EM-algorithm to estimate the haplotype frequencies of a window with and without a deletion haplotype. As a contribution to population biology, a coalescent simulator for simulating inversion polymorphisms has been developed. Coalescent simulation is a backward-in-time method of modelling population ancestry. Technically, the simulator also models multiple crossovers by using the Counting model as the chiasma interference model. Finally, this thesis includes an experimental section. The aforementioned methods were tested on synthetic data to evaluate their power and specificity. They were also applied to the HapMap Phase II and Phase III data sets, yielding a number of candidates for previously unknown inversions, deletions and also correctly detecting known such rearrangements.
  • Palin, Kimmo (Helsingin yliopisto, 2007)
    This thesis presents methods for locating and analyzing cis-regulatory DNA elements involved with the regulation of gene expression in multicellular organisms. The regulation of gene expression is carried out by the combined effort of several transcription factor proteins collectively binding the DNA on the cis-regulatory elements. Only sparse knowledge of the 'genetic code' of these elements exists today. An automatic tool for discovery of putative cis-regulatory elements could help their experimental analysis, which would result in a more detailed view of the cis-regulatory element structure and function. We have developed a computational model for the evolutionary conservation of cis-regulatory elements. The elements are modeled as evolutionarily conserved clusters of sequence-specific transcription factor binding sites. We give an efficient dynamic programming algorithm that locates the putative cis-regulatory elements and scores them according to the conservation model. A notable proportion of the high-scoring DNA sequences show transcriptional enhancer activity in transgenic mouse embryos. The conservation model includes four parameters whose optimal values are estimated with simulated annealing. With good parameter values the model discriminates well between the DNA sequences with evolutionarily conserved cis-regulatory elements and the DNA sequences that have evolved neutrally. In further inquiry, the set of highest scoring putative cis-regulatory elements were found to be sensitive to small variations in the parameter values. The statistical significance of the putative cis-regulatory elements is estimated with the Two Component Extreme Value Distribution. The p-values grade the conservation of the cis-regulatory elements above the neutral expectation. The parameter values for the distribution are estimated by simulating the neutral DNA evolution. The conservation of the transcription factor binding sites can be used in the upstream analysis of regulatory interactions. This approach may provide mechanistic insight to the transcription level data from, e.g., microarray experiments. Here we give a method to predict shared transcriptional regulators for a set of co-expressed genes. The EEL (Enhancer Element Locator) software implements the method for locating putative cis-regulatory elements. The software facilitates both interactive use and distributed batch processing. We have used it to analyze the non-coding regions around all human genes with respect to the orthologous regions in various other species including mouse. The data from these genome-wide analyzes is stored in a relational database which is used in the publicly available web services for upstream analysis and visualization of the putative cis-regulatory elements in the human genome.
  • Pitkänen, Esa (Helsingin yliopisto, 2010)
    Metabolism is the cellular subsystem responsible for generation of energy from nutrients and production of building blocks for larger macromolecules. Computational and statistical modeling of metabolism is vital to many disciplines including bioengineering, the study of diseases, drug target identification, and understanding the evolution of metabolism. In this thesis, we propose efficient computational methods for metabolic modeling. The techniques presented are targeted particularly at the analysis of large metabolic models encompassing the whole metabolism of one or several organisms. We concentrate on three major themes of metabolic modeling: metabolic pathway analysis, metabolic reconstruction and the study of evolution of metabolism. In the first part of this thesis, we study metabolic pathway analysis. We propose a novel modeling framework called gapless modeling to study biochemically viable metabolic networks and pathways. In addition, we investigate the utilization of atom-level information on metabolism to improve the quality of pathway analyses. We describe efficient algorithms for discovering both gapless and atom-level metabolic pathways, and conduct experiments with large-scale metabolic networks. The presented gapless approach offers a compromise in terms of complexity and feasibility between the previous graph-theoretic and stoichiometric approaches to metabolic modeling. Gapless pathway analysis shows that microbial metabolic networks are not as robust to random damage as suggested by previous studies. Furthermore the amino acid biosynthesis pathways of the fungal species Trichoderma reesei discovered from atom-level data are shown to closely correspond to those of Saccharomyces cerevisiae. In the second part, we propose computational methods for metabolic reconstruction in the gapless modeling framework. We study the task of reconstructing a metabolic network that does not suffer from connectivity problems. Such problems often limit the usability of reconstructed models, and typically require a significant amount of manual postprocessing. We formulate gapless metabolic reconstruction as an optimization problem and propose an efficient divide-and-conquer strategy to solve it with real-world instances. We also describe computational techniques for solving problems stemming from ambiguities in metabolite naming. These techniques have been implemented in a web-based sofware ReMatch intended for reconstruction of models for 13C metabolic flux analysis. In the third part, we extend our scope from single to multiple metabolic networks and propose an algorithm for inferring gapless metabolic networks of ancestral species from phylogenetic data. Experimenting with 16 fungal species, we show that the method is able to generate results that are easily interpretable and that provide hypotheses about the evolution of metabolism.
  • Heinonen, Markus (Helsingin yliopisto, 2012)
    Metabolism is the system of chemical reactions sustaining life in the cells of living organisms. It is responsible for cellular processes that break down nutrients for energy and produce building blocks for necessary molecules. The study of metabolism is vital to many disciplines in medicine and pharmacy. Chemical reactions operate on small molecules called metabolites, which form the core of metabolism. In this thesis we propose efficient computational methods for small molecules in metabolic applications. In this thesis we discuss four distinctive studies covering two major themes: the atom-level description of biochemical reactions, and analysis of tandem mass spectrometric measurements of metabolites. In the first part we study atom-level descriptions of organic reactions. We begin by proposing an optimal algorithm for determining the atom-to-atom correspondences between the reactant and product metabolites of organic reactions. In addition, we introduce a graph edit distance based cost as the mathematical formalism to determine optimality of atom mappings. We continue by proposing a compact single-graph representation of reactions using the atom mappings. We investigate the utility of the new representation in a reaction function classification task, where a descriptive category of the reaction's function is predicted. To facilitate the prediction, we introduce the first feasible path-based graph kernel, which describes the reactions as path sequences to high classification accuracy. In the second part we turn our focus on analysing tandem mass spectrometric measurements of metabolites. In a tandem mass spectrometer, an input molecule structure is fragmented into substructures or fragments, whose masses are observed. We begin by studying the fragment identification problem. A combinatorial algorithm is presented to enumerate candidate substructures based on the given masses. We also demonstrate the usefulness of utilising approximated bond energies as a cost function to rank the candidate structures according to their chemical feasibility. We propose fragmentation tree models to describe the dependencies between fragments for higher identification accuracy. We continue by studying a closely related problem where an unknown metabolite is elucidated based on its tandem mass spectrometric fragment signals. This metabolite identification task is an important problem in metabolomics, underpinning the subsequent modelling and analysis efforts. We propose an automatic machine learning framework to predict a set of structural properties of the unknown metabolite. The properties are turned into candidate structures by a novel statistical model. We introduce the first mass spectral kernels and explore three feature classes to facilitate the prediction. The kernels introduce support for high-accuracy mass spectrometric measurements for enhanced predictive accuracy.
  • Vänskä, Tommy (Helsingin yliopisto, 2011)
    This thesis presents ab initio studies of two kinds of physical systems, quantum dots and bosons, using two program packages of which the bosonic one has mainly been developed by the author. The implemented models, \emph{i.e.}, configuration interaction (CI) and coupled cluster (CC) take the correlated motion of the particles into account, and provide a hierarchy of computational schemes, on top of which the exact solution, within the limit of the single-particle basis set, is obtained. The theory underlying the models is presented in some detail, in order to provide insight into the approximations made and the circumstances under which they hold. Some of the computational methods are also highlighted. In the final sections the results are summarized. The CI and CC calculations on multiexciton complexes in self-assembled semiconductor quantum dots are presented and compared, along with radiative and non-radiative transition rates. Full CI calculations on quantum rings and double quantum rings are also presented. In the latter case, experimental and theoretical results from the literature are re-examined and an alternative explanation for the reported photoluminescence spectra is found. The boson program is first applied on a fictitious model system consisting of bosonic electrons in a central Coulomb field for which CI at the singles and doubles level is found to account for almost all of the correlation energy. Finally, the boson program is employed to study Bose-Einstein condensates confined in different anisotropic trap potentials. The effects of the anisotropy on the relative correlation energy is examined, as well as the effect of varying the interaction potential.}
  • Karinen, Sirkku (Helsingin yliopisto, 2013)
    Phenotype is a collection of an organism's observable features that can be characterized both on individual level and on single cell level. Phenotypes are largely determined by their molecular processes which also explains their inheritance and plasticity. Some of the molecular background of phenotypes can be characterized by inherited genetic variations and alterations in gene expression. The high-throughput measurement technologies enable the measurement of molecular determinants in cells. However, measurement technologies produce remarkable large data sets and the research questions have become increasingly complex. Thus computational methods are needed to discover molecular mechanisms behind the phenotypes. In many cases, analysis of molecular determinants that contribute to the phenotype proceeds by first identifying putative candidates by using a priori information and high-throughput measurements. Then further analysis can focus on most promising molecules. In many cases, the aim is to identify relevant markers or targets from a set of candidate molecules. Often biomedical studies result in a long list of candidate genes, and to interpret these candidates, information on their context in cell functions is needed. This context information can give insight to synergistic effects of molecular machinery in cells when functions of individual molecules do not explain the observed phenotype. In addition, the context information can be used to generate candidates. One of the methods in this thesis provides a computational data integration method that provides a link in between candidate genes from molecular pathways and genetic variants. It uses publicly available biological knowledge bases to systematically create functional context of candidate genes. This approach is especially important when studying cancer, that is dependent of complex molecular signaling. Genotypes associated with inherited disease predispositions have been studied successfully in the past, however, traditional methods are not applicable in wide variety of analysis conditions. Thus, this thesis introduces a method that uses haplotype sharing to identify genetic loci inherited by multiple distantly related individuals. It is flexible and can be used in various settings, also with very limited number of samples. Increasing the number of biological replicates in gene expression analysis increases the reliability of the results. In many cases, however, the number of samples is limited. Therefore, pooling gene expression data from multiple published studies can increase the understanding of the molecular background behind cell types. This is shown in this thesis by an analysis that identifies gene expression differences in two cell types using publicly available gene expression samples from previous studies. Finally, when candidate molecules are available to characterize phenotypes, they can be compiled into biomarkers. In many cases, a combination of multiple molecules serves as a better biomarker than a single molecule. This thesis also includes a machine learning approach that is used to discover a classifier that predicts the phenotype.