Browsing by Subject "NoSQL"

Sort by: Order: Results:

Now showing items 1-2 of 2
  • Heinonen, Jyrki (Helsingin yliopisto, 2020)
    Conventional Data warehouse main theme is ’single version of truth’ with either dimensional modeling option or normalized 3NF modeling. These both techniques have issues because on the way to data warehouse data is cleansed/transformed and data ends up changed, hence loosing information. Data Vault modeling - as response to these issues - is detail oriented and tracks history keeping the audit trail intact. This means we have ’single version of facts’ or ’all the data, all of the time’. Data Vault methodology and architecture can handle Big Data and NoSQL, which are also covered in this work on the Data Lake section. Data Lake tools have evolved strongly during the last decade and response to the ever expanding data amounts using distributed computing tactics. Data Lake can also ingest different types of structured, semi-structured and unstructured data. Data warehouse (and Data Lake) processing is moving from on-premises server rooms to the cloud data centers. Specifically Apache and Google have developed and inspired a lot of new tools, which can process data warehouse data on petabyte-scale. Now the challenge is that not only operational systems generate data to data warehouse but also huge amounts of machine-generated data has to be processed and analyzed on these practically infinitely scalable platforms. Data warehouse solution has to cover also machine-learning requirements. So the modernization of data warehouse is not over but still all these methodologies, architectures and tools are in use. The trick is to choose the right tool for the right job.
  • Lindström, Olli-Pekka (Helsingin yliopisto, 2021)
    Until recently, database management systems focused on the relational model, in which data are organized into tables with columns and rows. Relational databases are known for the widely standardized Structured Query Language (SQL), transaction processing, and strict data schema. However, with the introduction of Big Data, relational databases became too heavy for some use cases. In response, NoSQL databases were developed. The four best-known categories of NoSQL databases are key-value, document, column family, and graph databases. NoSQL databases impose fewer data consistency control measures to make processing more efficient. NoSQL databases haven’t replaced SQL databases in the industry. Many legacy applications still use SQL databases, and newer applications also often require the more strict and secure data processing of SQL databases. This is where the idea of SQL and NoSQL integration comes in. There are two mainstream approaches to combine the benefits of SQL and NoSQL databases:multi-model databases and polyglot persistence. Multi-model databases are database management systems that store and process data in multiple different data models under the same engine. Polyglot persistence refers to the principle of building a system architecture that uses different kinds of database engines to store data. Systems implementing the polyglot persistence principle are called polystores. This thesis introduces SQL and NoSQL databases and their two main integration strategies: multi-model databases and polyglot persistence. Some representative multi-model databases and polystores are introduced. In conclusion, some challenges and future research directions for multi-model databases and polyglot persistence are introduced and discussed.