Browsing by Subject "Visualization"

Sort by: Order: Results:

Now showing items 1-10 of 10
  • Kang, Bo; Puolamäki, Kai; Lijffijt, Jefrey; Bie, Tijl de (2020)
    Data visualization and iterative/interactive data mining are growing rapidly in attention, both in research as well as in industry. However, while there are a plethora of advanced data mining methods and lots of works in the field of visualization, integrated methods that combine advanced visualization and/or interaction with data mining techniques in a principled way are rare. We present a framework based on constrained randomization which lets users explore high-dimensional data via 'subjectively informative' two-dimensional data visualizations. The user is presented with 'interesting' projections, allowing users to express their observations using visual interactions that update a background model representing the user's belief state. This background model is then considered by a projection-finding algorithm employing data randomization to compute a new 'interesting' projection. By providing users with information that contrasts with the background model, we maximize the chance that the user encounters striking new information present in the data. This process can be iterated until the user runs out of time or until the difference between the randomized and the real data is insignificant. We present two case studies, one controlled study on synthetic data and another on census data, using the proof-of-concept tool SIDE that demonstrates the presented framework.
  • He, Chen; Micallef, Luana; He, Liye; Peddinti, Gopal; Aittokallio, Tero; Jacucci, Giulio (2021)
    Understanding the quality of insight has become increasingly important with the trend of allowing users to post comments during visual exploration, yet approaches for qualifying insight are rare. This article presents a case study to investigate the possibility of characterizing the quality of insight via the interactions performed. To do this, we devised the interaction of a visualization tool—MediSyn—for insight generation. MediSyn supports five types of interactions: selecting, connecting, elaborating, exploring, and sharing. We evaluated MediSyn with 14 participants by allowing them to freely explore the data and generate insights. We then extracted seven interaction patterns from their interaction logs and correlated the patterns to four aspects of insight quality. The results show the possibility of qualifying insights via interactions. Among other findings, exploration actions can lead to unexpected insights; the drill-down pattern tends to increase the domain values of insights. A qualitative analysis shows that using domain knowledge to guide exploration can positively affect the domain value of derived insights. We discuss the study’s implications, lessons learned, and future research opportunities.
  • Nissilä, Viivi (Helsingin yliopisto, 2020)
    Origin-Destination (OD) data is a crucial part of price estimation in the aviation industry, and an OD flight is any number of flights a passenger takes in a single journey. OD data is a complex set of data that is both flow and multidimensional type of data. In this work, the focus is to design interactive visualization techniques to support user exploration of OD data. The thesis work aims to find which of the two menu designs suit better for OD data visualization: breadth-first or depth-first menu design. The two menus follow Schneiderman’s Task by Data Taxonomy, a broader version of the Information Seeking Mantra. The first menu design is a parallel, breadth-first menu layout. The layout shows the variables in an open layout and is closer to the original data matrix. The second menu design is a hierarchical, depth-first layout. This layout is derived from the semantics of the data and is more compact in terms of screen space. The two menu designs are compared in an online survey study conducted with the potential end users. The results of the online survey study are inconclusive, and therefore are complemented with an expert review. Both the survey study and expert review show that the Sankey graph is a good visualization type for this work, but the interaction of the two menu designs requires further improvements. Both of the menu designs received positive and negative feedback in the expert review. For future work, a solution that combines the positives of the two designs could be considered. ACM Computing Classification System (CCS): Human-Centered Computing → Visualization → Empirical Studies in Visualization Human-centered computing → Interaction design → Interaction design process and methods → Interface design prototyping
  • Urpa, Lea M.; Anders, Simon (2019)
    BackgroundVisualization is an important tool for generating meaning from scientific data, but the visualization of structures in high-dimensional data (such as from high-throughput assays) presents unique challenges. Dimension reduction methods are key in solving this challenge, but these methods can be misleading- especially when apparent clustering in the dimension-reducing representation is used as the basis for reasoning about relationships within the data.ResultsWe present two interactive visualization tools, distnet and focusedMDS, that help in assessing the validity of a dimension-reducing plot and in interactively exploring relationships between objects in the data. The distnet tool is used to examine discrepancies between the placement of points in a two dimensional visualization and the points' actual similarities in feature space. The focusedMDS tool is an intuitive, interactive multidimensional scaling tool that is useful for exploring the relationships of one particular data point to the others, that might be useful in a personalized medicine framework.ConclusionsWe introduce here two freely available tools for visually exploring and verifying the validity of dimension-reducing visualizations and biological information gained from these. The use of such tools can confirm that conclusions drawn from dimension-reducing visualizations are not simply artifacts of the visualization method, but are real biological insights.
  • Urpa, Lea M; Anders, Simon (BioMed Central, 2019)
    Abstract Background Visualization is an important tool for generating meaning from scientific data, but the visualization of structures in high-dimensional data (such as from high-throughput assays) presents unique challenges. Dimension reduction methods are key in solving this challenge, but these methods can be misleading- especially when apparent clustering in the dimension-reducing representation is used as the basis for reasoning about relationships within the data. Results We present two interactive visualization tools, distnet and focusedMDS, that help in assessing the validity of a dimension-reducing plot and in interactively exploring relationships between objects in the data. The distnet tool is used to examine discrepancies between the placement of points in a two dimensional visualization and the points’ actual similarities in feature space. The focusedMDS tool is an intuitive, interactive multidimensional scaling tool that is useful for exploring the relationships of one particular data point to the others, that might be useful in a personalized medicine framework. Conclusions We introduce here two freely available tools for visually exploring and verifying the validity of dimension-reducing visualizations and biological information gained from these. The use of such tools can confirm that conclusions drawn from dimension-reducing visualizations are not simply artifacts of the visualization method, but are real biological insights.
  • Topa, Hande; Honkela, Antti (2018)
    Background: Genome-wide high-throughput sequencing (HIS) time series experiments are a powerful tool for monitoring various genomic elements over time. They can be used to monitor, for example, gene or transcript expression with RNA sequencing (RNA-seq), DNA methylation levels with bisulfite sequencing (BS-seq), or abundances of genetic variants in populations with pooled sequencing (Pool-seq). However, because of high experimental costs, the time series data sets often consist of a very limited number of time points with very few or no biological replicates, posing challenges in the data analysis. Results: Here we present the GPrank R package for modelling genome-wide time series by incorporating variance information obtained during pre-processing of the HIS data using probabilistic quantification methods or from a beta-binomial model using sequencing depth. GPrank is well-suited for analysing both short and irregularly sampled time series. It is based on modelling each time series by two Gaussian process (GP) models, namely, time-dependent and time-independent GP models, and comparing the evidence provided by data under two models by computing their Bayes factor (BF). Genomic elements are then ranked by their BFs, and temporally most dynamic elements can be identified. Conclusions: Incorporating the variance information helps GPrank avoid false positives without compromising computational efficiency. Fitted models can be easily further explored in a browser. Detection and visualisation of temporally most active dynamic elements in the genome can provide a good starting point for further downstream analyses for increasing our understanding of the studied processes.
  • Topa, Hande; Honkela, Antti (BioMed Central, 2018)
    Abstract Background Genome-wide high-throughput sequencing (HTS) time series experiments are a powerful tool for monitoring various genomic elements over time. They can be used to monitor, for example, gene or transcript expression with RNA sequencing (RNA-seq), DNA methylation levels with bisulfite sequencing (BS-seq), or abundances of genetic variants in populations with pooled sequencing (Pool-seq). However, because of high experimental costs, the time series data sets often consist of a very limited number of time points with very few or no biological replicates, posing challenges in the data analysis. Results Here we present the GPrank R package for modelling genome-wide time series by incorporating variance information obtained during pre-processing of the HTS data using probabilistic quantification methods or from a beta-binomial model using sequencing depth. GPrank is well-suited for analysing both short and irregularly sampled time series. It is based on modelling each time series by two Gaussian process (GP) models, namely, time-dependent and time-independent GP models, and comparing the evidence provided by data under two models by computing their Bayes factor (BF). Genomic elements are then ranked by their BFs, and temporally most dynamic elements can be identified. Conclusions Incorporating the variance information helps GPrank avoid false positives without compromising computational efficiency. Fitted models can be easily further explored in a browser. Detection and visualisation of temporally most active dynamic elements in the genome can provide a good starting point for further downstream analyses for increasing our understanding of the studied processes.
  • Lüders, C. M.; Raatikainen, M.; Motger, J.; Maalej, W. (IEEE, 2019)
    Proceedings of the ... IEEE International Symposium on Requirements Engineering
  • Walkowski, Slawomir; Lundin, Mikael; Szymas, Janusz; Lundin, Johan (2014)
  • Miyakita, Goki; Leskinen, Petri; Hyvönen, Eero Antero (Springer, 2018)
    Lecture notes in computer science
    This paper shows how biographical registries can be represented as Linked Data, enriched by data linking to related data sources, and used in Digital Humanities. As a use case, a database of 11 987 historical U.S. Congress Legislators in 1789–2018 was transformed into a knowledge graph. The data was published as a Linked Data service, including a SPARQL endpoint, on top of which tools for biographical and prosopographical research are implemented. A faceted browser named U.S. Congress Prosopographer with visualization tools for knowledge discovery is presented to provide new insights in political history.