Browsing by Subject "COLLECTIONS"

Sort by: Order: Results:

Now showing items 1-2 of 2
  • Tupasela, Aaro (2021)
    The sharing, circulation, distribution, and use of human tissue samples and related data have become a major political and scientific pre-occupation during the past two decades. In the age of big data, the political, scientific, and economic momentum around the need to increasingly collect and collate massive amounts of data has intensified. At the same time, the control and sharing of samples and data have become increasingly strategic in positioning biobanks within the global biomedical research market. Numerous commentators have identified several reasons why and with whom biobanks choose to share. Despite intensified efforts to encourage sharing within networks, there are still actors who have not embraced the values of sharing. The term 'data hugging' is introduced as a form of data work through which value is generated but sharing as a practice is not exercised according to community expectations. Data hugging is a term used within the biobanking community to describe the practice of withholding samples or data from other network members. While some biobankers consider data hugging to be an impediment to efficient and responsible science, it can also be another way of generating value in an otherwise challenging value creation environment. European biobanking policies, as well as the biobanking community, need a better understanding of these value-generating practices in relation to the life cycle of the biobank.
  • Valenzuela, Daniel; Norri, Tuukka; Välimäki, Niko; Pitkänen, Esa; Mäkinen, Veli (2018)
    Background: Typical human genome differs from the reference genome at 4-5 million sites. This diversity is increasingly catalogued in repositories such as ExAC/gnomAD, consisting of >15,000 whole-genomes and >126,000 exome sequences from different individuals. Despite this enormous diversity, resequencing data workflows are still based on a single human reference genome. Identification and genotyping of genetic variants is typically carried out on short-read data aligned to a single reference, disregarding the underlying variation. Results: We propose a new unified framework for variant calling with short-read data utilizing a representation of human genetic variation - a pan-genomic reference. We provide a modular pipeline that can be seamlessly incorporated into existing sequencing data analysis workflows. Our tool is open source and available online: https://gitlab.com/dvalenzu/PanVC. Conclusions: Our experiments show that by replacing a standard human reference with a pan-genomic one we achieve an improvement in single-nucleotide variant calling accuracy and in short indel calling accuracy over the widely adopted Genome Analysis Toolkit (GATK) in difficult genomic regions.