Browsing by Subject "SOFTWARE"

Sort by: Order: Results:

Now showing items 1-20 of 25
  • Izzo, Massimiliano; Mortola, Francesco; Arnulfo, Gabriele; Fato, Marco M.; Varesio, Luigi (2014)
    Motivation: Molecular biology laboratories require extensive metadata to improve data collection and analysis. The heterogeneity of the collected metadata grows as research is evolving in to international multi-disciplinary collaborations and increasing data sharing among institutions. Single standardization is not feasible and it becomes crucial to develop digital repositories with flexible and extensible data models, as in the case of modern integrated biobanks management. Results: We developed a novel data model in JSON format to describe heterogeneous data in a generic biomedical science scenario. The model is built on two hierarchical entities: processes and events, roughly corresponding to research studies and analysis steps within a single study. A number of sequential events can be grouped in a process building up a hierarchical structure to track patient and sample history. Each event can produce new data. Data is described by a set of user-defined metadata, and may have one or more associated files. We integrated the model in a web based digital repository with a data grid storage to manage large data sets located in geographically distinct areas. We built a graphical interface that allows authorized users to define new data types dynamically, according to their requirements. Operators compose queries on metadata fields using a flexible search interface and run them on the database and on the grid. We applied the digital repository to the integrated management of samples, patients and medical history in the BIT-Gaslini biobank. The platform currently manages 1800 samples of over 900 patients. Microarray data from 150 analyses are stored on the grid storage and replicated on two physical resources for preservation. The system is equipped with data integration capabilities with other biobanks for worldwide information sharing. Conclusions: Our data model enables users to continuously define flexible, ad hoc, and loosely structured metadata, for information sharing in specific research projects and purposes. This approach can improve sensitively interdisciplinary research collaboration and allows to track patients' clinical records, sample management information, and genomic data. The web interface allows the operators to easily manage, query, and annotate the files, without dealing with the technicalities of the data grid.
  • Pratas, Diogo; Toppinen, Mari; Pyöriä, Lari; Hedman, Klaus; Sajantila, Antti; Perdomo, Maria F. (2020)
    Background: Advances in sequencing technologies have enabled the characterization of multiple microbial and host genomes, opening new frontiers of knowledge while kindling novel applications and research perspectives. Among these is the investigation of the viral communities residing in the human body and their impact on health and disease. To this end, the study of samples from multiple tissues is critical, yet, the complexity of such analysis calls for a dedicated pipeline. We provide an automatic and efficient pipeline for identification, assembly, and analysis of viral genomes that combines the DNA sequence data from multiple organs. TRACESPipe relies on cooperation among 3 modalities: compression-based prediction, sequence alignment, and de novo assembly. The pipeline is ultra-fast and provides, additionally, secure transmission and storage of sensitive data. Findings: TRACESPipe performed outstandingly when tested on synthetic and ex vivo datasets, identifying and reconstructing all the viral genomes, including those with high levels of single-nucleotide polymorphisms. It also detected minimal levels of genomic variation between different organs. Conclusions: TRACESPipe's unique ability to simultaneously process and analyze samples from different sources enables the evaluation of within-host variability. This opens up the possibility to investigate viral tissue tropism, evolution, fitness, and disease associations. Moreover, additional features such as DNA damage estimation and mitochondrial DNA reconstruction and analysis, as well as exogenous-source controls, expand the utility of this pipeline to other fields such as forensics and ancient DNA studies. TRACESPipe is released under GPLv3 and is available for free download at https://github.com/viromelab/tracespipe.
  • Moilanen, Atte; Kujala, Heini; Mikkonen, Ninni (2020)
    Biodiversity offsetting is a tool to balance ecological damage caused by human activity with new benefits created elsewhere. Offsetting is implemented by protecting, restoring or managing sufficiently large areas of habitat. While there are concerns about the true feasibility of offsetting, they are becoming a common policy tool world-wide. Operationally uncomplicated, quantitative approaches to spatial analysis of offsets are rare and their use is often restricted by the availability of suitable spatial data. We describe a practical method for offsets that builds upon two layers of relatively easily sourced spatial data, a balanced spatial priority ranking and a weighted range size rarity map. Together with (a) spatial information about impact and offset areas, and (b) extra parameters for the effectiveness of avoided loss and the amount of leakage expected, we can evaluate whether the proposed offset exchange represents a credible no net loss or net positive impact with an upward trade. The priority ranking and range size rarity maps can be produced in various ways, most notably using existing conservation planning tools. Here we used the standard outputs of the Zonation spatial prioritization software. We illustrate the method and associated visualization in the context of offsetting of boreal forests in Finland, where forests experience high and increasing pressures from forestry and bioenergy sectors. The example is timely as there is political demand for the uptake of biodiversity offset policies in Finland, and boreal forests are the most common biotope. The methods described here are applicable to biomes around the world. The described tools are made available as r scripts that utilize standard Zonation outputs, thus providing direct linkage to any past or future Zonation applications. As a limitation, the present methods only apply to avoided loss offsets.
  • Barylski, Jacub; Enault, François; Dutilh, Bas E.; Schuller, Margo B.P.; Edwards, Robert A.; Gillis, Annika; Klumpp, Jochen; Knezevic, Petar; Krupovic, Mart; Kuhn, Jens H.; Lavigne, Rob; Oksanen, Hanna M; Sullivan, Matthew B.; Jang, Ho Bin; Simmonds, Peter; Aiewsakun, Pakorn; Wittmann, Johannes; Tolstoy, Igor; Brister, J. Rodney; Kropinki, Andrew; Adriaenssens, Evelien M. (2020)
    Tailed bacteriophages are the most abundant and diverse viruses in the world, with genome sizes ranging from 10 kbp to over 500 kbp. Yet, due to historical reasons, all this diversity is confined to a single virus order-Caudovirales, composed of just four families: Myoviridae, Siphoviridae, Podoviridae, and the newly created Ackermannviridae family. In recent years, this morphology-based classification scheme has started to crumble under the constant flood of phage sequences, revealing that tailed phages are even more genetically diverse than once thought. This prompted us, the Bacterial and Archaeal Viruses Subcommittee of the International Committee on Taxonomy of Viruses (ICTV), to consider overall reorganization of phage taxonomy. In this study, we used a wide range of complementary methods-including comparative genomics, core genome analysis, and marker gene phylogenetics-to show that the group of Bacillus phage SPO1-related viruses previously classified into the Spounavirinae subfamily, is clearly distinct from other members of the family Myoviridae and its diversity deserves the rank of an autonomous family. Thus, we removed this group from the Myoviridae family and created the family Herelleviridae-a new taxon of the same rank. In the process of the taxon evaluation, we explored the feasibility of different demarcation criteria and critically evaluated the usefulness of our methods for phage classification. The convergence of results, drawing a consistent and comprehensive picture of a new family with associated subfamilies, regardless of method, demonstrates that the tools applied here are particularly useful in phage taxonomy. We are convinced that creation of this novel family is a crucial milestone toward much-needed reclassification in the Caudovirales order.
  • Cheng, Lu; Connor, Thomas R.; Aanensen, David M.; Spratt, Brian G.; Corander, Jukka (2011)
  • Rodriguez-Becerra, Jorge; Cáceres-Jensen, Lizethly; Díaz, Tatiana; Druker, Sofía; Bahamonde Padilla, Victor; Pernaa, Johannes; Aksela, Maija (2020)
    The purpose of this descriptive case study was to develop pre-service chemistry teachers’ Technological Pedagogical Science Knowledge (TPASK) through novel computational chemistry modules. The study consisted of two phases starting with designing a computational chemistry based learning environment followed by a case study where students’ perceptions towards educational computational chemistry were explored. First, we designed an authentic research-based chemistry learning module that supported problem-based learning through the utilization of computational chemistry methods suitable for pre-service chemistry education. The objective of the learning module was to promote learning of specific chemistry knowledge and development of scientific skills. Systematic design decisions were made through the TPASK framework. The learning module was designed for a third-year physical chemistry course taken by pre-service chemistry teachers in Chile. After the design phase, the learning module was implemented in a course and students’ perceptions were gathered using semi-structured group interviews. The sample consisted of 22 pre-service chemistry teachers. Data were analyzed through qualitative content analysis using the same TPASK framework employed in the learning module design. Based on our findings, pre-service chemistry teachers first acquired Technological Scientific Knowledge (TSK) and then developed some elements of their TPASK. In addition, they highly appreciated the combination of student-centred problem-based learning and the use of computational chemistry tools. Students felt the educational computational learning environment supported their own knowledge acquisition and expressed an interest in applying similar learning environments in their future teaching careers. This case study demonstrates that learning through authentic real-world problems using educational computational methods offers great potential in supporting pre-service teachers’ instruction in the science of chemistry and pedagogy. For further research in the TPASK framework, we propose there would be significant benefit from developing additional learning environments of this nature and evaluating their utility in pre-service and in-service chemistry teacher’s education.
  • Broman, Elias; Asmala, Eero; Carstensen, Jacob; Pinhassi, Jarone; Dopson, Mark (2019)
    Coastal zones are important transitional areas between the land and sea, where both terrestrial and phytoplankton supplied dissolved organic matter (DOM) are respired or transformed. As climate change is expected to increase river discharge and water temperatures, DOM from both allochthonous and autochthonous sources is projected to increase. As these transformations are largely regulated by bacteria, we analyzed microbial community structure data in relation to a 6-month long time-series dataset of DOM characteristics from Roskilde Fjord and adjacent streams, Denmark. The results showed that the microbial community composition in the outer estuary (closer to the sea) was largely associated with salinity and nutrients, while the inner estuary formed two clusters linked to either nutrients plus allochthonous DOM or autochthonous DOM characteristics. In contrast, the microbial community composition in the streams was found to be mainly associated with allochthonous DOM characteristics. A general pattern across the land-to-sea interface was that Betaproteobacteria were strongly associated with humic-like DOM [operational taxonomic units (OTUs) belonging to family Comamonadaceae], while distinct populations were instead associated with nutrients or abiotic variables such as temperature (Cyanobacteria genus Synechococcus) and salinity (Actinobacteria family Microbacteriaceae). Furthermore, there was a stark shift in the relative abundance of OTUs between stream and marine stations. This indicates that as DOM travels through the land-to-sea interface, different bacterial guilds continuously degrade it.
  • Toth, Timea; Balassa, Tamas; Bara, Norbert; Kovacs, Ferenc; Kriston, Andras; Molnar, Csaba; Haracska, Lajos; Sukosd, Farkas; Horvath, Peter (2018)
    To answer major questions of cell biology, it is often essential to understand the complex phenotypic composition of cellular systems precisely. Modern automated microscopes produce vast amounts of images routinely, making manual analysis nearly impossible. Due to their efficiency, machine learningbased analysis software have become essential tools to perform single-cell-level phenotypic analysis of large imaging datasets. However, an important limitation of such methods is that they do not use the information gained from the cellular micro-and macroenvironment: the algorithmic decision is based solely on the local properties of the cell of interest. Here, we present how various features from the surrounding environment contribute to identifying a cell and how such additional information can improve single-cell-level phenotypic image analysis. The proposed methodology was tested for different sizes of Euclidean and nearest neighbour-based cellular environments both on tissue sections and cell cultures. Our experimental data verify that the surrounding area of a cell largely determines its entity. This effect was found to be especially strong for established tissues, while it was somewhat weaker in the case of cell cultures. Our analysis shows that combining local cellular features with the properties of the cell's neighbourhood significantly improves the accuracy of machine learning-based phenotyping.
  • Tonkin-Hill, Gerry; Lees, John A.; Bentley, Stephen D.; Frost, Simon D. W.; Corander, Jukka (2019)
    We present fastbaps, a fast solution to the genetic clustering problem. Fastbaps rapidly identifies an approximate fit to a Dirichlet process mixture model (DPM) for clustering multilocus genotype data. Our efficient model-based clustering approach is able to cluster datasets 10-100 times larger than the existing model-based methods, which we demonstrate by analyzing an alignment of over 110 000 sequences of HIV-1 pol genes. We also provide a method for rapidly partitioning an existing hierarchy in order to maximize the DPM model marginal likelihood, allowing us to split phylogenetic trees into clades and sub-clades using a population genomic model. Extensive tests on simulated data as well as a diverse set of real bacterial and viral datasets show that fastbaps provides comparable or improved solutions to previous model-based methods, while being significantly faster. The method is made freely available under an open source MIT licence as an easy to use R package at https://github.com/gtonkinhill/fastbaps.
  • Milovanov, Alexander; Zvyagin, Andrey; Daniyarov, Asset; Kalendar, Ruslan; Troshin, Leonid (2019)
    Cultivated grapevine (Vitis vinifera L. ssp. sativa D.C.) is one of the oldest agricultural crops, each variety comprising an array of clones obtained by vegetative propagation from a selected vine grown from a single seedling. Most clones within a variety are identical, but some show a different form of accession, giving rise to new divergent phenotypes. Understanding the associations among the genotypes within a variety is crucial to efficient management and effective grapevine improvement. Inter-primer binding-site (iPBS) markers may aid in determining the new clones inside closely related genotypes. Following this idea, iPBS markers were used to assess the genetic variation of 33 grapevine genotypes collected from Russia. We used molecular markers to identify the differences among and within five grapevine clonal populations and analysed the variation, using clustering and statistical approaches. Four of a total of 30 PBS primers were selected, based on amplification efficiency. Polymerase chain reaction (PCR) with PBS primers resulted in a total of 1412 bands ranging from 300 to 6000 bp, with a polymorphism ratio of 44%, ranging from 58 to 75 bands per group. In total, were identified seven private bands in 33 genotypes. Results of molecular variance analysis showed that 40% of the total variation was observed within groups and only 60% between groups. Cluster analysis clearly showed that grapevine genotypes are highly divergent and possess abundant genetic diversities. The iPBS PCR-based genome fingerprinting technology used in this study effectively differentiated genotypes into five grapevine groups and indicated that iPBS markers are useful tools for clonal selection. The number of differences between clones was sufficient to identify them as separate clones of studied varieties containing unique mutations. Our previous phenotypic and phenological studies have confirmed that these genotypes differ from those of maternal plants. This work emphasized the need for a better understanding of the genotypic differences among closely related varieties of grapevine and has implications for the management of its selection processes.
  • Qvist, Laura; Niskanen, Markku; Mannermaa, Kristiina; Wutke, Saskia; Aspi, Jouni (2019)
    Background: The Finnhorse was established as a breed more than 110 years ago by combining local Finnish landraces. Since its foundation, the breed has experienced both strong directional selection, especially for size and colour, and severe population bottlenecks that are connected with its initial foundation and subsequent changes in agricultural and forestry practices. Here, we used sequences of the mitochondrial control region and genomic single nucleotide polymorphisms (SNPs) to estimate the genetic diversity and differentiation of the four Finnhorse breeding sections: trotters, pony-sized horses, draught horses and riding horses. Furthermore, we estimated inbreeding and effective population sizes over time to infer the history of this breed. Results: We found a high level of mitochondrial genetic variation and identified 16 of the 18 haplogroups described in present-day horses. Interestingly, one of these detected haplogroups was previously reported only in the Przewalski’s horse. Female effective population sizes were in the thousands, but declines were evident at the times when the breed and its breeding sections were founded. By contrast, nuclear variation and effective population sizes were small (approximately 50). Nevertheless, inbreeding in Finnhorses was lower than in many other horse breeds. Based on nuclear SNP data, genetic differentiation among the four breeding sections was strongest between the draught horses and the three other sections (FST=0.007–0.018), whereas based on mitochondrial DNA data, it was strongest between the trotters and the pony-sized and riding horses (ΦST= 0.054–0.068). Conclusions: The existence of a Przewalski’s horse haplogroup in the Finnhorse provides new insights into the domestication of the horse, and this finding supports previous suggestions of a close relationship between the Finnhorse and eastern primitive breeds. The high level of mitochondrial DNA variation in the Finnhorse supports its domestication from a large number of mares but also reflects that its founding depended on many local landraces. Although inbreeding in Finnhorses was lower than in many other horse breeds, the small nuclear effective popula- tion sizes of each of its breeding sections can be considered as a warning sign, which warrants changes in breeding practices.
  • Almeida, Joao R.; Pinho, Armando J.; Oliveira, Jose L.; Fajarda, Olga; Pratas, Diogo (2020)
    Next-generation sequencing triggered the production of a massive volume of publicly available data and the development of new specialised tools. These tools are dispersed over different frameworks, making the management and analyses of the data a challenging task. Additionally, new targeted tools are needed, given the dynamics and specificities of the field. We present GTO, a comprehensive toolkit designed to unify pipelines in genomic and proteomic research, which combines specialised tools for analysis, simulation, compression, development, visualisation, and transformation of the data. This toolkit combines novel tools with a modular architecture, being an excellent platform for experimental scientists, as well as a useful resource for teaching bioinformatics enquiry to students in life sciences. GTO is implemented in C language and is available, under the MIT license, at https://bioinformatics.ua.pt/gto. (C) 2020 The Authors. Published by Elsevier B.V.
  • García-Fernández, Alfredo; Manzano, Pablo; Seoane, Javier; Azcárate, Francisco M.; Iriondo, Jose M.; Peco, Begoña (2019)
    Habitat fragmentation is one of the greatest threats to biodiversity conservation and ecosystem productivity mediated by direct human impact. Its consequences include genetic depauperation, comprising phenomena such as inbreeding depression or reduction in genetic diversity. While the capacity of wild and domestic herbivores to sustain long-distance seed dispersal has been proven, the impact of herbivore corridors in plant population genetics remains to be observed. We conducted this study in the Conquense Drove Road in Spain, where sustained use by livestock over centuries has involved transhumant herds passing twice a year en route to winter and summer pastures. We compared genetic diversity and inbreeding coefficients of Plantago lagopus populations along the drove road with populations in the surrounding agricultural matrix, at varying distances from human settlements. We observed significant differences in coefficients of inbreeding between the drove road and the agricultural matrix, as well as significant trends indicative of higher genetic diversity and population nestedness around human settlements. Trends for higher genetic diversity along drove roads may be present, although they were only marginally significant due to the available sample size. Our results illustrate a functional landscape with human settlements as dispersal hotspots, while the findings along the drove road confirm its role as a pollinator reservoir observed in other studies. Drove roads may possibly also function as linear structures that facilitate long-distance dispersal across the agricultural matrix, while local P. lagopus populations depend rather on short-distance seed dispersal. These results highlight the role of herbivore corridors for conserving the migration capacity of plants, and contribute towards understanding the role of seed dispersal and the spread of invasive species related to human activities.
  • Ghonaim, Marwa; Kalendar, Ruslan; Barakat, Hoda; Elsherif, Nahla; Ashry, Naglaa; Schulman, Alan (2020)
    Maize is one of the world’s most important crops and a model for grass genome research. Long terminal repeat (LTR) retrotransposons comprise most of the maize genome; their ability to produce new copies makes them efficient high-throughput genetic markers. Inter-Retrotransposon-Amplified Polymorphisms (IRAPs) were used to study the genetic diversity of maize germplasm. Five LTR retrotransposons (Huck, Tekay, Opie, Ji, and Grande) were chosen, based on their large number of copies in the maize genome, whereas polymerase chain reaction primers were designed based on consensus LTR sequences. The LTR primers showed high quality and reproducible DNA fingerprints, with a total of 677 bands including 392 polymorphic bands showing 58% polymorphism between maize hybrid lines. These markers were used to identify genetic similarities among all lines of maize. Analysis of genetic similarity was carried out based on polymorphic amplicon profiles and genetic similarity phylogeny analysis. This diversity was expected to display ecogeographical patterns of variation and local adaptation. The clustering method showed that the varieties were grouped into three clusters differing in ecogeographical origin. Each of these clusters comprised divergent hybrids with convergent characters. The clusters reflected the differences among maize hybrids and were in accordance with their pedigree. The IRAP technique is an efficient high-throughput genetic marker-generating method.
  • de Haan, Caroline P A; Kivistö, Rauni I; Hakkinen, Marjaana; Corander, Jukka; Hänninen, Marja-Liisa (2010)
  • Liang, Zhi-Qiang; Chen, Wei-Tao; Wang, Deng-Qiang; Zhang, Shu-Huan; Wang, Chong-Rui; He, Shun-Ping; Wu, Yuan-An; He, Ping; Xie, Jiang; Li, Chuan-Wu; Merilä, Juha; Wei, Qi-Wei (2019)
    Understanding genetic diversity patterns of endangered species is an important premise for biodiversity conservation. The critically endangered salamander Andrias davidianus, endemic to central and southern mainland in China, has suffered from sharp range and population size declines over the past three decades. However, the levels and patterns of genetic diversity of A. davidianus populations in wild remain poorly understood. Herein, we explore the levels and phylogeographic patterns of genetic diversity of wild-caught A. davidianus using larvae and adult collection with the aid of sequence variation in (a) the mitochondrial DNA (mtDNA) fragments (n = 320 individuals; 33 localities), (b) 19 whole mtDNA genomes, and (c) nuclear recombinase activating gene 2 (RAG2; n = 88 individuals; 19 localities). Phylogenetic analyses based on mtDNA datasets uncovered seven divergent mitochondrial clades (A-G), which likely originated in association with the uplifting of mountains during the Late Miocene, specific habitat requirements, barriers including mountains and drainages and lower dispersal ability. The distributions of clades were geographic partitioned and confined in neighboring regions. Furthermore, we discovered some mountains, rivers, and provinces harbored more than one clades. RAG2 analyses revealed no obvious geographic patterns among the five alleles detected. Our study depicts a relatively intact distribution map of A. davidianus clades in natural species range and provides important knowledge that can be used to improve monitoring programs and develop a conservation strategy for this critically endangered organism.
  • TEDDY STUDY GRP; Stanfill, Bryan A.; Nakayasu, Ernesto S.; Bramer, Lisa M.; Knip, Mikael (2018)
    Liquid chromatography-mass spectrometry (LC-MS)-based proteomics studies of large sample cohorts can easily require from months to years to complete. Acquiring consistent, high-quality data in such large-scale studies is challenging because of normal variations in instrumentation performance over time, as well as artifacts introduced by the samples themselves, such as those because of collection, storage and processing. Existing quality control methods for proteomics data primarily focus on post-hoc analysis to remove low-quality data that would degrade downstream statistics; they are not designed to evaluate the data in near real-time, which would allow for interventions as soon as deviations in data quality are detected. In addition to flagging analyses that demonstrate outlier behavior, evaluating how the data structure changes over time can aide in understanding typical instrument performance or identify issues such as a degradation in data quality because of the need for instrument cleaning and/or re-calibration. To address this gap for proteomics, we developed Quality Control Analysis in Real-Time (QC-ART), a tool for evaluating data as they are acquired to dynamically flag potential issues with instrument performance or sample quality. QC-ART has similar accuracy as standard post-hoc analysis methods with the additional benefit of real-time analysis. We demonstrate the utility and performance of QC-ART in identifying deviations in data quality because of both instrument and sample issues in near real-time for LC-MS-based plasma proteomics analyses of a sample subset of The Environmental Determinants of Diabetes in the Young cohort. We also present a case where QC-ART facilitated the identification of oxidative modifications, which are often underappreciated in proteomic experiments.
  • Szkalisity, Abel; Piccinini, Filippo; Beleon, Attila; Balassa, Tamas; Varga, Istvan Gergely; Migh, Ede; Molnar, Csaba; Paavolainen, Lassi; Timonen, Sanna; Banerjee, Indranil; Ikonen, Elina; Yamauchi, Yohei; Ando, Istvan; Peltonen, Jaakko; Pietiäinen, Vilja; Honti, Viktor; Horvath, Peter (2021)
    Biological processes are inherently continuous, and the chance of phenotypic discovery is significantly restricted by discretising them. Using multi-parametric active regression we introduce the Regression Plane (RP), a user-friendly discovery tool enabling class-free phenotypic supervised machine learning, to describe and explore biological data in a continuous manner. First, we compare traditional classification with regression in a simulated experimental setup. Second, we use our framework to identify genes involved in regulating triglyceride levels in human cells. Subsequently, we analyse a time-lapse dataset on mitosis to demonstrate that the proposed methodology is capable of modelling complex processes at infinite resolution. Finally, we show that hemocyte differentiation in Drosophila melanogaster has continuous characteristics. High-content screening prompted the development of software enabling discrete phenotypic analysis of single cells. Here, the authors show that supervised continuous machine learning can drive novel discoveries in diverse imaging experiments and present the Regression Plane module of Advanced Cell Classifier.
  • Vuorinen, Anssi L.; Kalendar, Ruslan; Fahima, Tzion; Korpelainen, Helena; Nevo, Eviatar; Schulman, Alan H. (2018)
    Wild emmer wheat (Triticum turgidum ssp. dicoccoides) is the wild ancestor of all cultivated tetraploid and hexaploid wheats and harbors a large amount of genetic diversity. This diversity is expected to display eco-geographical patterns of variation, conflating gene flow, and local adaptation. As self-replicating entities comprising the bulk of genomic DNA in wheat, retrotransposons are expected to create predominantly neutral variation via their propagation. Here, we have examined the genetic diversity of 1 Turkish and 14 Israeli populations of wild emmer wheat, based on the retrotransposon marker methods IRAP and REMAP. The level of genetic diversity we detected was in agreement with previous studies that were performed with a variety of marker systems assaying genes and other genomic components. The genetic distances failed to correlate with the geographical distances, suggesting local selection on geographically widespread haplotypes (‘weak selection’). However, the proportion of polymorphic loci correlated with the population latitude, which may reflect the temperature and water availability cline. Genetic diversity correlated with longitude, the east being more montane. Principal component analyses on the marker data separated most of the populations.
  • Wagner, Stefan; Méndez Fernández, Daniel; Felderer, Michael; Vetrò, Antonio; Kalinowski, Marco; Wieringa, Roel; Pfahl, Dietmar; Conte, Tayana; Christiansson, Marie-Therese; Greer, Desmond; Lassenius, Casper; Männistö, Tomi; Nayebi, Maleknaz; Oivo, Markku; Penzenstadler, Birgit; Prikladnicki, Rafael; Ruhe, Guenter; Schekelmann, André; Sen, Sagar; Spínola, Rodrigo; Tuzcu, Ahmed; Luis De La Vara, Jose; Winkler, Dietmar (2019)
    Requirements Engineering (RE) has established itself as a software engineering discipline over the past decades. While researchers have been investigating the RE discipline with a plethora of empirical studies, attempts to systematically derive an empirical theory in context of the RE discipline have just recently been started. However, such a theory is needed if we are to define and motivate guidance in performing high quality RE research and practice. We aim at providing an empirical and externally valid foundation for a theory of RE practice, which helps software engineers establish effective and efficient RE processes in a problem-driven manner. We designed a survey instrument and an engineer-focused theory that was first piloted in Germany and, after making substantial modifications, has now been replicated in 10 countries worldwide. We have a theory in the form of a set of propositions inferred from our experiences and available studies, as well as the results from our pilot study in Germany. We evaluate the propositions with bootstrapped confidence intervals and derive potential explanations for the propositions. In this article, we report on the design of the family of surveys, its underlying theory, and the full results obtained from the replication studies conducted in 10 countries with participants from 228 organisations. Our results represent a substantial step forward towards developing an empirical theory of RE practice. The results reveal, for example, that there are no strong differences between organisations in different countries and regions, that interviews, facilitated meetings and prototyping are the most used elicitation techniques, that requirements are often documented textually, that traces between requirements and code or design documents are common, that requirements specifications themselves are rarely changed and that requirements engineering (process) improvement endeavours are mostly internally driven. Our study establishes a theory that can be used as starting point for many further studies for more detailed investigations. Practitioners can use the results as theory-supported guidance on selecting suitable RE methods and techniques.