Browsing by Subject "data quality"

Sort by: Order: Results:

Now showing items 1-5 of 5
  • Stolze, Markus (Helsingin yliopisto, 2019)
    The purpose of this master’s thesis is to evaluate the reliability of forest products forecast information produced by United Nations Economic Commission for Europe member States. The study also aims to answer which dimensions of data quality are the most important when producing these predictions This study is carried out as quantitative research and it focuses on the predictions made by the 27 member States, produced between 2002 and 2017. This research aims to find out what methods are used by different member States and which methods produce the most reliable results. This research also aims to find out if there are any differences in reliability when assessing different product flows (removals, production, exports or imports) of the various products analyzed. There were clear differences visible between different products in the results of this research. In some products, almost all member States had managed to produce reliable predictions, while for others majority of member States didn’t manage that. There were also differences between member States and some were clearly more reliable than others. The biggest factor affecting reliability was volume: for most parts, bigger volumes meant more reliable predictions. Production and removals were more reliable product flow than imports or exports. This is due to the nature of imports and exports, as they are more easily affected by outside impacts. Although all member States were able to be sorted into four groups based on how different product flows looked like, no clear patterns were visible when observing how different member States produce predictions. Almost all of the interviewed representatives of member States reported that they were using almost or exactly the same methods to produce predictions.
  • Kujala, Heini; Lahoz-Monfort, José Joaquín; Elith, Jane; Moilanen, Atte (2018)
    Decisions about land use significantly influence biodiversity globally. The field of spatial conservation prioritisation explores allocation of conservation effort, including for reserve network expansion, targeting habitat restoration, or minimising ecological impacts of development. Inevitably, the utility of such planning depends on the quantity and quality input data, including spatial information on biodiversity, threats, and cost of action. In this work we systematically develop understanding about the significance of these different data types in spatial conservation prioritisation. We clarify the common ways different data types enter an analysis, develop mathematical models to understand the effects of data in spatial prioritisation, and survey literature to establish typical quantities of different types of data used. We use Jackknife analysis to derive the expected change in site values, when a single new data layer is added to a prioritisation. We validate mathematical formulae for expected impacts using simulations. A survey of scientific literature reveals that typical spatial prioritisation analyses include hundreds of biodiversity feature layers (species, habitat types, ecosystem services), but the count of cost, threat or habitat condition layers is typically 0-5. Due to these differences, and the mathematical formulations commonly used to combine data types, the influence of a single cost, threat, or habitat condition data layer can be an order or two higher than the influence of a single biodiversity feature layer. In a classical cost-effectiveness formulation (benefits divided by costs, B/C) the influence of a single cost layer can even be as large as the joint influence of thousands of species distributions. We also clarify how changes in data impact site values and spatial priority rankings differently, with the latter being further influenced by data correlations, the spread of numeric values inside data layers and other data characteristics. For example, costs influence priorities significantly if cost is positively correlated with biodiversity, but the correlation is the other way around for biodiversity and habitat condition. This work helps conservation practitioners to direct efforts when collating data for spatial conservation planning. It also helps decision makers understand where to focus attention when interpreting conservation plans and their uncertainties.
  • Pang, Sean E. H.; Zeng, Yiwen; De Alban, Jose Don T.; Webb, Edward L. (2022)
    Aims Human-induced pressures such as deforestation cause anthropogenic range contractions (ARCs). Such contractions present dynamic distributions that may engender data misrepresentations within species distribution models. The temporal bias of occurrence data-where occurrences represent distributions before (past bias) or after (recent bias) ARCs-underpins these data misrepresentations. Occurrence-habitat mismatching results when occurrences sampled before contractions are modelled with contemporary anthropogenic variables; niche truncation results when occurrences sampled after contractions are modelled without anthropogenic variables. Our understanding of their independent and interactive effects on model performance remains incomplete but is vital for developing good modelling protocols. Through a virtual ecologist approach, we demonstrate how these data misrepresentations manifest and investigate their effects on model performance. Location Virtual Southeast Asia. Methods Using 100 virtual species, we simulated ARCs with 100-year land-use data and generated temporally biased (past and recent) occurrence datasets. We modelled datasets with and without a contemporary land-use variable (conventional modelling protocols) and with a temporally dynamic land-use variable. We evaluated each model's ability to predict historical and contemporary distributions. Results Greater ARC resulted in greater occurrence-habitat mismatching for datasets with past bias and greater niche truncation for datasets with recent bias. Occurrence-habitat mismatching prevented models with the contemporary land-use variable from predicting anthropogenic-related absences, causing overpredictions of contemporary distributions. Although niche truncation caused underpredictions of historical distributions (environmentally suitable habitats), incorporating the contemporary land-use variable resolved these underpredictions, even when mismatching occurred. Models with the temporally dynamic land-use variable consistently outperformed models without. Main conclusions We showed how these data misrepresentations can degrade model performance, undermining their use for empirical research and conservation science. Given the ubiquity of ARCs, these data misrepresentations are likely inherent to most datasets. Therefore, we present a three-step strategy for handling data misrepresentations: maximize the temporal range of anthropogenic predictors, exclude mismatched occurrences and test for residual data misrepresentations.
  • Lacagnina, Carlo; Doblas-Reyes, Francisco; Larnicol, Gilles; Buontempo, Carlo; Obregón, André; Costa-Surós, Montserrat; San-Martín, Daniel; Bretonnière, Pierre-Antoine; Polade, Suraj D.; Romanova, Vanya; Putero, Davide; Serva, Federico; Llabrés-Brustenga, Alba; Pérez, Antonio; Cavaliere, Davide; Membrive, Olivier; Steger, Christian; Pérez-Zanón, Núria; Cristofanelli, Paolo; Madonna, Fabio; Rosoldi, Marco; Riihelä, Aku; Díez, Markel García (Ubiquity Press, Ltd., 2022)
    Data Science Journal
    Data from a variety of research programmes are increasingly used by policy makers, researchers, and private sectors to make data-driven decisions related to climate change and variability. Climate services are emerging as the link to narrow the gap between climate science and downstream users. The Global Framework for Climate Services (GFCS) of the World Meteorological Organization (WMO) offers an umbrella for the development of climate services and has identified the quality assessment, along with its use in user guidance, as a key aspect of the service provision. This offers an extra stimulus for discussing what type of quality information to focus on and how to present it to downstream users. Quality has become an important keyword for those working on data in both the private and public sectors and significant resources are now devoted to quality management of processes and products. Quality management guarantees reliability and usability of the product served, it is a key element to build trust between consumers and suppliers. Untrustworthy data could lead to a negative economic impact at best and a safety hazard at worst. In a progressive commitment to establish this relation of trust, as well as providing sufficient guidance for users, the Copernicus Climate Change Service (C3S) has made significant investments in the development of an Evaluation and Quality Control (EQC) function. This function offers a homogeneous user-driven service for the quality of the C3S Climate Data Store (CDS). Here we focus on the EQC component targeting the assessment of the CDS datasets, which include satellite and in-situ observations, reanalysis, climate projections, and seasonal forecasts. The EQC function is characterised by a two-tier review system designed to guarantee the quality of the dataset information. While the need of assessing the quality of climate data is well recognised, the methodologies, the metrics, the evaluation framework, and how to present all this information to the users have never been developed before in an operational service, encompassing all the main climate dataset categories. Building the underlying technical solutions poses unprecedented challenges and makes the C3S EQC approach unique. This paper describes the development and the implementation of the operational EQC function providing an overarching quality management service for the whole CDS data.
  • Khapugin, Anatoliy A.; Soltys-Lelek, Anna; Fedoronchuk, Nikolay M.; Muldashev, Albert A.; Agafonov, Vladimir A.; Kazmina, Elena S.; Vasjukov, Vladimir M.; Baranova, Olga G.; Buzunova, Irina O.; Teteryuk, Lyudmila; Dubovik, Dmitriy; Gudzinskas, Zigmantas; Kukk, Toomas; Kravchenko, Alexey; Yena, Andrey; Kozhin, Mikhail N.; Sennikov, Alexander N. (2021)
    By the method of data re-collection and re-assessment, we here test the completeness of distribution areas of the species and species aggregates of Rosa in Eastern Europe as mapped in volume 13 of Atlas Florae Europaeae (AFE), and discuss insights into the issues connected with the data. We found many new occurrences which are additions to the published maps: 1068 records of species and 570 records of species aggregates. The new occurrences are listed with references to the sources, and the updated AFE maps are provided. The greatest increase by new native occurrences was revealed for the species that are widespread or taxonomically complicated, and by new alien occurrences for the species that currently expand their secondary distribution areas. The mapping work published in 2004 is considered good, with minor omissions caused by possible oversights and incomplete sampling. The majority of new additions originated in the period after the original data collection. Nearly the same amount of new data originated from larger and smaller herbarium collections, underlining the value of small collections for chorological studies. We found that only ca 20% of new records based on herbarium specimens have been published, thus highlighting the need for data papers for publication of distributional data. The greatest increase by new records based on herbarium specimens was found for insufficiently studied territories (Belarus, central, northern and eastern parts of Russia), whereas the same level of increase for the territories with reasonably good coverage (Latvia) was achieved by observations. We conclude that the overall sparsity of published records in Eastern Europe is caused by a lower level of data collection rather than by poor data availability, and that floristic surveys based on herbarium specimens cannot compete in speed and density of records with observation-based surveys, which may become the main source of distributional information in the future.