Effects of Ignoring Survey Design Information for Data Reuse

Show full item record




Foster , S D , Vanhatalo , J , Trenkel , V M , Schulz , T , Lawrence , E , Przeslawski , R & Hosack , G 2021 , ' Effects of Ignoring Survey Design Information for Data Reuse ' , Ecological Applications , vol. 31 , no. 6 , 02360 . https://doi.org/10.1002/eap.2360

Title: Effects of Ignoring Survey Design Information for Data Reuse
Author: Foster, Scott D.; Vanhatalo, Jarno; Trenkel, Verena M.; Schulz, Torsti; Lawrence, Emma; Przeslawski, Rachel; Hosack, Geoffrey
Contributor organization: Department of Mathematics and Statistics
Organismal and Evolutionary Biology Research Programme
Environmental and Ecological Statistics Group
Biostatistics Helsinki
Research Centre for Ecological Change
Date: 2021-09
Language: eng
Number of pages: 8
Belongs to series: Ecological Applications
ISSN: 1051-0761
DOI: https://doi.org/10.1002/eap.2360
URI: http://hdl.handle.net/10138/334029
Abstract: Data are currently being used, and reused, in ecological research at an unprecedented rate. To ensure appropriate reuse however, we need to ask the question: "Are aggregated databases currently providing the right information to enable effective and unbiased reuse?" We investigate this question, with a focus on designs that purposefully favor the selection of sampling locations (upweighting the probability of selection of some locations). These designs are common and examples are those designs that have uneven inclusion probabilities or are stratified. We perform a simulation experiment by creating data sets with progressively more uneven inclusion probabilities and examine the resulting estimates of the average number of individuals per unit area (density). The effect of ignoring the survey design can be profound, with biases of up to 250% in density estimates when naive analytical methods are used. This density estimation bias is not reduced by adding more data. Fortunately, the estimation bias can be mitigated by using an appropriate estimator or an appropriate model that incorporates the design information. These are only available however, when essential information about the survey design is available: the sample location selection process (e.g., inclusion probabilities), and/or covariates used in their specification. The results suggest that such information must be stored and served with the data to support meaningful inference and data reuse.
Subject: 1181 Ecology, evolutionary biology
111 Mathematics
112 Statistics and probability
reusable data
Horvitz-Thompson estimator
inclusion probability
population density estimate
survey design
Peer reviewed: Yes
Usage restriction: openAccess
Self-archived version: acceptedVersion
Funder: Suomen Akatemia Projektilaskutus
Grant number:

Files in this item

Total number of downloads: Loading...

Files Size Format View
Effects_of_Igno ... rmation_for_Data_Reuse.pdf 8.676Mb PDF View/Open

This item appears in the following Collection(s)

Show full item record