Effects of Ignoring Survey Design Information for Data Reuse

Show full item record



Permalink

http://hdl.handle.net/10138/334029

Citation

Foster , S D , Vanhatalo , J , Trenkel , V M , Schulz , T , Lawrence , E , Przeslawski , R & Hosack , G 2021 , ' Effects of Ignoring Survey Design Information for Data Reuse ' , Ecological Applications , vol. 31 , no. 6 , 02360 . https://doi.org/10.1002/eap.2360

Title: Effects of Ignoring Survey Design Information for Data Reuse
Author: Foster, Scott D.; Vanhatalo, Jarno; Trenkel, Verena M.; Schulz, Torsti; Lawrence, Emma; Przeslawski, Rachel; Hosack, Geoffrey
Contributor: University of Helsinki, Department of Mathematics and Statistics
University of Helsinki, Environmental and Ecological Statistics Group
Date: 2021-09
Language: eng
Number of pages: 8
Belongs to series: Ecological Applications
ISSN: 1051-0761
URI: http://hdl.handle.net/10138/334029
Abstract: Data are currently being used, and reused, in ecological research at an unprecedented rate. To ensure appropriate reuse however, we need to ask the question: "Are aggregated databases currently providing the right information to enable effective and unbiased reuse?" We investigate this question, with a focus on designs that purposefully favor the selection of sampling locations (upweighting the probability of selection of some locations). These designs are common and examples are those designs that have uneven inclusion probabilities or are stratified. We perform a simulation experiment by creating data sets with progressively more uneven inclusion probabilities and examine the resulting estimates of the average number of individuals per unit area (density). The effect of ignoring the survey design can be profound, with biases of up to 250% in density estimates when naive analytical methods are used. This density estimation bias is not reduced by adding more data. Fortunately, the estimation bias can be mitigated by using an appropriate estimator or an appropriate model that incorporates the design information. These are only available however, when essential information about the survey design is available: the sample location selection process (e.g., inclusion probabilities), and/or covariates used in their specification. The results suggest that such information must be stored and served with the data to support meaningful inference and data reuse.
Subject: 1181 Ecology, evolutionary biology
111 Mathematics
112 Statistics and probability
bias
data
database
findable
accessible
interoperable
reusable data
Horvitz-Thompson estimator
inclusion probability
model
population density estimate
reuse
survey design
INFERENCE
Rights:


Files in this item

Total number of downloads: Loading...

Files Size Format View
Effects_of_Igno ... rmation_for_Data_Reuse.pdf 8.676Mb PDF View/Open

This item appears in the following Collection(s)

Show full item record