Browsing by Subject "Bias"

Sort by: Order: Results:

Now showing items 1-7 of 7
  • Alekseev, Alexander; Tomppo, Erkki; McRoberts, Ronald E; von Gadow, Klaus (Springer Singapore, 2019)
    Abstract The State Forest Inventory (SFI) in the Russian Federation is a relatively new project that is little known in the English-language scientific literature. Following the stipulations of the Forest Act of 2006, the first SFI sample plots in this vast territory were established in 2007. The 34 Russian forest regions were the basic geographical units for all statistical estimates and served as a first-level stratification, while a second level was based on old inventory data and remotely sensed data. The sampling design was to consist of a simple random sample of 84,700 circular 500 m2 sample plots over forest land. Each sample plot consists of three nested concentric circular subplots with radii of 12.62, 5.64 and 2.82 m and additional subplots for assessing and describing undergrowth, regeneration and ground vegetation. In total, 117 variables were to be measured or assessed on each plot. Although field work has begun, the methodology has elicited some criticism. The simple random sampling design is less efficient than a systematic design featuring sample plot clusters and a mix of temporary and permanent plots. The second-level stratification is mostly ineffective for increasing precision. Qualitative variables, which are not always essential, are dominant, while important quantitative variables are under-represented. Because of very slow progress, in 2018 the original plan was adjusted by reducing the number of permanent sample plots from 84,700 to 68,287 so that the first SFI cycle could be completed by 2020.
  • Alekseev, Alexander; Tomppo, Erkki; McRoberts, Ronald E.; von Gadow, Klaus (2019)
    The State Forest Inventory (SFI) in the Russian Federation is a relatively new project that is little known in the English-language scientific literature. Following the stipulations of the Forest Act of 2006, the first SFI sample plots in this vast territory were established in 2007. The 34 Russian forest regions were the basic geographical units for all statistical estimates and served as a first-level stratification, while a second level was based on old inventory data and remotely sensed data. The sampling design was to consist of a simple random sample of 84,700 circular 500m(2) sample plots over forest land. Each sample plot consists of three nested concentric circular subplots with radii of 12.62, 5.64 and 2.82m and additional subplots for assessing and describing undergrowth, regeneration and ground vegetation. In total, 117 variables were to be measured or assessed on each plot.Although field work has begun, the methodology has elicited some criticism. The simple random sampling design is less efficient than a systematic design featuring sample plot clusters and a mix of temporary and permanent plots. The second-level stratification is mostly ineffective for increasing precision. Qualitative variables, which are not always essential, are dominant, while important quantitative variables are under-represented. Because of very slow progress, in 2018 the original plan was adjusted by reducing the number of permanent sample plots from 84,700 to 68,287 so that the first SFI cycle could be completed by 2020.
  • McMinn, Megan A.; Gray, Linsay; Harkanen, Tommi; Tolonen, Hanna; Pitkanen, Joonas; Molaodi, Oarabile R.; Leyland, Alastair H.; Martikainen, Pekka (2020)
    Background: In the context of declining levels of participation, understanding differences between participants and non-participants in health surveys is increasingly important for reliable measurement of health-related behaviors and their social differentials. This study compared participants and non-participants of the Finnish Health 2000 survey, and participants and a representative sample of the target population, in terms of alcohol-related harms (hospitalizations and deaths) and all-cause mortality. Methods: We individually linked 6,127 survey participants and 1,040 non-participants, aged 30-79, and a register-based population sample (n = 496,079) to 12 years of subsequent administrative hospital discharge and mortality data. We estimated age-standardized rates and rate ratios for each outcome for non-participants and the population sample relative to participants with and without sampling weights by sex and educational attainment. Results: Harms and mortality were higher in non-participants, relative to participants for both men (rate ratios = 1.5 [95% confidence interval = 1.2, 1.9] for harms; 1.6 [1.3, 2.0] for mortality) and women (2.7 [1.6, 4.4] harms; 1.7 [1.4, 2.0] mortality). Non-participation bias in harms estimates in women increased with education and in all-cause mortality overall. Age-adjusted comparisons between the population sample and sampling weighted participants were inconclusive for differences by sex; however, there were some large differences by educational attainment level. Conclusions: Rates of harms and mortality in non-participants exceed those in participants. Weighted participants' rates reflected those in the population well by age and sex, but insufficiently by educational attainment. Despite relatively high participation levels (85%), social differentiating factors and levels of harm and mortality were underestimated in the participants.
  • Järvinen, Teppo L. N.; Sihvonen, Raine; Bhandari, Mohit; Sprague, Sheila; Malmivaara, Antti; Paavola, Mika; Schuenemann, Holger J.; Guyatt, Gordon H. (2014)
  • Lange, Moritz Johannes (Helsingin yliopisto, 2020)
    In the context of data science and machine learning, feature selection is a widely used technique that focuses on reducing the dimensionality of a dataset. It is commonly used to improve model accuracy by preventing data redundancy and over-fitting, but can also be beneficial in applications such as data compression. The majority of feature selection techniques rely on labelled data. In many real-world scenarios, however, data is only partially labelled and thus requires so-called semi-supervised techniques, which can utilise both labelled and unlabelled data. While unlabelled data is often obtainable in abundance, labelled datasets are smaller and potentially biased. This thesis presents a method called distribution matching, which offers a way to do feature selection in a semi-supervised setup. Distribution matching is a wrapper method, which trains models to select features that best affect model accuracy. It addresses the problem of biased labelled data directly by incorporating unlabelled data into a cost function which approximates expected loss on unseen data. In experiments, the method is shown to successfully minimise the expected loss transparently on a synthetic dataset. Additionally, a comparison with related methods is performed on a more complex EMNIST dataset.
  • Lindbohm, Joni V.; Kaprio, Jaakko; Korja, Miikka (2019)
    OBJECTIVE: Two recent hospital-based studies have reported that both smoking and hypertension-the 2 most important risk factors for aneurysmal subarachnoid hemorrhage (aSAH)-may improve survival after aSAH. We tested the hypothesis that a higher case fatality among smokers and hypertensive individuals after aSAH contributes to these paradoxical findings. METHODS: We followed 65,521 population-based FINRISK participants during 1.52 million person-years and identified 445 first-ever hospitalized aSAHs and 98 sudden-death aSAHs occurring between 1974 and 2014. We measured risk factors prior to disease onset in the cohort surveys, and confirmed, among all sudden-death aSAHs, 80% by extensive (including the brain) forensic autopsy; the remaining 20% were based on clinical examination (CT of the head, spinal tap, or both). The Cox proportional hazards model estimated survival curves. RESULTS: Analyses repeating the protocol of the 2 recent hospital-based studies again showed improved survival among smokers and those with hypertension. Conversely, in analyses including more accurate risk factor measurements and including patients with sudden-death aSAH who never reached a hospital, these paradoxical results were reversed. Smokers had reduced survival compared to that of never-smokers (p = 0.04), and those with high systolic blood pressure (SBP) (≥160 mm Hg) had reduced survival when compared to survival of those with SBP