Browsing by Subject "Validation"

Sort by: Order: Results:

Now showing items 1-20 of 28
  • DAGIS Consortium Grp (2018)
    Validated methodological aids for food quantification are needed for the accurate estimation of food consumption. Our objective was to assess the validity of an age-specific food picture book, which contains commonly eaten foods among Finnish children, for parents and early educators in estimating food portion sizes. The food picture book was developed to assist in portion size estimation when filling in food records in the Increased health and wellbeing in preschools (DAGIS) study. All ninety-five food pictures in the book, each containing three or four different portion sizes, were evaluated at real-time sessions. Altogether, seventy-three parents and 107 early educators or early education students participated. Each participant evaluated twenty-three or twenty-four portions by comparing presented pre-weighed food portions against the corresponding picture from the food picture book. Food portions were not consumed by participants. The total proportion of correct estimations varied from 36% (cottage cheese) to 100% (fish fingers). Among the food groups, nearly or over 90% of the estimations were correct for bread, pastries and main courses (piece products' such as meatballs and chicken nuggets). Soups, porridges, salads and grated and fresh vegetables were least correctly estimated (
  • Kanerva, Noora; Harald, Kennet; Männistö, Satu; Kaartinen, Niina E.; Maukonen, Mirkka; Haukkala, Ari; Jousilahti, Pekka (2018)
    Studies indicate that the healthy Nordic diet may improve heart health, but its relation to weight change is less clear. We studied the association between the adherence to the healthy Nordic diet and long-term changes in weight, BMI and waist circumference. Furthermore, the agreement between self-reported and measured body anthropometrics was examined. The population-based DIetary, Lifestyle and Genetic Determinants of Obesity and Metabolic syndrome Study in 2007 included 5024 Finns aged 25-75 years. The follow-up was conducted in 2014 (n 3735). One-third of the participants were invited to a health examination. The rest were sent measuring tape and written instructions along with questionnaires. The Baltic Sea Diet Score (BSDS) was used to measure adherence to the healthy Nordic diet. Association of the baseline BSDS and changes in BSDS during the follow-up with changes in body anthropometrics were examined using linear regression analysis. The agreement between self-reported and nurse-measured anthropometrics was determined with Bland-Altman analysis. Intra-class correlation coefficients between self-reported and nurse-measured anthropometrics exceeded 0.95. The baseline BSDS associated with lower weight (beta = -0.056, P = 0.043) and BMI (beta = -0.021, P=0.031) over the follow-up. This association was especially evident among those who had increased their BSDS. In conclusion, both high initial and improved adherence to the healthy Nordic diet may promote long-term weight maintenance. The self-reported/measured anthropometrics were shown to have high agreement with nurse-measured values which adds the credibility of our results.
  • Lommi, Sohvi; Viljakainen, Heli T.; Weiderpass, Elisabete; de Oliveira Figueiredo, Rejane Augusta (2020)
    Purpose To validate the Children's Eating Attitudes Test (ChEAT) in the Finnish population. Materials and methods In total 339 children (age 10-15 years) from primary schools in Southern Finland were evaluated at two time points. They answered the ChEAT and SCOFF test questions, and had their weight, height and waist circumference measured. Retesting was performed 4-6 weeks later. Test-retest reliability was evaluated using intra-class correlation (ICC), and internal consistency was examined using Cronbach's alpha coefficient (C-alpha). ChEAT was cross-calibrated against SCOFF and background variables. Factor analysis was performed to examine the factor structure of ChEAT. Results The 26-item ChEAT showed high internal consistency (C-alpha 0.79), however, a 24-item ChEAT showed even better internal consistency (C-alpha 0.84) and test-retest reliability (ICC 0.794). ChEAT scores demonstrated agreement with SCOFF scores (p <0.01). The mean ChEAT score was higher in overweight children than normal weight (p <0.001). Exploratory factor analysis yielded four factors (concerns about weight, limiting food intake, pressure to eat, and concerns about food), explaining 57.8% of the variance. Conclusions ChEAT is a valid and reliable tool for measuring eating attitudes in Finnish children. The 24-item ChEAT showed higher reliability than the 26-item ChEAT.
  • Plikk, Anna; Engels, Stefan; Luoto, Tomi P.; Nazarova, Larisa; Salonen, J. Sakari; Helmens, Karin F. (2019)
  • Liu, Yang; Meric, Guillaume; Havulinna, Aki S.; Teo, Shu Mei; Åberg, Fredrik; Ruuskanen, Matti; Sanders, Jon; Zhu, Qiyun; Tripathi, Anupriya; Verspoor, Karin; Cheng, Susan; Jain, Mohit; Jousilahti, Pekka; Vazquez-Baeza, Yoshiki; Loomba, Rohit; Lahti, Leo; Niiranen, Teemu; Salomaa, Veikko; Knight, Rob; Inouye, Michael (2022)
    The gut microbiome has shown promise as a predictive biomarker for various diseases. However, the potential of gut microbiota for prospective risk prediction of liver disease has not been assessed. Here, we utilized shallow shotgun metagenomic sequencing of a large population-based cohort (N > 7,000) with -15 years of follow-up in combination with machine learning to investigate the predictive capacity of gut microbial predictors individually and in conjunction with conventional risk factors for incident liver disease. Separately, conventional and microbial factors showed comparable predictive capacity. However, microbiome augmentation of conventional risk factors using machine learning significantly improved the performance. Similarly, disease free survival analysis showed significantly improved stratification using microbiome-augmented models. Investigation of predictive microbial signatures revealed previously unknown taxa for liver disease, as well as those previously associated with hepatic function and disease. This study supports the potential clinical validity of gut metagenomic sequencing to complement conventional risk factors for prediction of liver diseases.
  • Richter, Martinus; Agren, Per-Henrik; Besse, Jean-Luc; Coester, Maria; Kofoed, Hakon; Maffulli, Nicola; Steultjens, Martijn; Irgit, Kaan; Miettinen, Mikko; Repo, Jussi P.; Uygur, Esat (2020)
    Background The Score Committee of the European Foot and Ankle Society (EFAS) developed, validated, and published the EFAS Score in seven European languages (English, German, French, Italian, Polish, Dutch, Swedish). From other languages under validation, the Finnish and Turkish versions finished data acquisition and underwent further validation. Methods The EFAS Score was developed and validated in three stages: 1) item (question) identification (completed during initial validation study), 2) item reduction and scale exploration (completed during initial validation study), 3) confirmatory analyses and responsiveness of Finnish and Turkish version (completed during initial validation study in seven other languages). The data were collected pre-operatively and post-operatively at a minimum follow-up of 3 months and mean follow-up of 6 months. Item reduction, scale exploration, confirmatory analyses and responsiveness were executed using classical test theory and item response theory. Results The internal consistency of the scale was confirmed in the Finnish and Turkish versions (Cronbach's Alpha>0.8). Responsiveness was good, with moderate to large effect sizes in both languages, and evidence of a statistically significant positive association between the EFAS Score and patient-reported improvement. Conclusions The Finnish and Turkish EFAS Score versions were successfully validated in the orthopaedic ankle and foot surgery patients, including a wide variety of foot and ankle pathologies. All score versions are freely available at www.efas.co.
  • Kask, Gilber; Uimonen, Mikko M.; Barner-Rasmussen, Ian; Tukiainen, Erkki J.; Blomqvist, Carl; Repo, Jussi P. (2021)
    The most widely used patient-reported outcome (PRO) measure for soft tissue sarcoma (STS) patients is the Toronto Extremity Salvage Score (TESS). The aim of the study was to validate and test the reliability of the TESS for patients with lower extremity STS based on Finnish population data. Patients were assessed using the TESS, the QLQ-C30 Function and Quality of life (QoL) modules, the 15D and the Musculoskeletal tumour Society (MSTS) score. The TESS was completed twice with a 2- to 4-week interval. The intraclass correlation coefficient (ICC) was used for test-retest reliability. Construct validity was tested for structural validity and convergent validity. Altogether 136 patients completed the TESS. A ceiling effect was noted as 21% of the patients scored maximum points. The ICC between first and second administration of the TESS was 0.96. The results of exploratory factor analysis together with high Cronbach's alpha (0.98) supported a unidimensional structure. The TESS correlated moderately with the MSTS score (rho = 0.59, p < 0.001) and strongly with the mobility dimension in the 15D HRQL instrument (rho = 0.76, p < 0.001) and the physical function in QLQ-C30 (rho = 0.83, p < 0.001). The TESS instrument is a comprehensive and reliable PRO measure. The TESS may be used as a validated single index score, for lower extremity STS patients for the measurement of a functional outcome. The TESS seems to reflect patients' HRQoL well after the treatment of lower extremity soft tissue sarcomas. (C) 2020 British Association of Plastic, Reconstructive and Aesthetic Surgeons. Published by Elsevier Ltd. All rights reserved.
  • Meinila, Jelena; Valkama, Anita; Koivusalo, Saila B.; Stach-Lempinen, Beata; Lindstrom, Jaana; Kautiainen, Hannu; Eriksson, Johan G.; Erkkola, Maijaliisa (2016)
    Background: The aim was to develop and validate a food-based diet quality index for measuring adherence to the Nordic Nutrition Recommendations (NNR) in a pregnant population with high risk of gestational diabetes (GDM). Methods: This study is a part of the Finnish Gestational Diabetes Prevention Study (RADIEL), a lifestyle intervention conducted between 2008 and 2014. The 443 pregnant participants (61 % of those invited), were either obese or had a history of GDM. Food frequency questionnaires collected at 1st trimester served for composing the HFII; a sum of 11 food groups (available score range 0-17) with higher scores reflecting higher adherence to the NNR. Results: The average HFII of the participants was 10.2 (SD 2.8, range 2-17). Factor analysis for the HFII component matrix revealed three factors that explained most of the distribution (59 %) of the HFII. As an evidence of the component relevance 9 out of 11 of the HFII components independently contributed to the total score (item-rest correlation coefficients Conclusions: The HFII components reflect the food guidelines of the NNR, intakes of relevant nutrients, and characteristics known to vary with diet quality. It largely ignores energy intake, its components have independent contribution to the HFII, and it exhibits reproducibility. The main shortcomings are absence of red and processed meat component, and the validation in a selected study population. It is suitable for ranking participants according to the adherence to the NNR in pregnant women at high risk of GDM.
  • Alaraudanjoki, Viivi; Saarela, Henna; Pesonen, Reetta; Laitala, Marja-Liisa; Kiviahde, Heikki; Tjaderhane, Leo; Lussi, Adrian; Pesonen, Paula; Anttonen, Vuokko (2017)
    Objectives: To assess the reliability of the BEWE index on 3D models and to compare 3D-assessed erosive tooth wear scores with clinically detected scores. Methods: In total, 1964 members of the Northern Finland Birth Cohort 1966 participated in a standardized clinical dental examination including the Basic Erosive Wear Examination (BEWE) and dental 3D modelling at the age of 45-46 years. Of those examined, 586 were randomly selected for this study. 3D models were assessed using the same BEWE criteria as in the clinical examination. Calculated kappa values as well as the prevalence and severity of erosive wear according to the clinical examination and 3D models were compared. Re-examinations were performed to calculate intra-and inter-method and-examiner agreements. Results: The BEWE index on 3D models was reproducible; the mean intra-and inter-examiner agreement were 0.89 and 0.87, respectively, for sextant level, and 0.64 and 1, respectively, for BEWE sum scores. Erosive tooth wear was recorded as more severe in 3D models than in the clinical examination, and intermethod agreement was 0.41 for severe erosive wear (BEWE sum > 8). The biggest inter-method differences were found in upper posterior sextants. Conclusions: The BEWE index is reliable for recording erosive tooth wear on 3D models. 3D models seem to be especially sensitive in detecting initial erosive wear. Additionally, it seems that erosive wear may be underscored in the upper posterior sextants when assessed clinically. Due to the nature of 3D models, the assessment of erosive wear clinically and on 3D models may not be entirely comparable. Clinical significance: 3D models can serve as an additional tool to detect and document erosive wear, especially during the early stages of the condition and in assessing the progression of wear. When scoring erosive wear clinically, care must be taken especially when assessing upper posterior sextants. (C) 2017 Elsevier Ltd. All rights reserved.
  • Ketola, Helena; Kask, Gilber; Barner-Rasmussen, Ian; Tukiainen, Erkki; Blomqvist, Carl; Laitinen, Minna K.; Kautiainen, Hannu; Kiiski, Juha; Repo, Jussi P. (2022)
    Interest in functional outcome (FO) and health-related quality of life (HRQL) in extremity soft-tissue sarcoma (STS) patients has increased. The aim of this study was to validate two FO questionnaires for upper extremity STS patients: the Toronto Extremity Salvage Score (TESS) and short version of the Disability of Arm, Shoulder and Hand (QuickDASH), based on Finnish population data. A multi-center study was conducted at two academic sarcoma centers. Surgically treated upper extremity STS patients were invited to participate. Patients completed the TESS and the QuickDASH with HRQL questionnaires the 15D and the QLQ-C30. The scores were analyzed and compared. Fifty-five patients with a mean follow-up period of 4.7 years were included. Mean age was 63 years (standard deviation [SD] 14.6). The mean score for TESS was 88.5 (SD 15.1) and for QuickDASH 17.8 (SD 19.6). The QuickDASH had a statistically significantly better score coverage. A ceiling effect was noted, 27% and 20% for TESS and QuickDASH, respectively. The TESS and QuickDASH scores were strongly correlated ( r =-0.89). The TESS score strongly correlated with the QLQ-C30 ( r = 0.79) and the 15D score ( r = 0.70). The QuickDASH score correlated strongly with the QLQ-C30 score ( r =-0.71) and moderately with the 15D score ( r =-0.56). The TESS score had a statistically significantly stronger correlation with the 15D score than QuickDASH ( p < 0.005). Both the TESS and the QuickDASH provide reliable scores for assessing FO in upper extremity STS patients. The QuickDASH has a better coverage, whereas TESS showed a stronger correlation to HRQL scores. (c) 2022 British Association of Plastic, Reconstructive and Aesthetic Surgeons. Published by Elsevier Ltd. This is an open access article under the CC BY license ( http://creativecommons.org/licenses/by/4.0/ )
  • Moser, Andre; Reinikainen, Matti; Jakob, Stephan M.; Selander, Tuomas; Pettilä, Ville; Kiiski, Olli; Varpula, Tero; Raj, Rahul; Takala, Jukka (2022)
    Objective: Prognostic models are key for benchmarking intensive care units (ICUs). They require up-to-date predictors and should report transportability properties for reliable predictions. We developed and validated an in-hospital mortality risk prediction model to facilitate benchmarking, quality assurance, and health economics evaluation. Study Design and Setting: We retrieved data from the database of an international (Finland, Estonia, Switzerland) multicenter ICU cohort study from 2015 to 2017. We used a hierarchical logistic regression model that included age, a modified Simplified Acute Physiology Score-II, admission type, premorbid functional status, and diagnosis as grouping variable. We used pooled and meta-analytic cross-validation approaches to assess temporal and geographical transportability. Results: We included 61,224 patients treated in the ICU (hospital mortality 10.6%). The developed prediction model had an area under the receiver operating characteristic curve 0.886, 95% confidence interval (CI) 0.882-0.890; a calibration slope 1.01, 95% CI (0.99-1.03); a mean calibration -0.004, 95% CI (-0.035 to 0.027). Although the model showed very good internal validity and geographic discrimination transportability, we found substantial heterogeneity of performance measures between ICUs (I-squared: 53.4-84.7%). Conclusion: A novel framework evaluating the performance of our prediction model provided key information to judge the validity of our model and its adaptation for future use. (c) 2021 The Authors. Published by Elsevier Inc. This is an open access article under the CC BY license ( http:// creativecommons.org/ licenses/ by/ 4.0/ )
  • Simonsen, Nina; Koponen, Anne M.; Suominen, Sakari (2018)
    Background: To meet the challenges of the rising prevalence of chronic diseases, such as type 2 diabetes, new approaches to healthcare delivery have been initiated; among these the influential Chronic Care Model (CCM). Valid instruments are needed to evaluate the public health impact of these frameworks in different countries. The Patient Assessment of Chronic Illness Care (PACIC) is a 20-item quality of care measure that, from the perspective of the patient, measures the extent to which care is congruent with the CCM. The aim of this study was to evaluate the psychometric properties of the Finnish translation of the PACIC questionnaire, in terms of validity and reliability, in a large register-based sample of patients with type 2 diabetes. Method: The PACIC items were translated into Finnish in a standardized forward-backward procedure, followed by a cross-sectional survey among patients with type 2 diabetes (response rate 56%; n = 2866). We assessed the Finnish version of the PACIC scale for the following psychometric properties: content validity, internal consistency reliability, convergent and construct validity. We also present descriptive data on total scale as well as predetermined subscale levels. Results: The item-response on the PACIC scale was high with only small numbers of missing data (0.5-1.1%). Ceiling effects were low (0.3-5.3%) whereas floor effects were over 20% for two of the predetermined subscales (problem solving and follow-up/coordination). The total PACIC scale showed a reasonable distribution and excellent internal consistency (alpha 0.94) while the internal consistency of the subscales were at least acceptable (0.74-0.86). The principal component analysis identified a two-or three-factor solution instead of the proposed five-dimensional. In other respects, the PACIC scale showed the hypothesized relationships with quality of care and outcome measures, thus demonstrating convergent and construct validity. Conclusion: A Finnish version of the PACIC scale is now validated in the primary care setting among patients with type 2 diabetes. The findings suggest comparable psychometric properties of the Finnish scale as of the original English instrument and earlier translations, and reasonable levels of validity and reliability.
  • Simonsen, Nina; Koponen, Anne M; Suominen, Sakari (BioMed Central, 2018)
    Abstract Background To meet the challenges of the rising prevalence of chronic diseases, such as type 2 diabetes, new approaches to healthcare delivery have been initiated; among these the influential Chronic Care Model (CCM). Valid instruments are needed to evaluate the public health impact of these frameworks in different countries. The Patient Assessment of Chronic Illness Care (PACIC) is a 20-item quality of care measure that, from the perspective of the patient, measures the extent to which care is congruent with the CCM. The aim of this study was to evaluate the psychometric properties of the Finnish translation of the PACIC questionnaire, in terms of validity and reliability, in a large register-based sample of patients with type 2 diabetes. Method The PACIC items were translated into Finnish in a standardized forward-backward procedure, followed by a cross-sectional survey among patients with type 2 diabetes (response rate 56%; n = 2866). We assessed the Finnish version of the PACIC scale for the following psychometric properties: content validity, internal consistency reliability, convergent and construct validity. We also present descriptive data on total scale as well as predetermined subscale levels. Results The item-response on the PACIC scale was high with only small numbers of missing data (0.5–1.1%). Ceiling effects were low (0.3–5.3%) whereas floor effects were over 20% for two of the predetermined subscales (problem solving and follow-up/coordination). The total PACIC scale showed a reasonable distribution and excellent internal consistency (alpha 0.94) while the internal consistency of the subscales were at least acceptable (0.74–0.86). The principal component analysis identified a two- or three-factor solution instead of the proposed five-dimensional. In other respects, the PACIC scale showed the hypothesized relationships with quality of care and outcome measures, thus demonstrating convergent and construct validity. Conclusion A Finnish version of the PACIC scale is now validated in the primary care setting among patients with type 2 diabetes. The findings suggest comparable psychometric properties of the Finnish scale as of the original English instrument and earlier translations, and reasonable levels of validity and reliability.
  • Laakasuo, Michael; Palomäki, Jussi; Abuhamdeh, Sami; Lappi, Otto; Cowley, Benjamin Ultan (2022)
    Flow is a well-known construct describing the experience of deep absorption in a task, typically demanding but intrinsically motivating, and conducted with high skill. Flow is operationalized by self-report, and various instruments have been developed for this, but none have been made available in the Finnish language in thoroughly validated form. We present a psychometric scale-validation study for the Finnish translation of the Flow Short Scale (FSS). We collected data from 201 Finnish speaking participants using the Prolific Academic platform. We assessed the scale’s factorial structure using Mokken scale analysis, Parallel Analysis, Very Simple Structures analysis and a standard Confirmatory Factor Analysis. We then evaluated how correlated was the FSS with the Flow State Scale and Flow Core Scale. Finally, we evaluated how well the FSS distinguished Flow-inducing experiences from boring (non-Flow-inducing) experiences. Taken together, our results show that an 8-item, two-factor version of the scale was a justified instrument with good psychometric properties.
  • Tiirikainen, Kati; Haravuori, Henna; Ranta, Klaus; Kaltiala-Heino, Riittakerttu; Marttunen, Mauri (2019)
    Symptoms of generalized anxiety disorder (GAD) are common among adolescents and can lead to severe psychosocial impairment, yet there is a lack of a good quality scale to measure symptoms of generalized anxiety in young people. The 7-item Generalized Anxiety Disorder Scale (GAD-7) is a self-report scale used to measure GAD symptoms and has been validated in adult populations, but the measures psychometric properties regarding adolescents are unknown. The aim of this study was to investigate the reliability, factorial validity, and construct validity of the GAD-7 in adolescents in a nationally representative sample from a general population. Our study was based on Finnish survey data on 111,171 adolescents aged 14-18 years. Our results show that the GAD-7 demonstrates good psychometric properties in adolescents. The internal consistency of the GAD-7 was good (Cronbach's alpha = 0.91) and the instrument's unidimensional factor structure was supported. The associations of GAD-7 sum scores with self-report measures of depression and social anxiety supported construct validity. The psychometric properties of the GAD-7 in this sample of adolescents were similar to those reported among adults. However, studies in which diagnostic interviews are performed are needed to demonstrate the diagnostic efficacy of the measure in this age group.
  • Unemo, Magnus; Hansen, Marit; Hadad, Ronza; Puolakkainen, Mirja; Westh, Henrik; Rantakokko-Jalava, Kaisu; Thilesen, Carina; Cole, Michelle J; Boiko, Iryna; Lan, Pham T; Golparian, Daniel; Ito, Shin; Sundqvist, Martin (BioMed Central, 2020)
    Abstract Background Four new variants of Chlamydia trachomatis (nvCTs), detected in several countries, cause false-negative or equivocal results using the Aptima Combo 2 assay (AC2; Hologic). We evaluated the clinical sensitivity and specificity, as well as the analytical inclusivity and exclusivity of the updated AC2 for the detection of CT and Neisseria gonorrhoeae (NG) on the automated Panther system (Hologic). Methods We examined 1004 clinical AC2 samples and 225 analytical samples spiked with phenotypically and/or genetically diverse NG and CT strains, and other potentially cross-reacting microbial species. The clinical AC2 samples included CT wild type (WT)-positive (n = 488), all four described AC2 diagnostic-escape nvCTs (n = 170), NG-positive (n = 214), and CT/NG-negative (n = 202) specimens. Results All nvCT-positive samples (100%) and 486 (99.6%) of the CT WT-positive samples were positive in the updated AC2. All NG-positive, CT/NG-negative, Trichomonas vaginalis (TV)-positive, bacterial vaginosis-positive, and Candida-positive AC2 specimens gave correct results. The clinical sensitivity and specificity of the updated AC2 for CT detection was 99.7 and 100%, respectively, and for NG detection was 100% for both. Examining spiked samples, the analytical inclusivity and exclusivity were 100%, i.e., in clinically relevant concentrations of spiked microbe. Conclusions The updated AC2, including two CT targets and one NG target, showed a high sensitivity, specificity, inclusivity and exclusivity for the detection of CT WT, nvCTs, and NG. The updated AC2 on the fully automated Panther system offers a simple, rapid, high-throughput, sensitive, and specific diagnosis of CT and NG, which can easily be combined with detection of Mycoplasma genitalium and TV.
  • Unemo, Magnus; Hansen, Marit; Hadad, Ronza; Puolakkainen, Mirja; Westh, Henrik; Rantakokko-Jalava, Kaisu; Thilesen, Carina; Cole, Michelle J.; Boiko, Iryna; Lan, Pham T.; Golparian, Daniel; Ito, Shin; Sundqvist, Martin (2020)
    Background Four new variants ofChlamydia trachomatis(nvCTs), detected in several countries, cause false-negative or equivocal results using the Aptima Combo 2 assay (AC2; Hologic). We evaluated the clinical sensitivity and specificity, as well as the analytical inclusivity and exclusivity of the updated AC2 for the detection of CT andNeisseria gonorrhoeae(NG) on the automated Panther system (Hologic). Methods We examined 1004 clinical AC2 samples and 225 analytical samples spiked with phenotypically and/or genetically diverse NG and CT strains, and other potentially cross-reacting microbial species. The clinical AC2 samples included CT wild type (WT)-positive (n = 488), all four described AC2 diagnostic-escape nvCTs (n = 170), NG-positive (n = 214), and CT/NG-negative (n = 202) specimens. Results All nvCT-positive samples (100%) and 486 (99.6%) of the CT WT-positive samples were positive in the updated AC2. All NG-positive, CT/NG-negative,Trichomonas vaginalis(TV)-positive, bacterial vaginosis-positive, andCandida-positive AC2 specimens gave correct results. The clinical sensitivity and specificity of the updated AC2 for CT detection was 99.7 and 100%, respectively, and for NG detection was 100% for both. Examining spiked samples, the analytical inclusivity and exclusivity were 100%, i.e., in clinically relevant concentrations of spiked microbe. Conclusions The updated AC2, including two CT targets and one NG target, showed a high sensitivity, specificity, inclusivity and exclusivity for the detection of CT WT, nvCTs, and NG. The updated AC2 on the fully automated Panther system offers a simple, rapid, high-throughput, sensitive, and specific diagnosis of CT and NG, which can easily be combined with detection ofMycoplasma genitaliumand TV.
  • Wu, Teddy Y.; Sobowale, Oluwaseun; Hurford, Robert; Sharma, Gagan; Christensen, Soren; Yassi, Nawaf; Tatlisumak, Turgut; Desmond, Patricia M.; Campbell, Bruce C. V.; Davis, Stephen M.; Parry-Jones, Adrian R.; Meretoja, Atte (2016)
    Haematoma and oedema size determines outcome after intracerebral haemorrhage (ICH), with each added 10 % volume increasing mortality by 5 %. We assessed the reliability of semi-automated computed tomography planimetry using Analyze and Osirix softwares. We randomly selected 100 scans from 1329 ICH patients from two centres. We used Hounsfield Unit thresholds of 5-33 for oedema and 44-100 for ICH. Three raters segmented all scans using both softwares and 20 scans repeated for intra-rater reliability and segmentation timing. Volumes reported by Analyze and Osirix were compared to volume estimates calculated using the best practice method, taking effective individual slice thickness, i.e. voxel depth, into account. There was excellent overall inter-rater, intra-rater and inter-software reliability, all intraclass correlation coefficients > 0.918. Analyze and Osirix produced similar haematoma (mean difference: Analyze -aEuroeOsirix = 1.5 +/- 5.2 mL, 6 %, p aecurrency signaEuroe0.001) and oedema volumes (-0.6 +/- 12.6 mL, -3 %, p = 0.377). Compared to a best practice approach to volume calculation, the automated haematoma volume output was 2.6 mL (-11 %) too small with Analyze and 4.0 mL (-18 %) too small with Osirix, whilst the oedema volumes were 2.5 mL (-12 %) and 5.5 mL (-25 %) too small, correspondingly. In scans with variable slice thickness, the volume underestimations were larger, -29%/-36 % for ICH and -29 %/-41 % for oedema. Mean segmentation times were 6:53 +/- 4:02 min with Analyze and 9:06 +/- 5:24 min with Osirix (p <0.001). Our results demonstrate that the method used to determine voxel depth can influence the final volume output markedly. Results of clinical and collaborative studies need to be considered in the context of these methodological differences.
  • Ponkilainen, Ville T.; Miettinen, Mikko; Sandelin, Henrik; Lindahl, Jan; Häkkinen, Arja H.; Toom, Alar; Tillgren, Tomi; Ilves, Outi; Latvala, Antti O.; Ahonen, Katri; Sirola, Timo; Sampo, Mika; Väistö, Olli; Repo, Jussi P. (2021)
    Background: The 16-item patient-reported Manchester-Oxford Foot Questionnaire (MOXFQ) with subscales of pain, social interactions, and walking/standing has been claimed for strongest scientific evidence in measuring foot and ankle complaints. This study tests the validity of the Finnish MOXFQ for orthopaedic foot and ankle population using the Rasch analysis. Methods: We translated the MOXFQ into Finnish and used that translation in our study. MOXFQ scores were obtained from 183 patients. Response category distribution, item fit, coverage, targeting, item dependency, ability to measure latent trait (unidimensionality), internal consistency (Cronbach's alpha), and person separation index (PSI) were analyzed. Results: Fifteen of the items had ordered response categories and/or sufficient fit statistics. The subscales provided coverage and targeting. Some residual correlation was noted. Removing one item in the pain subscale led to a unidimensional structure. Alphas and PSIs ranged between 0.68-0.90 and 0.67-0.92, respectively. Conclusions: Despite some infractions of the Rasch model, the instrument functioned well. The subscales of the MOXFQ are meaningful for assessing patient-reported complaints and outcomes in orthopaedic foot and ankle population. (C) 2020 European Foot and Ankle Society. Published by Elsevier Ltd. All rights reserved.
  • Myllyaho, Lalli Santeri; Raatikainen, Mikko; Männistö, Tomi; Mikkonen, Tommi; Nurminen, Jukka K (2021)
    Context: Artificial intelligence (AI) has made its way into everyday activities, particularly through new techniques such as machine learning (ML). These techniques are implementable with little domain knowledge. This, combined with the difficulty of testing AI systems with traditional methods, has made system trustworthiness a pressing issue. Objective: This paper studies the methods used to validate practical AI systems reported in the literature. Our goal is to classify and describe the methods that are used in realistic settings to ensure the dependability of AI systems. Method: A systematic literature review resulted in 90 papers. Systems presented in the papers were analysed based on their domain, task, complexity, and applied validation methods. Results: The validation methods were synthesized into a taxonomy consisting of trial, simulation, model-centred validation, and expert opinion. Failure monitors, safety channels, redundancy, voting, and input and output restrictions are methods used to continuously validate the systems after deployment. Conclusions: Our results clarify existing strategies applied to validation. They form a basis for the synthesization, assessment, and refinement of AI system validation in research and guidelines for validating individual systems in practice. While various validation strategies have all been relatively widely applied, only few studies report on continuous validation.