Statistics

 

Recent Submissions

  • Jach, Agnieszka (2018)
    Preparation of Moodle quizzes which are data-based and contemporary tends to be tedious and time-consuming. By using innovative tools, this process can be simplified and automated, providing a substantial benefit to the teacher wishing to employ such quizzes, and ultimately improving student learning experience. The purpose of this article is to show how to create data-driven, up-to-date quizzes for Moodle in an easy fashion. The methodology is based on several popular, open-source, free tools, and its implementation details are demonstrated with an example. This makes the methodology readily-available to the practitioners.
  • Fellman, Johan (2018-06-21)
    Scientists have analysed different methods for numerical estimation of Gini coefficients. Using Lorenz curves, various numerical integration attempts have been made to identify accurate estimates. Central alternative methods have been the trapezium, Simpson and Lagrange rules. They are all special cases of the Newton-Cotes methods. In this study, we approximate the Lorenz curve by polynomial regression models and integrate optimal regression models for numerical estimation of the Gini coefficient. The attempts are checked on theoretical Lorenz curves and on empirical Lorenz curves with known Gini indices. In all cases the proposed methods seem to be a good alternative to earlier methods presented in the literature.
  • Hartikainen, Saara M.; Jach, Agnieszka; Grané, Aurea; Robson, Thomas Matthew (2018-09-12)
    Forest canopies create dynamic light environments in their understorey, where spectral composition changes among patterns of shade and sunflecks, and through the seasons with canopy phenology and sun angle. Plants use spectral composition as a cue to adjust their growth strategy for optimal resource use. Quantifying the ever‐changing nature of the understorey light environment is technically challenging with respect to data collection. Thus, to capture the simultaneous variation occurring in multiple regions of the solar spectrum, we recorded spectral irradiance from forest understoreys over the wavelength range 300–800 nm using an array spectroradiometer. It is also methodologically challenging to analyze solar spectra because of their multi‐scale nature and multivariate lay‐out. To compare spectra, we therefore used a novel method termed thick pen transform (TPT), which is simple and visually interpretable. This enabled us to show that sunlight position in the forest understorey (i.e., shade, semi‐shade, or sunfleck) was the most important factor in determining shape similarity of spectral irradiance. Likewise, the contributions of stand identity and time of year could be distinguished. Spectra from sunflecks were consistently the most similar, irrespective of differences in global irradiance. On average, the degree of cross‐dependence increased with increasing scale, sometimes shifting from negative (dissimilar) to positive (similar) values. We conclude that the interplay of sunlight position, stand identity, and date cannot be ignored when quantifying and comparing spectral composition in forest understoreys. Technological advances mean that array spectroradiometers, which can record spectra contiguously over very short time intervals, are being widely adopted, not only to measure irradiance under pollution, clouds, atmospheric changes, and in biological systems, but also spectral changes at small scales in the photonics industry. We consider that TPT is an applicable method for spectral analysis in any field and can be a useful tool to analyze large datasets in general.
  • Ahlgren, Niklas; Catani, Paul (2017)
    Tests for error autocorrelation (AC) are derived under the assumption of independent and identically distributed errors. The tests are not asymptotically valid if the errors are conditionally heteroskedastic. In this article we propose wild bootstrap (WB) Lagrange multiplier tests for error AC in vector autoregressive (VAR) models. We show that the WB tests are asymptotically valid under conditional heteroskedasticity of unknown form. WB tests based on a version of the heteroskedasticity-consistent covariance matrix estimator are found to have the smallest error in rejection probability under the null and high power under the alternative. We apply the tests to VAR models for credit default swap (CDS) prices and Euribor interest rates. An important result that we find is that the WB tests lead to parsimonious models while the asymptotic tests suggest that a long lag length is required to get white noise residuals.
  • Catani, Paul; Teräsvirta, Timo; Yin, Meiqun (2017)
    A Lagrange multiplier test for testing the parametric structure of a constant conditional correlation-generalized autoregressive conditional heteroskedasticity (CCC-GARCH) model is proposed. The test is based on decomposing the CCC-GARCH model multiplicatively into two components, one of which represents the null model, whereas the other one describes the misspecification. A simulation study shows that the test has good finite sample properties. We compare the test with other tests for misspecification of multivariate GARCH models. The test has high power against alternatives where the misspecification is in the GARCH parameters and is superior to other tests. The test is not greatly affected by misspecification in the conditional correlations and is therefore well suited for considering misspecification of GARCH equations.
  • Fellman, Johan (2018-05-24)
    The sex ratio (SR) is usually defined as the number of males per 100 females within an area or, as in this study, the proportion of males among all births (PM). It has been observed that among newborns, there is typically a slight excess number for boys compared to girls. Consequently, the SR becomes greater than 100, which is around 106 in number, and the chance of new born males is around 0.515. Attempts have been made to identify the factors those are influencing the level of the PM. Previous researches stated that where prenatal losses are low, as in the Western countries, the SRs are also become high around 105 to 106, but in areas where the frequencies of prenatal losses are relatively high then the SRs are found to be low around 102. Later on several researches have focused on temporal, regional and seasonal fluctuations of SR. In general, factorsthat affect the SR within the families remain poorly nderstood. Attempts to identify such factors in national birth registers are also remained to be unsuccessful. Recently, SR studies have mainly concentrated on the dentification of general but occasional factors. In this study, we tried to identify the effects of issues like maternal age and type of delivery (live- and stillborn, singletons and multiples) to identify the controlling parameters of sex ratio during birth. Post experimental outcome showed that there is no significant difference between live- and stillborn and maternal age had as no significant effect for controlling sex ratio. The SR is higher among singletons than that of multiples, but there is no significant difference obtained in SR between twins and triplets. Among singletons the temporal differences are non-significant, but for twins and triplets, significant temporal differences were obtained.
  • Fellman, Johan (2018-02-14)
    Income distributions are commonly unimodal and skew with a heavy right tail. Different skew models, such as the lognormal and the Pareto, have been proposed as suitable descriptions of income distribution and applied in specific empirical situations. More wide-ranging tools have been introduced as measures for general comparisons. In this study, we review the income analysis methods and apply them to specific Lorenz models.
  • Fellman, Johan (2017-12-20)
    In the 19th century, a series of international statistical congresses began that were important for population studies, including twin research. The introduction of common rules for the national demographic registers enabled scientists to contribute to the genesis of statistical research. The congress in St. Petersburg in 1872, in particular, focused on the movements of the population, and how they should be registered. Among the facts to be recorded were in multiple births the sex and number of children born alive or still-born, whether legitimate or illegitimate, and the age of the mother at the date of the births. During the history of twin research Hellin´s law (1895) has played a central role because it is an approximately correct association between the rates of multiple maternities. It has been mathematically proven that Hellin´s law does not hold as a general rule. Analyses show divergences from the law that are difficult to explain and/or eliminate. Varying improvements of this law have been proposed. The majority of all studies of Hellin´s law are based on empirical rates of multiple maternities, ignoring random errors. Such studies can never confirm the law, but only identify errors with respect to Hellin´s law that are too large to be characterised as random. It is of particular interest to note and explain why the rates of higher multiple maternities are sometimes too high or too low when Hellin´s law is used as a benchmark. Studies have shown that there have been investigators before Hellin who have contributed substantially to Hellin´s law. In this paper, we re-examine some old data sets and contributions in which Hellin´s law has been evaluated and also analyse recent data.
  • Rosenqvist, Gunnar (University of Vaasa, 2014)
  • Björk, Bo-Christer; Catani, Paul (2016-01-20)
    A Megajournal is an open access journal which publishes any manuscript which presents scientifically trustworthy empirical results, without asking about the potential scientific contribution prior to publication. Megajournals have rapidly increased their output and are currently publishing around 50,000 articles per year. We report on a small pilot study in which we looked at the citation distributions for articles in megajournals compared to journals with traditional peer review, which also evaluate the proposed ”contribution”. We found that elite journals with very low acceptance rates have far fewer articles with no or few citations, but that the long tail of articles with two citations or less was actually bigger in a sample of more selective traditional journals in comparison to megajournals. This indicates the need for more systematic studies, since the results raise a lot of questions as to how efficiently the current peer review system in reality fulfills its filtering function.
  • Hjertstrand, Per; Rosenqvist, Gunnar (ISI - International Statistical Institute, 2015)