Browsing by Subject "Multimodal"

Sort by: Order: Results:

Now showing items 1-5 of 5
  • Sulubacak, Umut; Caglayan, Ozan; Grönroos, Stig-Arne; Rouhe, Aku; Elliott, Desmond; Specia, Lucia; Tiedemann, Jörg (2020)
    Multimodal machine translation involves drawing information from more than one modality, based on the assumption that the additional modalities will contain useful alternative views of the input data. The most prominent tasks in this area are spoken language translation, image-guided translation, and video-guided translation, which exploit audio and visual modalities, respectively. These tasks are distinguished from their monolingual counterparts of speech recognition, image captioning, and video captioning by the requirement of models to generate outputs in a different language. This survey reviews the major data resources for these tasks, the evaluation campaigns concentrated around them, the state of the art in end-to-end and pipeline approaches, and also the challenges in performance evaluation. The paper concludes with a discussion of directions for future research in these areas: the need for more expansive and challenging datasets, for targeted evaluations of model performance, and for multimodality in both the input and output space.
  • Ojarinta, Rami; Saarinen, Jukka; Strachan, Clare J.; Korhonen, Ossi; Laitinen, Riikka (2018)
    Co-amorphous mixtures have rarely been formulated as oral dosage forms, even though they have been shown to stabilize amorphous drugs in the solid state and enhance the dissolution properties of poorly soluble drugs. In the present study we formulated tablets consisting of either spray dried co-amorphous ibuprofen-arginine or indomethacin-arginine, mannitol or xylitol and polyvinylpyrrolidone K30 (PVP). Experimental design was used for the selection of tablet compositions, and the effect of tablet composition on tablet characteristics was modelled. Multimodal non-linear imaging, including coherent anti-Stokes Raman scattering (CARS) and sum frequency/second harmonic generation (SFG/SHG) microscopies, as well as scanning electron microscopy, X-ray diffractometry and Fourier-transform infrared spectroscopy were utilized to characterize the tablets. The tablets possessed sufficient strength, but modelling produced no clear evidence about the compaction characteristics of co-amorphous salts. However, co-amorphous drug-arginine mixtures resulted in enhanced dissolution behaviour, and the PVP in the tableting mixture stabilized the supersaturation. The co-amorphous mixtures were physically stable during compaction, but the excipient selection affected the long term stability of the ibuprofen-arginine mixture. CARS and SFG/SHG proved feasible techniques in imaging the component distribution on the tablet surfaces, but possibly due to the limited imaging area, recrystallization detected with xray diffraction was not detected.
  • Kajamaa, Anu; Kumpulainen, Kristiina (2020)
    In this study, we aim to widen the understanding of how students' collaborative knowledge practices are mediated multimodally in a school's makerspace learning environment. Taking a sociocultural stance, we analyzed students' knowledge practices while carrying out STEAM learning challenges in small groups in the FUSE Studio, an elementary school's makerspace. Our findings show how discourse, digital and other "hands on" materials, embodied actions, such as gestures and postures, and the physical space with its arrangements mediated the students' knowledge practices. Our analysis of these mediational means led us to identifying four types of multimodal knowledge practice, namely orienting, interpreting, concretizing, and expanding knowledge, which guided and facilitated the students' creation of shared epistemic objects, artifacts, and their collective learning. However, due to the multimodal nature of knowledge practices, carrying out learning challenges in a makerspace can be challenging for students. To enhance the educational potential of makerspaces in supporting students' knowledge creation and learning, further attention needs to be directed to the development of new pedagogical solutions, to better facilitate multimodal knowledge practices and their collective management.
  • von Zansen, Anna; Hilden, Raili; Laihanen, Emma (2022)
    In this study, we used the Rasch measurement to investigate the fairness of the listening section of a national computerized high-stakes English test for differential item functioning (DIF) across gender subgroups. The computerized test format inspired us to investigate whether the items measure listening comprehension differently for females and males. Exploring the functioning of novel task types including multimodal materials such as videos and pictures was especially interesting. Firstly, the unidimensionality and local independence of the data were examined as preconditions for DIF analysis. Secondly, the authors explored the performance of female and male students through DIF analysis using the Rasch measurement. The uniform DIF analysis showed that 25 items (out of 30 items) displayed DIF and favored different gender subgroups, whereas the effect size was not meaningful. The non-uniform DIF analysis revealed several items exhibiting DIF with a moderate to large effect size, favoring various gender and ability groups. Explanations for DIF are hypothesized. Finally, implications of the study regarding test development and fairness are discussed.
  • Vázquez , Raúl; Aulamo, Mikko; Sulubacak, Umut; Tiedemann, Jörg (The Association for Computational Linguistics, 2020)
    This paper describes the University of Helsinki Language Technology group’s participation in the IWSLT 2020 offline speech translation task, addressing the translation of English audio into German text. In line with this year’s task objective, we train both cascade and end-to-end systems for spoken language translation. We opt for an end-to-end multitasking architecture with shared internal representations and a cascade approach that follows a standard procedure consisting of ASR, correction, and MT stages. We also describe the experiments that served as a basis for the submitted systems. Our experiments reveal that multitasking training with shared internal representations is not only possible but allows for knowledge-transfer across modalities.