Out of vocabulary guesser : Swahili

Show simple item record

dc.contributor.author Hurskainen, Arvi
dc.date.accessioned 2020-07-08T08:39:01Z
dc.date.available 2020-07-08T08:39:01Z
dc.date.issued 2020
dc.identifier.citation Hurskainen , A 2020 ' Out of vocabulary guesser : Swahili ' Technical reports on language technology , no. 53 , University of Helsinki, Institute for Asian and African Studies , Helsinki . < http://www.njas.helsinki.fi/salama/out-of-vocabulary-guesser-swahili.pdf >
dc.identifier.other PURE: 140269753
dc.identifier.other PURE UUID: cdb9b10e-afb2-405b-a7b8-e439028736a6
dc.identifier.other ORCID: /0000-0002-6076-7460/work/77087412
dc.identifier.uri http://hdl.handle.net/10138/317509
dc.description.abstract Free texts include also such words, which are not listed in the analysis system. Yet they need to be treated as part of the vocabulary, so that the unknown elements in text do not unnecessarily disturb the translation process. They cannot be fully treated as the known lexical items, but if we know some basic propertied of the words, we can figure out the structure of the sentence kore precisely. Traditionally, the heuristic guessing of such unknown words was done on the basis of the morphological form of the word only. In this report it is suggested that the unknown words should be treated in two phases. First, we give a tentative assignment of the word in the word-level guesser. In the second phase we test the assignment in context. The first assignment may have two or more assignment candidates, and in the second phase we test which one is the correct one in the context. fi
dc.format.extent 13
dc.language.iso eng
dc.publisher University of Helsinki, Institute for Asian and African Studies
dc.relation.ispartofseries Technical reports on language technology
dc.rights cc_by_nc
dc.rights.uri info:eu-repo/semantics/openAccess
dc.subject 6121 Languages
dc.title Out of vocabulary guesser : Swahili en
dc.type Working paper
dc.contributor.organization Department of Languages
dc.relation.issn 2670-1391
dc.rights.accesslevel openAccess
dc.type.version publishedVersion
dc.identifier.url http://www.njas.helsinki.fi/salama/out-of-vocabulary-guesser-swahili.pdf

Files in this item

Total number of downloads: Loading...

Files Size Format View
out_of_vocabulary_guesser_swahili.pdf 440.8Kb PDF View/Open

This item appears in the following Collection(s)

Show simple item record