Two bracketing schemes for the Penn Treebank

Show simple item record

dc.contributor University of Helsinki, Department of Modern Languages 2010-2017 en
dc.contributor.author Yli-Jyrä, Anssi
dc.date.accessioned 2011-02-22T12:46:17Z
dc.date.available 2011-02-22T12:46:17Z
dc.date.issued 2006
dc.identifier.citation Yli-Jyrä , A 2006 , Two bracketing schemes for the Penn Treebank . in A man of measure . The Linguistic Association of Finland , Turku , pp. 472 - 479 . < http://www.ling.helsinki.fi/sky/julkaisut/sky2006special.shtml > en
dc.identifier.other PURE: 661968
dc.identifier.other PURE UUID: 989956b9-dea4-4db9-ae9c-ffe72fae3754
dc.identifier.other dawa_publication: 159463
dc.identifier.other ORCID: /0000-0003-0731-2114/work/29531062
dc.identifier.uri http://hdl.handle.net/10138/24928
dc.description.abstract The trees in the Penn Treebank have a standard representation that involves complete balanced bracketing. In this article, an alternative for this standard representation of the tree bank is proposed. The proposed representation for the trees is loss-less, but it reduces the total number of brackets by 28%. This is possible by omitting the redundant pairs of special brackets that encode initial and final embedding, using a technique proposed by Krauwer and des Tombe (1981). In terms of the paired brackets, the maximum nesting depth in sentences decreases by 78%. The 99.9% coverage is achieved with only five non-top levels of paired brackets. The observed shallowness of the reduced bracketing suggests that finite-state based methods for parsing and searching could be a feasible option for tree bank processing. fi
dc.format.extent 8
dc.language.iso eng
dc.publisher The Linguistic Association of Finland
dc.relation.ispartof A man of measure
dc.relation.uri http://www.ling.helsinki.fi/sky/julkaisut/sky2006special.shtml
dc.relation.uri http://www.ling.helsinki.fi/sky/julkaisut/SKY2006_1/2.6.9.%20YLI-JYRA.pdf
dc.rights en
dc.subject 612 Languages and Literature en
dc.subject kieliteknologia en
dc.subject language technology en
dc.subject treebanks en
dc.subject syntax en
dc.subject phrase markers en
dc.subject kieliteknologia en
dc.subject 113 Computer and information sciences en
dc.subject finite-state methods en
dc.title Two bracketing schemes for the Penn Treebank en
dc.title.alternative Kaksi tapaa Penn Treebank -puupankin suluttamiseksi en
dc.type Chapter
dc.type.uri info:eu-repo/semantics/other
dc.contributor.pbl

Files in this item

Total number of downloads: Loading...

Files Size Format View
2.6.9._YLI_JYRA.pdf 181.6Kb PDF View/Open

This item appears in the following Collection(s)

Show simple item record