Yli-Jyrä , A 2006 , Two bracketing schemes for the Penn Treebank . in A man of measure . The Linguistic Association of Finland , Turku , pp. 472 - 479 . < http://www.ling.helsinki.fi/sky/julkaisut/sky2006special.shtml >
Title: | Two bracketing schemes for the Penn Treebank |
Alternative title: | Kaksi tapaa Penn Treebank -puupankin suluttamiseksi |
Author: | Yli-Jyrä, Anssi |
Contributor organization: | Department of Modern Languages 2010-2017 |
Publisher: | The Linguistic Association of Finland |
Date: | 2006 |
Language: | eng |
Number of pages: | 8 |
Belongs to series: | A man of measure |
URI: | http://hdl.handle.net/10138/24928 |
Abstract: | The trees in the Penn Treebank have a standard representation that involves complete balanced bracketing. In this article, an alternative for this standard representation of the tree bank is proposed. The proposed representation for the trees is loss-less, but it reduces the total number of brackets by 28%. This is possible by omitting the redundant pairs of special brackets that encode initial and final embedding, using a technique proposed by Krauwer and des Tombe (1981). In terms of the paired brackets, the maximum nesting depth in sentences decreases by 78%. The 99.9% coverage is achieved with only five non-top levels of paired brackets. The observed shallowness of the reduced bracketing suggests that finite-state based methods for parsing and searching could be a feasible option for tree bank processing. |
Subject: |
612 Languages and Literature
kieliteknologia language technology treebanks syntax phrase markers kieliteknologia 113 Computer and information sciences finite-state methods |
Peer reviewed: | Yes |
Usage restriction: | openAccess |
Self-archived version: | publishedVersion |
Total number of downloads: Loading...
Files | Size | Format | View |
---|---|---|---|
2.6.9._YLI_JYRA.pdf | 181.6Kb |
View/ |