Two bracketing schemes for the Penn Treebank

Show full item record



Yli-Jyrä , A 2006 , Two bracketing schemes for the Penn Treebank . in A man of measure . The Linguistic Association of Finland , Turku , pp. 472 - 479 . < >

Title: Two bracketing schemes for the Penn Treebank
Alternative title: Kaksi tapaa Penn Treebank -puupankin suluttamiseksi
Author: Yli-Jyrä, Anssi
Contributor organization: Department of Modern Languages 2010-2017
Publisher: The Linguistic Association of Finland
Date: 2006
Language: eng
Number of pages: 8
Belongs to series: A man of measure
Abstract: The trees in the Penn Treebank have a standard representation that involves complete balanced bracketing. In this article, an alternative for this standard representation of the tree bank is proposed. The proposed representation for the trees is loss-less, but it reduces the total number of brackets by 28%. This is possible by omitting the redundant pairs of special brackets that encode initial and final embedding, using a technique proposed by Krauwer and des Tombe (1981). In terms of the paired brackets, the maximum nesting depth in sentences decreases by 78%. The 99.9% coverage is achieved with only five non-top levels of paired brackets. The observed shallowness of the reduced bracketing suggests that finite-state based methods for parsing and searching could be a feasible option for tree bank processing.
Subject: 612 Languages and Literature
language technology
phrase markers
113 Computer and information sciences
finite-state methods
Peer reviewed: Yes
Usage restriction: openAccess
Self-archived version: publishedVersion

Files in this item

Total number of downloads: Loading...

Files Size Format View
2.6.9._YLI_JYRA.pdf 181.6Kb PDF View/Open

This item appears in the following Collection(s)

Show full item record