Bounded-Depth High-Coverage Search Space for Noncrossing Parses

Show full item record



Permalink

http://hdl.handle.net/10138/232473

Citation

Yli-Jyrä , A M 2017 , Bounded-Depth High-Coverage Search Space for Noncrossing Parses . in F Drewes (ed.) , Proceedings of the 13th International Conference on Finite State Methods and Natural Language Processing : FSMNLP 2017 . The Association for Computational Linguistics , Stroudsburg , pp. 30-40 , International Conference on Finite State Methods and Natural Language Processing (FSMNLP) , Umeå , Sweden , 05/09/2017 . https://doi.org/10.18653/v1/w17-4004

Title: Bounded-Depth High-Coverage Search Space for Noncrossing Parses
Author: Yli-Jyrä, Anssi Mikael
Editor: Drewes, Frank
Contributor: University of Helsinki, Department of Modern Languages 2010-2017
Publisher: The Association for Computational Linguistics
Date: 2017-09-04
Language: eng
Number of pages: 11
Belongs to series: Proceedings of the 13th International Conference on Finite State Methods and Natural Language Processing FSMNLP 2017
ISBN: 978-1-5108-4746-0
URI: http://hdl.handle.net/10138/232473
Abstract: A recently proposed encoding for noncrossing digraphs can be used to implement generic inference over families of these digraphs and to carry out first-order factored dependency parsing. It is now shown that the recent proposal can be substantially streamlined without information loss. The improved encoding is less dependent on hierarchical processing and it gives rise to a high-coverage bounded-depth approximation of the space of non- crossing digraphs. This subset is presented elegantly by a finite-state machine that recognizes an infinite set of encoded graphs. The set includes more than 99.99% of the 0.6 million noncrossing graphs obtained from the UDv2 treebanks through planarisation. Rather than taking the low probability of the residual as a flat rate, it can be modelled with a joint probability distribution that is factorised into two underlying stochastic processes – the sentence length distribution and the related conditional distribution for deep nesting. This model points out that deep nesting in the streamlined code requires extreme sentence lengths. High depth is categorically out in common sentence lengths but emerges slowly at infrequent lengths that prompt further inquiry.
Subject: 6121 Languages
dependency graphs
universal dependencies
embedding
finite-state methods
syntax
sentence length
dependency syntax
syntax
bracketing
finite-state automata
self-embedding
context-free grammar
sentence length
regular expressions
universal dependencies
treebanks
corpus linguistcs
projectivity
syntactic complexity
limits on embedding
113 Computer and information sciences
transducers
encoding
context-free grammars
finite-state automata
state complexity
finite automata
context-free grammar
graphs
digraphs
superbrackets
finite-state approximation
truncated stack
111 Mathematics
state complexity
path-width
narrowness
regularity
112 Statistics and probability
sentence length
sentence types
sentence length
syntactic complexity
Rights:


Files in this item

Total number of downloads: Loading...

Files Size Format View
W17_4004.pdf 355.9Kb PDF View/Open

This item appears in the following Collection(s)

Show full item record