Yli-Jyrä , A M 2017 , Bounded-Depth High-Coverage Search Space for Noncrossing Parses . in F Drewes (ed.) , Proceedings of the 13th International Conference on Finite State Methods and Natural Language Processing : FSMNLP 2017 . The Association for Computational Linguistics , Stroudsburg , pp. 30-40 , International Conference on Finite State Methods and Natural Language Processing (FSMNLP) , Umeå , Sweden , 05/09/2017 . https://doi.org/10.18653/v1/w17-4004
Title: | Bounded-Depth High-Coverage Search Space for Noncrossing Parses |
Author: | Yli-Jyrä, Anssi Mikael |
Other contributor: | Drewes, Frank |
Contributor organization: | Department of Modern Languages 2010-2017 Anssi Mikael Yli-Jyrä / Principal Investigator Language Technology |
Publisher: | The Association for Computational Linguistics |
Date: | 2017-09-04 |
Language: | eng |
Number of pages: | 11 |
Belongs to series: | Proceedings of the 13th International Conference on Finite State Methods and Natural Language Processing |
ISBN: | 978-1-5108-4746-0 |
DOI: | https://doi.org/10.18653/v1/w17-4004 |
URI: | http://hdl.handle.net/10138/232473 |
Abstract: | A recently proposed encoding for noncrossing digraphs can be used to implement generic inference over families of these digraphs and to carry out first-order factored dependency parsing. It is now shown that the recent proposal can be substantially streamlined without information loss. The improved encoding is less dependent on hierarchical processing and it gives rise to a high-coverage bounded-depth approximation of the space of non- crossing digraphs. This subset is presented elegantly by a finite-state machine that recognizes an infinite set of encoded graphs. The set includes more than 99.99% of the 0.6 million noncrossing graphs obtained from the UDv2 treebanks through planarisation. Rather than taking the low probability of the residual as a flat rate, it can be modelled with a joint probability distribution that is factorised into two underlying stochastic processes – the sentence length distribution and the related conditional distribution for deep nesting. This model points out that deep nesting in the streamlined code requires extreme sentence lengths. High depth is categorically out in common sentence lengths but emerges slowly at infrequent lengths that prompt further inquiry. A recently proposed encoding for non- crossing digraphs can be used to imple- ment generic inference over families of these digraphs and to carry out first-order factored dependency parsing. It is now shown that the recent proposal can be substantially streamlined without information loss. The improved encoding is less dependent on hierarchical processing and it gives rise to a high-coverage bounded-depth approximation of the space of non- crossing digraphs. This subset is presented elegantly by a finite-state machine that recognises an infinite set of encoded graphs. The set includes more than 99.99% of the 0.6 million noncrossing graphs obtained from the UDv2 treebanks through planarisation. Rather than taking the low probability of the residual as a flat rate, it can be modelled with a joint probability distribution that is factorised into two underlying stochastic processes – the sentence length distribution and the related conditional distribution for deep nesting. This model points out that deep nesting in the streamlined code requires extreme sentence lengths. High depth is categorically out in common sentence lengths but emerges slowly at infrequent lengths that prompt further inquiry. |
Description: | Volume: Proceeding volume: 13 |
Subject: |
6121 Languages
dependency graphs universal dependencies embedding finite-state methods syntax sentence length dependency syntax syntax bracketing finite-state automata self-embedding context-free grammar sentence length regular expressions universal dependencies treebanks corpus linguistcs projectivity syntactic complexity limits on embedding 113 Computer and information sciences transducers encoding context-free grammars finite-state automata state complexity finite automata context-free grammar graphs digraphs superbrackets finite-state approximation truncated stack 111 Mathematics state complexity path-width narrowness regularity 112 Statistics and probability sentence length sentence types sentence length syntactic complexity |
Peer reviewed: | Yes |
Rights: | cc_by |
Usage restriction: | openAccess |
Self-archived version: | publishedVersion |
Funder: | Suomen tietokirjailijat |
Grant number: |
Total number of downloads: Loading...
Files | Size | Format | View |
---|---|---|---|
W17_4004.pdf | 355.9Kb |
View/ |