Transition-Based Coding and Formal Language Theory for Ordered Digraphs

Näytä kaikki kuvailutiedot



Pysyväisosoite

http://hdl.handle.net/10138/306880

Lähdeviite

Yli-Jyrä , A 2019 , Transition-Based Coding and Formal Language Theory for Ordered Digraphs . in H Vogler & A Maletti (eds) , The 14th International Conference on Finite-State Methods and Natural Language Processing : Proceedings of the Conference . Proceedings of the International Conference on Finite-State Methods and Natural Language Processing , The Association for Computational Linguistics , Stroudsburg , pp. 118–131 , International Conference on Finite State Methods and Natural Language Processing , Dresden , Germany , 23/09/2019 . https://doi.org/10.18653/v1/w19-3115

Julkaisun nimi: Transition-Based Coding and Formal Language Theory for Ordered Digraphs
Toissijainen nimi: Järjestettyjen verkkojen siirtymäpohjainen koodaus ja formaalien kielten teoria
Tekijä: Yli-Jyrä, Anssi
Muu tekijä: Vogler, Heiko
Maletti, Andreas
Tekijän organisaatio: Language Technology
Department of Digital Humanities
Julkaisija: The Association for Computational Linguistics
Päiväys: 2019-09-23
Kieli: eng
Sivumäärä: 14
Kuuluu julkaisusarjaan: The 14th International Conference on Finite-State Methods and Natural Language Processing
Kuuluu julkaisusarjaan: Proceedings of the International Conference on Finite-State Methods and Natural Language Processing
ISBN: 978-1-950737-96-3
DOI-tunniste: https://doi.org/10.18653/v1/w19-3115
URI: http://hdl.handle.net/10138/306880
Tiivistelmä: Transition-based parsing of natural language uses transition systems to build directed annotation graphs (digraphs) for sentences. In this paper, we define, for an arbitrary ordered digraph, a unique decomposition and a corresponding linear encoding that are associated bijectively with each other via a new transition system. These results give us an efficient and succinct representation for digraphs and sets of digraphs. Based on the system and our analysis of its syntactic properties, we give structural bounds under which the set of encoded digraphs is restricted and becomes a context-free or a regular string language. The context-free restriction is essentially a superset of the encodings used previously to characterize properties of noncrossing digraphs and to solve maximal subgraphs problems. The regular restriction with a tight bound is shown to capture the Universal Dependencies v2.4 treebanks in linguistics.Transition-based parsing of natural language uses transition systems to build directed annotation graphs (digraphs) for sentences. In this paper, we define, for an arbitrary ordered digraph, a unique decomposition and a corresponding linear encoding that are associated bijectively with each other via a new transition system. These results give us an efficient and succinct representation for digraphs and sets of digraphs. Based on the system and our analysis of its syntactic properties, we give structural bounds under which the set of encoded digraphs is restricted and becomes a context-free or a regular string language. The context-free restriction is essentially a superset of the encodings used previously to characterise properties of noncrossing digraphs and to solve maximal subgraphs problems. The regular restriction with a tight bound is shown to capture the Universal Dependencies v2.4 treebanks in linguistics.
Kuvaus: The ISBN of the host publication can be found on the web site of the conference (https://wwwtcs.inf.tu-dresden.de/fsmnlp2019/accepted_papers/).
Avainsanat: 113 Computer and information sciences
graph representation
encoding
siirtymäjärjestelmöt
6121 Languages
dependency syntax
Vertaisarvioitu: Kyllä
Tekijänoikeustiedot: cc_by
Pääsyrajoitteet: openAccess
Rinnakkaistallennettu versio: publishedVersion


Tiedostot

Latausmäärä yhteensä: Ladataan...

Tiedosto(t) Koko Formaatti Näytä
W19_3115.pdf 343.8KB PDF Avaa tiedosto

Viite kuuluu kokoelmiin:

Näytä kaikki kuvailutiedot