Testing the Generalization Power of Neural Network Models Across NLI Benchmarks

Show simple item record

dc.contributor.author Talman, Aarne Johannes
dc.contributor.author Chatzikyriakidis, Stergios
dc.contributor.editor Linzen, Tal
dc.contributor.editor Chrupała, Grzegorz
dc.contributor.editor Belinkov, Yonatan
dc.contributor.editor Hupkes, Dieuwke
dc.date.accessioned 2019-08-12T07:42:01Z
dc.date.available 2019-08-12T07:42:01Z
dc.date.issued 2019-08-01
dc.identifier.citation Talman , A J & Chatzikyriakidis , S 2019 , Testing the Generalization Power of Neural Network Models Across NLI Benchmarks . in T Linzen , G Chrupała , Y Belinkov & D Hupkes (eds) , The Workshop BlackboxNLP on Analyzing and Interpreting Neural Networks for NLP at ACL 2019 : Proceedings of the Second Workshop . The Association for Computational Linguistics , Stroudsburg , pp. 85-94 , 2019 ACL Workshop BlackboxNLP , Florence , Italy , 01/08/2019 .
dc.identifier.citation workshop
dc.identifier.other PURE: 126070329
dc.identifier.other PURE UUID: bc7912f9-a49a-4d1e-8a83-ab3b37164efa
dc.identifier.other ORCID: /0000-0002-3573-5993/work/60613545
dc.identifier.other WOS: 000538563900011
dc.identifier.uri http://hdl.handle.net/10138/304485
dc.description.abstract Neural network models have been very successful in natural language inference, with the best models reaching 90% accuracy in some benchmarks. However, the success of these models turns out to be largely benchmark specific. We show that models trained on a natural language inference dataset drawn from one benchmark fail to perform well in others, even if the notion of inference assumed in these benchmarks is the same or similar. We train six high performing neural network models on different datasets and show that each one of these has problems of generalizing when we replace the original test set with a test set taken from another corpus designed for the same task. In light of these results, we argue that most of the current neural network models are not able to generalize well in the task of natural language inference. We find that using large pre-trained language models helps with transfer learning when the datasets are similar enough. Our results also highlight that the current NLI datasets do not cover the different nuances of inference extensively enough. en
dc.format.extent 10
dc.language.iso eng
dc.publisher The Association for Computational Linguistics
dc.relation.ispartof The Workshop BlackboxNLP on Analyzing and Interpreting Neural Networks for NLP at ACL 2019
dc.relation.isversionof 978-1-950737-30-7
dc.rights cc_by
dc.rights.uri info:eu-repo/semantics/openAccess
dc.subject 113 Computer and information sciences
dc.subject 6121 Languages
dc.title Testing the Generalization Power of Neural Network Models Across NLI Benchmarks en
dc.type Conference contribution
dc.contributor.organization Department of Digital Humanities
dc.contributor.organization Language Technology
dc.description.reviewstatus Peer reviewed
dc.rights.accesslevel openAccess
dc.type.version publishedVersion

Files in this item

Total number of downloads: Loading...

Files Size Format View
W19_4810.pdf 335.9Kb PDF View/Open

This item appears in the following Collection(s)

Show simple item record