Deep Biaffine Attention for Neural Dependency Parsing

international conference on learning representations, 2017.

Cited by: 363|Bibtex|Views43|Links
EI
Keywords:
neural dependency parsernoah a smithmaximum spaning treeneural graphdependency parserMore(4+)
Wei bo:
In this paper we proposed using a modified version of bilinear attention in a neural dependency parser that increases parsing speed without hurting performance

Abstract:

This paper builds off recent work from Kiperwasser u0026 Goldberg (2016) using neural attention in a simple graph-based dependency parser. We use a larger but more thoroughly regularized parser than other recent BiLSTM-based approaches, withbiaffine classifiers to predict arcs and labels. Our parser gets state of the art or near state of ...More

Code:

Data:

0
Introduction
  • Dependency parsers—which annotate sentences in a way designed to be easy for humans and computers alike to understand—have been found to be extremely useful for a sizable number of NLP tasks, especially those involving natural language understanding in some way (Bowman et al, 2016; Angeli et al, 2015; Levy & Goldberg, 2014; Toutanova et al, 2016; Parikh et al, 2015).
  • A number of other researchers have attempted to address some limitations of Chen & Manning’s Chen & Manning parser by augmenting it with additional complexity: Weiss et al (2015) and Andor et al (2016) augment it with a beam search and a conditional random field loss objective to allow the parser to “undo” previous actions once it finds evidence that they may have been incorrect; and Dyer et al (2015) and (Kuncoro et al, 2016) instead use LSTMs to represent the stack and buffer, getting state-of-the-art performance by building in a way of composing parsed phrases together
Highlights
  • Dependency parsers—which annotate sentences in a way designed to be easy for humans and computers alike to understand—have been found to be extremely useful for a sizable number of NLP tasks, especially those involving natural language understanding in some way (Bowman et al, 2016; Angeli et al, 2015; Levy & Goldberg, 2014; Toutanova et al, 2016; Parikh et al, 2015)
  • The resulting parser maintains most of the simplicity of neural graph-based approaches while approaching the performance of the SOTA transition-based one
  • In this paper we proposed using a modified version of bilinear attention in a neural dependency parser that increases parsing speed without hurting performance
  • We showed that our larger but more regularized network outperforms other neural graph-based parsers and gets comparable performance to the current SOTA transition-based parser
Results
  • 4.1 DATASETS

    The authors show test results for the proposed model on the English Penn Treebank, converted into Stanford Dependencies using both version 3.3.0 and version 3.5.0 of the Stanford Dependency converter (PTB-SD 3.3.0 and PTB-SD 3.5.0); the Chinese Penn Treebank; and the CoNLL 09 shared task dataset,3 following standard practices for each dataset.
  • The SOTA model is designed to capture phrasal compositionality; so another possibility is that ours doesn’t capture this compositionality as effectively, and that this results in a worse label score
  • It may be the result of a more general limitation of graph-based parsers, which have access to less explicit syntactic information than transition-based parsers when making decisions.
  • Addressing these latter two limitations would require a more innovative architecture than the relatively simple one used in current neural graph-based parsers
Conclusion
  • In this paper the authors proposed using a modified version of bilinear attention in a neural dependency parser that increases parsing speed without hurting performance.
  • The authors showed that the larger but more regularized network outperforms other neural graph-based parsers and gets comparable performance to the current SOTA transition-based parser.
  • The authors provided empirical motivation for the proposed architecture and configuration over similar ones in the existing literature.
  • Future work will involve exploring ways of bridging the gap between labeled and unlabeled accuracy and augment the parser with a smarter way of handling out-of-vocabulary tokens for morphologically richer languages
Summary
  • Introduction:

    Dependency parsers—which annotate sentences in a way designed to be easy for humans and computers alike to understand—have been found to be extremely useful for a sizable number of NLP tasks, especially those involving natural language understanding in some way (Bowman et al, 2016; Angeli et al, 2015; Levy & Goldberg, 2014; Toutanova et al, 2016; Parikh et al, 2015).
  • A number of other researchers have attempted to address some limitations of Chen & Manning’s Chen & Manning parser by augmenting it with additional complexity: Weiss et al (2015) and Andor et al (2016) augment it with a beam search and a conditional random field loss objective to allow the parser to “undo” previous actions once it finds evidence that they may have been incorrect; and Dyer et al (2015) and (Kuncoro et al, 2016) instead use LSTMs to represent the stack and buffer, getting state-of-the-art performance by building in a way of composing parsed phrases together
  • Results:

    4.1 DATASETS

    The authors show test results for the proposed model on the English Penn Treebank, converted into Stanford Dependencies using both version 3.3.0 and version 3.5.0 of the Stanford Dependency converter (PTB-SD 3.3.0 and PTB-SD 3.5.0); the Chinese Penn Treebank; and the CoNLL 09 shared task dataset,3 following standard practices for each dataset.
  • The SOTA model is designed to capture phrasal compositionality; so another possibility is that ours doesn’t capture this compositionality as effectively, and that this results in a worse label score
  • It may be the result of a more general limitation of graph-based parsers, which have access to less explicit syntactic information than transition-based parsers when making decisions.
  • Addressing these latter two limitations would require a more innovative architecture than the relatively simple one used in current neural graph-based parsers
  • Conclusion:

    In this paper the authors proposed using a modified version of bilinear attention in a neural dependency parser that increases parsing speed without hurting performance.
  • The authors showed that the larger but more regularized network outperforms other neural graph-based parsers and gets comparable performance to the current SOTA transition-based parser.
  • The authors provided empirical motivation for the proposed architecture and configuration over similar ones in the existing literature.
  • Future work will involve exploring ways of bridging the gap between labeled and unlabeled accuracy and augment the parser with a smarter way of handling out-of-vocabulary tokens for morphologically richer languages
Tables
  • Table1: Model hyperparameters
  • Table2: Test accuracy and speed on PTB-SD 3.5.0. Statistically significant differences are marked with an asterisk
  • Table3: Test Accuracy on PTB-SD 3.5.0. Statistically significant differences are marked with an asterisk
  • Table4: Results on the English PTB and Chinese PTB parsing datasets
  • Table5: Results on the CoNLL ’09 shared task datasets
Download tables as Excel
Reference
  • Daniel Andor, Chris Alberti, David Weiss, Aliaksei Severyn, Alessandro Presta, Kuzman Ganchev, Slav Petrov, and Michael Collins. Globally normalized transition-based neural networks. In Association for Computational Linguistics, 2016. URL https://arxiv.org/abs/1603.06042.
    Findings
  • Gabor Angeli, Melvin Johnson Premkumar, and Christopher D Manning. Leveraging linguistic structure for open domain information extraction. In Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics (ACL 2015), 2015.
    Google ScholarLocate open access versionFindings
  • Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. Neural machine translation by jointly learning to align and translate. International Conference on Learning Representations, 2014.
    Google ScholarLocate open access versionFindings
  • Miguel Ballesteros, Yoav Goldberg, Chris Dyer, and Noah A Smith. Training with exploration improves a greedy stack-LSTM parser. Proceedings of the conference on empirical methods in natural language processing, 2016.
    Google ScholarLocate open access versionFindings
  • Samuel R Bowman, Jon Gauthier, Abhinav Rastogi, Raghav Gupta, Christopher D Manning, and Christopher Potts. A fast unified model for parsing and sentence understanding. ACL 2016, 2016.
    Google ScholarLocate open access versionFindings
  • Danqi Chen and Christopher D Manning. A fast and accurate dependency parser using neural networks. In Proceedings of the conference on empirical methods in natural language processing, pp. 740–750, 2014.
    Google ScholarLocate open access versionFindings
  • Hao Cheng, Hao Fang, Xiaodong He, Jianfeng Gao, and Li Deng. Bi-directional attention with agreement for dependency parsing. arXiv preprint arXiv:1608.02076, 2016.
    Findings
  • Chris Dyer, Miguel Ballesteros, Wang Ling, Austin Matthews, and Noah A Smith. Transitionbased dependency parsing with stack long short-term memory. Proceedings of the conference on empirical methods in natural language processing, 2015.
    Google ScholarLocate open access versionFindings
  • Yarin Gal and Zoubin Ghahramani. Dropout as a bayesian approximation: Representing model uncertainty in deep learning. International Conference on Machine Learning, 2015.
    Google ScholarLocate open access versionFindings
  • Klaus Greff, Rupesh Kumar Srivastava, Jan Koutnık, Bas R Steunebrink, and Jurgen Schmidhuber. LSTM: A search space odyssey. IEEE Transactions on Neural Networks and Learning Systems, 2015.
    Google ScholarLocate open access versionFindings
  • Kazuma Hashimoto, Caiming Xiong, Yoshimasa Tsuruoka, and Richard Socher. A joint many-task model: Growing a neural network for multiple nlp tasks. arXiv preprint arXiv:1611.01587, 2016.
    Findings
  • Diederik Kingma and Jimmy Ba. Adam: A method for stochastic optimization. International Conference on Learning Representations, 2014.
    Google ScholarLocate open access versionFindings
  • Eliyahu Kiperwasser and Yoav Goldberg. Simple and accurate dependency parsing using bidirectional LSTM feature representations. Transactions of the Association for Computational Linguistics, 4:313–327, 2016.
    Google ScholarLocate open access versionFindings
  • Adhiguna Kuncoro, Miguel Ballesteros, Lingpeng Kong, Chris Dyer, Graham Neubig, and Noah A. Smith. What do recurrent neural network grammars learn about syntax? CoRR, abs/1611.05774, 2016. URL http://arxiv.org/abs/1611.05774.
    Findings
  • Omer Levy and Yoav Goldberg. Dependency-based word embeddings. In ACL 2014, pp. 302–308, 2014.
    Google ScholarLocate open access versionFindings
  • Minh-Thang Luong, Hieu Pham, and Christopher D Manning. Effective approaches to attentionbased neural machine translation. Empirical Methods in Natural Language Processing, 2015.
    Google ScholarLocate open access versionFindings
  • Ankur P Parikh, Hoifung Poon, and Kristina Toutanova. Grounded semantic parsing for complex knowledge extraction. In Proceedings of North American Chapter of the Association for Computational Linguistics, pp. 756–766, 2015.
    Google ScholarLocate open access versionFindings
  • Kristina Toutanova, Dan Klein, Christopher D Manning, and Yoram Singer. Feature-rich part-ofspeech tagging with a cyclic dependency network. In Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1, pp. 173–180. Association for Computational Linguistics, 2003.
    Google ScholarLocate open access versionFindings
  • Kristina Toutanova, Xi Victoria Lin, and Wen-tau Yih. Compositional learning of embeddings for relation paths in knowledge bases and text. In ACL, 2016.
    Google ScholarLocate open access versionFindings
  • David Weiss, Chris Alberti, Michael Collins, and Slav Petrov. Structured training for neural network transition-based parsing. Annual Meeting of the Association for Computational Linguistics, 2015.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments