Deep Semantic Role Labeling: What Works and What's Next

ACL, pp. 473-483, 2017.

Cited by: 249|Bibtex|Views59|Links
EI
Keywords:
deep learningrecent goodprevious stateprepositional phraselearning modelMore(13+)
Weibo:
We presented a new deep learning model for spanbased semantic role labeling with a 10% relative error reduction over the previous state of the art

Abstract:

We introduce a new deep learning model for semantic role labeling (SRL) that significantly improves the state of the art, along with detailed analyses to reveal its strengths and limitations. We use a deep highway BiLSTM architecture with constrained decoding, while observing a number of recent best practices for initialization and regula...More

Code:

Data:

Introduction
  • Semantic role labeling (SRL) systems aim to recover the predicate-argument structure of a sentence, to determine essentially “who did what to whom”, “when”, and “where.” Recently breakthroughs involving end-to-end deep models for SRL without syntactic input (Zhou and Xu, 2015; Marcheggiani et al, 2017) seem to overturn the long-held belief that syntactic parsing is a prerequisite for this task (Punyakanok et al, 2008).
  • The authors show that this result can be pushed further using deep highway bidirectional LSTMs with constrained decoding, again significantly moving the state of the art.
  • Following Zhou and Xu (2015), the authors treat SRL as a BIO tagging problem and use deep bidirectional LSTMs. the authors differ by (1) simplifying the input and output layers, (2) introducing highway connections (Srivastava et al, 2015; Zhang et al, 2016), (3) using recurrent dropout (Gal and Ghahramani, 2016), (4) decoding with BIOconstraints, and (5) ensembling with a product of experts.
  • The authors report performance with predicted predicates to encourage future exploration of end-to-end SRL systems
Highlights
  • Semantic role labeling (SRL) systems aim to recover the predicate-argument structure of a sentence, to determine essentially “who did what to whom”, “when”, and “where.” Recently breakthroughs involving end-to-end deep models for Semantic role labeling without syntactic input (Zhou and Xu, 2015; Marcheggiani et al, 2017) seem to overturn the long-held belief that syntactic parsing is a prerequisite for this task (Punyakanok et al, 2008)
  • We present detailed error analyses to better understand the performance gains, including (1) design choices on architecture, initialization, and regularization that have a surprisingly large impact on model performance; (2) different types of prediction errors showing, e.g., that deep models excel at predicting long-distance dependencies but still struggle with known challenges such as PPattachment errors and adjunct-argument distinctions; (3) the role of syntax, showing that there is significant room for improvement given oracle syntax but errors from existing automatic parsers prevent effective use in Semantic role labeling
  • While the continuing trend of improving Semantic role labeling without syntax seems to suggest that neural end-to-end systems no longer needs parsers, our analysis in Section 4.4 will show that accurate syntactic information can improve these deep models
  • We presented a new deep learning model for spanbased semantic role labeling with a 10% relative error reduction over the previous state of the art
  • While our deep model is better at recovering longdistance predicate-argument relations, we still observe structural inconsistencies, which can be alleviated by constrained decoding
  • Despite recent success without syntactic input, we found that our best neural model can still benefit from accurate syntactic parser output via straightforward constrained decoding
Methods
  • The authors measure the performance of the SRL system on two PropBank-style, span-based SRL datasets: CoNLL-2005 (Carreras and Marquez, 2005) and CoNLL-2012 (Pradhan et al, 2013)3.
  • Both datasets provide gold predicates as part of the input.
  • Initialization All the weight matrices in BiLSTMs are initialized with random orthonormal matrices as described in Saxe et al (2013)
Results
  • In Table 1 and 2, the authors compare the best single and ensemble model with previous work.
  • The authors' ensemble (PoE) has an absolute improvement of 2.1 F1 on both CoNLL 2005 and CoNLL 2012 over the previous state of the art.
  • The authors' single model achieves more than a 0.4 improvement on both datasets.
  • In comparison with the best reported results, the percentage of completely correct predicates improves by 5.9 points.
  • While the continuing trend of improving SRL without syntax seems to suggest that neural end-to-end systems no longer needs parsers, the analysis in Section 4.4 will show that accurate syntactic information can improve these deep models.
Conclusion
  • Conclusion and Future

    Work

    The authors presented a new deep learning model for spanbased semantic role labeling with a 10% relative error reduction over the previous state of the art.
  • While the deep model is better at recovering longdistance predicate-argument relations, the authors still observe structural inconsistencies, which can be alleviated by constrained decoding.
  • Despite recent success without syntactic input, the authors found that the best neural model can still benefit from accurate syntactic parser output via straightforward constrained decoding.
  • The authors observed a 3 F1 improvement by leveraging gold syntax, showing the potential for high quality parsers to further improve deep SRL models
Summary
  • Introduction:

    Semantic role labeling (SRL) systems aim to recover the predicate-argument structure of a sentence, to determine essentially “who did what to whom”, “when”, and “where.” Recently breakthroughs involving end-to-end deep models for SRL without syntactic input (Zhou and Xu, 2015; Marcheggiani et al, 2017) seem to overturn the long-held belief that syntactic parsing is a prerequisite for this task (Punyakanok et al, 2008).
  • The authors show that this result can be pushed further using deep highway bidirectional LSTMs with constrained decoding, again significantly moving the state of the art.
  • Following Zhou and Xu (2015), the authors treat SRL as a BIO tagging problem and use deep bidirectional LSTMs. the authors differ by (1) simplifying the input and output layers, (2) introducing highway connections (Srivastava et al, 2015; Zhang et al, 2016), (3) using recurrent dropout (Gal and Ghahramani, 2016), (4) decoding with BIOconstraints, and (5) ensembling with a product of experts.
  • The authors report performance with predicted predicates to encourage future exploration of end-to-end SRL systems
  • Methods:

    The authors measure the performance of the SRL system on two PropBank-style, span-based SRL datasets: CoNLL-2005 (Carreras and Marquez, 2005) and CoNLL-2012 (Pradhan et al, 2013)3.
  • Both datasets provide gold predicates as part of the input.
  • Initialization All the weight matrices in BiLSTMs are initialized with random orthonormal matrices as described in Saxe et al (2013)
  • Results:

    In Table 1 and 2, the authors compare the best single and ensemble model with previous work.
  • The authors' ensemble (PoE) has an absolute improvement of 2.1 F1 on both CoNLL 2005 and CoNLL 2012 over the previous state of the art.
  • The authors' single model achieves more than a 0.4 improvement on both datasets.
  • In comparison with the best reported results, the percentage of completely correct predicates improves by 5.9 points.
  • While the continuing trend of improving SRL without syntax seems to suggest that neural end-to-end systems no longer needs parsers, the analysis in Section 4.4 will show that accurate syntactic information can improve these deep models.
  • Conclusion:

    Conclusion and Future

    Work

    The authors presented a new deep learning model for spanbased semantic role labeling with a 10% relative error reduction over the previous state of the art.
  • While the deep model is better at recovering longdistance predicate-argument relations, the authors still observe structural inconsistencies, which can be alleviated by constrained decoding.
  • Despite recent success without syntactic input, the authors found that the best neural model can still benefit from accurate syntactic parser output via straightforward constrained decoding.
  • The authors observed a 3 F1 improvement by leveraging gold syntax, showing the potential for high quality parsers to further improve deep SRL models
Tables
  • Table1: Experimental results on CoNLL 2005, in terms of precision (P), recall (R), F1 and percentage of completely correct predicates (Comp.). We report results of our best single and ensemble (PoE) model. The comparison models are <a class="ref-link" id="cZhou_2015_a" href="#rZhou_2015_a">Zhou and Xu (2015</a>), <a class="ref-link" id="cFitzgerald_et+al_2015_a" href="#rFitzgerald_et+al_2015_a">FitzGerald et al (2015</a>), <a class="ref-link" id="cTackstrom_et+al_2015_a" href="#rTackstrom_et+al_2015_a">Tackstrom et al (2015</a>), <a class="ref-link" id="cToutanova_et+al_2008_a" href="#rToutanova_et+al_2008_a">Toutanova et al (2008</a>) and <a class="ref-link" id="cPunyakanok_et+al_2008_a" href="#rPunyakanok_et+al_2008_a">Punyakanok et al (2008</a>)
  • Table2: Experimental results on CoNLL 2012 in the same metrics as above. We compare our best single and ensemble (PoE) models against <a class="ref-link" id="cZhou_2015_a" href="#rZhou_2015_a">Zhou and Xu (2015</a>), <a class="ref-link" id="cFitzgerald_et+al_2015_a" href="#rFitzgerald_et+al_2015_a">FitzGerald et al (2015</a>), <a class="ref-link" id="cTackstrom_et+al_2015_a" href="#rTackstrom_et+al_2015_a">Tackstrom et al (2015</a>) and <a class="ref-link" id="cPradhan_et+al_2013_a" href="#rPradhan_et+al_2013_a">Pradhan et al (2013</a>)
  • Table3: Predicate detection performance and end-to-end SRL results using predicted predicates. ∆ F1 shows the absolute performance drop compared to our best ensemble model with gold predicates
  • Table4: Oracle transformations paired with the relative error reduction after each operation. All the operations are permitted only if they do not cause any overlapping arguments
  • Table5: Confusion matrix for labeling errors, showing the percentage of predicted labels for each gold label. We only count predicted arguments that match gold span boundaries
  • Table6: Comparison of BiLSTM models without BIO decoding. We compare F1 and token-level accuracy (Token), averaged BIO violations per token (BIO), overall model entropy (All) model entropy at tokens involved in BIO violations (BIO). Increasing the depth of the model beyond 4 does not produce more structurally consistent output, emphasizing the need for constrained decoding
  • Table7: Comparison of models with different depths and decoding constraints (in addition to BIO) as well as two previous systems. We compare F1, unlabeled agreement with gold constituency (Syn%) and each type of SRL-constraint violations (Unique core roles, Continuation roles and Reference roles). Our best model produces a similar number of constraint violations to the gold annotation, explaining why deterministically enforcing these constraints is not helpful
  • Table8: F1 on CoNLL 2005, and the development set of CoNLL 2012, broken down by genres. Syntax-constrained decoding (+AutoSyn) shows bigger improvement on in-domain data (CoNLL 05 and CoNLL 2012 NW)
Download tables as Excel
Related work
  • Traditional approaches to semantic role labeling have used syntactic parsers to identify constituents and model long-range dependencies, and enforced Gold Choe Charniak

    Penalty C tures have achieved state-of-the-art results in English span-based SRL (Zhou and Xu, 2015), Chinese SRL (Wang et al, 2015), and dependencybased SRL (Marcheggiani et al, 2017) with little to no syntactic input. Our techniques push results to more than 3 F1 over the best syntax-based models. However, we also show that there is potential for syntax to further improve performance. CoNLL-05 CoNLL-2012 Dev.

    Dev. Test BC BN NW MZ PT TC WB global consistency using integer linear programming (Punyakanok et al, 2008) or dynamic programs (Tackstrom et al, 2015). More recently, neural methods have been employed on top of syntactic features (FitzGerald et al, 2015; Roth and Lapata, 2016). Our experiments show that offthe-shelf neural methods have a remarkable ability to learn long-range dependencies, syntactic constituency structure, and global constraints without coding task-specific mechanisms for doing so.
Funding
  • The research was supported in part by DARPA under the DEFT program (FA8750-13-2-0019), the ARO (W911NF-16-1-0121), the NSF (IIS1252835, IIS-1562364), gifts from Google and Tencent, and an Allen Distinguished Investigator Award
Reference
  • Claire Bonial, Olga Babko-Malaya, Jinho D Choi, Jena Hwang, and Martha Palmer. 2010. Propbank annotation guidelines. Center for Computational Language and Education Research Institute of Cognitive Science University of Colorado at Boulder.
    Google ScholarLocate open access versionFindings
  • Xavier Carreras and Lluıs Marquez. 2005. Introduction to the conll-2005 shared task: Semantic role labeling. In Proceedings of the Ninth Conference on Computational Natural Language Learning. Association for Computational Linguistics, pages 152–164.
    Google ScholarLocate open access versionFindings
  • Eugene Charniak. 2000. A maximum-entropy-inspired parser. In Proc. of the First North American chapter of the Association for Computational Linguistics conference (NAACL). Association for Computational Linguistics, pages 132–139.
    Google ScholarLocate open access versionFindings
  • Do Kook Choe and Eugene Charniak. 2016. Parsing as language modeling. In Proc. of the 2016 Conference of Empirical Methods in Natural Language Processing (EMNLP).
    Google ScholarLocate open access versionFindings
  • Ronan Collobert, Jason Weston, Leon Bottou, Michael Karlen, Koray Kavukcuoglu, and Pavel Kuksa. 2011. Natural language processing (almost) from scratch. Journal of Machine Learning Research 12:2493–2537.
    Google ScholarLocate open access versionFindings
  • Nicholas FitzGerald, Oscar Tackstrom, Kuzman Ganchev, and Dipanjan Das. 2015. Semantic role labeling with neural network factors. In Proc. of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP). pages 960–970.
    Google ScholarLocate open access versionFindings
  • Yarin Gal and Zoubin Ghahramani. 2016. A theoretically grounded application of dropout in recurrent neural networks. In Advances in Neural Information Processing Systems. pages 1019–1027.
    Google ScholarLocate open access versionFindings
  • James Henderson, Paola Merlo, Ivan Titov, and Gabriele Musillo. 2013. Multilingual joint parsing of syntactic and semantic dependencies with a latent variable model. Computational Linguistics 39(4):949–998.
    Google ScholarLocate open access versionFindings
  • Geoffrey E Hinton. 2002. Training products of experts by minimizing contrastive divergence. Neural computation 14(8):1771–1800.
    Google ScholarLocate open access versionFindings
  • Paul Kingsbury, Martha Palmer, and Mitch Marcus. 2002. Adding semantic annotation to the penn treebank. In Proceedings of the human language technology conference. pages 252–256.
    Google ScholarLocate open access versionFindings
  • Jonathan K. Kummerfeld, David Hall, James R. Curran, and Dan Klein. 2012. Parser showdown at the wall street corral: An empirical investigation of error types in parser output. In Proc. of the 2012 Conference on Empirical Methods in Natural Language Processing (EMNLP). pages 1048–1059.
    Google ScholarLocate open access versionFindings
  • Kenton Lee, Mike Lewis, and Luke Zettlemoyer. 2016. Global neural ccg parsing with optimality guarantees. In Proc. of the 2016 Conference on Empirical Methods in Natural Language Processing (EMNLP).
    Google ScholarLocate open access versionFindings
  • Mike Lewis and Mark Steedman. 2014. A* ccg parsing with a supertag-factored model. In Proc. of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). pages 990–1000.
    Google ScholarLocate open access versionFindings
  • Diego Marcheggiani, Anton Frolov, and Ivan Titov. 2017. A simple and accurate syntax-agnostic neural model for dependency-based semantic role labeling. arXiv preprint arXiv:1701.02593.
    Findings
  • Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. Glove: Global vectors for word representation. In Proc. of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP). pages 1532–1543.
    Google ScholarLocate open access versionFindings
  • Sameer Pradhan, Kadri Hacioglu, Wayne Ward, James H Martin, and Daniel Jurafsky. 2005. Semantic role chunking combining complementary syntactic views. In Proc. of the 2005 Conference on Computational Natural Language Learning (CoNLL). pages 217–220.
    Google ScholarLocate open access versionFindings
  • Sameer Pradhan, Alessandro Moschitti, Nianwen Xue, Hwee Tou Ng, Anders Bjorkelund, Olga Uryupina, Yuchen Zhang, and Zhi Zhong. 2013. Towards robust linguistic analysis using ontonotes. In Proc. of the 2013 Conference on Computational Natural Language Learning (CoNLL). pages 143–152.
    Google ScholarLocate open access versionFindings
  • Vasin Punyakanok, Peter Koomen, Dan Roth, and Wen-tau Yih. 2005. Generalized inference with multiple semantic role labeling systems. In Proc. of the 2005 Conference on Computational Natural Language Learning (CoNLL).
    Google ScholarLocate open access versionFindings
  • Vasin Punyakanok, Dan Roth, and Wen-tau Yih. 2008. The importance of syntactic parsing and inference in semantic role labeling. Computational Linguistics 34(2):257–287.
    Google ScholarLocate open access versionFindings
  • Michael Roth and Mirella Lapata. 2016. Neural semantic role labeling with dependency path embeddings. In Proc. of the Annual Meeting of the Association for Computational Linguistics (ACL).
    Google ScholarLocate open access versionFindings
  • Andrew M Saxe, James L McClelland, and Surya Ganguli. 2013. Exact solutions to the nonlinear dynamics of learning in deep linear neural networks. arXiv preprint arXiv:1312.6120.
    Findings
  • Rupesh K Srivastava, Klaus Greff, and Jurgen Schmidhuber. 2015. Training very deep networks. In Advances in neural information processing systems. pages 2377–2385.
    Google ScholarLocate open access versionFindings
  • Swabha Swayamdipta, Miguel Ballesteros, Chris Dyer, and Noah A Smith. 2016.
    Google ScholarFindings
  • Greedy, joint syntacticsemantic parsing with stack lstms. In Proc. of the 2016 Conference on Computational Natural Language Learning (CoNLL). page 187.
    Google ScholarLocate open access versionFindings
  • Oscar Tackstrom, Kuzman Ganchev, and Dipanjan Das. 2015. Efficient inference and structured learning for semantic role labeling. Transactions of the Association for Computational Linguistics 3:29–41.
    Google ScholarLocate open access versionFindings
  • Kristina Toutanova, Aria Haghighi, and Christopher D Manning. 2008. A global joint model for semantic role labeling. Computational Linguistics 34(2):161– 191.
    Google ScholarLocate open access versionFindings
  • Zhen Wang, Tingsong Jiang, Baobao Chang, and Zhifang Sui. 2015. Chinese semantic role labeling with bidirectional recurrent neural networks. In Proc. of the 2015 Conference on Empirical Methods in Natural Language Processing (EMNLP). pages 1626– 1631.
    Google ScholarLocate open access versionFindings
  • Ralph Weischedel, Martha Palmer, Mitchell Marcus, Eduard Hovy, Sameer Pradhan, Lance Ramshaw, Nianwen Xue, Ann Taylor, Jeff Kaufman, Michelle Franchini, et al. 2013. Ontonotes release 5.0 ldc2013t19. Linguistic Data Consortium, Philadelphia, PA.
    Google ScholarFindings
  • Matthew D Zeiler. 2012. Adadelta: an adaptive learning rate method. arXiv preprint arXiv:1212.5701.
    Findings
  • Yu Zhang, Guoguo Chen, Dong Yu, Kaisheng Yaco, Sanjeev Khudanpur, and James Glass. 2016. Highway long short-term memory rnns for distant speech recognition. In 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP). pages 5755–5759.
    Google ScholarLocate open access versionFindings
  • Jie Zhou and Wei Xu. 2015. End-to-end learning of semantic role labeling using recurrent neural networks. In Proc. of the Annual Meeting of the Association for Computational Linguistics (ACL).
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments