A Convolutional Neural Network for Modelling Sentences

ACL, pp. 655-665, 2014.

Cited by: 2168|Bibtex|Views145|Links
EI
Keywords:
word meaningTimeDelay Neural Networkcompositional distributional modelRecursive Neural Networknatural languageMore(7+)
Wei bo:
We have described a dynamic convolutional neural network that uses the dynamic k-max pooling operator as a non-linear subsampling function

Abstract:

The ability to accurately represent sentences is central to language understanding. We describe a convolutional architecture dubbed the Dynamic Convolutional Neural Network (DCNN) that we adopt for the semantic modelling of sentences. The network uses Dynamic k-Max Pooling, a global pooling operation over linear sequences. The network h...More

Code:

Data:

Introduction
Highlights
  • The aim of a sentence model is to analyse and represent the semantic content of a sentence for purposes of classification or generation
  • The sentence modelling problem is at the core of many tasks involving a degree of natural language comprehension
  • Certain concepts used in these models are central to the Dynamic Convolutional Neural Network and we describe them
  • We describe a pooling operation that is a generalisation of the max pooling over the time dimension used in the Max-TimeDelay Neural Network sentence model and different from the local max pooling operations applied in a convolutional network for object recognition (LeCun et al, 1998)
  • We describe some of the properties of the sentence model based on the Dynamic Convolutional Neural Network
  • On the hand-labelled test set, the network achieves a greater than 25% reduction in the prediction error with respect to the strongest unigram and bigram baseline reported in Go et al (2009)
  • We have described a dynamic convolutional neural network that uses the dynamic k-max pooling operator as a non-linear subsampling function
Methods
  • The authors test the network on four different experiments.
  • The authors begin by specifying aspects of the implementation and the training of the network.
  • In each of the experiments, the top layer of the network has a fully connected layer followed by a softmax non-linearity that predicts the probability distribution over classes given the input sentence.
  • A Matlab implementation processes multiple millions of input sentences per hour on one GPU, depending primarily on the number of layers used in the network
Results
  • The network achieves excellent performance in the first three tasks and a greater than 25% error reduction in the last task with respect to the strongest baseline.
  • On the hand-labelled test set, the network achieves a greater than 25% reduction in the prediction error with respect to the strongest unigram and bigram baseline reported in Go et al (2009)
Conclusion
  • The authors have described a dynamic convolutional neural network that uses the dynamic k-max pooling operator as a non-linear subsampling function.
  • The feature graph induced by the network is able to capture word relations of varying size.
  • The network achieves high performance on question and sentiment classification without requiring external features as provided by parsers or other resources
Summary
  • Introduction:

    The aim of a sentence model is to analyse and represent the semantic content of a sentence for purposes of classification or generation.
  • A composition function is learned and either tied to particular syntactic relations (Guevara, 2010; Zanzotto et al, 2010) or to particular word types (Baroni and Zamparelli, 2010; Coecke et al, 2010; Grefenstette and Sadrzadeh, 2011; Kartsaklis and Sadrzadeh, 2013; Grefenstette, 2013)
  • Another approach represents the meaning of sentences by way of automatically extracted logical forms (Zettlemoyer and Collins, 2005).
  • The resulting combined vector is classified through one or more fully connected layers
  • Methods:

    The authors test the network on four different experiments.
  • The authors begin by specifying aspects of the implementation and the training of the network.
  • In each of the experiments, the top layer of the network has a fully connected layer followed by a softmax non-linearity that predicts the probability distribution over classes given the input sentence.
  • A Matlab implementation processes multiple millions of input sentences per hour on one GPU, depending primarily on the number of layers used in the network
  • Results:

    The network achieves excellent performance in the first three tasks and a greater than 25% error reduction in the last task with respect to the strongest baseline.
  • On the hand-labelled test set, the network achieves a greater than 25% reduction in the prediction error with respect to the strongest unigram and bigram baseline reported in Go et al (2009)
  • Conclusion:

    The authors have described a dynamic convolutional neural network that uses the dynamic k-max pooling operator as a non-linear subsampling function.
  • The feature graph induced by the network is able to capture word relations of varying size.
  • The network achieves high performance on question and sentiment classification without requiring external features as provided by parsers or other resources
Tables
  • Table1: Accuracy of sentiment prediction in the movie reviews dataset. The first four results are reported from <a class="ref-link" id="cSocher_et+al_2013_b" href="#rSocher_et+al_2013_b">Socher et al (2013b</a>). The baselines NB and BINB are Naive Bayes classifiers with, respectively, unigram features and unigram and bigram features. SVM is a support vector machine with unigram and bigram features. RECNTN is a recursive neural network with a tensor-based feature function, which relies on external structural features given by a parse tree and performs best among the RecNNs
  • Table2: Accuracy of six-way question classification on the TREC questions dataset. The second column details the external features used in the various approaches. The first four results are respectively from <a class="ref-link" id="cLi_2002_a" href="#rLi_2002_a">Li and Roth (2002</a>), <a class="ref-link" id="cBlunsom_et+al_2006_a" href="#rBlunsom_et+al_2006_a">Blunsom et al (2006</a>), <a class="ref-link" id="cHuang_et+al_2008_a" href="#rHuang_et+al_2008_a">Huang et al (2008</a>) and <a class="ref-link" id="cSilva_et+al_2011_a" href="#rSilva_et+al_2011_a">Silva et al (2011</a>)
  • Table3: Accuracy on the Twitter sentiment dataset. The three non-neural classifiers are based on unigram and bigram features; the results are reported from (<a class="ref-link" id="cGo_et+al_2009_a" href="#rGo_et+al_2009_a">Go et al, 2009</a>)
Download tables as Excel
Funding
  • This work was supported by a Xerox Foundation Award, EPSRC grant number EP/F042728/1, and EPSRC grant number EP/K036580/1
Reference
  • Marco Baroni and Roberto Zamparelli. 2010. Nouns are vectors, adjectives are matrices: Representing adjective-noun constructions in semantic space. In EMNLP, pages 1183–1193. ACL.
    Google ScholarLocate open access versionFindings
  • Phil Blunsom, Krystle Kocik, and James R. Curran. 2006. Question classification with log-linear models. In SIGIR ’06: Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval, pages 615–616, New York, NY, USA. ACM.
    Google ScholarLocate open access versionFindings
  • Daoud Clarke. 2012. A context-theoretic framework for compositionality in distributional semantics. Computational Linguistics, 38(1):41–71.
    Google ScholarLocate open access versionFindings
  • Bob Coecke, Mehrnoosh Sadrzadeh, and Stephen Clark. 2010. Mathematical Foundations for a Compositional Distributional Model of Meaning. March.
    Google ScholarFindings
  • Ronan Collobert and Jason Weston. 2008. A unified architecture for natural language processing: Deep neural networks with multitask learning. In International Conference on Machine Learning, ICML.
    Google ScholarLocate open access versionFindings
  • John Duchi, Elad Hazan, and Yoram Singer. 2011. Adaptive subgradient methods for online learning and stochastic optimization. J. Mach. Learn. Res., 12:2121–2159, July.
    Google ScholarLocate open access versionFindings
  • Katrin Erk and Sebastian Pado. 2008. A structured vector space model for word meaning in context. Proceedings of the Conference on Empirical Methods in Natural Language Processing - EMNLP ’08, (October):897.
    Google ScholarLocate open access versionFindings
  • Katrin Erk. 2012. Vector space models of word meaning and phrase meaning: A survey. Language and Linguistics Compass, 6(10):635–653.
    Google ScholarLocate open access versionFindings
  • Felix A. Gers and Jrgen Schmidhuber. 2001. Lstm recurrent networks learn simple context-free and context-sensitive languages. IEEE Transactions on Neural Networks, 12(6):1333–1340.
    Google ScholarLocate open access versionFindings
  • Alec Go, Richa Bhayani, and Lei Huang. 2009. Twitter sentiment classification using distant supervision. Processing, pages 1–6.
    Google ScholarLocate open access versionFindings
  • Edward Grefenstette and Mehrnoosh Sadrzadeh. 20Experimental support for a categorical compositional distributional model of meaning. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, pages 1394–1404. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Edward Grefenstette. 2013. Category-theoretic quantitative compositional distributional models of natural language semantics. arXiv preprint arXiv:1311.1539.
    Findings
  • Emiliano Guevara. 2010. Modelling Adjective-Noun Compositionality by Regression. ESSLLI’10 Workshop on Compositionality and Distributional Semantic Models.
    Google ScholarFindings
  • Karl Moritz Hermann and Phil Blunsom. 2013. The Role of Syntax in Vector Space Models of Compositional Semantics. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Sofia, Bulgaria, August. Association for Computational Linguistics. Forthcoming.
    Google ScholarLocate open access versionFindings
  • Geoffrey E. Hinton, Nitish Srivastava, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2012. Improving neural networks by preventing co-adaptation of feature detectors. CoRR, abs/1207.0580.
    Findings
  • Geoffrey E. Hinton. 1989. Connectionist learning procedures. Artif. Intell., 40(1-3):185–234.
    Google ScholarLocate open access versionFindings
  • Zhiheng Huang, Marcus Thint, and Zengchang Qin. 2008. Question classification using head words and their hypernyms. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP ’08, pages 927–936, Stroudsburg, PA, USA. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Nal Kalchbrenner and Phil Blunsom. 2013a. Recurrent continuous translation models. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, October. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Nal Kalchbrenner and Phil Blunsom. 2013b. Recurrent Convolutional Neural Networks for Discourse Compositionality. In Proceedings of the Workshop on Continuous Vector Space Models and their Compositionality, Sofia, Bulgaria, August. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Dimitri Kartsaklis and Mehrnoosh Sadrzadeh. 2013. Prior disambiguation of word tensors for constructing sentence vectors. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing (EMNLP), Seattle, USA, October.
    Google ScholarLocate open access versionFindings
  • Andreas Kuchler and Christoph Goller. 1996. Inductive learning in symbolic domains using structuredriven recurrent neural networks. In Gunther Gorz and Steffen Holldobler, editors, KI, volume 1137 of Lecture Notes in Computer Science, pages 183–197. Springer.
    Google ScholarLocate open access versionFindings
  • Yann LeCun, Leon Bottou, Yoshua Bengio, and Patrick Haffner. 1998. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 86(11):2278–2324, November.
    Google ScholarLocate open access versionFindings
  • Xin Li and Dan Roth. 2002. Learning question classifiers. In Proceedings of the 19th international conference on Computational linguistics-Volume 1, pages 1–7. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Tomas Mikolov and Geoffrey Zweig. 2012. Context dependent recurrent neural network language model. In SLT, pages 234–239.
    Google ScholarLocate open access versionFindings
  • Tomas Mikolov, Stefan Kombrink, Lukas Burget, Jan Cernocky, and Sanjeev Khudanpur. 2011. Extensions of recurrent neural network language model. In ICASSP, pages 5528–5531. IEEE.
    Google ScholarLocate open access versionFindings
  • Jeff Mitchell and Mirella Lapata. 2008. Vector-based models of semantic composition. In Proceedings of ACL, volume 8.
    Google ScholarLocate open access versionFindings
  • Jeff Mitchell and Mirella Lapata. 2010. Composition in distributional models of semantics. Cognitive Science, 34(8):1388–1429.
    Google ScholarLocate open access versionFindings
  • Jordan B. Pollack. 1990. Recursive distributed representations. Artificial Intelligence, 46:77–105.
    Google ScholarLocate open access versionFindings
  • Holger Schwenk. 2012. Continuous space translation models for phrase-based statistical machine translation. In COLING (Posters), pages 1071–1080.
    Google ScholarLocate open access versionFindings
  • Joo Silva, Lusa Coheur, AnaCristina Mendes, and Andreas Wichert. 2011. From symbolic to subsymbolic information in question classification. Artificial Intelligence Review, 35(2):137–154.
    Google ScholarLocate open access versionFindings
  • Richard Socher, Jeffrey Pennington, Eric H. Huang, Andrew Y. Ng, and Christopher D. Manning. 2011. Semi-Supervised Recursive Autoencoders for Predicting Sentiment Distributions. In Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP).
    Google ScholarLocate open access versionFindings
  • Richard Socher, Quoc V. Le, Christopher D. Manning, and Andrew Y. Ng. 2013a. Grounded Compositional Semantics for Finding and Describing Images with Sentences. In Transactions of the Association for Computational Linguistics (TACL).
    Google ScholarLocate open access versionFindings
  • Richard Socher, Alex Perelygin, Jean Wu, Jason Chuang, Christopher D. Manning, Andrew Y. Ng, and Christopher Potts. 2013b. Recursive deep models for semantic compositionality over a sentiment treebank. In Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, pages 1631–1642, Stroudsburg, PA, October. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Joseph Turian, Lev Ratinov, and Yoshua Bengio. 2010. Word representations: a simple and general method for semi-supervised learning. In Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, pages 384–394. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Peter Turney. 2012. Domain and function: A dualspace model of semantic relations and compositions. J. Artif. Intell. Res.(JAIR), 44:533–585.
    Google ScholarLocate open access versionFindings
  • Alexander Waibel, Toshiyuki Hanazawa, Geofrey Hinton, Kiyohiro Shikano, and Kevin J. Lang. 1990. Readings in speech recognition. chapter Phoneme Recognition Using Time-delay Neural Networks, pages 393–404. Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.
    Google ScholarFindings
  • Fabio Massimo Zanzotto, Ioannis Korkontzelos, Francesca Fallucchi, and Suresh Manandhar. 2010. Estimating linear models for compositional distributional semantics. In Proceedings of the 23rd International Conference on Computational Linguistics, pages 1263–1271. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Luke S. Zettlemoyer and Michael Collins. 2005. Learning to map sentences to logical form: Structured classification with probabilistic categorial grammars. In UAI, pages 658–666. AUAI Press.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments