Conversational Graph Grounded Policy Learning for Open-Domain Conversation Generation

ACL, pp. 1835-1845, 2020.

Cited by: 0|Bibtex|Views71|Links
EI
Keywords:
message encodermulti turnknowledge aware conversation generationdialog transitionresponse generationMore(9+)
Weibo:
In this paper we present a novel graph grounded policy learning framework for open-domain multiturn conversation, which can effectively leverage prior information about dialog transitions to foster a more coherent and controllable dialog

Abstract:

To address the challenge of policy learning in open-domain multi-turn conversation, we propose to represent prior information about dialog transitions as a graph and learn a graph grounded dialog policy, aimed at fostering a more coherent and controllable dialog. To this end, we first construct a conversational graph (CG) from dialog corp...More

Code:

Data:

0
Introduction
  • Response Message

    今天晚上要 通宵 加班 the author has to work overnight tonight.

    辛苦了,好辛苦,注意身体 Take care of yourself when doing a very hard work.

    还不能 打盹,领导也在 the author can’t take a nap yet, as the leaders are here.
  • 犯困/sleepy + 我以为你会 犯困 的, 这么晚了 the author thought you’d be sleepy, as it's late.
  • To address the “one-to-many” semantic mapping problem for conversation generation, Chen et al(2019) proposed an end-to-end multi-mapping model in which each responding mechanism models how to express response content.
  • To ensure that the given keyword will appear in generated responses, the authors introduce another Seq2BF based decoder (Mou et al, 2016) to replace the original RNN decoder.
  • This generator is trained on a dataset with pairs of [the message, a keyword extracted from a response]-the response.3
Highlights
  • Response Message

    今天晚上要 通宵 加班 I have to work overnight tonight.

    辛苦了,好辛苦,注意身体 Take care of yourself when doing a very hard work.

    还不能 打盹,领导也在 I can’t take a nap yet, as the leaders are here
  • We propose to represent prior information about dialog transition as a graph, and optimize dialog policy based on the graph, to foster a more coherent dialog
  • We propose a novel conversational graph (CG) grounded policy learning frame
  • Our study shows that: (1) one-hop whatvertex neighbors of hit what-vertices provide locally-appropriate and diverse response content; (2) the conversational graph based rewards can supervise the policy model to promote a globallycoherent dialog; (3) the use of how-vertices in conversational graph can improve response diversity; (4) the conversational graph can help our system succeed in the task of target-guided conversation, indicating that it gives us more control over the dialog policy
  • In this paper we present a novel graph grounded policy learning framework for open-domain multiturn conversation, which can effectively leverage prior information about dialog transitions to foster a more coherent and controllable dialog
Methods
  • The authors carefully select three SOTA methods that focus on dialog policy learning as baselines.

    LaRL It is a latent variable driven dialog policy model (Zhao et al, 2019).
  • To evaluate the contribution of CG, the authors remove the CG from CG-Policy, denoted as CG-Policy-noCG, where the authors do not use graph structure information for action space pruning and reward design.
  • As shown in Table 4, the performance of CG-Policy-noCGact drops significantly in terms of Dist-2 as it tends to select high-frequency keywords like ChatMore, indicating the importance of graph paths to provide both locally-appropriate and diverse response keywords.
Results
  • 4.6.1 Setting

    The authors ask three annotators to judge the quality of each dialog or utterance pair for each model.
  • TGRM performs the best in terms of Dist-2 and informativeness, indicating that retrieval-based models can produce more diverse responses than generation based models.
  • It is consistent with the conclusions in previous work (Chen et al, 2017; Zhang et al, 2018a).
  • TGRM performs the worst in terms of coherence, since TGRM does not use RL framework
  • It indicates the importance of RL framework for multi-turn dialog modeling.
Conclusion
  • In this paper the authors present a novel graph grounded policy learning framework for open-domain multiturn conversation, which can effectively leverage prior information about dialog transitions to foster a more coherent and controllable dialog.
  • Experimental results demonstrate the effectiveness of this framework in terms of local appropriateness, global coherence and dialog-target success rate.
  • The authors will investigate how to extend the CG to support hierarchical topic management in conversational systems
Summary
  • Introduction:

    Response Message

    今天晚上要 通宵 加班 the author has to work overnight tonight.

    辛苦了,好辛苦,注意身体 Take care of yourself when doing a very hard work.

    还不能 打盹,领导也在 the author can’t take a nap yet, as the leaders are here.
  • 犯困/sleepy + 我以为你会 犯困 的, 这么晚了 the author thought you’d be sleepy, as it's late.
  • To address the “one-to-many” semantic mapping problem for conversation generation, Chen et al(2019) proposed an end-to-end multi-mapping model in which each responding mechanism models how to express response content.
  • To ensure that the given keyword will appear in generated responses, the authors introduce another Seq2BF based decoder (Mou et al, 2016) to replace the original RNN decoder.
  • This generator is trained on a dataset with pairs of [the message, a keyword extracted from a response]-the response.3
  • Methods:

    The authors carefully select three SOTA methods that focus on dialog policy learning as baselines.

    LaRL It is a latent variable driven dialog policy model (Zhao et al, 2019).
  • To evaluate the contribution of CG, the authors remove the CG from CG-Policy, denoted as CG-Policy-noCG, where the authors do not use graph structure information for action space pruning and reward design.
  • As shown in Table 4, the performance of CG-Policy-noCGact drops significantly in terms of Dist-2 as it tends to select high-frequency keywords like ChatMore, indicating the importance of graph paths to provide both locally-appropriate and diverse response keywords.
  • Results:

    4.6.1 Setting

    The authors ask three annotators to judge the quality of each dialog or utterance pair for each model.
  • TGRM performs the best in terms of Dist-2 and informativeness, indicating that retrieval-based models can produce more diverse responses than generation based models.
  • It is consistent with the conclusions in previous work (Chen et al, 2017; Zhang et al, 2018a).
  • TGRM performs the worst in terms of coherence, since TGRM does not use RL framework
  • It indicates the importance of RL framework for multi-turn dialog modeling.
  • Conclusion:

    In this paper the authors present a novel graph grounded policy learning framework for open-domain multiturn conversation, which can effectively leverage prior information about dialog transitions to foster a more coherent and controllable dialog.
  • Experimental results demonstrate the effectiveness of this framework in terms of local appropriateness, global coherence and dialog-target success rate.
  • The authors will investigate how to extend the CG to support hierarchical topic management in conversational systems
Tables
  • Table1: The training procedure of CG-Policy
  • Table2: Results for dialogs with simulator on Weibo
  • Table3: Results for dialogs with human on Weibo
  • Table4: Ablation study for CG-Policy on Weibo
  • Table5: Results for target-guided dialogs on Persona
  • Table6: Training details for models
Download tables as Excel
Related work
  • Policy learning for chitchat generation To address the degeneration issue of word-level policy models (Li et al, 2016b; Zhang et al, 2018b), previous works decouple policy learning from response generation, and then use utterance-level latent variables (Zhao et al, 2019) or keywords (Yao et al, 2018) as RL actions to guide response generation. In this work, we investigate how to use prior dialog-transition information to facilitate dialog policy learning.

    Knowledge aware conversation generation There are growing interests in leveraging knowledge bases for generation of more informative responses (Dinan et al, 2019; Ghazvininejad et al, 2018; Moghe et al, 2018; Zhou et al, 2018; Liu et al, 2019; Bao et al, 2019; Xu et al, 2020). In this work, we employ a dialog-modeling oriented graph built from dialog corpora, instead of a external knowledge base, in order to facilitate multi-turn policy learning, instead of dialog informativeness improvement.

    Specifically, we are motivated by (Xu et al, 2020). The method in (Xu et al, 2020) has the issue of cross-domain transfer since it relies on labor-intensive knowledge graph grounded multiturn dialog datasets for model training. Compared with them, our conversational graph is automatically built from dialog datasets, which introduces very low cost for training data construction. Furthermore, we decouple conversation modeling into two parts: “what to say” modeling and “how to
Funding
  • This work is supported by the National Key Research and Development Project of China (No.2018AAA0101900) and the National Natural Science Foundation of China (NSFC) via grant 61976072
Reference
  • Siqi Bao, Huang He, Fan Wang, Rongzhong Lian, and Hua Wu. 2019. Know more about each other: Evolving dialogue strategy via compound assessment. In Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, pages 5382– 5391.
    Google ScholarLocate open access versionFindings
  • Antoine Bordes, Nicolas Usunier, Alberto GarciaDuran, Jason Weston, and Oksana Yakhnenko. 2013. Translating embeddings for modeling multirelational data. In Advances in neural information processing systems, pages 2787–2795.
    Google ScholarLocate open access versionFindings
  • Chaotao Chen, Jinhua Peng, Fan Wang, Jun Xu, and Hua Wu. 2019. Generating multiple diverse re-
    Google ScholarFindings
  • Marjan Ghazvininejad, Chris Brockett, Ming-Wei Chang, Bill Dolan, Jianfeng Gao, Wen tau Yih, and Michel Galley. 2018. A knowledge-grounded neural conversation model. In Proceedings of AAAI 2018, pages 5110–5117. Association for the Advancement of Artificial Intelligence.
    Google ScholarLocate open access versionFindings
  • He He, Derek Chen, Anusha Balakrishnan, and Percy Liang. 2018. Decoupling strategy and generation in negotiation dialogues. In Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, pages 2333–2343.
    Google ScholarLocate open access versionFindings
  • Mike Lewis, Denis Yarats, Yann Dauphin, Devi Parikh, and Dhruv Batra. 2017. Deal or no deal? end-to-end learning of negotiation dialogues. In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing, pages 2443–2453.
    Google ScholarLocate open access versionFindings
  • Jiwei Li, Michel Galley, Chris Brockett, Jianfeng Gao, and Bill Dolan. 2016a. A diversity-promoting objective function for neural conversation models. In Proceedings of the 2016 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pages 110–119.
    Google ScholarLocate open access versionFindings
  • Jiwei Li, Will Monroe, Alan Ritter, Michel Galley, Jianfeng Gao, and Dan Jurafsky. 2016b. Deep reinforcement learning for dialogue generation. In Proceedings of EMNLP, pages 1192—-1202.
    Google ScholarLocate open access versionFindings
  • Chia-Wei Liu, Ryan Lowe, Iulian Serban, Mike Noseworthy, Laurent Charlin, and Joelle Pineau. 2016. How not to evaluate your dialogue system: An empirical study of unsupervised evaluation metrics for dialogue response generation. In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, pages 2122–2132.
    Google ScholarLocate open access versionFindings
  • Zhibin Liu, Zheng-Yu Niu, Hua Wu, and Haifeng Wang. 2019. Knowledge aware conversation generation with explainable reasoning over augmented graphs. In EMNLP-IJCNLP.
    Google ScholarFindings
  • Ryan Lowe, Nissan Pow, Iulian Serban, and Joelle Pineau. 2015. The ubuntu dialogue corpus: A large dataset for research in unstructured multi-turn dialogue systems. In Proceedings of the 16th Annual Meeting of the Special Interest Group on Discourse and Dialogue, pages 285–294.
    Google ScholarLocate open access versionFindings
  • Nikita Moghe, Siddhartha Arora, Suman Banerjee, and Mitesh M. Khapra. 2018. Towards exploiting background knowledge for building conversation systems. In Proceedings of EMNLP, pages 2322—2332. Association for Computational Linguistics.
    Google ScholarLocate open access versionFindings
  • Lili Mou, Yiping Song, Rui Yan, Ge Li, Lu Zhang, and Zhi Jin. 2016. Sequence to backward and forward sequences: A content-introducing approach to generative short-text conversation. In Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: Technical Papers, pages 3349–3358.
    Google ScholarLocate open access versionFindings
  • Jeffrey Pennington, Richard Socher, and Christopher Manning. 20Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), pages 1532–1543.
    Google ScholarLocate open access versionFindings
  • Iulian Vlad Serban, Alessandro Sordoni, Yoshua Bengio, Aaron C. Courville, and Joelle Pineau. 2016. Building end-to-end dialogue systems using generative hierarchical neural network models. In Proceedings of AAAI, pages 3776—-3784.
    Google ScholarLocate open access versionFindings
  • Lifeng Shang, Zhengdong Lu, and Hang Li. 2015. Neural responding machine for short-text conversation. In Proceedings of ACL-IJCNLP, volume 1, pages 1577–1586.
    Google ScholarLocate open access versionFindings
  • learning in end-to-end dialog agents with latent variable models. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), pages 1208–1218.
    Google ScholarLocate open access versionFindings
  • Hao Zhou, Tom Young, Minlie Huang, Haizhou Zhao, Jingfang Xu, and Xiaoyan Zhu. 20Commonsense knowledge aware conversation generation with graph attention. In Proceedings of IJCAIECAI.
    Google ScholarLocate open access versionFindings
  • Richard S Sutton and Andrew G Barto. 2018. Reinforcement learning: An introduction. MIT Press.
    Google ScholarFindings
  • Jianheng Tang, Tiancheng Zhao, Chenyan Xiong, Xiaodan Liang, Eric P. Xing, and Zhiting Hu. 2019. Target-guided open-domain conversation. In Proceedings of ACL.
    Google ScholarLocate open access versionFindings
  • Jun Xu, Haifeng Wang, Zhengyu Niu, Hua Wu, and Wanxiang Che. 2020. Knowledge graph grounded goal planning for open-domain conversation generation. In Thirty-Fourth AAAI Conference on Artificial Intelligence.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments