A Study of Methods for the Generation of Domain-Aware Word Embeddings

Dominic Seyler,Chengxiang Zhai

SIGIR '20: The 43rd International ACM SIGIR conference on research and development in Information Retrieval Virtual Event China July, 2020（2020）

引用 4|浏览141

暂无评分

摘要

Word embeddings are essential components for many text data applications. In most work, "out-of-the-box" embeddings trained on general text corpora are used, but they can be less effective when applied to domain-specific settings. Thus, how to create "domain-aware" word embeddings is an interesting open research question. In this paper, we study three methods for creating domain-aware word embeddings based on both general and domain-specific text corpora, including concatenation of embedding vectors, weighted fusion of text data, and interpolation of aligned embedding vectors. Even though the investigated strategies are tailored for domain-specific tasks, they are general enough to be applied to any domain and are not specific to a single task. Experimental results show that all three methods can work well, however, the interpolation method consistently works best.

查看译文

关键词

domain adaptation, text representation, empirical study

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要