Short Text Clustering Enhanced by Semantic Matching Model

international conference on information systems(2019)

引用 1|浏览16
暂无评分
摘要
With the popularity of social networks, short text clustering has become a more and more important task that is widely used. Short text clustering is a challenging problem because social network short texts are characterized by irregular words, a lot of noise, and sparse features. We propose a Short Text Clustering enhanced by Semantic Matching Model (abbr. to STCSMM). The STCSMM method applies the knowledge of the tagged text similarity task dataset to the short text clustering through the semantic matching model, thereby improving the effect of short text clustering. First, we train a semantic matching network on the data set of the text similarity task, where the network contains the feature extraction layer and the vector distance calculation layer. Then, we use the learned feature extraction layer to extract short text feature and use the vector distance calculation layer replaces the commonly used distance metrics in the traditional K-means algorithm, such as cosine distance, Euclidean distance and so on. Finally, the text features obtained by feature extraction layer are applied to K-means based on vector distance calculation layer. This improved K-means clustering (STCSMM) has better performance on the microblog text clustering dataset than some existing methods such as K-means clustering with LDA, LSI or average word embedding feature vectors.
更多
查看译文
关键词
short text clustering,semantic matching model,K- means,STCSMM
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要