STEP: Generating Semantic Text Embeddings with Prompt.

Wenqiang Cao, Qing Li, Siying Zhang, Rixin Xu,Youqi Li

International Conference on Advanced Cloud and Big Data(2023)

引用 0|浏览0
暂无评分
摘要
In recent years, semantic embeddings for text has played a bigger role in the field of natural language processing (NLP), additionally, it has shown great potential in real-life applications like search and recommendation systems. Therefore, models for generating semantic text embeddings have received extensive study. State-of-the-art solutions for text embeddings have evolved from traditional methods (like Word2Vec, Glove, etc.) to deep neural network based solutions (such as LSTM, Transformer, and pre-trained models like BERT and RoBERTa, etc), besides, frameworks like Sentence Transformer have already lowered the bar of training models for semantic text representation using customized models and datasets. In this paper, we investigated several well trained models according to Massive Text Embedding Benchmark (MTEB) in Huggingface website. Enlighted by the extensive use of prompt engineering in large language models like Llama or GPT3, we proposed STEP: a novel method using prompt to improve performance of text embeddings on downstream tasks, making it applicable to almost any pre-trained language models for text embeddings. Besides, STEP does not need to modify base model structure. In the experiment, we applied STEP to five pre-trained models chosen from MTEB, trained and evaluated our approach on two separated datasets, final results indicated that our approach could improve performance of tasks related to semantic text similarity.
更多
查看译文
关键词
embedding,prompt,semantic,NLP
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要