How Well Do Text Embedding Models Understand Syntax?
CoRR(2023)
摘要
Text embedding models have significantly contributed to advancements in
natural language processing by adeptly capturing semantic properties of textual
data. However, the ability of these models to generalize across a wide range of
syntactic contexts remains under-explored. In this paper, we first develop an
evaluation set, named \textbf{SR}, to scrutinize the capability for syntax
understanding of text embedding models from two crucial syntactic aspects:
Structural heuristics, and Relational understanding among concepts, as revealed
by the performance gaps in previous studies. Our findings reveal that existing
text embedding models have not sufficiently addressed these syntactic
understanding challenges, and such ineffectiveness becomes even more apparent
when evaluated against existing benchmark datasets. Furthermore, we conduct
rigorous analysis to unearth factors that lead to such limitations and examine
why previous evaluations fail to detect such ineffectiveness. Lastly, we
propose strategies to augment the generalization ability of text embedding
models in diverse syntactic scenarios. This study serves to highlight the
hurdles associated with syntactic generalization and provides pragmatic
guidance for boosting model performance across varied syntactic contexts.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要