Towards Human-Like Machine Comprehension: Few-Shot Relational Learning in Visually-Rich Documents
arxiv(2024)
摘要
Key-value relations are prevalent in Visually-Rich Documents (VRDs), often
depicted in distinct spatial regions accompanied by specific color and font
styles. These non-textual cues serve as important indicators that greatly
enhance human comprehension and acquisition of such relation triplets. However,
current document AI approaches often fail to consider this valuable prior
information related to visual and spatial features, resulting in suboptimal
performance, particularly when dealing with limited examples. To address this
limitation, our research focuses on few-shot relational learning, specifically
targeting the extraction of key-value relation triplets in VRDs. Given the
absence of a suitable dataset for this task, we introduce two new few-shot
benchmarks built upon existing supervised benchmark datasets. Furthermore, we
propose a variational approach that incorporates relational 2D-spatial priors
and prototypical rectification techniques. This approach aims to generate
relation representations that are more aware of the spatial context and unseen
relation in a manner similar to human perception. Experimental results
demonstrate the effectiveness of our proposed method by showcasing its ability
to outperform existing methods. This study also opens up new possibilities for
practical applications.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要