Retrieval Fine-Tuning for In-Context Tabular Models
CoRR(2024)
摘要
Tabular data is a pervasive modality spanning a wide range of domains, and
the inherent diversity poses a considerable challenge for deep learning. Recent
advancements using transformer-based in-context learning have shown promise on
smaller and less complex datasets, but have struggled to scale to larger and
more complex ones. To address this limitation, we propose a combination of
retrieval and fine-tuning: we can adapt the transformer to a local subset of
the data by collecting nearest neighbours, and then perform task-specific
fine-tuning with this retrieved set of neighbours in context. Using TabPFN as
the base model – currently the best tabular in-context learner – and applying
our retrieval and fine-tuning scheme on top results in what we call a
locally-calibrated PFN, or LoCalPFN. We conduct extensive evaluation on 95
datasets curated by TabZilla from OpenML, upon which we establish a new
state-of-the-art with LoCalPFN – even with respect to tuned tree-based models.
Notably, we show a significant boost in performance compared to the base
in-context model, demonstrating the efficacy of our approach and advancing the
frontier of deep learning in tabular data.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要