InstructEval: Systematic Evaluation of Instruction Selection Methods

Anirudh Ajith, Chris Pan,Mengzhou Xia,Ameet Deshpande,Karthik Narasimhan

CoRR（2023）

引用 2|浏览41

暂无评分

摘要

In-context learning (ICL) performs tasks by prompting a large language model (LLM) using an instruction and a small set of annotated examples called demonstrations. Recent work has shown that the precise details of the inputs used in the prompt significantly impacts ICL, which has incentivized instruction selection algorithms. The effect of instruction-choice however is severely underexplored, with existing analyses being restricted to shallow subsets of models and tasks, which limits the generalizability of their insights. We develop an ICL evaluation suite to conduct a thorough assessment of these techniques. The suite includes 13 open-sourced LLMs of varying scales from 4 distinct model families and covers 9 different tasks, representing a range of task types across 3 categories. In this work, we evaluate the relative performance of 7 popular instruction selection methods using our benchmark over five desiderata relevant to ICL. We discover that using curated manually-written instructions and simple instructions without any task-specific descriptions often elicits superior ICL performance than that of automatic instruction-induction methods, pointing to a lack of generalizability among the latter. We release our evaluation suite for benchmarking instruction selection approaches, and call for more rigorous and generalizable methods in this space.

查看译文

关键词

instruction,systematic evaluation,methods,selection

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要