A Declarative System for Optimizing AI Workloads
CoRR(2024)
摘要
Modern AI models provide the key to a long-standing dream: processing
analytical queries about almost any kind of data. Until recently, it was
difficult and expensive to extract facts from company documents, data from
scientific papers, or insights from image and video corpora. Today's models can
accomplish these tasks with high accuracy. However, a programmer who wants to
answer a substantive AI-powered query must orchestrate large numbers of models,
prompts, and data operations. For even a single query, the programmer has to
make a vast number of decisions such as the choice of model, the right
inference method, the most cost-effective inference hardware, the ideal prompt
design, and so on. The optimal set of decisions can change as the query changes
and as the rapidly-evolving technical landscape shifts. In this paper we
present Palimpzest, a system that enables anyone to process AI-powered
analytical queries simply by defining them in a declarative language. The
system uses its cost optimization framework – which explores the search space
of AI models, prompting techniques, and related foundation model optimizations
– to implement the query with the best trade-offs between runtime, financial
cost, and output data quality. We describe the workload of AI-powered analytics
tasks, the optimization methods that Palimpzest uses, and the prototype system
itself. We evaluate Palimpzest on tasks in Legal Discovery, Real Estate Search,
and Medical Schema Matching. We show that even our simple prototype offers a
range of appealing plans, including one that is 3.3x faster, 2.9x cheaper, and
offers better data quality than the baseline method. With parallelism enabled,
Palimpzest can produce plans with up to a 90.3x speedup at 9.1x lower cost
relative to a single-threaded GPT-4 baseline, while obtaining an F1-score
within 83.5
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要