Approximate Caching for Efficiently Serving Diffusion Models
CoRR(2023)
摘要
Text-to-image generation using diffusion models has seen explosive popularity
owing to their ability in producing high quality images adhering to text
prompts. However, production-grade diffusion model serving is a resource
intensive task that not only require high-end GPUs which are expensive but also
incurs considerable latency. In this paper, we introduce a technique called
approximate-caching that can reduce such iterative denoising steps for an image
generation based on a prompt by reusing intermediate noise states created
during a prior image generation for similar prompts. Based on this idea, we
present an end to end text-to-image system, Nirvana, that uses the
approximate-caching with a novel cache management-policy Least Computationally
Beneficial and Frequently Used (LCBFU) to provide % GPU compute savings, 19.8%
end-to-end latency reduction and 19% dollar savings, on average, on two real
production workloads. We further present an extensive characterization of real
production text-to-image prompts from the perspective of caching, popularity
and reuse of intermediate states in a large production environment.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要