TrimCaching: Parameter-sharing Edge Caching for AI Model Downloading
CoRR(2024)
Abstract
Next-generation mobile networks are expected to facilitate fast AI model
downloading to end users. By caching models on edge servers, mobile networks
can deliver models to end users with low latency, resulting in a paradigm
called edge model caching. In this paper, we develop a novel model placement
scheme, called parameter-sharing model caching (TrimCaching). TrimCaching
exploits the key observation that a wide range of AI models, such as
convolutional neural networks or large language models, can share a significant
proportion of parameter blocks containing reusable knowledge, thereby improving
storage efficiency. To this end, we formulate a parameter-sharing model
placement problem to maximize the cache hit ratio in multi-edge wireless
networks by balancing the fundamental tradeoff between storage efficiency and
service latency. We show that the formulated problem is a submodular
maximization problem with submodular constraints, for which no polynomial-time
approximation algorithm exists. To overcome this challenge, we study an
important special case, where a small fixed number of parameter blocks are
shared across models, which often holds in practice. In such a case, a
polynomial-time algorithm with (1-ϵ)/2-approximation
guarantee is developed. Subsequently, we address the original problem for the
general case by developing a greedy algorithm. Simulation results demonstrate
that the proposed TrimCaching framework significantly improves the cache hit
ratio compared with state-of-the-art content caching without exploiting shared
parameters in AI models.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined