IISAN: Efficiently Adapting Multimodal Representation for Sequential Recommendation with Decoupled PEFT
arxiv(2024)
摘要
Multimodal foundation models are transformative in sequential recommender
systems, leveraging powerful representation learning capabilities. While
Parameter-efficient Fine-tuning (PEFT) is commonly used to adapt foundation
models for recommendation tasks, most research prioritizes parameter
efficiency, often overlooking critical factors like GPU memory efficiency and
training speed. Addressing this gap, our paper introduces IISAN (Intra- and
Inter-modal Side Adapted Network for Multimodal Representation), a simple
plug-and-play architecture using a Decoupled PEFT structure and exploiting both
intra- and inter-modal adaptation.
IISAN matches the performance of full fine-tuning (FFT) and state-of-the-art
PEFT. More importantly, it significantly reduces GPU memory usage - from 47GB
to just 3GB for multimodal sequential recommendation tasks. Additionally, it
accelerates training time per epoch from 443s to 22s compared to FFT. This is
also a notable improvement over the Adapter and LoRA, which require 37-39 GB
GPU memory and 350-380 seconds per epoch for training.
Furthermore, we propose a new composite efficiency metric, TPME
(Training-time, Parameter, and GPU Memory Efficiency) to alleviate the
prevalent misconception that "parameter efficiency represents overall
efficiency". TPME provides more comprehensive insights into practical
efficiency comparisons between different methods. Besides, we give an
accessible efficiency analysis of all PEFT and FFT approaches, which
demonstrate the superiority of IISAN. We release our codes and other materials
at https://github.com/GAIR-Lab/IISAN.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要