TIS-DPO: Token-level Importance Sampling for Direct Preference Optimization with Estimated Weights

Aiwei Liu,Haoping Bai, Zhiyun Lu,Yanchao Sun, Xiang Kong, Simon Wang,Jiulong Shan, Albin Madappally Jose, Xiaojiang Liu,Lijie Wen,Philip S. Yu,Meng Cao


