Catboost-based Framework with Additional User Information for Social Media Popularity Prediction

Proceedings of the 27th ACM International Conference on Multimedia(2019)

引用 45|浏览200
暂无评分
摘要
In this paper, a Catboost-based framework is proposed to predict social media popularity. The framework is constituted by two components: feature representation and Catboost training. In the component of feature representation, numerical features are directly used, while categorical features are converted into numerical features by a method of order target statistics in Catboost. Besides, some additional user information is also tracked to enrich the feature space. In the other component, Catboost is adopted as the regression model which is trained by using post-related, user-related and additional user information. Moreover, to make full use of the dataset for model training, a dataset augmentation strategy based on pseudo labels is proposed. This strategy involves in two-stage training. In the first stage, it trains a first-stage model that is used to label the test set as pseudo labeled. In the next stage, a final model is trained based on the new training set that includes original validation set and the pseudo labeled test set. The proposed method achieves the 2nd place in the leader board of the Grand Challenge of Social Media Prediction.
更多
查看译文
关键词
catboost, categorical features, social media prediction
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要