Neural Compatibility Modeling with Attentive Knowledge Distillation

SIGIR, pp. 5-14, 2018.

Cited by: 59|Bibtex|Views28|Links
EI
Keywords:
real worldconvolutional neural networksrectified linear unitstochastic gradient descentmatching ruleMore(12+)
Weibo:
We present an attentive knowledge distillation scheme towards compatibility modeling in the context of clothing matching, which jointly learns from both the specific data samples and general knowledge rules

Abstract:

Recently, the booming fashion sector and its huge potential benefits have attracted tremendous attention from many research communities. In particular, increasing research efforts have been dedicated to the complementary clothing matching as matching clothes to make a suitable outfit has become a daily headache for many people, especially...More

Code:

Data:

0
Introduction
  • According to the Goldman Sachs, the 2016 online retail market of China for fashion products, including apparel, footwear, and accessories, has reached 187.5 billion US dollars1, which demonstrates people’s great demand for clothing.
  • Recent years have witnessed the proliferation of many online fashion communities, such as Polyvore2 and Chictopia3, where a great number of outfits composed by fashion experts have been made publicly available, as shown in Figure 1.
  • Based on such rich real-world data, several researchers have attempted to intelligently aid people in clothing matching.
Highlights
  • According to the Goldman Sachs, the 2016 online retail market of China for fashion products, including apparel, footwear, and accessories, has reached 187.5 billion US dollars1, which demonstrates people’s great demand for clothing
  • To address the aforementioned challenges, we present a compatibility modeling scheme with attentive knowledge distillation, dubbed as AKD-DBPR, as shown in Figure 3, which is able to learn from both the specific data samples and the general domain knowledge
  • By checking the context of each example, we found that they both activate certain matching rules, such as “floral + floral”, “coat + dress” and “white + black”, which may contribute to the good performance of AKD-DBPR
  • We present an attentive knowledge distillation scheme towards compatibility modeling in the context of clothing matching, which jointly learns from both the specific data samples and general knowledge rules
  • Extensive experiments have been conducted on the real-world dataset and the encouraging empirical results demonstrate the effectiveness of the proposed scheme and indicate the benifits of taking the domain knowledge into consideration in the context of compatibility modeling
  • We find that the negative matching rules and category related rules seem to be more powerful than others
Results
  • The authors empirically found that the proposed model achieves the optimal performance with K = 1 hidden layer of 1024 hidden units
Conclusion
  • The authors present an attentive knowledge distillation scheme towards compatibility modeling in the context of clothing matching, which jointly learns from both the specific data samples and general knowledge rules.
  • Considering that different rules can have different confidence levels to different samples, the authors seamlessly sew up the attention mechanism into the knowledge distillation framework to attentively assign the rule confidence.
  • The authors plan to explore the potential of visual signals in the rule identification
Summary
  • Introduction:

    According to the Goldman Sachs, the 2016 online retail market of China for fashion products, including apparel, footwear, and accessories, has reached 187.5 billion US dollars1, which demonstrates people’s great demand for clothing.
  • Recent years have witnessed the proliferation of many online fashion communities, such as Polyvore2 and Chictopia3, where a great number of outfits composed by fashion experts have been made publicly available, as shown in Figure 1.
  • Based on such rich real-world data, several researchers have attempted to intelligently aid people in clothing matching.
  • Results:

    The authors empirically found that the proposed model achieves the optimal performance with K = 1 hidden layer of 1024 hidden units
  • Conclusion:

    The authors present an attentive knowledge distillation scheme towards compatibility modeling in the context of clothing matching, which jointly learns from both the specific data samples and general knowledge rules.
  • Considering that different rules can have different confidence levels to different samples, the authors seamlessly sew up the attention mechanism into the knowledge distillation framework to attentively assign the rule confidence.
  • The authors plan to explore the potential of visual signals in the rule identification
Tables
  • Table1: Value examples of each attribute
  • Table2: Performance comparison among different approaches in terms of AUC
  • Table3: Effects of the rule guidance. The first row refers to the performance of the baseline DBPR
  • Table4: Effects of the attention mechanism
Download tables as Excel
Related work
  • 2.1 Fashion Analyses

    Recently, the huge amount of potential benefits of fashion industry have attracted many researchers’ attention from the computer vision to the multimedia research communities. Existing efforts mainly focus on clothing retrieval [16, 26, 27], fashionability prediction [24] and compatibility modeling [11, 36]. For example, Liu et al [26] presented a latent Support Vector Machine [9] model for both occasion-oriented outfit and item recommendation based on a dataset of wild street photos, constructed by manual annotations. Due to the infeasibility of human annotated dataset, several pioneering researchers have resorted to other sources, where real-world data can be harvested automatically. For example, Hu et al [17] investigated the problem of personalized outfit recommendation with a dataset collected from Polyvore. McAuley et al [29] proposed a general framework to model the human visual preference for a given pair of objects based on the Amazon real-world co-purchase dataset. In particular, they extracted visual features with convolutional neural networks (CNNs) and introduced a similarity metric to model the human notion of complement objects. Similarly, He et al [12] introduced a scalable matrix factorization approach that incorporates visual features of product
Funding
  • This work is supported by the National Basic Research Program of China (973 Program), No.: 2015CB352502; National Natural Science Foundation of China, No.: 61772310, 61702300, and 61702302; the Fundamental Research Funds of Shandong University No.: 2018HW010, and the Project of Thousand Youth Talents 2016
Reference
  • Dzmitry Bahdanau, Kyunghyun Cho, and Yoshua Bengio. 2014. Neural Machine Translation by Jointly Learning to Align and Translate. CoRR abs/1409.0473 (2014).
    Findings
  • Léon Bottou. 1991. Stochastic gradient learning in neural networks. Proceedings of Neuro-Nımes 91, 8 (1991).
    Google ScholarLocate open access versionFindings
  • Da Cao, Xiangnan He, Lianhai Miao, Yahui An, Chao Yang, and Richang Hong. 2018. Attentive Group Recommendation. In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM.
    Google ScholarLocate open access versionFindings
  • Da Cao, Liqiang Nie, Xiangnan He, Xiaochi Wei, Shunzhi Zhu, and Tat-Seng Chua. 2017. Embedding Factorization Models for Jointly Recommending Items and User Generated Lists. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval. ACM, 585–594.
    Google ScholarLocate open access versionFindings
  • Jingyuan Chen, Xuemeng Song, Liqiang Nie, Xiang Wang, Hanwang Zhang, and Tat-Seng Chua. 2016. Micro tells macro: predicting the popularity of micro-videos via a transductive model. In Proceedings of the ACM International Conference on Multimedia. ACM, 898–907.
    Google ScholarLocate open access versionFindings
  • Jingyuan Chen, Hanwang Zhang, Xiangnan He, Liqiang Nie, Wei Liu, and TatSeng Chua. 2017. Attentive Collaborative Filtering: Multimedia Recommendation with Item- and Component-Level Attention. In Proceedings of the International ACM SIGIR Conference. 335–344.
    Google ScholarLocate open access versionFindings
  • Zhiyong Cheng, Ying Ding, Lei Zhu, and Mohan S. Kankanhalli. 2018. AspectAware Latent Factor Model: Rating Prediction with Ratings and Reviews. In Proceedings of the ACM International WWW Conference. 639–648.
    Google ScholarLocate open access versionFindings
  • Zhiyong Cheng, Jialie Shen, Lei Zhu, Mohan S. Kankanhalli, and Liqiang Nie. 2017. Exploiting Music Play Sequence for Music Recommendation. In Proceedings of the International Joint Conference on Artificial Intelligence. 3654–3660.
    Google ScholarLocate open access versionFindings
  • Pedro Felzenszwalb, David McAllester, and Deva Ramanan. 2008. A discriminatively trained, multiscale, deformable part model. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. IEEE, 1–8.
    Google ScholarLocate open access versionFindings
  • Fuli Feng, Xiangnan He, Yiqun Liu, Liqiang Nie, and Tat-Seng Chua. 2018. Learning on Partial-Order Hypergraphs. In Proceedings of the ACM International WWW Conference. 1523–1532.
    Google ScholarLocate open access versionFindings
  • Xintong Han, Zuxuan Wu, Yu-Gang Jiang, and Larry S. Davis. 2017. Learning Fashion Compatibility with Bidirectional LSTMs. In Proceedings of the ACM International Conference on Multimedia. 1078–1086.
    Google ScholarLocate open access versionFindings
  • Ruining He and Julian McAuley. 2016. VBPR: Visual Bayesian Personalized Ranking from Implicit Feedback. In Proceedings of the AAAI Conference. AAAI Press, 144–150.
    Google ScholarLocate open access versionFindings
  • Xiangnan He, Lizi Liao, Hanwang Zhang, Liqiang Nie, Xia Hu, and Tat-Seng Chua. 2017. Neural Collaborative Filtering. In Proceedings of the ACM International WWW Conference. ACM, 173–182.
    Google ScholarLocate open access versionFindings
  • Xiangnan He, Hanwang Zhang, Min Yen Kan, and Tat Seng Chua. 2016. Fast Matrix Factorization for Online Recommendation with Implicit Feedback. In Proceedings of the International ACM SIGIR Conference. 549–558.
    Google ScholarLocate open access versionFindings
  • Geoffrey E. Hinton, Oriol Vinyals, and Jeffrey Dean. 20Distilling the Knowledge in a Neural Network. CoRR abs/1503.02531 (2015).
    Findings
  • Diane J Hu, Rob Hall, and Josh Attenberg. 2014. Style in the long tail: Discovering unique interests with latent variable models in large scale social e-commerce. In Proceedings of the International ACM SIGKDD Conference. ACM, 1640–1649.
    Google ScholarLocate open access versionFindings
  • Yang Hu, Xi Yi, and Larry S Davis. 2015. Collaborative fashion recommendation: a functional tensor factorization approach. In Proceedings of the ACM International Conference on Multimedia. ACM, 129–138.
    Google ScholarLocate open access versionFindings
  • Zhiting Hu, Xuezhe Ma, Zhengzhong Liu, Eduard H. Hovy, and Eric P. Xing. 2016. Harnessing Deep Neural Networks with Logic Rules. In Proceedings of the Annual Meeting of the Association for Computational Linguistics. The Association for Computer Linguistics, 2410–2420.
    Google ScholarLocate open access versionFindings
  • Zhiting Hu, Zichao Yang, Ruslan Salakhutdinov, and Eric P. Xing. 2016. Deep Neural Networks with Massive Learned Knowledge. In Proceedings of the Conference on Empirical Methods in Natural Language Processing. The Association for Computational Linguistics, 1670–1679.
    Google ScholarLocate open access versionFindings
  • Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the ACM International Conference on Multimedia. ACM, 675–678.
    Google ScholarLocate open access versionFindings
  • Lu Jiang, Shoou-I Yu, Deyu Meng, Yi Yang, Teruko Mitamura, and Alexander G Hauptmann. 2015. Fast and accurate content-based semantic search in 100m internet videos. In Proceedings of the ACM International Conference on Multimedia. ACM, 49–58.
    Google ScholarLocate open access versionFindings
  • Aditya Khosla, Atish Das Sarma, and Raffay Hamid. 2014. What makes an image popular?. In Proceedings of the ACM International WWW Conference. ACM, 867–876.
    Google ScholarLocate open access versionFindings
  • Yoon Kim. 2014. Convolutional Neural Networks for Sentence Classification. In Proceedings of the Conference on Empirical Methods in Natural Language Processing, EMNLP. 1746–1751.
    Google ScholarLocate open access versionFindings
  • Yuncheng Li, Liangliang Cao, Jiang Zhu, and Jiebo Luo. 2017. Mining Fashion Outfit Composition Using an End-to-End Deep Learning Approach on Set Data. IEEE Transactions on Multimedia 19, 8 (2017), 1946–1955.
    Google ScholarLocate open access versionFindings
  • Meng Liu, Liqiang Nie, Meng Wang, and Baoquan Chen. 2017. Towards Microvideo Understanding by Joint Sequential-Sparse Modeling. In Proceedings of the ACM on Multimedia Conference. 970–978.
    Google ScholarLocate open access versionFindings
  • Si Liu, Jiashi Feng, Zheng Song, Tianzhu Zhang, Hanqing Lu, Changsheng Xu, and Shuicheng Yan. 2012. Hi, magic closet, tell me what to wear!. In Proceedings of the ACM International Conference on Multimedia. ACM, 619–628.
    Google ScholarLocate open access versionFindings
  • Si Liu, Zheng Song, Guangcan Liu, Changsheng Xu, Hanqing Lu, and Shuicheng Yan. 2012. Street-to-shop: Cross-scenario clothing retrieval via parts alignment and auxiliary set. In Proceedings of the IEEE International Conference on Computer Vision and Pattern Recognition. IEEE, 3330–3337.
    Google ScholarLocate open access versionFindings
  • Yihui Ma, Jia Jia, Suping Zhou, Jingtian Fu, Yejun Liu, and Zijian Tong. 2017. Towards Better Understanding the Clothing Fashion Styles: A Multimodal Deep Learning Approach. In Proceedings of the AAAI Conference on Artificial Intelligence. AAAI Press, 38–44.
    Google ScholarLocate open access versionFindings
  • Julian McAuley, Christopher Targett, Qinfeng Shi, and Anton Van Den Hengel. 2015. Image-based recommendations on styles and substitutes. In Proceedings of the International ACM SIGIR Conference. ACM, 43–52.
    Google ScholarLocate open access versionFindings
  • Xueming Qian, He Feng, Guoshuai Zhao, and Tao Mei. 2014. Personalized recommendation combining user interest and social circle. IEEE Transactions on Knowledge and Data Engineering 26, 7 (2014), 1763–1777.
    Google ScholarLocate open access versionFindings
  • Meng Qu, Jian Tang, Jingbo Shang, Xiang Ren, Ming Zhang, and Jiawei Han. 2017. An Attention-based Collaboration Framework for Multi-View Network Representation Learning. In Proceedings of the International ACM CIKM Conference. ACM, 1767–1776.
    Google ScholarLocate open access versionFindings
  • Janarthanan Rajendran, Mitesh M Khapra, Sarath Chandar, and Balaraman Ravindran. 2015. Bridge correlational neural networks for multilingual multimodal representation learning. arXiv preprint arXiv:1510.03519 (2015).
    Findings
  • Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2009. BPR: Bayesian personalized ranking from implicit feedback. In Proceedings of the International Conference on Uncertainty in Artificial Intelligence. AUAI Press, 452–461.
    Google ScholarLocate open access versionFindings
  • Steffen Rendle and Lars Schmidt-Thieme. 2010. Pairwise interaction tensor factorization for personalized tag recommendation. In Proceedings of the ACM International WSDM Conference. ACM, 81–90.
    Google ScholarLocate open access versionFindings
  • Aliaksei Severyn and Alessandro Moschitti. 2015. Twitter Sentiment Analysis with Deep Convolutional Neural Networks. In Proceedings of the International ACM SIGIR Conference. ACM, 959–962.
    Google ScholarLocate open access versionFindings
  • Xuemeng Song, Fuli Feng, Jinhuan Liu, Zekun Li, Liqiang Nie, and Jun Ma. 2017. NeuroStylist: Neural Compatibility Modeling for Clothing Matching. In Proceedings of the ACM International Conference on Multimedia. 753–761.
    Google ScholarLocate open access versionFindings
  • Xuemeng Song, Liqiang Nie, Luming Zhang, Mohammad Akbari, and Tat-Seng Chua. 2015. Multiple social network learning and its application in volunteerism tendency prediction. In Proceedings of the International ACM SIGIR Conference. ACM, 213–222.
    Google ScholarLocate open access versionFindings
  • Xuemeng Song, Liqiang Nie, Luming Zhang, Maofu Liu, and Tat-Seng Chua. 2015. Interest Inference via Structure-Constrained Multi-Source Multi-Task Learning.. In Proceedings of the International Joint Conference on Artificial Intelligence. AAAI Press, 2371–2377.
    Google ScholarLocate open access versionFindings
  • Xiang Wang, Xiangnan He, Fuli Feng, Liqiang Nie, and Tat-Seng Chua. 2017. TEM: Tree-enhanced Embedding Model for Explainable Recommendation. In Proceedings of the International Conference on World Wide Web.
    Google ScholarLocate open access versionFindings
  • Xiang Wang, Xiangnan He, Liqiang Nie, and Tat-Seng Chua. 2017. Item silk road: Recommending items from information domains to social users. In Proceedings of the International ACM SIGIR conference on Research and Development in Information Retrieval. ACM, 185–194.
    Google ScholarLocate open access versionFindings
  • Hongzhi Yin, Hongxu Chen, Xiaoshuai Sun, Hao Wang, Yang Wang, and Quoc Viet Hung Nguyen. 2017. SPTF: A Scalable Probabilistic Tensor Factorization Model for Semantic-Aware Behavior Prediction. In IEEE International Conference on Data Mining. 585–594.
    Google ScholarLocate open access versionFindings
  • Hongzhi Yin, Weiqing Wang, Hao Wang, Ling Chen, and Xiaofang Zhou. 2017. Spatial-Aware Hierarchical Collaborative Deep Learning for POI Recommendation. IEEE Transactions on Knowledge and Data Engineering 29, 11 (2017), 2537–2551.
    Google ScholarLocate open access versionFindings
  • Ruichi Yu, Ang Li, Vlad I. Morariu, and Larry S. Davis. 2017. Visual Relationship Detection With Internal and External Linguistic Knowledge Distillation. In Proceedings of the IEEE International Conference on Computer Vision. IEEE Computer Society, 1974–1982.
    Google ScholarLocate open access versionFindings
  • Hanwang Zhang, Zawlin Kyaw, Shih-Fu Chang, and Tat-Seng Chua. 2017. Visual Translation Embedding Network for Visual Relation Detection. In IEEE Conference on Computer Vision and Pattern Recognition. 3107–3115.
    Google ScholarLocate open access versionFindings
  • Hanwang Zhang, Zheng-Jun Zha, Yang Yang, Shuicheng Yan, Yue Gao, and TatSeng Chua. 2013. Attribute-augmented semantic hierarchy: towards bridging semantic gap and intention gap in image retrieval. In Proceedings of the ACM International Conference on Multimedia. ACM, 33–42.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments