Predicting the popularity of web 2.0 items based on user comments

SIGIR, pp. 233-242, 2014.

Cited by: 83|Bibtex|Views17|Links
EI
Keywords:
bipartite graph rankingpopularity predictionbuirinformation filteringcomments miningMore(2+)
Weibo:
We introduce a new ranking algorithm, Bipartite User-Item Ranking, that realizes these hypotheses under a regularization framework

Abstract:

In the current Web 2.0 era, the popularity of Web resources fluctuates ephemerally, based on trends and social interest. As a result, content-based relevance signals are insufficient to meet users' constantly evolving information needs in searching for Web 2.0 items. Incorporating future popularity into ranking is one way to counter this....More

Code:

Data:

0
Introduction
  • The era of static webpages has been surpassed over a decade ago with the advent of Web 2.0.
  • Inspecting the view count of these search results over the three days, the authors find that the top three results receive less than 10,000 views, while the top-ranked video of the second season was viewed over 100,000 times
  • This indicates that at least for this particular query, Google did not make optimal use of future popularity, and did not satisfy many users’ search expectations
Highlights
  • The era of static webpages has been surpassed over a decade ago with the advent of Web 2.0
  • To enable popularity prediction externally without excessive crawling, we propose an alternative solution by leveraging user comments, which are more accessible than view counts
  • Inspecting the view count of these search results over the three days, we find that the top three results receive less than 10,000 views, while the top-ranked video of the second season was viewed over 100,000 times
  • As normalized discounted cumulative gain takes relevance levels into account, we define the top 10% of items found in the ground truth ranking as relevant, where higher ranked positions are accorded more relevance, computing a relevance score of
  • We introduce a new ranking algorithm, Bipartite User-Item Ranking (BUIR), that realizes these hypotheses under a regularization framework
  • Detailed analysis reveals that the factors individually only predict well for some subset of items, while combining all under the proposed Bipartite User-Item Ranking methodology yields the highest quality predictions
Methods
  • Most applications seek to determine an item’s ranking relative to other items. The authors focus on the relative ranking of items — rather than exact popularity prediction (cf.
  • Edge weights are assigned intuitively: for the time unit of YouTube and Last.fm, as comments are rich and reflective of popularity, the authors set it to 1 day; for Flickr, the authors find that the comments are posted less frequently, the authors set it to 3 days; for the time decay function in Eq (3), the authors empirically set δ = 0.85, a = 1, and b = 0 for all datasets.
  • ML [28] PR [27] BUIR
Results
  • To assess the predicted ranking with the ground truth ranking, the authors employ ranking correlation in the standard form of the Spearman coefficient [2].
  • It measures the agreement between two rankings defined as follows: S(R1, R2) 1− 6× N i=1 (s1,i s2,i )2 N × (N 2 − 1) (14).
  • As nDCG takes relevance levels into account, the authors define the top 10% of items found in the ground truth ranking as relevant, where higher ranked positions are accorded more relevance, computing a relevance score of i 0.1×N [37] for the ith ranked item and a score of 0 for items beyond the top 10%
Conclusion
  • The authors systematically investigate how to best leverage user comments for predicting the popularity of Web 2.0 items.
  • Detailed analysis reveals that the factors individually only predict well for some subset of items, while combining all under the proposed BUIR methodology yields the highest quality predictions.
  • The authors' proposed solution is general: it is extended to incorporate additional factors, and is applicable to ranking items when user comments are available
Summary
  • Introduction:

    The era of static webpages has been surpassed over a decade ago with the advent of Web 2.0.
  • Inspecting the view count of these search results over the three days, the authors find that the top three results receive less than 10,000 views, while the top-ranked video of the second season was viewed over 100,000 times
  • This indicates that at least for this particular query, Google did not make optimal use of future popularity, and did not satisfy many users’ search expectations
  • Methods:

    Most applications seek to determine an item’s ranking relative to other items. The authors focus on the relative ranking of items — rather than exact popularity prediction (cf.
  • Edge weights are assigned intuitively: for the time unit of YouTube and Last.fm, as comments are rich and reflective of popularity, the authors set it to 1 day; for Flickr, the authors find that the comments are posted less frequently, the authors set it to 3 days; for the time decay function in Eq (3), the authors empirically set δ = 0.85, a = 1, and b = 0 for all datasets.
  • ML [28] PR [27] BUIR
  • Results:

    To assess the predicted ranking with the ground truth ranking, the authors employ ranking correlation in the standard form of the Spearman coefficient [2].
  • It measures the agreement between two rankings defined as follows: S(R1, R2) 1− 6× N i=1 (s1,i s2,i )2 N × (N 2 − 1) (14).
  • As nDCG takes relevance levels into account, the authors define the top 10% of items found in the ground truth ranking as relevant, where higher ranked positions are accorded more relevance, computing a relevance score of i 0.1×N [37] for the ith ranked item and a score of 0 for items beyond the top 10%
  • Conclusion:

    The authors systematically investigate how to best leverage user comments for predicting the popularity of Web 2.0 items.
  • Detailed analysis reveals that the factors individually only predict well for some subset of items, while combining all under the proposed BUIR methodology yields the highest quality predictions.
  • The authors' proposed solution is general: it is extended to incorporate additional factors, and is applicable to ranking items when user comments are available
Tables
  • Table1: Statistics of our three Web 2.0 datasets. Avg C:I denotes the average number of comments per item
  • Table2: Spearman coeff. (%) of overall evaluation
  • Table3: Results (mean±standard deviation) of query-specific evaluation. “*” denotes the statistical significance for p < 0.05
  • Table4: Spearman coefficient of overall prediction and performance decrease of different parameter settings
Download tables as Excel
Related work
  • In this section, we review work on popularity prediction first, then discuss Web 2.0 user comment mining.

    2.1 Popularity Prediction of Online Content

    Popularity prediction can be classified into three broad types: statistics-based, classification-based, and model-based approaches.

    Statistics-based Prediction. These approaches assume that past popularity is a good predictor of future popularity. Szabo and Huberman [32] analyzed the popularity growth of YouTube videos and Digg stories, finding a strong correlation between the logarithmically transformed past popularity and current popularity. They proposed a univariate linear model to capture this correlation. Later, Pinto et al [28] extended the univariate model to a multivariate one by incorporating additional historical points and features. Radinsky et al [29] proposed several time series prediction methods of user behaviors based on state-space models. All of these techniques require access to the view histories of items, which are difficult for third parties to obtain in practice, as described earlier in Section 1.
Funding
  • ∗This research is supported by the Singapore National Research Foundation under its International Research Centre @ Singapore Funding Initiative and administered by the IDM Programme Office
Reference
  • M. Ahmed, S. Spagna, F. Huici, and S. Niccolini. A peek into the future: Predicting the evolution of popularity in user generated content. In Proc. of WSDM ’13, pages 607–616, 2013.
    Google ScholarLocate open access versionFindings
  • R. A. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval, volume 463. ACM press New York, 1999.
    Google ScholarLocate open access versionFindings
  • E. Bakshy, J. M. Hofman, W. A. Mason, and D. J. Watts. Everyone’s an influencer: Quantifying influence on Twitter. In Proc. of WSDM ’11, pages 65–74, 2011.
    Google ScholarLocate open access versionFindings
  • Y. Cha, B. Bi, C.-C. Hsieh, and J. Cho. Incorporating popularity in topic models for social network analysis. In Proc. of SIGIR ’13, pages 223–232, 2013.
    Google ScholarLocate open access versionFindings
  • C. Chatfield. The Analysis of Time Series: An Introduction, Sixth Edition. Taylor & Francis, 2003.
    Google ScholarFindings
  • S. V. Chelaru, C. Orellana-Rodriguez, and I. S. Altingovde. Can social features help learning to rank Youtube videos? In Proc. of WISE ’12, pages 552–566, 2012.
    Google ScholarLocate open access versionFindings
  • F. R. Chung. Spectral graph theory. 92, 1997.
    Google ScholarFindings
  • E. Cohen and M. J. Strauss. Maintaining time-decaying stream aggregates. Journal of Algorithms, 59(1):19–36, 2006.
    Google ScholarLocate open access versionFindings
  • H. Deng, M. R. Lyu, and I. King. A generalized Co-HITS algorithm and its application to bipartite graphs. In Proc. of KDD ’09, pages 239–248, 2009.
    Google ScholarLocate open access versionFindings
  • Y. Ding and X. Li. Time weight collaborative filtering. In Proc. of CIKM ’05, pages 485–492, 2005.
    Google ScholarLocate open access versionFindings
  • F. Figueiredo, F. Benevenuto, and J. M. Almeida. The tube over time: characterizing popularity growth of Youtube videos. In Proc. of WSDM ’11, pages 745–754, 2011.
    Google ScholarLocate open access versionFindings
  • K. Filippova and K. B. Hall. Improved video categorization from text metadata and user comments. In Proc. of SIGIR ’11, pages 835–842, 2011.
    Google ScholarLocate open access versionFindings
  • M. A. Gonçalves, J. M. Almeida, L. G. dos Santos, A. H. Laender, and V. Almeida. On popularity in the blogosphere. Internet Computing, IEEE, 14(3):42–49, 2010.
    Google ScholarLocate open access versionFindings
  • T. H. Haveliwala. Topic-sensitive PageRank. In Proc. of WWW ’02, pages 517–526, 2002.
    Google ScholarLocate open access versionFindings
  • X. He, M.-Y. Kan, P. Xie, and X. Chen. Comment-based multi-view clustering of web 2.0 items. In Proc. of WWW ’14, pages 771–782, 2014.
    Google ScholarLocate open access versionFindings
  • M. Hu, A. Sun, and E.-P. Lim. Comments-oriented document summarization: understanding documents with readers’ feedback. In Proc. of SIGIR ’08, pages 291–298, 2008.
    Google ScholarLocate open access versionFindings
  • S. Jamali and H. Rangwala. Digging Digg: Comment mining, popularity prediction, and social network analysis. In Proc. of WISM ’09, pages 32–38, 2009.
    Google ScholarLocate open access versionFindings
  • K. Järvelin and J. Kekäläinen. IR evaluation methods for retrieving highly relevant documents. In Proc. of SIGIR ’00, pages 41–48, 2000.
    Google ScholarLocate open access versionFindings
  • J. M. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604–632, 1999.
    Google ScholarLocate open access versionFindings
  • H. Lakkaraju and J. Ajmera. Attention prediction on social media brand pages. In Proc. of CIKM ’11, pages 2157–2160, 2011.
    Google ScholarLocate open access versionFindings
  • R. Lempel and S. Moran. The stochastic approach for link-structure analysis (salsa) and the tkc effect. Computer Networks, 33(1):387–401, 2000.
    Google ScholarLocate open access versionFindings
  • K. Lerman and T. Hogg. Using a model of social dynamics to predict popularity of news. In Proc. of WWW ’10, pages 621–630, 2010.
    Google ScholarLocate open access versionFindings
  • H. Ma, D. Zhou, C. Liu, M. R. Lyu, and I. King. Recommender systems with social regularization. In Proc. of WSDM ’11, pages 287–296, 2011.
    Google ScholarLocate open access versionFindings
  • F. McSherry. A uniform approach to accelerated PageRank computation. In Proc. of WWW ’05, pages 575–582, 2005.
    Google ScholarLocate open access versionFindings
  • G. Mishne and N. Glance. Leave a reply: An analysis of Weblog comments. In Third annual workshop on the Weblogging ecosystem, 2006.
    Google ScholarLocate open access versionFindings
  • M. Mitzenmacher and E. Upfal. Probability and computing: Randomized algorithms and probabilistic analysis. Cambridge University Press, 2005.
    Google ScholarFindings
  • L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the Web. Technical report, Stanford InfoLab, 1999.
    Google ScholarFindings
  • H. Pinto, J. M. Almeida, and M. A. Gonçalves. Using early view patterns to predict the popularity of youtube videos. In Proc. of WSDM ’13, pages 365–374, 2013.
    Google ScholarLocate open access versionFindings
  • K. Radinsky, K. Svore, S. Dumais, J. Teevan, A. Bocharov, and E. Horvitz. Modeling and predicting behavioral dynamics on the web. In Proc. of WWW ’12, pages 599–608, 2012.
    Google ScholarLocate open access versionFindings
  • J. San Pedro, T. Yeh, and N. Oliver. Leveraging user comments for aesthetic aware image search reranking. In Proc. of WWW ’12, pages 439–448, 2012.
    Google ScholarLocate open access versionFindings
  • E. Shmueli, A. Kagian, Y. Koren, and R. Lempel. Care to comment?: Recommendations for commenting on news stories. In Proc. of WWW ’12, pages 429–438, 2012.
    Google ScholarLocate open access versionFindings
  • G. Szabo and B. A. Huberman. Predicting the popularity of online content. Communications of the ACM, 53(8):80–88, 2010.
    Google ScholarLocate open access versionFindings
  • A. Tatar, J. Leguay, P. Antoniadis, A. Limbourg, M. D. de Amorim, and S. Fdida. Predicting the popularity of online articles based on user comments. In Proc. of WIMS ’11, pages 67–75, 2011.
    Google ScholarLocate open access versionFindings
  • A. Wang, T. Chen, and M.-Y. Kan. Re-tweeting from a linguistic perspective. In Proc. of NAACL-HLT ’12, pages 46–55, 2012.
    Google ScholarLocate open access versionFindings
  • D. T. Wijaya and S. Bressan. A random walk on the red carpet: Rating movies with user reviews and PageRank. In Proc. of CIKM ’08, pages 951–960, 2008.
    Google ScholarLocate open access versionFindings
  • R. Yan, M. Lapata, and X. Li. Tweet recommendation with graph co-ranking. In Proc. of ACL ’12, pages 516–525, 2012.
    Google ScholarLocate open access versionFindings
  • P. Yin, P. Luo, M. Wang, and W.-C. Lee. A straw shows which way the wind blows: Ranking potentially popular items from early votes. In Proc. of WSDM ’12, pages 623–632, 2012.
    Google ScholarLocate open access versionFindings
  • Y. Zhang, M. Zhang, Y. Zhang, Y. Liu, and S. Ma. Explicit factor models for explainable recommendation based on phrase-level sentiment analysis. In Proc. of SIGIR ’14, 2014.
    Google ScholarLocate open access versionFindings
  • D. Zhou and B. Schölkopf. Regularization on discrete spaces. In Pattern Recognition, pages 361–368.
    Google ScholarLocate open access versionFindings
Your rating :
0

 

Tags
Comments