# Predicting the popularity of web 2.0 items based on user comments

SIGIR, pp. 233-242, 2014.

EI

Keywords:

Weibo:

Abstract:

In the current Web 2.0 era, the popularity of Web resources fluctuates ephemerally, based on trends and social interest. As a result, content-based relevance signals are insufficient to meet users' constantly evolving information needs in searching for Web 2.0 items. Incorporating future popularity into ranking is one way to counter this....More

Code:

Data:

Introduction

- The era of static webpages has been surpassed over a decade ago with the advent of Web 2.0.
- Inspecting the view count of these search results over the three days, the authors find that the top three results receive less than 10,000 views, while the top-ranked video of the second season was viewed over 100,000 times
- This indicates that at least for this particular query, Google did not make optimal use of future popularity, and did not satisfy many users’ search expectations

Highlights

- The era of static webpages has been surpassed over a decade ago with the advent of Web 2.0
- To enable popularity prediction externally without excessive crawling, we propose an alternative solution by leveraging user comments, which are more accessible than view counts
- Inspecting the view count of these search results over the three days, we find that the top three results receive less than 10,000 views, while the top-ranked video of the second season was viewed over 100,000 times
- As normalized discounted cumulative gain takes relevance levels into account, we define the top 10% of items found in the ground truth ranking as relevant, where higher ranked positions are accorded more relevance, computing a relevance score of
- We introduce a new ranking algorithm, Bipartite User-Item Ranking (BUIR), that realizes these hypotheses under a regularization framework
- Detailed analysis reveals that the factors individually only predict well for some subset of items, while combining all under the proposed Bipartite User-Item Ranking methodology yields the highest quality predictions

Methods

- Most applications seek to determine an item’s ranking relative to other items. The authors focus on the relative ranking of items — rather than exact popularity prediction (cf.
- Edge weights are assigned intuitively: for the time unit of YouTube and Last.fm, as comments are rich and reflective of popularity, the authors set it to 1 day; for Flickr, the authors find that the comments are posted less frequently, the authors set it to 3 days; for the time decay function in Eq (3), the authors empirically set δ = 0.85, a = 1, and b = 0 for all datasets.
- ML [28] PR [27] BUIR

Results

- To assess the predicted ranking with the ground truth ranking, the authors employ ranking correlation in the standard form of the Spearman coefficient [2].
- It measures the agreement between two rankings defined as follows: S(R1, R2) 1− 6× N i=1 (s1,i s2,i )2 N × (N 2 − 1) (14).
- As nDCG takes relevance levels into account, the authors define the top 10% of items found in the ground truth ranking as relevant, where higher ranked positions are accorded more relevance, computing a relevance score of i 0.1×N [37] for the ith ranked item and a score of 0 for items beyond the top 10%

Conclusion

- The authors systematically investigate how to best leverage user comments for predicting the popularity of Web 2.0 items.
- Detailed analysis reveals that the factors individually only predict well for some subset of items, while combining all under the proposed BUIR methodology yields the highest quality predictions.
- The authors' proposed solution is general: it is extended to incorporate additional factors, and is applicable to ranking items when user comments are available

Summary

## Introduction:

The era of static webpages has been surpassed over a decade ago with the advent of Web 2.0.- Inspecting the view count of these search results over the three days, the authors find that the top three results receive less than 10,000 views, while the top-ranked video of the second season was viewed over 100,000 times
- This indicates that at least for this particular query, Google did not make optimal use of future popularity, and did not satisfy many users’ search expectations
## Methods:

Most applications seek to determine an item’s ranking relative to other items. The authors focus on the relative ranking of items — rather than exact popularity prediction (cf.- Edge weights are assigned intuitively: for the time unit of YouTube and Last.fm, as comments are rich and reflective of popularity, the authors set it to 1 day; for Flickr, the authors find that the comments are posted less frequently, the authors set it to 3 days; for the time decay function in Eq (3), the authors empirically set δ = 0.85, a = 1, and b = 0 for all datasets.
- ML [28] PR [27] BUIR
## Results:

To assess the predicted ranking with the ground truth ranking, the authors employ ranking correlation in the standard form of the Spearman coefficient [2].- It measures the agreement between two rankings defined as follows: S(R1, R2) 1− 6× N i=1 (s1,i s2,i )2 N × (N 2 − 1) (14).
- As nDCG takes relevance levels into account, the authors define the top 10% of items found in the ground truth ranking as relevant, where higher ranked positions are accorded more relevance, computing a relevance score of i 0.1×N [37] for the ith ranked item and a score of 0 for items beyond the top 10%
## Conclusion:

The authors systematically investigate how to best leverage user comments for predicting the popularity of Web 2.0 items.- Detailed analysis reveals that the factors individually only predict well for some subset of items, while combining all under the proposed BUIR methodology yields the highest quality predictions.
- The authors' proposed solution is general: it is extended to incorporate additional factors, and is applicable to ranking items when user comments are available

- Table1: Statistics of our three Web 2.0 datasets. Avg C:I denotes the average number of comments per item
- Table2: Spearman coeff. (%) of overall evaluation
- Table3: Results (mean±standard deviation) of query-specific evaluation. “*” denotes the statistical significance for p < 0.05
- Table4: Spearman coefficient of overall prediction and performance decrease of different parameter settings

Related work

- In this section, we review work on popularity prediction first, then discuss Web 2.0 user comment mining.

2.1 Popularity Prediction of Online Content

Popularity prediction can be classified into three broad types: statistics-based, classification-based, and model-based approaches.

Statistics-based Prediction. These approaches assume that past popularity is a good predictor of future popularity. Szabo and Huberman [32] analyzed the popularity growth of YouTube videos and Digg stories, finding a strong correlation between the logarithmically transformed past popularity and current popularity. They proposed a univariate linear model to capture this correlation. Later, Pinto et al [28] extended the univariate model to a multivariate one by incorporating additional historical points and features. Radinsky et al [29] proposed several time series prediction methods of user behaviors based on state-space models. All of these techniques require access to the view histories of items, which are difficult for third parties to obtain in practice, as described earlier in Section 1.

Funding

- ∗This research is supported by the Singapore National Research Foundation under its International Research Centre @ Singapore Funding Initiative and administered by the IDM Programme Office

Reference

- M. Ahmed, S. Spagna, F. Huici, and S. Niccolini. A peek into the future: Predicting the evolution of popularity in user generated content. In Proc. of WSDM ’13, pages 607–616, 2013.
- R. A. Baeza-Yates and B. Ribeiro-Neto. Modern Information Retrieval, volume 463. ACM press New York, 1999.
- E. Bakshy, J. M. Hofman, W. A. Mason, and D. J. Watts. Everyone’s an influencer: Quantifying influence on Twitter. In Proc. of WSDM ’11, pages 65–74, 2011.
- Y. Cha, B. Bi, C.-C. Hsieh, and J. Cho. Incorporating popularity in topic models for social network analysis. In Proc. of SIGIR ’13, pages 223–232, 2013.
- C. Chatfield. The Analysis of Time Series: An Introduction, Sixth Edition. Taylor & Francis, 2003.
- S. V. Chelaru, C. Orellana-Rodriguez, and I. S. Altingovde. Can social features help learning to rank Youtube videos? In Proc. of WISE ’12, pages 552–566, 2012.
- F. R. Chung. Spectral graph theory. 92, 1997.
- E. Cohen and M. J. Strauss. Maintaining time-decaying stream aggregates. Journal of Algorithms, 59(1):19–36, 2006.
- H. Deng, M. R. Lyu, and I. King. A generalized Co-HITS algorithm and its application to bipartite graphs. In Proc. of KDD ’09, pages 239–248, 2009.
- Y. Ding and X. Li. Time weight collaborative filtering. In Proc. of CIKM ’05, pages 485–492, 2005.
- F. Figueiredo, F. Benevenuto, and J. M. Almeida. The tube over time: characterizing popularity growth of Youtube videos. In Proc. of WSDM ’11, pages 745–754, 2011.
- K. Filippova and K. B. Hall. Improved video categorization from text metadata and user comments. In Proc. of SIGIR ’11, pages 835–842, 2011.
- M. A. Gonçalves, J. M. Almeida, L. G. dos Santos, A. H. Laender, and V. Almeida. On popularity in the blogosphere. Internet Computing, IEEE, 14(3):42–49, 2010.
- T. H. Haveliwala. Topic-sensitive PageRank. In Proc. of WWW ’02, pages 517–526, 2002.
- X. He, M.-Y. Kan, P. Xie, and X. Chen. Comment-based multi-view clustering of web 2.0 items. In Proc. of WWW ’14, pages 771–782, 2014.
- M. Hu, A. Sun, and E.-P. Lim. Comments-oriented document summarization: understanding documents with readers’ feedback. In Proc. of SIGIR ’08, pages 291–298, 2008.
- S. Jamali and H. Rangwala. Digging Digg: Comment mining, popularity prediction, and social network analysis. In Proc. of WISM ’09, pages 32–38, 2009.
- K. Järvelin and J. Kekäläinen. IR evaluation methods for retrieving highly relevant documents. In Proc. of SIGIR ’00, pages 41–48, 2000.
- J. M. Kleinberg. Authoritative sources in a hyperlinked environment. Journal of the ACM, 46(5):604–632, 1999.
- H. Lakkaraju and J. Ajmera. Attention prediction on social media brand pages. In Proc. of CIKM ’11, pages 2157–2160, 2011.
- R. Lempel and S. Moran. The stochastic approach for link-structure analysis (salsa) and the tkc effect. Computer Networks, 33(1):387–401, 2000.
- K. Lerman and T. Hogg. Using a model of social dynamics to predict popularity of news. In Proc. of WWW ’10, pages 621–630, 2010.
- H. Ma, D. Zhou, C. Liu, M. R. Lyu, and I. King. Recommender systems with social regularization. In Proc. of WSDM ’11, pages 287–296, 2011.
- F. McSherry. A uniform approach to accelerated PageRank computation. In Proc. of WWW ’05, pages 575–582, 2005.
- G. Mishne and N. Glance. Leave a reply: An analysis of Weblog comments. In Third annual workshop on the Weblogging ecosystem, 2006.
- M. Mitzenmacher and E. Upfal. Probability and computing: Randomized algorithms and probabilistic analysis. Cambridge University Press, 2005.
- L. Page, S. Brin, R. Motwani, and T. Winograd. The PageRank citation ranking: Bringing order to the Web. Technical report, Stanford InfoLab, 1999.
- H. Pinto, J. M. Almeida, and M. A. Gonçalves. Using early view patterns to predict the popularity of youtube videos. In Proc. of WSDM ’13, pages 365–374, 2013.
- K. Radinsky, K. Svore, S. Dumais, J. Teevan, A. Bocharov, and E. Horvitz. Modeling and predicting behavioral dynamics on the web. In Proc. of WWW ’12, pages 599–608, 2012.
- J. San Pedro, T. Yeh, and N. Oliver. Leveraging user comments for aesthetic aware image search reranking. In Proc. of WWW ’12, pages 439–448, 2012.
- E. Shmueli, A. Kagian, Y. Koren, and R. Lempel. Care to comment?: Recommendations for commenting on news stories. In Proc. of WWW ’12, pages 429–438, 2012.
- G. Szabo and B. A. Huberman. Predicting the popularity of online content. Communications of the ACM, 53(8):80–88, 2010.
- A. Tatar, J. Leguay, P. Antoniadis, A. Limbourg, M. D. de Amorim, and S. Fdida. Predicting the popularity of online articles based on user comments. In Proc. of WIMS ’11, pages 67–75, 2011.
- A. Wang, T. Chen, and M.-Y. Kan. Re-tweeting from a linguistic perspective. In Proc. of NAACL-HLT ’12, pages 46–55, 2012.
- D. T. Wijaya and S. Bressan. A random walk on the red carpet: Rating movies with user reviews and PageRank. In Proc. of CIKM ’08, pages 951–960, 2008.
- R. Yan, M. Lapata, and X. Li. Tweet recommendation with graph co-ranking. In Proc. of ACL ’12, pages 516–525, 2012.
- P. Yin, P. Luo, M. Wang, and W.-C. Lee. A straw shows which way the wind blows: Ranking potentially popular items from early votes. In Proc. of WSDM ’12, pages 623–632, 2012.
- Y. Zhang, M. Zhang, Y. Zhang, Y. Liu, and S. Ma. Explicit factor models for explainable recommendation based on phrase-level sentiment analysis. In Proc. of SIGIR ’14, 2014.
- D. Zhou and B. Schölkopf. Regularization on discrete spaces. In Pattern Recognition, pages 361–368.

Tags

Comments