The Untold Story of the Clones: Content-agnostic Factors that Impact YouTube Video Popularity

Y. Borghol, S. Ardon, N. Carlsson, D. Eager, and A. Mahanti, "The Untold Story of the Clones: Content-agnostic Factors that Impact YouTube Video Popularity", Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2012), Beijing, China, Aug. 2012, pp. 1186--1194. (pdf)

Abstract: Video dissemination through sites such as YouTube can have widespread impacts on opinions, thoughts, and cultures. Not all videos will reach the same popularity and have the same impact. Popularity differences arise not only because of differences in video content, but also because of other “content-agnostic” factors. The latter factors are of considerable interest but it has been difficult to accurately study them. For example, videos uploaded by users with large social networks may tend to be more popular because they tend to have more interesting content, not because social network size has a substantial direct impact on popularity. In this paper, we develop and apply a methodology that is able to accurately assess, both qualitatively and quantitatively, the impacts of various content-agnostic factors on video popularity. When controlling for video content, we observe a strong linear “rich-get-richer” behavior, with the total number of previous views as the most important factor except for very young videos. The second most important factor is found to be video age. We analyze a number of phenomena that may contribute to rich-get-richer, including the first-mover advantage, and search bias towards popular videos. For young videos we find that factors other than the total number of previous views, such as uploader characteristics and number of keywords, become relatively more important. Our findings also confirm that inaccurate conclusions can be reached when not controlling for content.


The datasets used in our paper are made available here for use by the wider research community. The datasets consist of publicly available meta-data associated with videos from the Youtube Web site. Please refer to Section 2 of our paper for a description of the data collection methodology and a summary of the datasets. If you use our datasets in your research, please drop Niklas Carlsson a line at "niklas dot carlsson AT-SIGN liu dot se", and include a reference to our paper (pdf) in your work.