Do we Read what we Share? Analyzing the Click Dynamic of News Articles Shared on Twitter

Jesper Holmstrom, Daniel Jonsson, Filip Polbratt, Olav Nilsson, Linnea Lundstrom, Sebastian Ragnarsson, Anton Forsberg, Karl Andersson, and Niklas Carlsson

Paper: Jesper Holmstrom, Daniel Jonsson, Filip Polbratt, Olav Nilsson, Linnea Lundstrom, Sebastian Ragnarsson, Anton Forsberg, Karl Andersson, and Niklas Carlsson. "Do we Read what we Share? Analyzing the Click Dynamic of News Articles Shared on Twitter", Proc. IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (IEEE/ACM ASONAM), Vancouver, Canada, Aug. 2019. (pdf)

Abstract: News and information spread over social media can have big impact on thoughts, beliefs, and opinions. It is therefore important to understand the sharing dynamics on these forums. However, most studies trying to capture these dynamics rely only on Twitter's open APIs to measure how frequently articles are shared/retweeted, and therefore do not capture how many users actually read the articles linked in these tweets. To address this problem, in this paper, we first develop a novel measurement methodology, which combines the Twitter steaming API, the Bitly API, and careful sample rate selection to simultaneously collect and analyze the timeline of both the number of retweets and clicks generated by news article links. Second, we present a temporal analysis of the news cycle based on five-day-long traces (containing both clicks and retweet over time) for the news article links discovered during a seven-day period. Among other things, our analysis highlights differences in the relative timelines observed for clicks and retweets (e.g., retweet data often lags and underestimates the bias towards reading popular links/articles), and helps answer important questions regarding differences in how age-based biases and churn affect how frequently news articles shared on Twitter are accessed over time. Our temporal findings are shown to be consistent both when comparing data collected a year apart (2017 vs 2018) and across articles published on news websites with vastly different characteristics.

Software and datasets

To help build upon our work, below, we make available code and example datasets.

Sotware: The software and code used in our paper is made available here (7.6 MB) for use by the wider research community. (The file contains commented source codes and a README file which should help in getting started with the files.)
Datasets: An example dataset with the longitudinal click data from 2018 can be found here: 2018 click data. The fields here conists of (1) the date that article was posted (when easily extracted from URL) (2) the long URL, (3) whether the article was classified as "biased" (fake vs real) using a very simple Naive Bayes classifier (this classification was not used in the paper - please ignore), (4) whether the link was in the "random" set, (5) whether the link is outside the "top" set ("bottom" part), (6) the number of followers, (7) the number of retweets, (8) the start date+time when the link was first observed in the tweet stream, (9-69) the total number of clicks accumulated up-to-and-including the T-th hour, where T=0,2,4,6,...120, (70) the fraction of clicks obtained after the "0-hour" measurement, (71) a "dummary marker" (yes/no) that we used with a prior example methodology (please ignore). Additional datasets may be added later.

Note: If you use our datafiles or code in your research, please include a reference to our ASONAM 2019 paper (pdf) in your work.