Do we Read what we Share? Analyzing the Click Dynamic of News Articles Shared on Twitter
Jesper Holmstrom, Daniel Jonsson, Filip Polbratt, Olav Nilsson, Linnea Lundstrom, Sebastian Ragnarsson, Anton Forsberg, Karl Andersson, and Niklas Carlsson
Paper:
Jesper Holmstrom, Daniel Jonsson, Filip Polbratt, Olav Nilsson, Linnea Lundstrom, Sebastian Ragnarsson, Anton Forsberg, Karl Andersson, and Niklas Carlsson.
"Do we Read what we Share? Analyzing the Click Dynamic of News Articles Shared on Twitter",
Proc. IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (IEEE/ACM ASONAM),
Vancouver, Canada, Aug. 2019.
(pdf)
Abstract:
News and information spread over social media can have big impact on thoughts, beliefs, and opinions.
It is therefore important to understand the sharing dynamics on these forums.
However, most studies trying to capture these dynamics
rely only on Twitter's open APIs to measure how frequently articles
are shared/retweeted, and therefore do not capture how many users
actually read the articles linked in these tweets.
To address this problem, in this paper,
we first develop a novel measurement methodology,
which combines the Twitter steaming API, the Bitly API,
and careful sample rate selection to simultaneously collect and analyze
the timeline of both the number of retweets and clicks generated by news article links.
Second, we present a temporal analysis of the news cycle
based on five-day-long traces (containing both clicks and retweet over time)
for the news article links discovered during a seven-day period.
Among other things, our analysis highlights differences in the relative timelines
observed for clicks and retweets (e.g., retweet data often lags
and underestimates the bias towards reading popular links/articles),
and helps answer important questions regarding differences in how
age-based biases and churn affect how frequently news articles shared on Twitter
are accessed over time.
Our temporal findings are shown to be consistent both when comparing data collected a year apart
(2017 vs 2018) and across articles published on news websites with vastly different characteristics.
Software and datasets
To help build upon our work, below, we make available code and example datasets.
-
Sotware:
The software and code used in our paper is made available
here (7.6 MB)
for use by the wider research community.
(The file contains commented source codes and a
README file which should help in getting started with the files.)
-
Datasets:
An example dataset with the longitudinal click data from 2018 can be found here:
2018 click data.
The fields here conists of
(1) the date that article was posted (when easily extracted from URL)
(2) the long URL,
(3) whether the article was classified as "biased" (fake vs real) using a very simple
Naive Bayes classifier (this classification was not used in the paper - please ignore),
(4) whether the link was in the "random" set,
(5) whether the link is outside the "top" set ("bottom" part),
(6) the number of followers,
(7) the number of retweets,
(8) the start date+time when the link was first observed in the tweet stream,
(9-69) the total number of clicks accumulated up-to-and-including the T-th hour, where T=0,2,4,6,...120,
(70) the fraction of clicks obtained after the "0-hour" measurement,
(71) a "dummary marker" (yes/no) that we used with a prior example methodology (please ignore).
Additional datasets may be added later.
Note: If you use our datafiles or code in your research,
please include a reference to our ASONAM 2019 paper
(pdf) in your work.