READ ME

This text describes the data presented in the paper: A hierarchical model of non-homogeneous Poisson processes for Twitter retweets

========================
Introductory information
========================
Files included in the data deposit (include a short description of what data are contained): 

1) 20170614_original_a.csv: temporal data for original tweets unretweeted during data collection period
2) 20170614_original_b.csv: temporal data for retweeted original tweets and their retweets
3) 20170614_range.csv: range of data collection period
4) 20170716_original_a.csv: similar to 20170614_original_a.csv but for a different date & hashtag
5) 20170716_original_b.csv: similar to 20170614_original_b.csv but for a different date & hashtag
6) 20170716_range.csv: similar to 20170614_range.csv but for a different date & hashtag



Explain the relationship between multiple data sets, if required:

None



Key words used to describe the data:

Twitter, tweets, retweets, temporal data



========================== 
Methodological information
==========================
A brief method description – what the data is, how and why it was collected or created, and how it was processed:

The data contains the time stamps of original tweets on Twitter as well as their associated retweets, for #thehandmaidstale on 2017-06-14 and for #gots7 on 2017-07-16. The raw data was created by collecting live tweets and retweets on Twitter on the aforementioned dates, before being cleaned and converted into the deposited comma-separated values (csv) files.



Instruments, hardware and software used:

Docker and python for raw data collection; R for data cleaning and conversion



Date(s) of data collection:

2017-06-14; 2017-07-16



Geographic coverage of data:

None



Data validation (how was the data checked, proofed and cleaned):

R scripts were created to clean the data without altering the raw data files. The scripts were run multiple times to generate the same csv files that are deposited, thus achieving reproducibility.



Overview of secondary data, if used:

None



=========================
Data-specific information
=========================
Definitions of names, labels, acronyms or specialist terminology uses for variables, records and their values:

For each <<date>>_range.csv:
1) t_0: beginning of data collection
2) t_inf: duration of data collection in seconds; essentially, end of data collection is (t_0 + t_inf)

For each <<date>>_original_a.csv or <<date>>_original_b.csv:
1) id_str: unique ID of original tweet
2) user_followers_count: follower count of the user who authored the original tweet at its creation
3) retweet_count: retweet count of the original tweet at the end of data collection
4) t_i: creation time of original tweet relative to t_0, in seconds
5) t_ij: creation time of retweet relative to t_0, in seconds; t_ij is always greater than or equal to the corresponding t_i
6) retweeted: whether the original tweet was ever retweeted during data collection
7) log_followers_count: log(1+user_followers_count)
8) log_retweet_count: log(1+retweet_count)



Explanation of weighting and grossing variables:

Not applicable



Outline any missing data:

None



=======
Contact
=======
Please contact rdm@ncl.ac.uk for further information