2/22/2021
Get Twitter Data Using R and Tableau
During the 2020 Tableau Conference-ish this year, I published a viz called #DATA20 BY THE MINUTE where I visualized tweets with the #data20 hashtag counted by the minute. To do this, I used R to collect the data from Twitter. I thought others might find this useful in tracking their own tweets, hashtags or other people and topics, so here is a quick blog post on how to get this data and bring it into Tableau.
Initiate the Code
To use this code, you will need a Twitter handle and to set up a Twitter Developer App (free) here. After creating an app, you will get an API Key, API Secret Key and Bearer Token. We need these three to execute the code that downloads the data. Note: scroll to the bottom of this blog post if you want to copy and paste all of the code at once.
## General Info
vignette(“auth”, package = “rtweet”)
## install dev version of rtweet from github
remotes::install_github(“ropensci/rtweet”)
## install httpuv if not already
if (!requireNamespace(“httpuv”, quietly = TRUE)) {
install.packages(“httpuv”)
}
##Install these packages
install.packages(“rtweet”, dependencies=TRUE)
install.packages(“jsonlite”)
install.packages(“stringr”)
library(jsonlite)
library(rtweet)
## name of twitter app and API settings.
app_name <- “[Your Twitter App Name]”
consumer_key <- “[Your Consumer Key]”
consumer_secret <- “[Your Consumer Secret]”
bearer_token <- “[Your Bearer Token]”
## create token
token <- create_token(app_name, consumer_key, consumer_secret)
## print token (just to make sure it’s working)
token
Note: In the R code above, replace the consumer_key, consumer_secret, and bearer_token with your own inside the quotes (without the brackets). Every request sent to Twitter must include a token so you should store it as an environment variable.
## save token to home directory
path_to_token <- file.path(path.expand(“~”), “.twitter_token.rds”)
saveRDS(token, path_to_token)
## create env variable TWITTER_PAT (with path to saved token)
env_var <- paste0(“TWITTER_PAT=”, path_to_token)
## save as .Renviron file (or append if the file already exists)
cat(env_var, file = file.path(path.expand(“~”), “.Renviron”),
fill = TRUE, append = TRUE)
## refresh .Renviron variables
readRenviron(“~/.Renviron”)
Get the #data20 Hashtag
After setting up the initiating codes, the next step is the code to collect the data. The sample code below will search tweets for the hashtag #data20 and return 25,000 results, not including retweets.
##https://public.tableau.com/profile/jeffrey.shaffer#!/vizhome/DATA20BYTHEMINUTE/DATA20Hashtag
rt <- search_tweets(
“#data20”, n = 25000, include_rts = FALSE
)
location <- users_data(rt)
After the data is collected, the next bit of code will write the data to a CSV. Replace the path and file name below to your desired location.
## Change your file paths below as needed
fwrite(rt, file =”D:\\Dropbox\\Data20 Tweets.csv”)
fwrite(location, file =”D:\\Dropbox\\Data20 Locations.csv”)
The final output is two CSV files, one with the tweets, and the other location file with the user information. You can create a relationship (noodle) or join them together in Tableau using the field user id.
Read Twitter Status IDs and Look up Tweets
Another useful tool is looking up specific Twitter Status IDs and the tweets. For example, I used this technique to track the activity of my Tableau Tips last year. I published 194 tips and wanted to see what the most favorited tips were at the end of the year. To do this, I used a Google Sheet that had a list of all of the Tweets, specifically the Tweet Status ID.
In the R code below, I read these Status IDs from a Google Sheet into R, then look up each of them to gather the information about each Tweet. In this case, the Status ID is at the end of the URL, so there is a line of code that parses the Status_ID from the URL link. If you had a simple list of just the Status-IDs that you wanted to track then you wouldn’t need to parse them out of the URL.
## Ex.
https://docs.google.com/spreadsheets/d/15_ikKizR52ugsZW2G0pCcmVW1_2qWzOd6dkGg9bBJuU/edit?usp=sharing
#install.packages(‘gsheet’)
library(gsheet)
library(stringr)
tipsheet <- gsheet2tbl(‘docs.google.com/spreadsheets/d/15_ikKizR52ugsZW2G0pCcmVW1_2qWzOd6dkGg9bBJuU’)
##Parse the Status_Id from the URL Link
tipsheet$StatusID <- substr(tipsheet$Link,43,str_length(tipsheet$Link)-5)
status_ids <- tipsheet$StatusID
##Lookup Tweets by Status ID
twt <- lookup_tweets(status_ids,token = bearer_token())
##Save data table to CSV (change your file path below as needed)
library(data.table)
fwrite(twt, file =”D:\\Dropbox\\TableauTip.csv”)
The rtweet package in R
There are a number of other tools available in the rtweet package. For example, you can get followers, mentions, favorites and timelines of a user. You can download members or subscribers of a list. You can retrieve the direct messages you have sent or received. You can download trends on Twitter, globally, using a city name, or even a longitude and latitude. For more information and sample code for doing some of these other things, check out the documentation on the rtweet package here.
Below is all of the code used for this project as a quick reference for you to copy and paste.
## install dev version of rtweet from github
remotes::install_github(“ropensci/rtweet”)
## install httpuv if not already
if (!requireNamespace(“httpuv”, quietly = TRUE)) {
install.packages(“httpuv”)
}
##Install these packages
install.packages(“rtweet”, dependencies=TRUE)
install.packages(“jsonlite”)
install.packages(“stringr”)
library(jsonlite)
library(rtweet)
## name of twitter app and API settings
app_name <- “[Your Twitter App Name]”
consumer_key <- “[Your Consumer Key]”
consumer_secret <- “[Your Consumer Secret]”
bearer_token <- “[Your Bearer Token]”
## create token
token <- create_token(app_name, consumer_key, consumer_secret)
## print token (just to make sure it’s working)
token
## save token to home directory
path_to_token <- file.path(path.expand(“~”), “.twitter_token.rds”)
saveRDS(token, path_to_token)
## create env variable TWITTER_PAT (with path to saved token)
env_var <- paste0(“TWITTER_PAT=”, path_to_token)
## save as .Renviron file (or append if the file already exists)
cat(env_var, file = file.path(path.expand(“~”), “.Renviron”),
fill = TRUE, append = TRUE)
## refresh .Renviron variables
readRenviron(“~/.Renviron”)
## Ex. https://docs.google.com/spreadsheets/d/15_ikKizR52ugsZW2G0pCcmVW1_2qWzOd6dkGg9bBJuU/
#install.packages(‘gsheet’)
library(gsheet)
library(stringr)
tipsheet <- gsheet2tbl(‘docs.google.com/spreadsheets/d/15_ikKizR52ugsZW2G0pCcmVW1_2qWzOd6dkGg9bBJuU’)
##Parse the Status_Id from the URL Link
tipsheet$StatusID <- substr(tipsheet$Link,43,str_length(tipsheet$Link)-5)
status_ids <- tipsheet$StatusID
##Lookup Tweets by Status ID
twt <- lookup_tweets(status_ids,token = bearer_token())
##Save data table to CSV (change your file path below as needed)
library(data.table)
fwrite(twt, file =”D:\\Dropbox\\TableauTip.csv”)
##This was the code to get the #data20 hashtag for my viz
##https://public.tableau.com/profile/jeffrey.shaffer#!/vizhome/DATA20BYTHEMINUTE/DATA20Hashtag
rt <- search_tweets(
“#data20″, n = 25000, include_rts = FALSE
)
location <- users_data(rt)
## Change your file paths below as needed
fwrite(rt, file =”D:\\Dropbox\\Data20 Tweets.csv”)
fwrite(location, file =”D:\\Dropbox\\Data20 Locations.csv”)
I hope you find this information useful. If you have any questions feel free to email me at Jeff@DataPlusScience.com
Jeffrey A. Shaffer
Follow on Twitter @HighVizAbility