Data + Science

2/22/2021
Get Twitter Data Using R and Tableau

During the 2020 Tableau Conference-ish this year, I published a viz called #DATA20 BY THE MINUTE where I visualized tweets with the #data20 hashtag counted by the minute. To do this, I used R to collect the data from Twitter. I thought others might find this useful in tracking their own tweets, hashtags or other people and topics, so here is a quick blog post on how to get this data and bring it into Tableau.

Initiate the Code

To use this code, you will need a Twitter handle and to set up a Twitter Developer App (free) here. After creating an app, you will get an API Key, API Secret Key and Bearer Token. We need these three to execute the code that downloads the data. Note: scroll to the bottom of this blog post if you want to copy and paste all of the code at once.

## General Info vignette(“auth”, package = “rtweet”) ## install dev version of rtweet from github remotes::install_github(“ropensci/rtweet”) ## install httpuv if not already if (!requireNamespace(“httpuv”, quietly = TRUE)) { install.packages(“httpuv”) } ##Install these packages install.packages(“rtweet”, dependencies=TRUE) install.packages(“jsonlite”) install.packages(“stringr”) library(jsonlite) library(rtweet) ## name of twitter app and API settings. app_name <- “[Your Twitter App Name]” consumer_key <- “[Your Consumer Key]” consumer_secret <- “[Your Consumer Secret]” bearer_token <- “[Your Bearer Token]” ## create token token <- create_token(app_name, consumer_key, consumer_secret) ## print token (just to make sure it’s working) token

Note: In the R code above, replace the consumer_key, consumer_secret, and bearer_token with your own inside the quotes (without the brackets). Every request sent to Twitter must include a token so you should store it as an environment variable.

## save token to home directory path_to_token <- file.path(path.expand(“~”), “.twitter_token.rds”) saveRDS(token, path_to_token) ## create env variable TWITTER_PAT (with path to saved token) env_var <- paste0(“TWITTER_PAT=”, path_to_token) ## save as .Renviron file (or append if the file already exists) cat(env_var, file = file.path(path.expand(“~”), “.Renviron”), fill = TRUE, append = TRUE) ## refresh .Renviron variables readRenviron(“~/.Renviron”)

Get the #data20 Hashtag

After setting up the initiating codes, the next step is the code to collect the data. The sample code below will search tweets for the hashtag #data20 and return 25,000 results, not including retweets.

##https://public.tableau.com/profile/jeffrey.shaffer#!/vizhome/DATA20BYTHEMINUTE/DATA20Hashtag rt <- search_tweets( “#data20”, n = 25000, include_rts = FALSE ) location <- users_data(rt)

After the data is collected, the next bit of code will write the data to a CSV. Replace the path and file name below to your desired location.

## Change your file paths below as needed fwrite(rt, file =”D:\\Dropbox\\Data20 Tweets.csv”) fwrite(location, file =”D:\\Dropbox\\Data20 Locations.csv”) 

The final output is two CSV files, one with the tweets, and the other location file with the user information. You can create a relationship (noodle) or join them together in Tableau using the field user id.

Read Twitter Status IDs and Look up Tweets

Another useful tool is looking up specific Twitter Status IDs and the tweets. For example, I used this technique to track the activity of my Tableau Tips last year. I published 194 tips and wanted to see what the most favorited tips were at the end of the year. To do this, I used a Google Sheet that had a list of all of the Tweets, specifically the Tweet Status ID.

In the R code below, I read these Status IDs from a Google Sheet into R, then look up each of them to gather the information about each Tweet. In this case, the Status ID is at the end of the URL, so there is a line of code that parses the Status_ID from the URL link. If you had a simple list of just the Status-IDs that you wanted to track then you wouldn’t need to parse them out of the URL.

## Ex. https://docs.google.com/spreadsheets/d/15_ikKizR52ugsZW2G0pCcmVW1_2qWzOd6dkGg9bBJuU/edit?usp=sharing #install.packages(‘gsheet’) library(gsheet) library(stringr) tipsheet <- gsheet2tbl(‘docs.google.com/spreadsheets/d/15_ikKizR52ugsZW2G0pCcmVW1_2qWzOd6dkGg9bBJuU’) ##Parse the Status_Id from the URL Link tipsheet$StatusID <- substr(tipsheet$Link,43,str_length(tipsheet$Link)-5) status_ids <- tipsheet$StatusID ##Lookup Tweets by Status ID twt <- lookup_tweets(status_ids,token = bearer_token()) ##Save data table to CSV (change your file path below as needed) library(data.table) fwrite(twt, file =”D:\\Dropbox\\TableauTip.csv”)

The rtweet package in R

There are a number of other tools available in the rtweet package. For example, you can get followers, mentions, favorites and timelines of a user. You can download members or subscribers of a list. You can retrieve the direct messages you have sent or received. You can download trends on Twitter, globally, using a city name, or even a longitude and latitude. For more information and sample code for doing some of these other things, check out the documentation on the rtweet package here.

Below is all of the code used for this project as a quick reference for you to copy and paste.

## General Info vignette(“auth”, package = “rtweet”) ## install dev version of rtweet from github remotes::install_github(“ropensci/rtweet”) ## install httpuv if not already if (!requireNamespace(“httpuv”, quietly = TRUE)) { install.packages(“httpuv”) } ##Install these packages install.packages(“rtweet”, dependencies=TRUE) install.packages(“jsonlite”) install.packages(“stringr”) library(jsonlite) library(rtweet) ## name of twitter app and API settings app_name <- “[Your Twitter App Name]” consumer_key <- “[Your Consumer Key]” consumer_secret <- “[Your Consumer Secret]” bearer_token <- “[Your Bearer Token]” ## create token token <- create_token(app_name, consumer_key, consumer_secret) ## print token (just to make sure it’s working) token ## save token to home directory path_to_token <- file.path(path.expand(“~”), “.twitter_token.rds”) saveRDS(token, path_to_token) ## create env variable TWITTER_PAT (with path to saved token) env_var <- paste0(“TWITTER_PAT=”, path_to_token) ## save as .Renviron file (or append if the file already exists) cat(env_var, file = file.path(path.expand(“~”), “.Renviron”), fill = TRUE, append = TRUE) ## refresh .Renviron variables readRenviron(“~/.Renviron”) ## Ex. https://docs.google.com/spreadsheets/d/15_ikKizR52ugsZW2G0pCcmVW1_2qWzOd6dkGg9bBJuU/ #install.packages(‘gsheet’) library(gsheet) library(stringr) tipsheet <- gsheet2tbl(‘docs.google.com/spreadsheets/d/15_ikKizR52ugsZW2G0pCcmVW1_2qWzOd6dkGg9bBJuU’) ##Parse the Status_Id from the URL Link tipsheet$StatusID <- substr(tipsheet$Link,43,str_length(tipsheet$Link)-5) status_ids <- tipsheet$StatusID ##Lookup Tweets by Status ID twt <- lookup_tweets(status_ids,token = bearer_token()) ##Save data table to CSV (change your file path below as needed) library(data.table) fwrite(twt, file =”D:\\Dropbox\\TableauTip.csv”) ##This was the code to get the #data20 hashtag for my viz ##https://public.tableau.com/profile/jeffrey.shaffer#!/vizhome/DATA20BYTHEMINUTE/DATA20Hashtag rt <- search_tweets( “#data20″, n = 25000, include_rts = FALSE ) location <- users_data(rt) ## Change your file paths below as needed fwrite(rt, file =”D:\\Dropbox\\Data20 Tweets.csv”) fwrite(location, file =”D:\\Dropbox\\Data20 Locations.csv”)

I hope you find this information useful. If you have any questions feel free to email me at Jeff@DataPlusScience.com

Jeffrey A. Shaffer

Follow on Twitter @HighVizAbility

Source link

Post Views: 93

Data + Science

Data + Science

Initiate the Code

Get the #data20 Hashtag

Read Twitter Status IDs and Look up Tweets

The rtweet package in R

Leave a comment Cancel reply