Text analysis
Basic workflow for text analysis
Obtain your text sources Text data can come from lots of areas: Web sites Twitter Databases PDF documents Digital scans of printed materials The easier to convert your text data into digitally stored text, the cleaner your results and fewer transcription errors.
Practicing tidytext with song titles
library(tidyverse) library(acs) library(tidytext) library(here) set.seed(1234) theme_set(theme_minimal()) Run the code below in your console to download this exercise as a set of R scripts. usethis::use_course("cis-ds/text-analysis-fundamentals-and-sentiment-analysis") Today let’s practice our tidytext skills with a basic analysis of song titles.
Practicing sentiment analysis with Harry Potter
library(tidyverse) library(tidytext) library(harrypotter) set.seed(1234) theme_set(theme_minimal()) Run the code below in your console to download this exercise as a set of R scripts. usethis::use_course("cis-ds/text-analysis-fundamentals-and-sentiment-analysis") Load Harry Potter text Run the following code to download the harrypotter package:
Practicing tidytext with Hamilton
library(tidyverse) library(tidytext) library(ggtext) library(here) set.seed(123) theme_set(theme_minimal()) About seven months ago, my wife and I became addicted to Hamilton. My name is Alexander Hamilton I admit, we were quite late to the party.
Supervised classification with text data
library(tidyverse) library(tidymodels) library(tidytext) set.seed(1234) theme_set(theme_minimal()) A common task in social science involves hand-labeling sets of documents for specific variables (e.g. manual coding). In previous years, this required hiring a set of research assistants and training them to read and evaluate text by hand.
Predicting song artist from lyrics
library(tidyverse) library(tidymodels) library(stringr) library(textrecipes) library(themis) library(vip) set.seed(123) theme_set(theme_minimal()) Run the code below in your console to download this exercise as a set of R scripts. usethis::use_course("cis-ds/text-analysis-classification-and-topic-modeling") Beyoncé and Taylor Swift at the 2009 MTV Video Music Awards.
Topic modeling
library(tidyverse) library(tidymodels) library(tidytext) library(textrecipes) library(topicmodels) library(here) library(rjson) library(tm) library(tictoc) library(appa) set.seed(1234) theme_set(theme_minimal()) Typically when we search for information online, there are two primary methods: Keywords - use a search engine and type in words that relate to whatever it is we want to find Links - use the networked structure of the web to travel from page to page.