Text and Data Mining Basics
Text and data mining (TDM) is a research technique used to discover and extract patterns in large data sets. Resources that TDM can be applied to include articles, newspapers, books, websites, social media sites, and more. Types of analysis will vary based on individual projects and could focus on a variety of things such as sentiment analysis, common phrases, word frequency, or word associations.
TDM Project Examples
-
An Epidemiology of Information: A Digging into Data Challenge ProjectThis project, done by the Department of History at Virginia Tech, uses text and data mining to explore how newspapers may have shaped public opinion of the 1918 influenza pandemic.
-
Covid Sentiment Analysis on TwitterThe purpose of this project was to identify public sentiment trends regarding Covid-19 in both Washington and Florida, and further determine how these sentiments may have impacted the spread of the disease.
-
Gender and Authorship (1665 - 2011)This project, from the University of Washington, used text and data mining and JSTOR to look at gender differences in authorship across several academic disciplines.
-
Rescued HistoryThis project, from the University of Illinois at Urbana-Champaign, used topic modeling and data visualization to analyze hundreds of thousands of documents from JSTOR and the HathiTrust in order to learn more about the historical experiences of African American women.
-
Six Basic Emotional Arcs of StorytellingAndrew Reagan and a group of scientists at the Computational Story Laboratory in Vermont used data mining on over 1700 stories to establish the six most commonly used emotional arcs in storytelling.