TDM Concepts
-
Tidy DataA paper focusing on "data tidying", the act of structuring datasets to facilitate analysis.
Text Analysis Methods
-
Digital Methods and ToolsA helpful guide to the most common text mining digital methods and tools, including text analysis and visual presentation and analysis. This resource also includes information related to data cleaning and management, links to more example TDM projects, and TDM related platforms and software packages.
TDM Tools
-
Apache OpenNLPA machine learning based toolkit for the processing of natural language text.
-
Beautiful Soup Library for PythonA Python library that makes it easier to pull out data from HTML or XML files.
-
OpenRefineAn open source tool that can be used to clean up data and transform one format to another.
-
PythonPython is a general purpose programming language that can be used to build websites and analyze data.
-
RR is a free software environment for statistical computing and graphics.
-
spaCyA free open-source library for Natural Language Processing in Python.
-
WordStatA content analysis and text mining software.
TDM Tutorials
-
CodecademyA free course that covers learning how to code Python for data extraction.
-
Data Mining in PythonAn introduction to data mining using the programming language Python. Includes aspects of dimensionality reduction, clustering, k-Means, DBSCAN, and more.
-
Data Mining in RTutorial focusing on data mining in R using a broad range of algorithms including machine learning methods. This resources also offers information on laws and policies that affect data mining.
-
Plotting and Programming in PythonAn introductory lesson in programming with Python 3 for individuals who have little to no knowledge of programming.
-
The Programming HistorianA variety of lessons covering text and data mining, Python, data manipulation and management, and more.
-
Text Mining in RA short video from the "Data Mining in R" tutorial.
-
Tools for Data MiningA short video from the "Data Mining in Python" tutorial covering popular text and data mining tools.