Skip to Main Content

Text Data Mining

Resources for working with text as data, including corpus preparation, tutorials, data sources, and lists of tools.

What is Text & Data Mining?

Data mining is a research technique using computational analysis to uncover patterns in large data sets.  Data mining techniques range from machine learning applications, to GIS and mapping, to business intelligence. The range of data types makes data mining techniques harder to pin down.

Text mining is the process of deriving information from textual data. Text mining techniques might include sentiment analysis, network analysis, word frequency distributions, pattern recognition, tagging/annotation, information extraction, and the production of granular taxonomies or ontologies.

This kind of analytic tool is useful in numerous scholarly fields, from the humanities to the sciences, where useful data can be "mined" from large non-text datasets and from text databases of the published literature (Source: UMass Amherst Libraries).

Questions? Contact us by emailing data.services@nyu.edu, or fill out our consultation request form and we'll get back to you.

A warning on copyright!

Before grabbing all the data you can, you need to check the copyright and policies of the database, website, or social media platform you plan on mining. Many platforms, including library systems, do NOT allow users to mine their materials. For more information, check out the Libraries guide on Applying Fair Use.

Examples of TDM Projects