Research Guides: Text Data Mining: Getting started

What is Text & Data Mining?

Data mining is a research technique using computational analysis to uncover patterns in large data sets. Data mining techniques range from machine learning applications, to GIS and mapping, to business intelligence. The range of data types makes data mining techniques harder to pin down.

Text mining is the process of deriving information from textual data. Text mining techniques might include sentiment analysis, network analysis, word frequency distributions, pattern recognition, tagging/annotation, information extraction, and the production of granular taxonomies or ontologies.

This kind of analytic tool is useful in numerous scholarly fields, from the humanities to the sciences, where useful data can be "mined" from large non-text datasets and from text databases of the published literature (Source: UMass Amherst Libraries).

Questions? Contact us by emailing data.services@nyu.edu, or fill out our consultation request form and we'll get back to you.

A warning on copyright!

Before grabbing all the data you can, you need to check the copyright and policies of the database, website, or social media platform you plan on mining. Many platforms, including library systems, do NOT allow users to mine their materials. For more information, check out the Libraries guide on Applying Fair Use.

Examples of TDM Projects

A comprehensive and quantitative comparison of text-mining in 15 million full-text articles versus their corresponding abstracts
Comparing abstracts and full-text articles.
Low Cost Text Mining as a Strategy for Qualitative Researchers (PDF)
Qualitative methods paper on how to use TDM for qualitative studies.
Practical Text Mining for Trend Analysis: Ontology to visualization in Aerospace Technology
Methods paper on how to use TDM for aerospace technology.
A Text Mining and Multidimensional Sentiment Analysis of Online Restaurant Reviews
Using TDM to analyze restaurant reviews.
Text Mining of Judicial System's Corpora via Clause Elements (PDF)
Law paper on how TDM can support law practitioner and research scholars to trace
desired information and identify all cases related to their relevant case.
Thumbs up?: sentiment classification using machine learning techniques
Computational linguistics project using TDM.
Transfer learning for biomedical named entity recognition with neural networks
Deep learning + biomedical project.

Communities and Initiatives at New York University

Non-NYU Communities and Projects

CC

Original work in this LibGuide is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

Text Data Mining

General Information

Service Desk and Chat

What is Text & Data Mining?

A warning on copyright!

Examples of TDM Projects

Communities and Initiatives at New York University

Non-NYU Communities and Projects

CC