Skip to Main Content

Qualitative Data Analysis: Text as Data Workshop

This guide provides information and guidance for researchers interested in conducting qualitative data analysis.

Text as Data Workshop

Scatter plot with blue and red terms on a white background illustrating a text mining diagram.


An introduction to the text analysis for literature with basic introduction to software packages. This workshop is an introduction to working with text as data in the humanities. This workshop will cover:

  • gathering text corpora,
  • copyright considerations
  • data cleaning,
  • an introduction to some computational software tools,
  • reading the output and analysis of topic modeling and cluster analysis, and
  • general overview of common questions asked in computational literary studies.

Presented with grateful consultation from Dr. David Hoover, NYU English Department. 

General Information

NYU Data Services: NYU Libraries and Information Technology logo.




For assistance, reach out by chat below or submit a request

We can be reached by email at

Join our Discord server

If you've met with us before, tell us how we're doing

Service Desk and Chat

Staffed Hours: Spring 2023
   Mondays:        12pm - 5pm
   Tuesdays:       12pm - 5pm
   Wednesdays: 12pm - 5pm
   Thursdays:     12pm - 5pm
   Fridays:          12pm - 5pm

chat loading...

Class Materials


  • A Companion to Digital Literary Studies, ed. Susan Schreibman and Ray Siemens. Oxford: Blackwell, 2008. 
  • Cordell, Ryan. “‘Q i-jtb the Raven’: Taking Dirty OCR Seriously,” Book History 20 (2017), 188-225.
  • Eder, Maciej, Jan Rybicki, and Mike Kestemont. “Stylometry with R: A Package for Computational Text Analysis.” The R Journal, 2016, 8(1): 107-21.
  • Fish, Stanley. “Mind Your P’s and B’s: The Digital Humanities and Interpretation,” The New York Times, January 23, 2012.
  • Hoover, David L. “Argument, Evidence, and the Limits of Digital Humanities,” from Debates in the Digital Humanities 2016, ed. Matthew K. Gold and Lauren F. Klein. Minneapolis, MN: Univ. of Minnesota Press, 2016.
  • Hoover, David L., “Modes of Composition in Henry James: Dictation, Style, and What Maisie Knew,” Henry James Review 35(3), 2014: 257-77. 
  • Hoover, David L. “Textual Analysis,” in the MLA volume Literary Studies in the Digital Age.
  • Nair, Goutam. “Text Mining 101.” KDnuggets: Analytics, Big Data, Data Mining, and Data Science, KDnuggets, July 2016.
  • Graham, Shawn, et al. “Getting Started with Topic Modeling and MALLET.” Programming Historian, 2 Sept. 2012.