Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Data Services Class Descriptions

Information, materials, and schedules for all currently offered Data Services classes
This course offers an introduction to extracting and organizing textual and tabular data using the Optical Character Recognition (OCR) softwares Tesseract and ABBYY FineReader. Use of OCR can significantly cut down on data entry and enables digital analysis of non-digital materials.
Software: ABBYY Finereader 14, Tesseract 3
Duration: 90 min

Room description:

During the Fall 2021 semester, some tutorials are held remotely and require NYU sign on to access, while others are held in person, without a remote component. Please note the correct modality and location of the tutorial when registering

Prerequisites: None
Skills Taught / Learning Outcomes:
  • Extract of text from images of documents using ABBYY and Tesseract
  • Training ABBYY to recognize new fonts from historical documents
  • Image preparation and conversion for OCR readers
Class Materials:

Slides: https://nyu-dataservices.github.io/ExtractingOCR/

Related Classes:

Introduction to ATLAS.ti

Introduction to NVivo

Introduction to R

Additional Training Materials:

Official how-to videos

NYU Libraries ABBYY guide

Feedback: bit.ly/feedbackds

Upcoming sessions for this tutorial