Skip to main content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

ABBYY FineReader Tutorial

A guide to using ABBYY FineReader for text extraction from documents using OCR (Optical Character Recognition)

Welcome

ABBYY FineReader is an OCR (optical character recognition) software that allows for the conversion of images of text documents and tables into editable, machine-readable text formats. With ABBYY one is also able to convert image files and non-searchable PDFs into popular office formats such as .docx, .pptx, and searchable PDFs. The program allows for the recognition of nearly one hundred languages, and it can work with multilingual documents.

The goals of this guide are to learn how to:

  • Provide documents suitable for working with ABBYY FineReader
  • Program ABBYY FineReader to detect and reproduce simple and complex documents.
  • Train the program to recognize characters and create a user pattern for a more thorough output.
  • Check and edit the text output.
  • Export the results into the user’s desired format (.txt, .html, .docx, .pdf, .csv, etc.)

Where to Access ABBYY at NYU

Members of the NYU community with access to Bobst Library can access ABBYY at the following workstations:

  • Mac workstations at the Digital Studio (Mac version of ABBYY only) connected to the feeder scanner workstations (north wall of Studio)
  • Windows OS workstations 16-19 (walk in or reservable) at the Data Services Research Commons
  • Windows OS workstations in room 617 (by reservation only for group/team use; contact data.services@nyu.edu)

 

Note that the Windows license of ABBYY currently has a more expansive and developed interface. The tutorial that follows is based on that version.

Researchers also have the option of purchasing a license for individual use. ABBYY offers an educational license at a discount.