Skip to Main Content

ABBYY FineReader Tutorial

A guide to using ABBYY FineReader for text extraction from documents using OCR (Optical Character Recognition).

The ABBYY OCR Interface

Step 1. Once the ABBYY FineReader 14 is opened, you will see a general navigation menu.

Here is a screen shot of how that menu appears:

Screen capture of the ABBYY startup dashboard.

Step 2. Select the Menu Option "Convert to Other Formats"

This will launch the image editor interface where adjustment of bounding boxes, decisions about output format, and language or font training can take place.

Here is a screenshot of the image editor interface:

Screen capture of the main interface used by ABBYY to show a document that a user wishes to apply optical character recognition to.

In the image editor interface the three main windows you will be working out of are:

  1. The image pane, located to the left on the screen,
  2. The text pane, located to the right, and
  3. The zoom pane located at the bottom of the screen.

The default language for ABBYY is English.

Step 3. Introduction to the Output Ribbon Toolbar

Above ABBYY’s original output in the right-hand pane, there is an icon for Microsoft Word and a toolbar for selecting the output type that you desire. Here is screen shot of that toolbar:

Screen capture of the small toolbar located at the top of the ABBYY document interface.

The default output for a file and can be changed from the dropdown menu to alternative formats such as .rtf and .txt. 

Next to that, there is a box that says Editable copy. This option along with the Send option controls how the output will look in the text pane window.

Clicking on the image icon will allow you to remove or include images in the output.

The icon to the right of that offers you the choice to keep or ignore headers and footers during the text recognition process.

The next section will include information on how to use the tools found in ABBY FineReader to identify text, image, and table areas that will be determine what content gets extracted from each page.