Skip to Main Content

ABBYY FineReader Tutorial

A guide to using ABBYY FineReader for text extraction from documents using OCR (Optical Character Recognition).

Adjusting Areas

Step 5. Adjusting Existing Text Areas

Normally, ABBYY will make a text area solely around text, while new paragraphs will sometimes be in different boxes. If the boxes are of the same type, you can just take one box and expand it to cover everything by clicking and dragging at the corners of a box.

Here is a screen shot of ABBYY's first-pass capture of segments of a page:

Screen capture of the ABBYY interface showing the premade boxes around text or image areas that are created by default when a new document is first loaded


This screen shot shows the result of multiple text areas combined into one:

Screen capture of the ABBYY interface showing the result when a user has combined multiple capture boxes into just a few.

 

The default shape of an area is a rectangle, but sometimes parts of a document you wish to recognize cannot fit in a simple rectangle. Let’s say you want all the text to be in one text box, as in this screen shot of irregular text that is not easily capture by a rectangular text-area box:

Screen capture of the ABBYY inteface showing the difficulty of merging capture-area boxes when the text is an irregular shape.

Expanding as in the previous method will not work because of the irregular shape. If you click on the area you wish to expand, a floating toolbar will appear that looks like this image:

Screen capture of the popup ribbon toolbox that appears when a text-capture area is highlighted.

The two dashed boxes with the plus and minus sign are for creating and deleting areas in relation to the text box you clicked on. If you press on the dashed box with the plus sign, you can create a text box that can connect the disconnected ones.

Note: Merging areas that are arranged side-by-side would lead the text pane to output the text in one straight line, so if you want two separate columns, be sure to have two separate text areas.

Again, the zoom pane can be used for more precise adjustments of the areas. The same will work for picture areas.

Step 6. Working with Tables

For tables, however, there are different options that pop up when one clicks on the text-recognized area. Here is a screen shot of the ribbon toolbar that appears:

Screen capture of the ABBYY ribbon toolbar that pops up when a table capture box is highlighted.

This ribbon menu will let you separate parts of the table into rows and columns, delete separators, and analyze the area created into a table so you don’t manually have to do it.


As you can see in this image, ABBYY did not recognize this page of the document as a table despite its table-like layout into rows and two columns:

Screen capture of ABBYY interface showing what happens when the software fails to adequately capture a table on a page and needs user intervention to improve the capture.


Using the Table tool from the toolbar, we can create a table area, as in this image showing the blue text area drawn over the table area:

Screen capture showing a blue table area drawn around a table in a text.


Because the table has not been separated into any columns or rows, we must do this manually using the pop-up toolbar.

Select the tool option showing a table with the wand in front. ABBYY will try to guess where the lines would be and will draw them within the blue table recognition area, as in this screen shot:

Screen capture of ABBYY interface showing a properly created table capture area with divisions where table rows and columns are.

Now the software has placed columns and rows mostly where they should be. Still, there are a couple of mistakes. If you look in the zoom pane, ABBYY has created an extra row where there shouldn’t be.

In this case, you select from the pop-up toolbar the icon with the red X. Moving your cursor to the line you want deleted and click on it. This will delete the dividing line.

Dedicating time in ABBYY to adjusting the locations and number of recognition areas that the program attempts based on auto-detection pays off. It leads to much better results and less time spent in post-processing fixes to output.