Skip to Main Content

Tesseract OCR Software Tutorial

A step-by-step guide for users to learn how to use Tesseract open-source software for performing optical character recognition (OCR) on a text corpus.

Software to Install and Items to Download

Helpful Utilities for Preparing Documents

There are a number of helpful utilities for preparing document files for use in Tesseract. Many standard image manipulation tools (Adobe, for example) can be used. The list below are open source and work well on Mac environments. Installation for many can be done on the command line.

Sample Documents for Learning Tesseract​​

When downloading these documents, be mindful to where in your files they will be located and if you changed the name of the file.