It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.
Tesseract OCR Software Tutorial
A step-by-step guide for users to learn how to use Tesseract open-source software for performing optical character recognition (OCR) on a text corpus.
Tesseract requires high-quality photos with prominent text. Converting the file to a TIFF already takes care of some of the image improvement, but if your image is skewed or difficult to see, it is best to edit the image beforehand.
ImageMagick is a free software ideal for transforming images on the command line. Some of the transformations can be done during the time you convert the image to TIFF as in the previous section. One of the expansions of ImageMagick is textcleaner, which is a robust tool for editing images within the command line. It can process scanned documents and clean up the background and other aspects of the image. With scripts, there is an order of usage depending on how you want to manipulate the image.
Note that this command assumes that your current working directory contains the textcleaner utility script.
Note : not all parameters will be necessary every time. Each parameter has a default value which means you don't have to specify it each time.
Here is a list of what each parameter means and the values that can be input:
r cw or ccwor n
rotates the image clockwise (cw) or counterclockwise (ccw) by 90 degrees; the default value is n for no rotation
l p or l
determines whether the document will be portrait or landscape; the default value is p for portrait
cropping offset after rotating an image; one number crops all around, two numbers crop tops and sides, respectively, and all four values determine how much is cropped left, top, right, and bottom, respectively
e noneor stretch or normalize
f integer > 0
size of the filter used to clean background
o integer >= 0
an offset of filter that reduces noise
text smoothing threshold
s float >= 0
how much to sharpen the image by in pixels
S integer >= 0
anumber >= 0
alternate text smoothing
trims background around the outer part of the image
p integer >= 0
adds a border
changes background color; the default color is white
F integer >= 0
fuzz value for determining the background color
i1 or 2
one-way or two-way inversions; the default is no inversion
Using the CWS Toolkit image that has not been scanned and is askew, we can input this code to clean it up: