Tesseract requires high-quality photos with prominent text. Converting the file to a TIFF already takes care of some of the image improvement, but if your image is skewed or difficult to see, it is best to edit the image beforehand.
ImageMagick is a free software ideal for transforming images on the command line. Some of the transformations can be done during the time you convert the image to TIFF as in the previous section. One of the expansions of ImageMagick is textcleaner, which is a robust tool for editing images within the command line. It can process scanned documents and clean up the background and other aspects of the image. With scripts, there is an order of usage depending on how you want to manipulate the image.
Note that this command assumes that your current working directory contains the textcleaner utility script.
./textcleaner [-r rotate] [-l layout] [-c cropoff] [-g] [-e enhance ] [-f filtersize] [-o offset] [-u] [-t threshold] [-s sharpamt] [-S saturation] [-a adaptblur] [-T] [-p padamt] [-b bgcolor] [-F fuzzval] [-i invert] input_file.* output_file.*
Note : not all parameters will be necessary every time. Each parameter has a default value which means you don't have to specify it each time.
Here is a list of what each parameter means and the values that can be input:
Using the CWS Toolkit image that has not been scanned and is askew, we can input this code to clean it up:
./textcleaner -c "50,250,190,250" -g -e stretch -t 30 -s 2 -u -T Path/to/document/cws_toolkits.jpg cws_toolkits.tiff
There are plenty of different ways you could have cleaned up the image. A setback with this method versus a regular image manipulator is that this requires a lot of trial and error.