Petter Envall

〈 Back

Tesseract

Mar 01, 2015

I decided to check out the OCR reading software tesseract (and its source code).

Building it from source didn't exactly “just work”. These are some of the things I had to do:

  • Add a “tesseract prefix” path to my .profile:
export TESSDATA_PREFIX=/usr/local/share
  • Install the dependency Leptonica
  • To fix “missing image support”, configure Leptonica like this when building it:
$ ./configure LDFLAGS=-L/opt/local/lib/ CFLAGS=-I/opt/local/include/
  • Download and extract the english language pack (that was said to be included with the standard src, but hey), and copy its data to the “prefix” directory:
$ sudo mkdir /usr/local/share/tessdata  
$ sudo cp tesseract-ocr/tessdata/*.* /usr/local/share/tessdata/

Then, the following worked:

$ tesseract foo.png result