Tesseract Training on Mac OSXNovember 28, 2015
The process of training Tessaract OCR on Mac OSX can be pretty confusing. The Tessaract documentation isn't great and most of the existing blog posts and information online refer to training Tesseract on Linux or Windows. My hope is to document my process from start to finish to give others a more complete guidance as they navigate through the process.
Start by installing Tesseract using homebrew with training tools. Until a recent contribution by Ryan Baumann that used to involve building from source, but thanks to him we can just use:
brew install --with-training-tools tesseractThere can be a few gotchas (particularly with font locations --fonts_dir) so be sure to refer to his post for more information. (Article)
Next make sure you've got the latest version of Java SE Development Kit installed on your mac. (Latest version at time of writing)
Then download jTessBoxEditor which is a training software we'll be using. Download the latest version with the format jTessBoxEditor-version.zip. (jTessBoxEditor Download)
Then start jTessBoxEditor by running the following command in the unzipped jTessBoxEditor directory:
java -Xms4096m -Xmx4096m -jar jTessBoxEditor.jar