Tesseract-OCR quick guide
1. 安装
Github Repository: https://github.com/tesseract-ocr/tesseract
Binary Packages Download: https://digi.bib.uni-mannheim.de/tesseract/
Other installing: https://tesseract-ocr.github.io/tessdoc/Compiling.html
- 1.1 apt安装
sudo apt install tesseract-ocr
# Default installing to: /usr/share/tesseract-ocr/4.00/tessdata/
- 1.2 其他安装
centos安装(因为tesseract官方只提供 windows 和 ubuntu 的安装包),centos等只能编译安装,若不想折腾就用现成的docker镜像吧: https://github.com/tesseract-shadow/tesseract-ocr-re
docker pull tesseractshadow/tesseract4re.
docker run -dt --name t4re tesseractshadow/tesseract4re
- 1.2 检查安装
tesseract --version
tesseract --list-langs
2. Examples:
- 2.1 Importing traingdata files.
https://github.com/tesseract-ocr/tessdata_fast
cd $TESSERACT_HOME/configs
curl -O https://raw.githubusercontent.com/tesseract-ocr/tessdata_fast/master/chi_sim.traineddata
- 2.2 Invoking image recognition:
tesseract 1.png . -l configs/chi_sim