OCR:修订间差异
标签:2017版源代码编辑 |
标签:2017版源代码编辑 |
||
(未显示同一用户的1个中间版本) | |||
第1行: | 第1行: | ||
= Tesseract OCR = | = Tesseract OCR = | ||
== Build tesseract == | == Build tesseract == | ||
Prerequisits<ref>https://stackoverflow.com/questions/33659458/tesseract-image-issue</ref>: | |||
<syntaxhighlight lang="bash"> | |||
brew install libgif libjpeg libpng libtiff zlib | |||
# Error: xz: undefined method `deny_network_access!' for Formulary::FormulaNamespaceeddce1918855a2fb5cf7427fd5438072::Xz:Class | |||
# then comment this line `# deny_network_access! [:build, :postinstall]` | |||
</syntaxhighlight> | |||
Install leptonica<ref>https://github.com/DanBloomberg/leptonica</ref>: | Install leptonica<ref>https://github.com/DanBloomberg/leptonica</ref>: |
2024年12月2日 (一) 09:09的最新版本
Tesseract OCR
Build tesseract
Prerequisits[1]:
brew install libgif libjpeg libpng libtiff zlib
# Error: xz: undefined method `deny_network_access!' for Formulary::FormulaNamespaceeddce1918855a2fb5cf7427fd5438072::Xz:Class
# then comment this line `# deny_network_access! [:build, :postinstall]`
Install leptonica[2]:
# https://stackoverflow.com/questions/40067547/glibtool-on-macbook
brew install libtool automake
git clone https://github.com/DanBloomberg/leptonica.git
cd leptonica
./autogen.sh
./configure
make
sudo make install
Install tesseract-ocr:
git clone https://github.com/tesseract-ocr/tesseract.git
cd tesseract
./autogen.sh
./configure
make
sudo make install
Trained data
We have three sets of official .traineddata files trained at Google, for tesseract versions 4.00 and above. These are made available in three separate repositories.
- tessdata_fast (Sep 2017) best “value for money” in speed vs accuracy, Integer models.
- tessdata_best (Sep 2017) best results on Google’s eval data, slower, Float models. These are the only models that can be used as base for finetune training.
- tessdata (Nov 2016 and Sep 2017) These have legacy tesseract models from 2016. The LSTM models have been updated with Integer version of tessdata_best LSTM models. (Cube based legacy tesseract models for Hindi, Arabic etc. have been deleted).
git clone https://github.com/tesseract-ocr/tessdata_best.git