OCR:修订间差异
标签:2017版源代码编辑 |
标签:2017版源代码编辑 |
||
第29行: | 第29行: | ||
== Trained data == | == Trained data == | ||
We have three sets of official .traineddata files trained at Google, for tesseract versions 4.00 and above. These are made available in three separate repositories. | |||
* tessdata_fast (Sep 2017) best “value for money” in speed vs accuracy, Integer models. | |||
* tessdata_best (Sep 2017) best results on Google’s eval data, slower, Float models. These are the only models that can be used as base for finetune training. | |||
* tessdata (Nov 2016 and Sep 2017) These have legacy tesseract models from 2016. The LSTM models have been updated with Integer version of tessdata_best LSTM models. (Cube based legacy tesseract models for Hindi, Arabic etc. have been deleted). | |||
<syntaxhighlight lang="bash"> | |||
git clone https://github.com/tesseract-ocr/tessdata_best.git | |||
</syntaxhighlight> | |||
[[Category:Deep Learning]] | [[Category:Deep Learning]] |
2024年12月2日 (一) 03:04的版本
Tesseract OCR
Build tesseract
Install leptonica[1]:
# https://stackoverflow.com/questions/40067547/glibtool-on-macbook
brew install libtool automake
git clone https://github.com/DanBloomberg/leptonica.git
cd leptonica
./autogen.sh
./configure
make
sudo make install
Install tesseract-ocr:
git clone https://github.com/tesseract-ocr/tesseract.git
cd tesseract
./autogen.sh
./configure
make
sudo make install
Trained data
We have three sets of official .traineddata files trained at Google, for tesseract versions 4.00 and above. These are made available in three separate repositories.
- tessdata_fast (Sep 2017) best “value for money” in speed vs accuracy, Integer models.
- tessdata_best (Sep 2017) best results on Google’s eval data, slower, Float models. These are the only models that can be used as base for finetune training.
- tessdata (Nov 2016 and Sep 2017) These have legacy tesseract models from 2016. The LSTM models have been updated with Integer version of tessdata_best LSTM models. (Cube based legacy tesseract models for Hindi, Arabic etc. have been deleted).
git clone https://github.com/tesseract-ocr/tessdata_best.git