OCR:修订间差异

来自WHY42
Riguz留言 | 贡献
标签2017版源代码编辑
Riguz留言 | 贡献
标签2017版源代码编辑
第29行: 第29行:
== Trained data ==
== Trained data ==


We have three sets of official .traineddata files trained at Google, for tesseract versions 4.00 and above. These are made available in three separate repositories.


* tessdata_fast (Sep 2017) best “value for money” in speed vs accuracy, Integer models.
* tessdata_best (Sep 2017) best results on Google’s eval data, slower, Float models. These are the only models that can be used as base for finetune training.
* tessdata (Nov 2016 and Sep 2017) These have legacy tesseract models from 2016. The LSTM models have been updated with Integer version of tessdata_best LSTM models. (Cube based legacy tesseract models for Hindi, Arabic etc. have been deleted).


<syntaxhighlight lang="bash">
git clone https://github.com/tesseract-ocr/tessdata_best.git
</syntaxhighlight>
[[Category:Deep Learning]]
[[Category:Deep Learning]]

2024年12月2日 (一) 03:04的版本

Tesseract OCR

Build tesseract

Install leptonica[1]:

# https://stackoverflow.com/questions/40067547/glibtool-on-macbook
brew install libtool automake

git clone https://github.com/DanBloomberg/leptonica.git
cd leptonica
./autogen.sh
./configure 
make 
sudo make install

Install tesseract-ocr:

git clone https://github.com/tesseract-ocr/tesseract.git
cd tesseract
./autogen.sh
./configure
make
sudo make install

Trained data

We have three sets of official .traineddata files trained at Google, for tesseract versions 4.00 and above. These are made available in three separate repositories.

  • tessdata_fast (Sep 2017) best “value for money” in speed vs accuracy, Integer models.
  • tessdata_best (Sep 2017) best results on Google’s eval data, slower, Float models. These are the only models that can be used as base for finetune training.
  • tessdata (Nov 2016 and Sep 2017) These have legacy tesseract models from 2016. The LSTM models have been updated with Integer version of tessdata_best LSTM models. (Cube based legacy tesseract models for Hindi, Arabic etc. have been deleted).
git clone https://github.com/tesseract-ocr/tessdata_best.git