From: shamoon <4887959+shamoon@users.noreply.github.com> Date: Mon, 10 Apr 2023 21:04:30 +0000 (-0700) Subject: Add info re tesseract language codes X-Git-Tag: v1.14.0~9 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=d872423a7649540d5d23311bc8765b993329e1cc;p=thirdparty%2Fpaperless-ngx.git Add info re tesseract language codes Closes #3065 --- diff --git a/docs/configuration.md b/docs/configuration.md index 046904eaf4..aca9961e28 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -1088,10 +1088,13 @@ actual group ID on the host system, which you can get by executing : Additional OCR languages to install. By default, paperless comes with English, German, Italian, Spanish and French. If your language is not in this list, install additional languages with this -configuration option ([find the right LangCodes](https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html)): +configuration option. You will need to [find the right LangCodes](https://tesseract-ocr.github.io/tessdoc/Data-Files-in-different-versions.html) +but note that (tesseract-ocr-\* package names)[https://packages.debian.org/bullseye/graphics/] +do not always correspond with the language codes e.g. "chi_tra" should be +specified as "chi-tra". ``` bash - PAPERLESS_OCR_LANGUAGES=tur ces + PAPERLESS_OCR_LANGUAGES=tur ces chi-tra ``` Make sure it's a space separated list when using several values.