Reworks documentation on the max pixels setting

author Trenton Holmes <holmes.trenton@gmail.com>

Mon, 23 May 2022 17:44:33 +0000 (10:44 -0700)

committer Trenton Holmes <holmes.trenton@gmail.com>

Mon, 23 May 2022 17:54:41 +0000 (10:54 -0700)
author Trenton Holmes <holmes.trenton@gmail.com>
Mon, 23 May 2022 17:44:33 +0000 (10:44 -0700)
committer Trenton Holmes <holmes.trenton@gmail.com>
Mon, 23 May 2022 17:54:41 +0000 (10:54 -0700)
diff --git a/docs/configuration.rst b/docs/configuration.rst

index 2068a4238f455257120a8e0846565658baec1efd..b7ab978f473b0c11a777ab5641e269bbc348884e 100644 (file)
--- a/docs/configuration.rst
+++ b/docs/configuration.rst
@@ -424,14 +424,23 @@ PAPERLESS_OCR_IMAGE_DPI=<num>
      the produced PDF documents are A4 sized.
  
  PAPERLESS_OCR_MAX_IMAGE_PIXELS=<num>
-    Paperless will not OCR images that have more pixels than this limit.
-    This is intended to prevent decompression bombs from overloading paperless.
-    Increasing this limit is desired if you face a DecompressionBombError despite
-    the concerning file not being malicious; this could e.g. be caused by invalidly
-    recognized metadata.
-    If you have enough resources or if you are certain that your uploaded files
-    are not malicious you can increase this value to your needs.
-    The default value is 256000000, an image with more pixels than that would not be parsed.
+    Paperless will raise a warning when OCRing images which are over this limit and
+    will not OCR images which are more than twice this limit.  Note this does not
+    prevent the document from being consumed, but could result in missing text content.
+
+    If unset, will default to the value determined by
+    `Pillow <https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.MAX_IMAGE_PIXELS>`_.
+
+    .. note::
+
+        Increasing this limit could cause Paperless to consume additional resources
+        when consuming a file.  Be sure you have sufficient system resources.
+
+    .. caution::
+
+        The limit is intended to prevent malicious files from consuming system resources
+        and causing crashes and other errors.  Only increase this value if you are certain
+        your documents are not malicious and you need the text which was not OCRed
  
  PAPERLESS_OCR_USER_ARGS=<json>
      OCRmyPDF offers many more options. Use this parameter to specify any
author	Trenton Holmes <holmes.trenton@gmail.com>
	Mon, 23 May 2022 17:44:33 +0000 (10:44 -0700)
committer	Trenton Holmes <holmes.trenton@gmail.com>
	Mon, 23 May 2022 17:54:41 +0000 (10:54 -0700)