dependabot[bot] [Thu, 4 Jun 2026 21:55:55 +0000 (14:55 -0700)]
Chore(deps): Bump the pre-commit-dependencies group across 1 directory with 2 updates (#12923)
Bumps the pre-commit-dependencies group with 2 updates in the / directory: [https://github.com/astral-sh/ruff-pre-commit](https://github.com/astral-sh/ruff-pre-commit) and [https://github.com/tox-dev/pyproject-fmt](https://github.com/tox-dev/pyproject-fmt).
Updates `https://github.com/astral-sh/ruff-pre-commit` from v0.15.12 to 0.15.15
- [Release notes](https://github.com/astral-sh/ruff-pre-commit/releases)
- [Commits](https://github.com/astral-sh/ruff-pre-commit/compare/v0.15.12...v0.15.15)
Updates `https://github.com/tox-dev/pyproject-fmt` from v2.21.1 to 2.21.2
- [Release notes](https://github.com/tox-dev/pyproject-fmt/releases)
- [Commits](https://github.com/tox-dev/pyproject-fmt/compare/v2.21.1...v2.21.2)
dependabot[bot] [Wed, 3 Jun 2026 15:06:26 +0000 (15:06 +0000)]
Chore(deps-dev): Bump the frontend-eslint-dependencies group across 1 directory with 4 updates (#12913)
Bumps the frontend-eslint-dependencies group with 4 updates in the /src-ui directory: [@typescript-eslint/eslint-plugin](https://github.com/typescript-eslint/typescript-eslint/tree/HEAD/packages/eslint-plugin), [@typescript-eslint/parser](https://github.com/typescript-eslint/typescript-eslint/tree/HEAD/packages/parser), [@typescript-eslint/utils](https://github.com/typescript-eslint/typescript-eslint/tree/HEAD/packages/utils) and [eslint](https://github.com/eslint/eslint).
Updates `@typescript-eslint/eslint-plugin` from 8.59.1 to 8.60.0
- [Release notes](https://github.com/typescript-eslint/typescript-eslint/releases)
- [Changelog](https://github.com/typescript-eslint/typescript-eslint/blob/main/packages/eslint-plugin/CHANGELOG.md)
- [Commits](https://github.com/typescript-eslint/typescript-eslint/commits/v8.60.0/packages/eslint-plugin)
Updates `@typescript-eslint/parser` from 8.59.1 to 8.60.0
- [Release notes](https://github.com/typescript-eslint/typescript-eslint/releases)
- [Changelog](https://github.com/typescript-eslint/typescript-eslint/blob/main/packages/parser/CHANGELOG.md)
- [Commits](https://github.com/typescript-eslint/typescript-eslint/commits/v8.60.0/packages/parser)
Updates `@typescript-eslint/utils` from 8.59.1 to 8.60.0
- [Release notes](https://github.com/typescript-eslint/typescript-eslint/releases)
- [Changelog](https://github.com/typescript-eslint/typescript-eslint/blob/main/packages/utils/CHANGELOG.md)
- [Commits](https://github.com/typescript-eslint/typescript-eslint/commits/v8.60.0/packages/utils)
Updates `eslint` from 10.2.1 to 10.4.0
- [Release notes](https://github.com/eslint/eslint/releases)
- [Commits](https://github.com/eslint/eslint/compare/v10.2.1...v10.4.0)
dependabot[bot] [Wed, 3 Jun 2026 14:16:06 +0000 (14:16 +0000)]
Chore(deps-dev): Bump the frontend-jest-dependencies group (#12908)
Bumps the frontend-jest-dependencies group in /src-ui with 3 updates: [jest](https://github.com/jestjs/jest/tree/HEAD/packages/jest), [jest-environment-jsdom](https://github.com/jestjs/jest/tree/HEAD/packages/jest-environment-jsdom) and [jest-preset-angular](https://github.com/thymikee/jest-preset-angular).
Updates `jest` from 30.3.0 to 30.4.2
- [Release notes](https://github.com/jestjs/jest/releases)
- [Changelog](https://github.com/jestjs/jest/blob/main/CHANGELOG.md)
- [Commits](https://github.com/jestjs/jest/commits/v30.4.2/packages/jest)
Updates `jest-environment-jsdom` from 30.3.0 to 30.4.1
- [Release notes](https://github.com/jestjs/jest/releases)
- [Changelog](https://github.com/jestjs/jest/blob/main/CHANGELOG.md)
- [Commits](https://github.com/jestjs/jest/commits/v30.4.1/packages/jest-environment-jsdom)
Updates `jest-preset-angular` from 16.1.4 to 16.1.5
- [Release notes](https://github.com/thymikee/jest-preset-angular/releases)
- [Changelog](https://github.com/thymikee/jest-preset-angular/blob/main/CHANGELOG.md)
- [Commits](https://github.com/thymikee/jest-preset-angular/compare/v16.1.4...v16.1.5)
Trenton H [Tue, 2 Jun 2026 17:46:29 +0000 (10:46 -0700)]
Fix: Lock AI index during reading and don't index documents many times during a bulk update (#12899)
* Fix: Move LLM index lock outside index dir and skip per-doc tasks on bulk update
Two concurrency bugs from #12893:
[P1] Lock file lived inside LLM_INDEX_DIR. A rebuild calls
shutil.rmtree(LLM_INDEX_DIR), deleting the lock while a worker still
held it. A second worker then acquired a fresh lock on the new path and
ran concurrently, defeating serialisation. Move the lock to
DATA_DIR/locks/llm_index.lock (a new settings constant LLM_INDEX_LOCK)
so rmtree cannot touch it. The locks/ dir is created at settings load
time, matching the existing pattern for LOGGING_DIR.
[P2] document_updated was connected to add_or_update_document_in_llm_index
in apps.py. bulk_update_documents() emits document_updated for every
document in the batch, queuing N per-document LLM tasks, and then also
calls update_llm_index(rebuild=False) once at the end. Pass
skip_ai_index=True when sending document_updated from the bulk path so
the handler skips the per-document enqueue; the existing batch call at
the end of bulk_update_documents is the only LLM update for that path.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* Fix: ghost vectors leave KeyError-prone nodes_dict entries after deletion
docstore.delete_document() removes a node from the docstore but leaves its
entry in index_struct.nodes_dict (the FAISS positional-id to node-UUID map).
A subsequent similarity query resolves the ghost position to the deleted UUID,
finds nothing in fetched_nodes_by_id, and raises KeyError inside
_insert_fetched_nodes_into_query_result.
Purge stale nodes_dict entries after each docstore deletion and re-sync the
mutated index_struct into the kvstore so persist() writes the updated mapping.
Dead FAISS vectors remain in the flat index until the next full rebuild
(IndexFlatL2 is append-only); add a try/except KeyError around
retriever.retrieve() as a defensive fallback for any residual ghost positions.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* Fix: acquire index lock in query_similar_documents
query_similar_documents() loaded the index and ran the FAISS retriever
without holding the file lock. All write paths (update_llm_index,
llm_index_add_or_update_document, llm_index_remove_document) hold
FileLock(_index_lock_path()), so a concurrent rebuild calling
shutil.rmtree(LLM_INDEX_DIR) while a read is mid-load produces an IOError
or corrupt partial state.
Wrap the load_or_build_index() call and all subsequent retriever work inside
FileLock. The early-return guards (vector_store_file_exists check, empty
allowed_document_ids) remain outside the lock; the DB query for the final
result set also stays outside.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* Fix: skip LLM index enqueue on document_updated during version addition
When a document is consumed as a new version of an existing document, the
consumer fires document_consumption_finished (which triggers
add_or_update_document_in_llm_index) and then document_updated for the root
document. Both signals are connected to the same handler, so the root document
was enqueued for LLM indexing twice per version-addition event.
Pass skip_ai_index=True on the consumer's version-addition document_updated
send so the handler's existing guard suppresses the duplicate enqueue.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* Test: bulk_update_documents must not enqueue per-doc LLM tasks
With AI enabled, bulk_update_documents() sends document_updated for every
document in the batch. The skip_ai_index=True kwarg (added in the P2 fix)
prevents add_or_update_document_in_llm_index from enqueuing a per-document
task for each one. Only the single update_llm_index call at the end should run.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* Debug level log sure
* Update src/paperless_ai/indexing.py
Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>
* Apply suggestion from @shamoon
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: shamoon <4887959+shamoon@users.noreply.github.com>
Trenton H [Mon, 1 Jun 2026 20:40:49 +0000 (13:40 -0700)]
Fix: Minor fixes for the AI indexing (#12893)
* Fix: Remove all nodes for multi-chunk documents in update_llm_index incremental path
The existing_nodes dict comprehension keyed on document_id silently dropped all
but the last node per document, so only that one node was deleted when a
modified document was re-indexed, leaving all other chunks as ghost vectors in
the FAISS index. Switch to a defaultdict(list) that collects every node per
document_id, then iterate and delete all of them before inserting fresh nodes.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* Fix: Wire document_updated signal to LLM index update handler
Connect document_updated to add_or_update_document_in_llm_index in
DocumentsConfig.ready() so REST API edits (PATCH /api/documents/{id}/)
enqueue an LLM vector store update, matching the existing
document_consumption_finished behavior.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* Fix: Add file lock around FAISS index mutations to prevent concurrent write corruption
Two concurrent Celery workers calling llm_index_add_or_update_document or
llm_index_remove_document each loaded the same on-disk index independently,
made their own change, and the last writer silently overwrote the first's
update. Wrap both functions and the rebuild/persist body of update_llm_index
in a filelock.FileLock keyed on LLM_INDEX_DIR/index.lock. Add a TOCTOU
comment on queue_llm_index_update_if_needed explaining the residual risk
(duplicate rebuild tasks are wasteful but not corrupting because the lock
serialises the actual write).
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* Fix: Apply _normalize() in extract_unmatched_names to prevent duplicate suggestions
extract_unmatched_names was using .lower() while _match_names_to_queryset
uses _normalize() (which also strips punctuation). A name like "J. Smith"
matched to existing correspondent "J Smith" would still appear in the
unmatched list, causing duplicate object creation.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* Fix: Skip LLM index update gracefully when document has no indexable content
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* Fix: Persist empty index when all documents are deleted to clear stale FAISS vectors
The early-return guard in update_llm_index fired before persist() when no
documents existed, leaving a stale on-disk FAISS index that returned phantom
hits for deleted document IDs. Now the guard only returns early for the
incremental (rebuild=False) path when no index exists on disk; the rebuild
path always continues through to persist(), producing an empty clean index.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* Chore: Simplify incremental index update — use docs.values() and deduplicate node extend
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Trenton H [Mon, 1 Jun 2026 17:03:27 +0000 (10:03 -0700)]
Fix: Improvements for security around the AI (#12895)
* Fix: Validate and limit chat question input in ChatStreamingView
Add max_length=4000 to ChatStreamingSerializer.q and replace the bare
request.data["q"] read with proper serializer.is_valid(raise_exception=True)
so oversized or missing questions are rejected with HTTP 400 before
reaching the LLM.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* Fix: Add defensive prompt framing to mark document content as untrusted
* Also adds a system prompt which is treated higher that this is untrusted stuff
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
Trenton H [Wed, 27 May 2026 16:47:13 +0000 (09:47 -0700)]
Fix: Handle tanvity index lock contention (#12856)
Implements and tests a retry with backoff + jitter for aquring the index update lock. If we still can't get it, dispatch a celery task to handle it later instead (also with retry)
Bumps the npm_and_yarn group with 1 update in the /src-ui directory: [@babel/plugin-transform-modules-systemjs](https://github.com/babel/babel/tree/HEAD/packages/babel-plugin-transform-modules-systemjs).
Updates `@babel/plugin-transform-modules-systemjs` from 7.29.0 to 7.29.4
- [Release notes](https://github.com/babel/babel/releases)
- [Changelog](https://github.com/babel/babel/blob/main/CHANGELOG.md)
- [Commits](https://github.com/babel/babel/commits/v7.29.4/packages/babel-plugin-transform-modules-systemjs)
Trenton H [Mon, 4 May 2026 23:36:05 +0000 (16:36 -0700)]
Feature: Further reduce document importer memory usage (#12707)
* Replaces loaddata with streaming bulk_create
Replaces call_command('loaddata') with a streaming implementation that
reads manifest records one at a time via ijson, accumulates per-model
batches up to --batch-size, and flushes via bulk_create. This reduces
peak memory and no longer scales directly with the size of the import.
* fix(importer): avoid guardian lru_cache poisoning; include M2M through tables in check_constraints
clear_cache() inside the import transaction emptied Django's ContentType
manager cache while fixture PKs were live, causing downstream ContentType
lookups to repopulate guardian's separate @lru_cache(None) with
fixture-PK objects. After the TestCase transaction rolled back to
original PKs, guardian's lru_cache held stale fixture ContentType
objects, causing MixedContentTypeError in unrelated subsequent tests.
Remove clear_cache() since it was defending against a theoretical
stale-cache scenario that doesn't occur in a proper same-install restore.
Fix check_constraints() to explicitly include auto-created M2M through
tables (populated by .set() after bulk_create) alongside the model tables,
addressing the gap where join-table FK violations would have gone
undetected.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
* Excludes the consumer and AnonymousUser from any models which might have a FK relation to it. This prevents orphan things like UI setting, which have a relation to no existing user
* Splits into more sub functions for Sonar
* Improvements to the typing of the new functions
* Coverage for some error cases, and removes handling for pk only models. No need to support these
* Final coverage gaps
---------
Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>