Previously, only server triggered 5xx errors would cause a retry. However,
there are multiple categories of errors from http-client that should
also cause a retry (9xxx errors).
e.g., setting Tika to restart after every processed file can easily
cause either 9003 (connect failed) or 9005 (connection_lost) errors, which
is a temporary condition.
Timo Sirainen [Tue, 21 Apr 2026 11:12:02 +0000 (11:12 +0000)]
lib-sql: sqlite - Apply sqlite_journal_mode via PRAGMA
The sqlite_journal_mode setting was never actually applied to the
database. The code added SQLITE_OPEN_WAL to sqlite3_open_v2() flags,
but that flag is documented as VFS-internal and has no effect on the
journal mode of the database.
Issue a 'PRAGMA journal_mode = <mode>' statement after connecting so
the setting takes effect. Without this, databases remained in the
default rollback journal mode, where readers block on writers and
writers block on readers, causing spurious 'database is locked'
failures under concurrency.
Timo Sirainen [Mon, 27 Apr 2026 10:10:55 +0000 (10:10 +0000)]
lib-storage: Auto-rename non-NFC subscription file entries to NFC on read
Mailbox listing already auto-renames non-NFC mailbox names to NFC form
during iteration. Apply the same treatment to the subscription file:
when mailbox_list_subscriptions_refresh reads a subscription whose name
is not in NFC form, insert the NFC version into the in-memory
subscription tree and rewrite the file on disk to use the NFC name.
The rewrite first adds the NFC subscription and then removes the
non-NFC one, so a crash between the two leaves the subscription intact
under the old name and the next refresh retries. If either subsfile
operation fails, log the error and stop processing further renames.
Timo Sirainen [Mon, 27 Apr 2026 10:10:45 +0000 (10:10 +0000)]
lib-storage: Normalize subscription name to NFC in mailbox_list_set_subscribed
When mailbox_list_normalize_names_to_nfc is enabled, normalize the
subscription name to NFC before passing it to the backend. Ensures that
subscribe/unsubscribe operations on non-NFC mailbox names hit the same
entry as the NFC name, matching the existing NFC handling for mailbox
listing.
Timo Sirainen [Thu, 23 Apr 2026 10:36:26 +0000 (13:36 +0300)]
login-proxy: Fix crash with rawlog and multiplexing during reconnection
If login_proxy_rawlog_dir was enabled, reconnection (especially with
immediate SSL like SMTPS) caused a panic in proxy_rawlog_deinit().
login_proxy_disconnect() destroyed server_input directly, leaving
rawlog_input/rawlog_output pointing to freed streams. On reconnect the
new server_input no longer matched rawlog_input, tripping the
assertions in proxy_rawlog_deinit().
Fix this by having login_proxy_disconnect() tear down the stream chain
(multiplex + rawlog) properly before destroying server_input, and by
clearing pre_rawlog_input/output in proxy_rawlog_deinit().
Also reorder login_proxy_starttls() so multiplex is removed before
rawlog. Multiplex sits on top of rawlog, so rawlog_deinit()'s
server_input == rawlog_input assertion requires multiplex to be gone
first.
proxy_multiplex_deinit() gains a NULL check so it is safe to call
unconditionally from login_proxy_disconnect().
Timo Sirainen [Thu, 23 Apr 2026 10:35:34 +0000 (13:35 +0300)]
lib: Fix truncation of long log messages with event_set_log_message_callback()
event_get_log_message() cached a pointer from str_c(glmctx->log_prefix)
at the top of the function, then kept appending to the same string_t
buffer via str_vprintfa() / str_append(). When the appended data
exceeded the buffer's current allocation, the buffer was reallocated
and the cached pointer was left pointing at the old (truncated) copy.
The stale pointer was then passed to the log_message_callback as
in_message, and also used in the equality check that decides whether
the callback returned its input buffer.
Fix by calling str_c() after each mutation so the pointer always
reflects the current buffer.
Timo Sirainen [Sat, 25 Apr 2026 16:08:32 +0000 (19:08 +0300)]
lib: ostream-multiplex - Take ownership of parent cork across wrap/unwrap
The multiplex's interaction with the parent ostream's cork was loose:
the channel send paths corked the parent on demand and the trailing
o_stream_multiplex_sendv() blindly uncorked it, even when the cork
had been put there by someone outside the multiplex. That made it
ambiguous who owns the parent's cork while a multiplex is wrapping
it, and easy for a caller to leave the parent corked across a wrap
or to cork it directly through the wrapped pointer and have the
multiplex undo it.
Take ownership explicitly:
- A new parent_corked bit tracks the cork state the multiplex
itself has set on the parent. multiplex_cork_parent() asserts
that the observed cork matches the tracked state on every
transition, catching any caller that corks/uncorks the parent
behind the multiplex's back. After the o_stream_(un)cork() call
it re-reads o_stream_is_corked() so a no-op on a closed/errored
stream stays consistent with the tracker.
- The send_packet()/send_stream() transient cork and the matching
uncork at the tail of o_stream_multiplex_sendv() now go through
the helper.
- stream_send_io() in ostream-file.c corks the parent around the
flush callback, which is the only legal external cork. Undo it
on entry to o_stream_multiplex_flush() so the invariant
(parent_corked reflects reality) holds everywhere else.
Transfer the parent's cork across wrap/unwrap. At wrap time, if the
parent is corked, move the cork onto channel 0 - from the caller's
point of view nothing changed, the ostream they got back is corked
and a later o_stream_uncork() on it stays balanced. At destroy time
do the reverse: if channel 0 is corked when it goes away, re-cork
the parent in o_stream_multiplex_try_destroy() before releasing the
multiplex's reference, so a caller still holding the parent ostream
sees the cork preserved.
The transfer cannot go through o_stream_(un)cork(): o_stream_uncork()
flushes data already buffered in the parent (ostream-file.c
buffer_flush(), default cork's o_stream_flush()), which would defeat
the transparency of the transfer - the caller wrapped a corked parent
and expects its buffered bytes to stay buffered until they uncork
channel 0. multiplex_transfer_parent_cork() bypasses the public API
and writes the corked bit directly, asserting first that the parent
is in the expected state and is neither closed nor errored. The
destroy direction uses the same helper for symmetry.
The flush-callback invariant (parent_corked tracks reality) is kept:
the cork is moved off the parent during create, and back on as the
last step of try_destroy() when the multiplex no longer manipulates
the parent.
Adds unit tests covering both transfer directions and the
buffered-parent case that the direct-bit transfer protects against.
Timo Sirainen [Fri, 24 Apr 2026 19:07:06 +0000 (19:07 +0000)]
lib-event: Add event_register_callback_prepend()
A plugin that wants to annotate outgoing events with extra fields has
no good way to get in front of the stats client today. The stats
client registers its event callback in master_service_init(), well
before any module is loaded, and event_register_callback() always
appends - so a plugin's own callback runs after the stats client has
already exported the event.
Add event_register_callback_prepend() for callers that need to run
first. Implementation is a one-line array_push_front() against the
same event_handlers array.
Timo Sirainen [Wed, 22 Apr 2026 12:06:03 +0000 (15:06 +0300)]
lib: iostream_fd_unref() - NULL out the pointer like other unref APIs
iostream_fd_unref() takes a pointer-to-pointer argument but did not
set *_ref to NULL after unreferencing. This left fstream->fd_ref in
both file_istream and file_ostream dangling after close, which made
the fstream->fd_ref != NULL guard in stream_closed() and
i_stream_file_close() ineffective if either function was called more
than once on the same stream. A second call would dereference freed
memory, either hitting the refcount > 0 assertion or corrupting the
heap (detected later as a malloc crash).
Observed in managesieve-login during SSL proxying when the post-login
managesieve process crashed shortly after starting: the iostream_fd
shared between the socketpair istream and ostream got double-unrefed
through iostream_fd_proxy_finished() -> iostream_proxy_unref() ->
iostream_pump_unref() -> o_stream_unref().
libcassandra initializes libcrypto lazily on first SSL use. In processes
that do not load lib-ssl-iostream (e.g. dict-async with a cassandra SSL
dict backend), nobody else passes OPENSSL_INIT_NO_ATEXIT first, so
libcrypto registers atexit(OPENSSL_cleanup).
That handler runs after main() returns, which is after Dovecot has
dlclose()d libdriver_cassandra.so and its libcassandra dependency.
OPENSSL_cleanup() then walks per-thread OpenSSL state holding method
pointers into unmapped libcassandra/libuv code, crashing the process at
exit.
Call OPENSSL_init_crypto(OPENSSL_INIT_NO_ATEXIT, NULL) from
driver_cassandra_init() so that even in minimal processes where
dovecot_openssl_common_global_ref() is never called, libcrypto's unsafe
atexit handler is suppressed. The per-process OpenSSL state is leaked at
exit, which is harmless as the kernel reclaims it.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Timo Sirainen [Tue, 21 Apr 2026 12:34:53 +0000 (12:34 +0000)]
lib-ssl-iostream: Pass OPENSSL_INIT_NO_ATEXIT to OPENSSL_init_ssl()
Prevent libcrypto from registering its own atexit() handler. libcrypto's
handler runs after main() returns, which is after Dovecot has dlclose()d
plugins that may have registered OpenSSL callbacks (e.g. libcassandra).
Accessing those unmapped code pages from OPENSSL_cleanup() crashes the
process at exit.
Dovecot already calls OPENSSL_cleanup() explicitly via
dovecot_openssl_common_global_unref() while the relevant modules are still
loaded, so libcrypto's own atexit handler was redundant and unsafe.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Aki Tuomi [Mon, 20 Apr 2026 11:32:56 +0000 (14:32 +0300)]
lib-dcrypt: Validate hex_to_binary() return in key loading paths
Check return value of hex_to_binary() at all call sites that process
untrusted key material. Previously ignored return value meant invalid
hex input (non-hex characters) would silently produce truncated buffers
rather than an explicit error.
Aki Tuomi [Wed, 8 Apr 2026 12:55:47 +0000 (15:55 +0300)]
lib-dcrypt: Limit key lengths and replace VLA with dynamic buffer
Reject oversized key material early to prevent large datastack/buffer
allocations from untrusted input. DCRYPT_MAX_KEY_BUFFER_SIZE = 16 KB
for raw keys, x2 for hex-encoded fields, x4 for JWK/PEM.
Replace VLA (unsigned char keybuf[keylen]) in
dcrypt_openssl_load_public_key_dovecot_v2() with a pool_datastack_create()
buffer to avoid potential stack overflow.
Aki Tuomi [Sat, 18 Apr 2026 18:00:36 +0000 (21:00 +0300)]
lib-index: Fix dangling index->map pointer when fsck fails in mail_index_map_latest_file()
If mail_index_fsck() returns -1 (e.g. log lock failure), index->map was
left pointing at new_map. The subsequent mail_index_unmap(&new_map) freed
that object, leaving index->map as a dangling pointer.
Fix by restoring index->map = old_map before breaking on fsck error,
mirroring what the success path already does.
Timo Sirainen [Fri, 17 Apr 2026 12:51:02 +0000 (12:51 +0000)]
config: Add __attribute__((weak)) to global variable definitions in all-settings.c
all-settings.c copies global variable definitions (e.g. mail_storage_default_settings,
setting_parser_info structs) verbatim from plugin settings files. The same symbols
are also compiled into the plugin .so files. When doveconf dlopen()'s a plugin at
runtime, ASAN detects duplicate definitions and reports an ODR violation.
Fix by injecting __attribute__((weak)) on these definitions as settings-get.pl
generates all-settings.c. ASAN doesn't flag weak+strong as an ODR violation, and
the strong definition from the .so correctly takes precedence at runtime. Static
symbols are excluded since Clang rejects __attribute__((weak)) on internal-linkage
declarations.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Koda Reef [Tue, 24 Mar 2026 18:45:35 +0000 (18:45 +0000)]
lib: Fix unsigned underflow in buffer_check_limits() bounds check
When pos > buf->max_size, the unsigned subtraction buf->max_size - pos
wraps to a large value instead of triggering the panic. Add an explicit
check before the subtraction so out-of-range positions are caught.
This does not affect buffers created with buffer_create_dynamic(), which
use max_size == SIZE_MAX.
Also add an i_assert in buffer_check_append_limits() to verify the
invariant that used <= writable_size.
Timo Sirainen [Wed, 1 Apr 2026 12:24:52 +0000 (15:24 +0300)]
fts-solr: Fix use-after-free crash during DNS lookup
The solr_http_client singleton outlived the user session that created
it. Its event chain held a ref on the mail_storage_service_user event,
keeping it alive after the user was freed. The SETTINGS_EVENT_INSTANCE on
that event still pointed to the already-freed settings_instance. On the
next user session, dns_lookup() walked the stale event chain and
dereferenced the dangling pointer in settings_instance_get().
Timo Sirainen [Tue, 31 Mar 2026 17:49:42 +0000 (20:49 +0300)]
lib-http: Count lock and ioloop wait statistics beginning when request is queued
Previously they were counted only after the request was sent. However,
it's important to also know where time is spent while the request is still
queued.
This also fixes wrong output for lock wait statistics when the request
was never sent (e.g. DNS lookup failure).
This only works properly if the listener sockets don't change and
there is always a process listening on each of the sockets.
Now master process creates the reuse_port=yes listeners for all of the
processes at startup (up to process_limit). Each child process inherits
exactly one unique listener socket based on its assigned process index.
Since with reuse_port=yes connections are assigned to a specific socket,
processes can no longer rely on other processes picking up connections when
they are full. Instead, each process is now responsible for rejecting
connections when they reach client_limit.
Fred Morcos [Tue, 17 Mar 2026 10:55:19 +0000 (11:55 +0100)]
pop3: Fix memory leak when using DELE+RSET+DELE
This fixes a leak where calls to delete -> reset -> delete would leak memory because the
client->deleted flag is checked but the state of the client->deleted_bitmask isn't.
This removes client->deleted and uses the bitmask as a marker instead.
Timo Sirainen [Wed, 25 Mar 2026 09:30:20 +0000 (11:30 +0200)]
lib-master, master: Fix behavior for services with client_limit>1 and restart_request_count
Especially login processes are commonly configured to use client_limit > 1
and process_limit = process_min_avail = number of CPUs. However, this
prevents using restart_request_count, because long lived connections
can reserve the process and prevent a new one from being launched.
Change the behavior so that when restart_request_count is reached for a
process whose service has client_limit > 1, the process is no longer
counted towards process_limit.
Timo Sirainen [Tue, 31 Mar 2026 15:35:07 +0000 (18:35 +0300)]
lib-http: Count time spent in any ioloop waiting on HTTP request as "http ioloop"
Previously only http_client_wait() caused times to be counted in "http
ioloop". This isn't relevant though. The important difference is that
time spent on "http ioloop" is actually time spent on waiting for the
HTTP request, while "other ioloop" is time spent on waiting for an ioloop
without the HTTP request.
Stephan Bosch [Sat, 28 Mar 2026 19:18:15 +0000 (20:18 +0100)]
lib-auth: auth-scram-server - Assert that hash method never gets to be NULL somehow
Stack-based buffer sizes are based on it a field in the hash method struct. An
assert is easier to debug than a segfault. Also, this makes code consistent with
auth-scram-client.