]> git.ipfire.org Git - thirdparty/systemd.git/commitdiff
Introduce support for running code in fibers
authorDaan De Meyer <daan@amutable.com>
Wed, 12 Nov 2025 16:53:47 +0000 (17:53 +0100)
committerDaan De Meyer <daan@amutable.com>
Thu, 21 May 2026 09:55:04 +0000 (09:55 +0000)
Traditionally, asynchronous programming in systemd has been achieved using
sd-event along with the asynchronous interfaces of sd-bus and sd-varlink.
This works well when the system is reacting to events and all code triggered
by those events can run without blocking. In these scenarios, the global
Manager object is passed as userdata to the callback, and the callback can
use the stack as usual, declaring local state and ensuring proper cleanup via
_cleanup_. Control flow structures, such as loops, work as expected, and
everything runs smoothly.

However, challenges arise when the code needs to perform long-running
operations within these callbacks. Since the system cannot block execution
within the callback, we can't directly invoke a long-running operation and
wait for its result without introducing complexities. Instead, we need to
initiate the long-running task, register for completion with sd-event,
sd-bus, or sd-varlink, and provide a callback to be invoked when the
operation completes.

This callback, however, only receives a single userdata pointer, which
forces us to bundle all local variables into a struct and pass it along as
part of the callback. On top of that, after queuing the asynchronous
operation, the caller continues executing. As the caller's stack unwinds
when the function exits, the resources and state within the local scope may
be prematurely cleaned up. Therefore, the struct must store copies of the
local variables or ensure proper reference counting to prevent premature
resource cleanup.

When multiple long-running operations need to be initiated within a loop,
the complexity grows further. We must introduce additional shared state to
track the completion of all operations before we can run any code that
depends on their results.

Furthermore, since the daemon may be shut down at any time, we must track
the lifecycle of each long-running operation in the global Manager struct,
ensuring proper cleanup even when stack unwinding can no longer manage the
resources for us.

Fibers, or green threads, provide a more natural way of handling
asynchronous operations. By enabling cooperative multitasking within a
single thread, fibers allow us to write code that looks like it’s running
synchronously, but with the ability to yield control at predefined points,
such as when waiting for long-running tasks to complete.

With fibers, we can simplify the control flow by running asynchronous
operations within a fiber, allowing us to "pause" execution while waiting
for the long-running operation to finish and then "resume" the operation once
it's complete. This eliminates the need for multiple callback chains,
extensive state tracking, and the potential pitfalls of stack unwinding.

This commit introduces the ability to execute long-running operations in a
non-blocking manner while maintaining the simplicity and readability of
synchronous code. The fiber-based approach will significantly improve the
handling of complex workflows, making the code easier to write and maintain.

The implementation is based on ucontext.h's makecontext() (with a fallback
to the venerable sigaltstack() approach on musl), sigsetjmp()/siglongjmp()
and sd-event. ucontext.h provides us with alternate stacks that we can switch
between. We use sigsetjmp()/siglongjmp() instead of swapcontext() because the
latter forcibly saves/restores a per context signal mask every time it is called.
Using sigsetjmp()/siglongjmp(), we can avoid the unnecessary syscall and maintain
a per thread signal mask, which makes much more sense than having a per fiber
signal mask.

The default stack size is the same as a regular thread. Because we
use mmap() to allocate the stack, the memory won't actually be used until it
is paged in by the kernel, so we don't actually use 8MB per fiber.

To integrate fibers with the event loop, each fiber is assigned a deferred
event source which resumes the fiber when enabled. The deferred event source
is oneshot by default so the fiber will run immediately until it yields or
suspends. If it yields, the deferred event source is enabled again (oneshot)
immediately. If it suspends, before it suspends, one or more event sources
are registered with sd-event that will enable the deferred event source
(oneshot) to resume the fiber once the operation it is waiting for completes.

Yielding or suspending the fiber is done by calling sd_fiber_yield() or
sd_fiber_suspend() respectively. Both of these return zero on success or any
error value from the async operation that caused the fiber to resume.

This is also how fiber cancellation is implemented. When a fiber is cancelled,
sd_fiber_yield() and sd_fiber_suspend() will return ECANCELED when the fiber
is resumed, allowing the fiber to unwind its stack (which allows cleanup to
happen automatically) and finish.

Instead of having applications work directly with fibers, we hide them behind
a generic futures interface to represent long-running operations, regardless of
whether those operations are running on a fiber or not. Aside from fibers, the
futures library (sd-future) will for example allow waiting for sd-event sources
and doing sd-bus calls in the background as well. Fibers can suspend until a
future is ready with sd_fiber_await() or by having the future wake up the fiber
explicitly in its callback. A future always defaults to waking up the current
fiber.

Each future kind plugs into the library by providing an sd_future_ops vtable
(alloc, free, cancel, set_priority). The library treats the impl pointer
returned by alloc() as a black box. Future Implementations retrieve it via
sd_future_get_private().

A future starts in SD_FUTURE_PENDING and transitions exactly once to
SD_FUTURE_RESOLVED, carrying an integer result. Consumers can react to that
transition either by installing a one-shot callback with
sd_future_set_callback() (callback-style code) or by waiting on it from a
fiber via sd_fiber_await() (synchronous-looking fiber code). sd_fiber_await()
is itself built on a "wait future" that resolves when its target resolves;
sd_future_new_wait() exposes the same primitive directly so non-fiber callers
can chain futures without involving a fiber.

Cancellation is cooperative: sd_future_cancel() invokes the future impl's
cancel callback, which is responsible for tearing down its work and ultimately
resolving the promise with -ECANCELED. For fiber futures this is what
surfaces as the ECANCELED return from sd_fiber_yield()/sd_fiber_suspend()
mentioned above.

Fire-and-forget fibers — created by passing a NULL ret to sd_fiber_new() —
take a self-reference on their future so they outlive the caller's scope.
The self-ref is dropped when the fiber resolves. This floating mechanism
(sd_fiber_set_floating()) is restricted to fiber futures because they
uniquely guarantee resolution; allowing it for arbitrary future kinds would
risk silent leaks for kinds that may never resolve.

Note that fiber cleanup depends on the runtime operating normally. Each
fiber's _cleanup_-style cleanups live on the fiber's own stack and run
only when the fiber is resumed and allowed to unwind, which requires a
working event loop to drive it to completion. The exit event source
registered for top-level fibers ensures unwind on a normal sd_event_exit(),
but if the event loop itself terminates abnormally (e.g. an unrecoverable
allocation failure mid-dispatch) before all fibers have resolved, their
stacks never unwind and any resources they own leak.

The code lives in libsystemd as sd-future (not exported) for the following reasons:
- We may want to make this a public libsystemd API in the future
- The code can't live in src/basic as it makes heavy use of sd-event
- The code can't live in src/shared as sd-bus and sd-event make use of it

The log and log-context headers are updated with functions to allow
fibers to have their own log prefix and log context.

20 files changed:
meson.build
src/basic/architecture.h
src/basic/basic-forward.h
src/basic/log-context.c
src/basic/log-context.h
src/basic/log.c
src/basic/log.h
src/include/override/sys/mman.h
src/libsystemd/meson.build
src/libsystemd/sd-common/sd-forward.h
src/libsystemd/sd-event/event-future.c [new file with mode: 0644]
src/libsystemd/sd-event/event-future.h [new file with mode: 0644]
src/libsystemd/sd-future/fiber.c [new file with mode: 0644]
src/libsystemd/sd-future/sd-future.c [new file with mode: 0644]
src/libsystemd/sd-future/test-fiber.c [new file with mode: 0644]
src/systemd/_sd-common.h
src/systemd/meson.build
src/systemd/sd-future.h [new file with mode: 0644]
test/integration-tests/TEST-02-UNITTESTS/meson.build
test/test-link-abi.py

index 1d2d59e4fa5368cb4e0c90c15e55df5a373861a5..56f73ea30df5fb6e7e552ea3353b88473fdb9a34 100644 (file)
@@ -1017,6 +1017,14 @@ else
 endif
 conf.set10('HAVE_LIBCRYPT', have)
 
+# musl declares the ucontext.h functions but does not implement them; the fiber bootstrap in
+# src/libsystemd/sd-future/fiber.c relies on them, so on musl we have to link to libucontext.
+if get_option('libc') == 'musl'
+        libucontext = dependency('libucontext')
+else
+        libucontext = []
+endif
+
 bpf_framework = get_option('bpf-framework')
 bpf_compiler = get_option('bpf-compiler')
 libbpf = dependency('libbpf',
@@ -1720,6 +1728,7 @@ libsystemd_includes = [basic_includes, include_directories(
         'src/libsystemd/sd-common',
         'src/libsystemd/sd-device',
         'src/libsystemd/sd-event',
+        'src/libsystemd/sd-future',
         'src/libsystemd/sd-hwdb',
         'src/libsystemd/sd-id128',
         'src/libsystemd/sd-journal',
@@ -1760,7 +1769,7 @@ libsystemd = shared_library(
         link_with : [libc_wrapper_static,
                      libbasic_static],
         link_whole : [libsystemd_static],
-        dependencies : [userspace],
+        dependencies : [libucontext, userspace],
         link_depends : libsystemd_sym,
         install : true,
         install_tag: 'libsystemd',
@@ -1782,6 +1791,7 @@ if static_libsystemd != 'false'
                 dependencies : [libgcrypt_cflags,
                                 liblz4_cflags,
                                 libm,
+                                libucontext,
                                 libxz_cflags,
                                 libzstd_cflags,
                                 userspace],
index ee91ea0d349b82555dafdebe0c3a781c9a624686..350a72d087a357e55d26079467eec90941e860aa 100644 (file)
@@ -242,4 +242,10 @@ Architecture uname_architecture(void);
 #  error "Please register your architecture here!"
 #endif
 
+#if defined(__hppa__) || defined(__hppa64__)
+#  define STACK_GROWS_UP 1
+#else
+#  define STACK_GROWS_UP 0
+#endif
+
 DECLARE_STRING_TABLE_LOOKUP(architecture, Architecture);
index 5edd494f783c69010c0c8724d1a891ed5b41087d..5f9109cb17c1e0d9650931e57a849a4c5c13e243 100644 (file)
@@ -111,10 +111,12 @@ typedef enum UnitNameMangle UnitNameMangle;
 typedef enum UnitType UnitType;
 typedef enum WaitFlags WaitFlags;
 
+typedef struct Fiber Fiber;
 typedef struct Hashmap Hashmap;
 typedef struct HashmapBase HashmapBase;
 typedef struct IteratedCache IteratedCache;
 typedef struct Iterator Iterator;
+typedef struct LogContext LogContext;
 typedef struct OrderedHashmap OrderedHashmap;
 typedef struct OrderedSet OrderedSet;
 typedef struct Set Set;
index a05b4b1980e6b5362b35bde4a94b7e5b13eab06f..44eb7cd36f923b1bf621e1e10102d31358bbef98 100644 (file)
@@ -177,6 +177,14 @@ size_t log_context_num_fields(void) {
         return _log_context_num_fields;
 }
 
+void log_context_swap(LogContext **log_context, size_t *num_fields) {
+        assert(log_context);
+        assert(num_fields);
+
+        SWAP_TWO(_log_context, *log_context);
+        SWAP_TWO(_log_context_num_fields, *num_fields);
+}
+
 void _reset_log_level(int *saved_log_level) {
         assert(saved_log_level);
 
index ca112fa862acfd3f68028d7fd67336938edba109..638d2ed913b19834fce8a5255e192127775bf822 100644 (file)
@@ -66,6 +66,8 @@ size_t log_context_num_contexts(void);
 /* Returns the number of fields in all attached log contexts. */
 size_t log_context_num_fields(void);
 
+void log_context_swap(LogContext **log_context, size_t *num_fields);
+
 void _reset_log_level(int *saved_log_level);
 
 #define _LOG_CONTEXT_SET_LOG_LEVEL(level, l) \
index d8b441bfadf21b03bec7ee3550c2998bd0128fb8..29702abd44e40126c4cc3e224e8b3aee1ecfefce 100644 (file)
@@ -87,6 +87,12 @@ bool _log_message_dummy = false; /* Always false */
                 }                                                       \
         } while (false)
 
+void log_prefix_swap(const char **prefix) {
+        assert(prefix);
+
+        SWAP_TWO(log_prefix, *prefix);
+}
+
 static void log_close_console(void) {
         /* See comment in log_close_journal() */
         (void) safe_close_above_stdio(TAKE_FD(console_fd));
index 46a4339de5565b4065736ccc0d0e498c56c596ec..cb4e8b5d0f15b8b4ba40cde5bc7aed16b22e6f47 100644 (file)
@@ -380,6 +380,8 @@ int log_syntax_parse_error_internal(
 void log_setup(void);
 
 const char* _log_set_prefix(const char *prefix, bool force);
+
+void log_prefix_swap(const char **prefix);
 static inline const char* _log_unset_prefixp(const char **p) {
         assert(p);
         _log_set_prefix(*p, true);
index 30ef92b83538e992023b8a7dc1fddf0aed529772..9961ea4b21d93a1ee613910d80487aa25b6fdcbb 100644 (file)
@@ -18,3 +18,10 @@ static_assert(MFD_NOEXEC_SEAL == 0x0008U, "");
 #else
 static_assert(MFD_EXEC == 0x0010U, "");
 #endif
+
+/* since Linux 6.13 / glibc-2.42 */
+#ifndef MADV_GUARD_INSTALL
+#  define MADV_GUARD_INSTALL 102
+#else
+static_assert(MADV_GUARD_INSTALL == 102, "");
+#endif
index c6cf0fad34b60ebfc7cea8269718b63b7648e8ba..061ff6213f61e5569b037ade7acdfaebc565ad4f 100644 (file)
@@ -33,6 +33,7 @@ sd_daemon_sources = files('sd-daemon/sd-daemon.c')
 ############################################################
 
 sd_event_sources = files(
+        'sd-event/event-future.c',
         'sd-event/event-util.c',
         'sd-event/sd-event.c',
 )
@@ -75,6 +76,13 @@ sd_device_sources = files(
 
 ############################################################
 
+sd_future_sources = files(
+        'sd-future/fiber.c',
+        'sd-future/sd-future.c',
+)
+
+############################################################
+
 sd_login_sources = files('sd-login/sd-login.c')
 
 ############################################################
@@ -135,8 +143,9 @@ libsystemd_sources = files(
         'sd-resolve/sd-resolve.c',
 ) + sd_journal_sources + sd_id128_sources + sd_daemon_sources \
   + sd_event_sources + sd_bus_sources + sd_device_sources \
-  + sd_login_sources + sd_json_sources + sd_varlink_sources \
-  + sd_path_sources + sd_netlink_sources + sd_network_sources
+  + sd_future_sources + sd_login_sources + sd_json_sources \
+  + sd_varlink_sources + sd_path_sources + sd_netlink_sources \
+  + sd_network_sources
 
 sources += libsystemd_sources
 
@@ -151,6 +160,7 @@ libsystemd_static = static_library(
         link_with : [libc_wrapper_static,
                      libbasic_static],
         dependencies : [libm,
+                        libucontext,
                         userspace],
         build_by_default : false)
 
@@ -179,6 +189,7 @@ simple_tests += files(
         'sd-bus/test-bus-vtable.c',
         'sd-device/test-device-util.c',
         'sd-device/test-sd-device-monitor.c',
+        'sd-future/test-fiber.c',
         'sd-hwdb/test-sd-hwdb.c',
         'sd-id128/test-id128.c',
         'sd-journal/test-audit-type.c',
index 8abe655209dec9cc7d0ebb08daf39b75613e5ad2..96ab84e9828862a8938634ca326839e531845fd3 100644 (file)
@@ -127,3 +127,9 @@ typedef struct sd_resolve sd_resolve;
 typedef struct sd_resolve_query sd_resolve_query;
 
 typedef struct sd_hwdb sd_hwdb;
+
+typedef struct sd_future sd_future;
+
+typedef int (*sd_future_func_t)(sd_future *f);
+typedef int (*sd_fiber_func_t)(void *userdata);
+typedef _sd_destroy_t sd_fiber_destroy_t;
diff --git a/src/libsystemd/sd-event/event-future.c b/src/libsystemd/sd-event/event-future.c
new file mode 100644 (file)
index 0000000..4595a7a
--- /dev/null
@@ -0,0 +1,118 @@
+/* SPDX-License-Identifier: LGPL-2.1-or-later */
+
+#include "sd-event.h"
+#include "sd-future.h"
+
+#include "alloc-util.h"
+#include "errno-util.h"
+#include "event-future.h"
+
+typedef struct TimeFuture {
+        sd_event_source *source;
+
+        /* Result the future resolves with on natural expiry (vs. cancellation). 0 for normal sleep,
+         * non-zero (e.g. -ETIMEDOUT) lets a fiber waiting on this future resume with that error. */
+        int result;
+} TimeFuture;
+
+static void* time_future_alloc(void) {
+        return new0(TimeFuture, 1);
+}
+
+static void time_future_free(sd_future *f) {
+        TimeFuture *tf = ASSERT_PTR(sd_future_get_private(ASSERT_PTR(f)));
+        sd_event_source_unref(tf->source);
+        free(tf);
+}
+
+static int time_future_cancel(sd_future *f) {
+        TimeFuture *tf = ASSERT_PTR(sd_future_get_private(ASSERT_PTR(f)));
+        int r;
+
+        r = sd_event_source_set_enabled(tf->source, SD_EVENT_OFF);
+        RET_GATHER(r, sd_future_resolve(f, -ECANCELED));
+        return r;
+}
+
+static int time_future_set_priority(sd_future *f, int64_t priority) {
+        TimeFuture *tf = ASSERT_PTR(sd_future_get_private(ASSERT_PTR(f)));
+        return sd_event_source_set_priority(tf->source, priority);
+}
+
+static const sd_future_ops time_future_ops = {
+        .size = sizeof(sd_future_ops),
+        .alloc = time_future_alloc,
+        .free = time_future_free,
+        .cancel = time_future_cancel,
+        .set_priority = time_future_set_priority,
+};
+
+static int time_handler(sd_event_source *s, usec_t usec, void *userdata) {
+        sd_future *f = ASSERT_PTR(userdata);
+        TimeFuture *tf = ASSERT_PTR(sd_future_get_private(f));
+
+        return sd_future_resolve(f, tf->result);
+}
+
+typedef int (*event_add_time_func)(
+                sd_event *e,
+                sd_event_source **ret,
+                clockid_t clock,
+                uint64_t usec,
+                uint64_t accuracy,
+                sd_event_time_handler_t callback,
+                void *userdata);
+
+static int future_new_time_internal(
+                event_add_time_func add_time,
+                sd_event *e,
+                clockid_t clock,
+                uint64_t usec,
+                uint64_t accuracy,
+                int result,
+                sd_future **ret) {
+
+        int r;
+
+        assert(add_time);
+        assert(e);
+        assert(ret);
+
+        if (IN_SET(sd_event_get_state(e), SD_EVENT_EXITING, SD_EVENT_FINISHED))
+                return -ECANCELED;
+
+        _cleanup_(sd_future_unrefp) sd_future *f = NULL;
+        r = sd_future_new(&time_future_ops, &f);
+        if (r < 0)
+                return r;
+
+        TimeFuture *tf = sd_future_get_private(f);
+        tf->result = result;
+
+        r = add_time(e, &tf->source, clock, usec, accuracy, time_handler, f);
+        if (r < 0)
+                return r;
+
+        if (sd_fiber_is_running()) {
+                int64_t priority;
+
+                r = sd_fiber_get_priority(&priority);
+                if (r < 0)
+                        return r;
+
+                r = sd_event_source_set_priority(tf->source, priority);
+                if (r < 0)
+                        return r;
+        }
+
+        *ret = TAKE_PTR(f);
+        return 0;
+}
+
+int future_new_time(sd_event *e, clockid_t clock, uint64_t usec, uint64_t accuracy, int result, sd_future **ret) {
+        return future_new_time_internal(sd_event_add_time, e, clock, usec, accuracy, result, ret);
+}
+
+int future_new_time_relative(sd_event *e, clockid_t clock, uint64_t usec, uint64_t accuracy, int result, sd_future **ret) {
+        return future_new_time_internal(sd_event_add_time_relative, e, clock, usec, accuracy, result, ret);
+}
diff --git a/src/libsystemd/sd-event/event-future.h b/src/libsystemd/sd-event/event-future.h
new file mode 100644 (file)
index 0000000..7e95690
--- /dev/null
@@ -0,0 +1,7 @@
+/* SPDX-License-Identifier: LGPL-2.1-or-later */
+#pragma once
+
+#include "sd-forward.h"
+
+int future_new_time(sd_event *e, clockid_t clock, uint64_t usec, uint64_t accuracy, int result, sd_future **ret);
+int future_new_time_relative(sd_event *e, clockid_t clock, uint64_t usec, uint64_t accuracy, int result, sd_future **ret);
diff --git a/src/libsystemd/sd-future/fiber.c b/src/libsystemd/sd-future/fiber.c
new file mode 100644 (file)
index 0000000..48d26a2
--- /dev/null
@@ -0,0 +1,812 @@
+/* SPDX-License-Identifier: LGPL-2.1-or-later */
+
+#include <pthread.h>
+#include <setjmp.h>
+#include <sys/mman.h>
+#include <sys/resource.h>
+#include <sys/uio.h>
+#include <threads.h>
+#include <ucontext.h>
+#include <unistd.h>
+
+#if HAVE_VALGRIND_VALGRIND_H
+#include <valgrind/valgrind.h>
+#endif
+
+#include "sd-event.h"
+#include "sd-future.h"
+
+#include "alloc-util.h"
+#include "architecture.h"
+#include "errno-util.h"
+#include "event-future.h"
+#include "log-context.h"
+#include "log.h"
+#include "memory-util.h"
+#include "pthread-util.h"
+#include "time-util.h"
+
+#if HAS_FEATURE_ADDRESS_SANITIZER
+#include <sanitizer/common_interface_defs.h>
+#endif
+
+/* glibc's _FORTIFY_SOURCE wraps siglongjmp() with __longjmp_chk, which asserts that the target SP is below
+ * the current SP. That assumption is incompatible with fiber switching, where the target SP lives on a
+ * separately-mmap'd stack and can be at any address relative to the caller. The fortify redirect happens
+ * in <setjmp.h>'s declaration of siglongjmp; we sidestep it by declaring our own alias that links
+ * directly to the unchecked "siglongjmp" symbol. musl doesn't fortify setjmp.h, so the alias is a plain
+ * synonym there. */
+_noreturn_ extern void siglongjmp_unchecked(sigjmp_buf env, int val) __asm__("siglongjmp");
+
+static thread_local Fiber *current_fiber = NULL;
+
+typedef enum FiberState {
+        FIBER_STATE_INITIAL,
+        FIBER_STATE_READY,
+        FIBER_STATE_SUSPENDED,
+        FIBER_STATE_CANCELLED,
+        FIBER_STATE_COMPLETED,
+        _FIBER_STATE_MAX,
+        _FIBER_STATE_INVALID = -EINVAL,
+} FiberState;
+
+typedef struct Fiber {
+        struct iovec stack;
+        sigjmp_buf context;             /* Where to jump to when entering or resuming the fiber. */
+        sigjmp_buf resume_context;      /* Where to jump back to when the fiber yields or completes. */
+
+        /* Caller's stack range, recorded by fiber_run() on each entry so the fiber's siglongjmp back
+         * out (in fiber_swap() or the trampoline's terminate path) can hand AddressSanitizer the
+         * destination stack info. With ucontext this comes for free via uc_link/uc_stack; sigjmp_buf
+         * is opaque and doesn't carry it. */
+        struct iovec resume_stack;
+
+        FiberState state;
+        int result;                     /* Either resume error code or final return value */
+
+        sd_future *floating;            /* Self-ref held while the fiber is floating; dropped on resolve. */
+
+        sd_event *event;
+        sd_event_source *defer_event_source;
+        sd_event_source *exit_event_source;
+
+        char *name;
+        int64_t priority;
+        sd_fiber_func_t func;
+        void *userdata;
+        sd_fiber_destroy_t destroy;
+
+        /* Storage for the swap performed in fiber_run(): while the fiber is suspended these hold the
+         * fiber's own log state; while it is running they hold the caller's log state. The active state
+         * always lives in the thread-locals in log.c / log-context.c. */
+        LIST_HEAD(LogContext, log_context);
+        size_t log_context_num_fields;
+        const char *log_prefix;
+
+#if HAVE_VALGRIND_VALGRIND_H
+        unsigned stack_id;
+#endif
+} Fiber;
+
+static Fiber* fiber_get_current(void) {
+        return current_fiber;
+}
+
+static void fiber_set_current(Fiber *f) {
+        current_fiber = f;
+}
+
+static int fiber_allocate_stack(size_t size, void **ret) {
+        void *stack = NULL;
+        int r;
+
+        /* The effective stack size is one page less than the given size, because we have to use
+         * one page as the guard page for the stack. */
+
+        assert(size > 0 && size % page_size() == 0);
+        assert(ret);
+
+        stack = mmap(/* addr= */ NULL, size,
+                     PROT_READ | PROT_WRITE,
+                     MAP_PRIVATE | MAP_ANONYMOUS | MAP_STACK,
+                     /* fd= */ -EBADF, /* offset= */ 0);
+        if (stack == MAP_FAILED)
+                return -errno;
+
+        /* Place the guard page where stack overflow will hit it: the high end on architectures
+         * where the stack grows up (PA-RISC), the low end everywhere else. fiber_stack_usable()
+         * mirrors this with the inverse offset. */
+        void *guard = STACK_GROWS_UP ? (uint8_t*) stack + size - page_size() : stack;
+
+        /* Prefer MADV_GUARD_INSTALL (Linux 6.13+): unlike mprotect(PROT_NONE) it doesn't split
+         * the VMA, so guard installation skips the mmap-lock contention and per-guard VMA cost.
+         * Fall back to mprotect on older kernels, which return EINVAL for unknown advice.
+         * FIXME: delete when baseline above 6.13. */
+        r = RET_NERRNO(madvise(guard, page_size(), MADV_GUARD_INSTALL));
+        if (r == -EINVAL)
+                r = RET_NERRNO(mprotect(guard, page_size(), PROT_NONE));
+        if (r < 0) {
+                (void) munmap(stack, size);
+                return r;
+        }
+
+        *ret = TAKE_PTR(stack);
+        return 0;
+}
+
+/* Usable stack range of a fiber: the full mmap region minus the guard page. Single source of
+ * truth for the layout assumed by fiber_allocate_stack(); every consumer (ucontext ss_sp,
+ * ASAN handoff iovecs, Valgrind stack registration) goes through here.
+ *
+ * iov_base is the lowest usable byte regardless of growth direction — that matches POSIX's
+ * definition of stack_t.ss_sp, so libc's makecontext() handles the direction for us. Only the
+ * guard page placement (and hence iov_base's offset within the mapping) varies. */
+static struct iovec fiber_stack_usable(const struct iovec *stack) {
+        assert(stack);
+        assert(stack->iov_len > page_size());
+        return (struct iovec) {
+                .iov_base = STACK_GROWS_UP ? stack->iov_base : (uint8_t*) stack->iov_base + page_size(),
+                .iov_len = stack->iov_len - page_size(),
+        };
+}
+
+static inline void start_switch_stack(void **fake_stack_save, const struct iovec *dest) {
+#if HAS_FEATURE_ADDRESS_SANITIZER
+        __sanitizer_start_switch_fiber(fake_stack_save,
+                                       dest ? dest->iov_base : NULL,
+                                       dest ? dest->iov_len : 0);
+#endif
+}
+
+static inline void finish_switch_stack(void *fake_stack_save) {
+#if HAS_FEATURE_ADDRESS_SANITIZER
+        __sanitizer_finish_switch_fiber(fake_stack_save, NULL, NULL);
+#endif
+}
+
+/* Refresh f->resume_stack from whoever is currently the running fiber, so the next siglongjmp() out
+ * of f (in the trampoline or fiber_swap()) can hand the right destination stack to ASAN. Must be
+ * called before fiber_set_current(f) — relies on fiber_get_current() returning the caller. */
+static void fiber_set_resume_stack(Fiber *f, Fiber *resume) {
+        assert(f);
+
+        if (resume)
+                f->resume_stack = fiber_stack_usable(&resume->stack);
+        else
+                f->resume_stack = (struct iovec) {};
+}
+
+_noreturn_ static void fiber_entry_point(void) {
+        Fiber *f = ASSERT_PTR(fiber_get_current());
+        void *fake_stack_save = NULL;
+
+        assert(f->func);
+        assert(IN_SET(f->state, FIBER_STATE_INITIAL, FIBER_STATE_READY, FIBER_STATE_CANCELLED));
+
+        finish_switch_stack(NULL);
+
+        /* Capture our resumable point on the fiber's stack, then bounce back to whoever last set
+         * f->resume_context. On bootstrap that's fiber_bootstrap(); on every subsequent yield it's
+         * the most recent fiber_run(). sigsetjmp(buf, 0) skips the signal-mask save: switching is
+         * thread-shared with respect to signal masks. */
+        if (sigsetjmp(f->context, /* savemask= */ 0) == 0) {
+                start_switch_stack(&fake_stack_save, &f->resume_stack);
+                siglongjmp_unchecked(f->resume_context, /* val= */ 1);
+        }
+
+        /* Re-entered for real via fiber_run()'s siglongjmp(f->context). */
+        finish_switch_stack(fake_stack_save);
+
+        /* Block scope so the cleanups attached to LOG_SET_PREFIX / LOG_CONTEXT_PUSH_KEY_VALUE fire
+         * before the siglongjmp below — siglongjmp skips _cleanup_ attributes, so we have to make
+         * sure the scope ends via a normal control-flow path first. */
+        {
+                LOG_SET_PREFIX(f->name);
+                LOG_CONTEXT_PUSH_KEY_VALUE("FIBER=", f->name);
+
+                f->result = f->state == FIBER_STATE_CANCELLED ? -ECANCELED : f->func(f->userdata);
+                f->state = FIBER_STATE_COMPLETED;
+        }
+
+        /* Pass NULL fake_stack_save to discard the fiber's fake stack since the fiber is done. */
+        start_switch_stack(NULL, &f->resume_stack);
+
+        /* Bounce back to whichever fiber_run() call most recently entered us. resume_context is
+         * per-fiber so nested fiber_run() — e.g. a bus method dispatched as a fiber handler while
+         * sd_event_loop() itself runs in a fiber — is safe. */
+        siglongjmp_unchecked(f->resume_context, 1);
+        assert_not_reached();
+}
+
+static int fiber_init(Fiber *f) {
+        ucontext_t old_uc, uc;
+        void *fake_stack_save = NULL;
+
+        assert(f);
+
+        if (getcontext(&uc) < 0)
+                return -errno;
+
+        struct iovec fiber_stack = fiber_stack_usable(&f->stack);
+
+        uc.uc_link = NULL;              /* Unused: trampoline siglongjmps out instead of returning. */
+        uc.uc_stack.ss_sp = fiber_stack.iov_base;
+        uc.uc_stack.ss_size = fiber_stack.iov_len;
+        uc.uc_stack.ss_flags = 0;
+
+        Fiber *prev = fiber_get_current();
+        fiber_set_current(f);
+
+        makecontext(&uc, fiber_entry_point, /* argc= */ 0);
+
+        fiber_set_resume_stack(f, prev);
+        if (sigsetjmp(f->resume_context, /* savemask= */ 0) == 0) {
+                start_switch_stack(&fake_stack_save, &fiber_stack);
+                if (swapcontext(&old_uc, &uc) < 0) {
+                        finish_switch_stack(fake_stack_save);
+                        fiber_set_current(prev);
+                        return -errno;
+                }
+                assert_not_reached();   /* Trampoline siglongjmps back; swapcontext doesn't return. */
+        }
+
+        finish_switch_stack(fake_stack_save);
+
+        fiber_set_current(prev);
+        return 0;
+}
+
+/* Swap the thread-local log prefix and log context with the values stashed in f. While the fiber is
+ * suspended, f holds the fiber's own log state; while it's running, f holds the caller's log state. The
+ * swap is its own inverse, so the same call drives both directions. */
+static void fiber_swap_log_state(Fiber *f) {
+        assert(f);
+        log_prefix_swap(&f->log_prefix);
+        log_context_swap(&f->log_context, &f->log_context_num_fields);
+}
+
+static void reset_current_fiber(void) {
+        /* Restore the caller's log state stashed in the running fiber (if any) before clearing
+         * current_fiber. Without this, the child of a fork() that happened mid-fiber would inherit the
+         * fiber's log prefix / context list in its thread-locals even though no fiber is running. */
+        Fiber *f = fiber_get_current();
+        if (f)
+                fiber_swap_log_state(f);
+        fiber_set_current(NULL);
+}
+
+static sd_event_source* fiber_current_event_source(Fiber *f) {
+        assert(f);
+        assert(f->state != FIBER_STATE_COMPLETED);
+        assert(f->event);
+
+        return sd_event_get_state(f->event) == SD_EVENT_EXITING ? f->exit_event_source : f->defer_event_source;
+}
+
+static int atfork_ret;
+
+static void install_atfork(void) {
+        /* __register_atfork() either returns 0 or -ENOMEM, in its glibc implementation. Since it's
+         * only half-documented (glibc doesn't document it but LSB does — though only superficially)
+         * we'll check for errors only in the most generic fashion possible. */
+        atfork_ret = pthread_atfork(/* prepare= */ NULL, /* parent= */ NULL, reset_current_fiber);
+        if (atfork_ret != 0)
+                log_debug_errno(atfork_ret, "pthread_atfork() failed: %m");
+}
+
+static void fiber_resolve(sd_future *f) {
+        Fiber *fiber = ASSERT_PTR(sd_future_get_private(ASSERT_PTR(f)));
+
+        fiber->defer_event_source = sd_event_source_disable_unref(fiber->defer_event_source);
+        fiber->exit_event_source = sd_event_source_disable_unref(fiber->exit_event_source);
+        /* The floating self-ref (if any) is potentially the last ref keeping the fiber alive — moving it
+         * into a local _cleanup_ slot ensures sd_future_resolve() runs callbacks and waiters while f is
+         * still valid; the local's cleanup drops the ref afterwards, at which point no further f->...
+         * access can happen. */
+        _unused_ _cleanup_(sd_future_unrefp) sd_future *floating = TAKE_PTR(fiber->floating);
+        sd_future_resolve(f, fiber->result);
+}
+
+static void fiber_enter(Fiber *fiber, Fiber *prev, void **fake_stack_save) {
+        fiber_set_current(fiber);
+        fiber_swap_log_state(fiber);
+
+        struct iovec fiber_stack = fiber_stack_usable(&fiber->stack);
+        start_switch_stack(fake_stack_save, &fiber_stack);
+        fiber_set_resume_stack(fiber, prev);
+}
+
+static void fiber_leave(Fiber *fiber, Fiber *prev, void *fake_stack_save) {
+        finish_switch_stack(fake_stack_save);
+        fiber_swap_log_state(fiber);
+        fiber_set_current(prev);
+}
+
+static int fiber_run(sd_future *f) {
+        Fiber *fiber = ASSERT_PTR(sd_future_get_private(ASSERT_PTR(f)));
+        int r;
+
+        if (fiber->state == FIBER_STATE_COMPLETED)
+                return -ESTALE;
+
+        assert(IN_SET(fiber->state, FIBER_STATE_INITIAL, FIBER_STATE_READY, FIBER_STATE_CANCELLED));
+
+        static pthread_once_t atfork_once = PTHREAD_ONCE_INIT;
+        r = pthread_once(&atfork_once, install_atfork);
+        if (r != 0)
+                return -r;
+        if (atfork_ret != 0)
+                return -atfork_ret;
+
+        LOG_SET_PREFIX(fiber->name);
+        LOG_CONTEXT_PUSH_KEY_VALUE("FIBER=", fiber->name);
+
+        log_debug("Scheduling fiber");
+
+        /* Save the previously-current fiber (if any) so we can restore it when this fiber yields or
+         * completes. This matters when fiber_run() is invoked from within another fiber (e.g. an
+         * sd-event dispatch that happens to be running inside a fiber context itself): the
+         * LOG_SET_PREFIX/LOG_CONTEXT_PUSH above attached to whichever fiber was current at that moment,
+         * and their scope-level cleanup must see the same fiber_get_current() when it runs to detach
+         * them from the correct list. */
+        Fiber *prev = fiber_get_current();
+        void *fake_stack_save = NULL;
+        fiber_enter(fiber, prev, &fake_stack_save);
+
+        /* This is where we start executing the fiber. Once it yields, we continue here as if nothing
+         * happened. resume_context captures this point; the fiber siglongjmps back to it. */
+        if (sigsetjmp(fiber->resume_context, 0) == 0)
+                siglongjmp_unchecked(fiber->context, 1);
+
+        fiber_leave(fiber, prev, fake_stack_save);
+
+        switch (fiber->state) {
+
+        case FIBER_STATE_COMPLETED:
+                if (fiber->result < 0 && fiber->result != -ECANCELED)
+                        log_debug_errno(fiber->result, "Fiber failed with error: %m");
+                else
+                        log_debug("Fiber finished executing");
+
+                fiber_resolve(f);
+                break;
+
+        case FIBER_STATE_CANCELLED:
+        case FIBER_STATE_READY:
+                log_debug("Fiber yielded execution");
+
+                r = sd_event_source_set_enabled(fiber_current_event_source(fiber), SD_EVENT_ONESHOT);
+                if (r < 0)
+                        return r;
+                break;
+
+        case FIBER_STATE_SUSPENDED:
+                log_debug("Fiber suspended execution");
+                /* Fiber is waiting for something - don't re-queue it */
+                break;
+
+        default:
+                assert_not_reached();
+        }
+
+        return 0;
+}
+
+static int fiber_cancel(sd_future *f) {
+        Fiber *fiber = ASSERT_PTR(sd_future_get_private(ASSERT_PTR(f)));
+        int r;
+
+        assert(fiber != fiber_get_current());
+
+        if (IN_SET(fiber->state, FIBER_STATE_COMPLETED, FIBER_STATE_CANCELLED))
+                return 0;
+
+        if (fiber->state == FIBER_STATE_INITIAL) {
+                /* The fiber's stack was allocated but never entered, so there are no scope-level cleanups
+                 * waiting to run. Skip the dispatch round-trip that would just have fiber_entry_point()
+                 * fall straight through with -ECANCELED, and settle the future right here — mirroring the
+                 * FIBER_STATE_COMPLETED branch of fiber_run(). */
+                fiber->result = -ECANCELED;
+                fiber->state = FIBER_STATE_COMPLETED;
+                fiber_resolve(f);
+                return 1;
+        }
+
+        /* Once we cancel a fiber, we want to immediately resume it with -ECANCELED. */
+        r = sd_event_source_set_enabled(fiber_current_event_source(fiber), SD_EVENT_ONESHOT);
+        if (r < 0)
+                return r;
+
+        fiber->state = FIBER_STATE_CANCELLED;
+
+        return 1;
+}
+
+static int fiber_on_defer(sd_event_source *s, void *userdata) {
+        sd_future *f = ASSERT_PTR(userdata);
+        return fiber_run(f);
+}
+
+static int fiber_on_exit(sd_event_source *s, void *userdata) {
+        sd_future *f = ASSERT_PTR(userdata);
+        Fiber *fiber = ASSERT_PTR(sd_future_get_private(f));
+        int r;
+
+        /* The fiber may already have completed via the regular defer path before sd_event_exit()
+         * fires the exit source; in that case there's nothing left to drive and we'd otherwise
+         * trip fiber_run()'s -ESTALE return, which sd_event would log spuriously and disable the
+         * source for. */
+        if (fiber->state == FIBER_STATE_COMPLETED)
+                return 0;
+
+        /* If fiber_cancel() returned 1 the fiber was just marked cancelled and its deferred/exit event
+         * source was re-armed; we let the event loop dispatch that source on the next iteration so it goes
+         * through the normal fiber_on_defer/fiber_on_exit path rather than running it recursively here. */
+        r = fiber_cancel(f);
+        if (r != 0)
+                return r;
+
+        return fiber_run(f);
+}
+
+static void* fiber_alloc(void) {
+        return new0(Fiber, 1);
+}
+
+static void fiber_free(sd_future *f) {
+        Fiber *fiber = ASSERT_PTR(sd_future_get_private(f));
+
+        /* To make sure all memory is deallocated, the fiber has to have completed by the time we free it to
+         * make sure its stack has finished unwinding (which will invoke the registered cleanup functions).
+         * As this function may get called when not running on a fiber ourselves, we can't guarantee here
+         * that we can run the fiber to completion ourselves, so we insist that this happens before we get
+         * here. To ensure fibers are cleaned up before exiting the event loop, exit handlers are added for
+         * fibers created outside of existing fibers. For fibers created within running fibers, unwinding the
+         * outer fiber should take care of cleaning up any created child fibers (for example using
+         * sd_future_cancel_wait_unref()).
+         *
+         * FIBER_STATE_INITIAL is also accepted: the stack was allocated but never entered, so there are no
+         * registered cleanups to run. This covers the partial-construction failure path in sd_fiber_new()
+         * as well as fibers that are unrefed before the event loop ever dispatches them. */
+        assert(IN_SET(fiber->state, FIBER_STATE_INITIAL, FIBER_STATE_COMPLETED));
+
+        if (fiber->destroy)
+                fiber->destroy(fiber->userdata);
+
+#if HAVE_VALGRIND_VALGRIND_H
+        if (fiber->stack.iov_base)
+                VALGRIND_STACK_DEREGISTER(fiber->stack_id);
+#endif
+
+        if (fiber->stack.iov_base)
+                (void) munmap(fiber->stack.iov_base, fiber->stack.iov_len);
+
+        sd_event_source_disable_unref(fiber->defer_event_source);
+        sd_event_source_disable_unref(fiber->exit_event_source);
+        sd_event_unref(fiber->event);
+
+        free(fiber->name);
+        free(fiber);
+}
+
+sd_future* sd_fiber_get_current(void) {
+        Fiber *f = fiber_get_current();
+        if (!f)
+                return NULL;
+
+        return sd_event_source_get_userdata(fiber_current_event_source(f));
+}
+
+int sd_fiber_is_running(void) {
+        return !!fiber_get_current();
+}
+
+sd_event* sd_fiber_get_event(void) {
+        Fiber *f = fiber_get_current();
+        assert_return(f, NULL);
+        return f->event;
+}
+
+int sd_fiber_get_priority(int64_t *ret) {
+        Fiber *f = fiber_get_current();
+
+        assert_return(ret, -EINVAL);
+        assert_return(f, -ESRCH);
+
+        *ret = f->priority;
+        return 0;
+}
+
+static int fiber_swap(FiberState state) {
+        Fiber *f = ASSERT_PTR(fiber_get_current());
+
+        f->state = state;
+
+        void *fake_stack_save = NULL;
+
+        if (sigsetjmp(f->context, 0) == 0) {
+                start_switch_stack(&fake_stack_save, &f->resume_stack);
+                siglongjmp_unchecked(f->resume_context, 1);
+        }
+
+        finish_switch_stack(fake_stack_save);
+
+        /* When we get here, we've been resumed. */
+
+        if (f->state == FIBER_STATE_CANCELLED)
+                return -ECANCELED;
+
+        /* sd_fiber_resume() stashes the resumer's value (an async wakeup error from a deadline
+         * timer, an io_uring CQE result, etc.) into f->result for us to surface here. Consume it
+         * unconditionally so it doesn't pollute subsequent suspends or the fiber's eventual return
+         * value — both negative errors and positive payloads (byte counts, accepted fds, revents
+         * masks) are valid resume values. */
+        return TAKE_GENERIC(f->result, int, 0);
+}
+
+int sd_fiber_yield(void) {
+        assert_return(fiber_get_current(), -ESRCH);
+        return fiber_swap(FIBER_STATE_READY);
+}
+
+int sd_fiber_suspend(void) {
+        assert_return(fiber_get_current(), -ESRCH);
+        return fiber_swap(FIBER_STATE_SUSPENDED);
+}
+
+static int fiber_set_priority(sd_future *f, int64_t priority) {
+        Fiber *fiber = ASSERT_PTR(sd_future_get_private(ASSERT_PTR(f)));
+        int r = 0;
+
+        if (fiber->defer_event_source)
+                RET_GATHER(r, sd_event_source_set_priority(fiber->defer_event_source, priority));
+
+        if (fiber->exit_event_source)
+                RET_GATHER(r, sd_event_source_set_priority(fiber->exit_event_source, priority));
+
+        if (r >= 0)
+                fiber->priority = priority;
+
+        return r;
+}
+
+static const sd_future_ops fiber_future_ops;
+
+int sd_fiber_resume(sd_future *f, int result) {
+        assert_return(f, -EINVAL);
+        assert_return(sd_future_get_ops(f) == &fiber_future_ops, -EINVAL);
+
+        Fiber *fiber = ASSERT_PTR(sd_future_get_private(f));
+
+        if (fiber->state != FIBER_STATE_SUSPENDED)
+                return 0;
+
+        /* Stash the result so fiber_swap() returns it from sd_fiber_suspend(). */
+        fiber->result = result;
+        fiber->state = FIBER_STATE_READY;
+        return sd_event_source_set_enabled(fiber_current_event_source(fiber), SD_EVENT_ONESHOT);
+}
+
+/* The fiber_future ops pass the Fiber pointer through as the future's private state. The fiber resolves
+ * its own future once it finishes running, so fiber_cancel() intentionally does not resolve. */
+static const sd_future_ops fiber_future_ops = {
+        .size = sizeof(sd_future_ops),
+        .alloc = fiber_alloc,
+        .free = fiber_free,
+        .cancel = fiber_cancel,
+        .set_priority = fiber_set_priority,
+};
+
+int sd_fiber_new(sd_event *e, const char *name, sd_fiber_func_t func, void *userdata, sd_fiber_destroy_t destroy, sd_future **ret) {
+        int r;
+
+        assert_return(e, -EINVAL);
+        assert_return(name, -EINVAL);
+        assert_return(func, -EINVAL);
+
+        if (IN_SET(sd_event_get_state(e), SD_EVENT_EXITING, SD_EVENT_FINISHED))
+                return -ECANCELED;
+
+        _cleanup_(sd_future_unrefp) sd_future *f = NULL;
+        r = sd_future_new(&fiber_future_ops, &f);
+        if (r < 0)
+                return r;
+
+        Fiber *fiber = ASSERT_PTR(sd_future_get_private(f));
+
+        struct rlimit rl = { RLIM_INFINITY, RLIM_INFINITY };
+        if (getrlimit(RLIMIT_STACK, &rl) < 0)
+                log_debug_errno(errno, "Reading RLIMIT_STACK failed, ignoring: %m");
+        if (rl.rlim_cur == RLIM_INFINITY)
+                rl.rlim_cur = 8U * U64_MB; /* Same as the default thread stack size */
+
+        /* Reserve room for the guard page so the usable region stays above PTHREAD_STACK_MIN, which
+         * is what libc/pthread routines (e.g. TLS setup on musl) assume. */
+        size_t stack_len = ROUND_UP(rl.rlim_cur, page_size());
+        if (stack_len < (size_t) PTHREAD_STACK_MIN + page_size())
+                stack_len = ROUND_UP((size_t) PTHREAD_STACK_MIN + page_size(), page_size());
+
+        *fiber = (Fiber) {
+                .stack.iov_len = stack_len,
+                .state = FIBER_STATE_INITIAL,
+                .name = strdup(name),
+                .func = func,
+                .userdata = userdata,
+                .event = sd_event_ref(e),
+        };
+        if (!fiber->name)
+                return -ENOMEM;
+
+        r = fiber_allocate_stack(fiber->stack.iov_len, &fiber->stack.iov_base);
+        if (r < 0)
+                return r;
+
+#if HAVE_VALGRIND_VALGRIND_H
+        /* Register the usable stack range (above the guard page) before fiber_bootstrap() so the
+         * trampoline's first sigsetjmp doesn't trip Valgrind's stack-tracking heuristics. */
+        struct iovec usable = fiber_stack_usable(&fiber->stack);
+        fiber->stack_id = VALGRIND_STACK_REGISTER(
+                        usable.iov_base,
+                        (uint8_t*) usable.iov_base + usable.iov_len);
+#endif
+
+        r = fiber_init(fiber);
+        if (r < 0)
+                return r;
+
+        /* Execution of the fiber is driven by two event sources, one deferred, one exit. The exit event
+         * source kicks in when sd_event_exit() is called, as from that point onwards only exit event
+         * sources will be dispatched. */
+
+        r = sd_event_add_defer(e, &fiber->defer_event_source, fiber_on_defer, f);
+        if (r < 0)
+                return r;
+
+        r = sd_event_source_set_description(fiber->defer_event_source, fiber->name);
+        if (r < 0)
+                return r;
+
+        r = sd_event_add_exit(e, &fiber->exit_event_source, fiber_on_exit, f);
+        if (r < 0)
+                return r;
+
+        r = sd_event_source_set_description(fiber->exit_event_source, fiber->name);
+        if (r < 0)
+                return r;
+
+        /* If we're on a fiber, we'll rely on the parent fiber to cancel this fiber if the event loop is
+         * exiting. Otherwise, we'll trigger cancellation of this fiber via the exit event source. Why cancel
+         * via the exit event source? We can only run the fiber while the event loop is active, so we need to
+         * make sure all fibers finish running before the event loop is finished, which an exit event source
+         * allows us to do. */
+        r = sd_event_source_set_enabled(fiber->exit_event_source, sd_fiber_is_running() ? SD_EVENT_OFF : SD_EVENT_ONESHOT);
+        if (r < 0)
+                return r;
+
+        /* Stays in FIBER_STATE_INITIAL until the event loop first dispatches it via fiber_run(). */
+
+        if (ret)
+                *ret = TAKE_PTR(f);
+        else {
+                /* Fire-and-forget: the fiber is guaranteed to resolve (via completion, cancellation, or
+                 * the event loop exit handler), so making the future floating cleans it up. */
+                r = sd_fiber_set_floating(f, true);
+                if (r < 0)
+                        return r;
+        }
+
+        /* We only take ownership of the given userdata pointer on success so assign the destroy callback
+         * at the very end so we don't clean up the userdata pointer on failure. */
+        fiber->destroy = destroy;
+
+        return 0;
+}
+
+int sd_fiber_set_floating(sd_future *f, int b) {
+        assert_return(f, -EINVAL);
+        assert_return(sd_future_get_ops(f) == &fiber_future_ops, -EINVAL);
+
+        Fiber *fiber = ASSERT_PTR(sd_future_get_private(f));
+
+        if (!!fiber->floating == !!b)
+                return 0;
+
+        /* The floating self-ref keeps the future alive until the fiber resolves; fiber_run() drops it
+         * in the COMPLETED branch. Only valid for fiber futures because fibers uniquely guarantee
+         * resolution (via completion, cancellation, or the event loop exit handler). */
+        if (b)
+                fiber->floating = sd_future_ref(f);
+        else
+                fiber->floating = sd_future_unref(fiber->floating);
+
+        return 0;
+}
+
+int sd_fiber_get_floating(sd_future *f) {
+        assert_return(f, -EINVAL);
+        assert_return(sd_future_get_ops(f) == &fiber_future_ops, -EINVAL);
+
+        Fiber *fiber = ASSERT_PTR(sd_future_get_private(f));
+        return !!fiber->floating;
+}
+
+int sd_fiber_sleep(uint64_t usec) {
+        Fiber *f = fiber_get_current();
+        int r;
+
+        if (!f)
+                return usleep_safe(usec);
+
+        if (usec == 0)
+                return sd_fiber_yield();
+
+        /* Match usleep_safe(USEC_INFINITY): suspend indefinitely. Passing USEC_INFINITY to
+         * sd_event_add_time_relative() would overflow into -EOVERFLOW. */
+        if (usec == USEC_INFINITY)
+                return sd_fiber_suspend();
+
+        assert(f->event);
+
+        _cleanup_(sd_future_cancel_wait_unrefp) sd_future *timer = NULL;
+        r = future_new_time_relative(
+                        f->event,
+                        CLOCK_MONOTONIC,
+                        usec,
+                        /* accuracy= */ 1,
+                        /* result= */ 0,
+                        &timer);
+        if (r < 0)
+                return r;
+
+        return sd_fiber_suspend();
+}
+
+int sd_fiber_await(sd_future *target) {
+        sd_future *f = sd_fiber_get_current();
+        int r;
+
+        assert_return(f, -ESRCH);
+        assert_return(target, -EINVAL);
+        assert_return(target != f, -EDEADLK);
+
+        Fiber *fiber = ASSERT_PTR(sd_future_get_private(f));
+
+        if (sd_future_state(target) == SD_FUTURE_RESOLVED)
+                return sd_future_result(target);
+
+        /* Note that we do allow waiting for other fibers when the event loop is exiting, since waiting for
+         * other fibers does not require adding new event sources to the event loop. */
+        if (sd_event_get_state(fiber->event) == SD_EVENT_FINISHED)
+                return -ECANCELED;
+
+        _cleanup_(sd_future_cancel_wait_unrefp) sd_future *wait = NULL;
+        r = sd_future_new_wait(target, &wait);
+        if (r < 0)
+                return r;
+
+        return sd_fiber_suspend();
+}
+
+sd_future* sd_fiber_timeout(uint64_t timeout) {
+        Fiber *fiber = fiber_get_current();
+        int r;
+
+        assert_return(fiber, NULL);
+
+        if (timeout == USEC_INFINITY)
+                return NULL;
+
+        sd_future *timer;
+        r = future_new_time_relative(
+                        fiber->event,
+                        CLOCK_MONOTONIC,
+                        timeout,
+                        /* accuracy= */ 1,
+                        /* result= */ -ETIME,
+                        &timer);
+        if (r < 0)
+                return NULL; /* On allocation failure no timer is armed and the scope becomes a no-op.
+                              * Errors here are rare; if the caller cares they can compare to NULL. */
+
+        return timer;
+}
diff --git a/src/libsystemd/sd-future/sd-future.c b/src/libsystemd/sd-future/sd-future.c
new file mode 100644 (file)
index 0000000..ba3a1c5
--- /dev/null
@@ -0,0 +1,263 @@
+/* SPDX-License-Identifier: LGPL-2.1-or-later */
+
+#include "sd-future.h"
+
+#include "alloc-util.h"
+#include "errno-util.h"
+#include "log.h"
+#include "macro.h"
+#include "set.h"
+
+struct sd_future {
+        unsigned n_ref;
+
+        int state;
+        int result;
+
+        Set *waiters;
+
+        sd_future_func_t callback;
+        void *userdata;
+
+        const sd_future_ops *ops;
+
+        /* Opaque per-future state owned by the future implementation (the code that called
+         * sd_future_new()). The ops vtable above receives this pointer in its callbacks, and
+         * external code can fetch it via sd_future_get_private(). */
+        void *private;
+};
+
+static int fiber_resume_trampoline(sd_future *f) {
+        /* The future's result is what the fiber should resume with. Impls choose the value at
+         * resolution time — e.g. a deadline timer resolves with -ETIME, a wait future resolves
+         * with the target's result, a normal IO/sleep future resolves with 0 on success. */
+        return sd_fiber_resume(sd_future_get_userdata(f), sd_future_result(f));
+}
+
+int sd_future_resolve(sd_future *f, int result) {
+        int r = 0;
+
+        assert_return(f, -EINVAL);
+
+        if (f->state != SD_FUTURE_PENDING)
+                return 0;
+
+        /* Hold a self-ref across callback/waiter dispatch: callbacks (e.g. bus_fiber_resolved()
+         * dropping the tracking-set's ref) may legitimately release what would otherwise be the
+         * last reference, and we still access f->waiters below. The cleanup unrefs at scope exit,
+         * which is when freeing is safe again. */
+        _unused_ _cleanup_(sd_future_unrefp) sd_future *self = sd_future_ref(f);
+
+        f->state = SD_FUTURE_RESOLVED;
+        f->result = result;
+
+        if (f->callback)
+                RET_GATHER(r, f->callback(f));
+
+        /* We'd like the set to not be modified while iterating over it, hence take ownership over it in
+         * a local variable. Otherwise code invoked via sd_future_resolve() could try to modify the set while
+         * we're iterating over it (for example wait_future_free()). */
+        Set *waiters = TAKE_PTR(f->waiters);
+        sd_future *w;
+        SET_FOREACH(w, waiters)
+                RET_GATHER(r, sd_future_resolve(w, result));
+
+        set_free(waiters);
+
+        return r;
+}
+
+static sd_future* sd_future_free(sd_future *f) {
+        if (!f)
+                return NULL;
+
+        if (f->state == SD_FUTURE_PENDING)
+                sd_future_resolve(f, -ECANCELED);
+
+        set_free(f->waiters);
+
+        if (f->ops->free)
+                f->ops->free(f);
+
+        return mfree(f);
+}
+
+DEFINE_TRIVIAL_REF_UNREF_FUNC(sd_future, sd_future, sd_future_free);
+DEFINE_POINTER_ARRAY_CLEAR_FUNC(sd_future*, sd_future_unref);
+DEFINE_POINTER_ARRAY_FREE_FUNC(sd_future*, sd_future_unref);
+
+sd_future* sd_future_cancel_wait_unref(sd_future *f) {
+        int r;
+
+        if (!f)
+                return NULL;
+
+        /* We have to be able to suspend until the fiber we're waiting for finishes, and that's only
+         * possible if we're running on a fiber ourselves. */
+        if (!sd_fiber_is_running())
+                return sd_future_unref(f);
+
+        r = sd_future_cancel(f);
+        if (r < 0)
+                log_debug_errno(r, "Failed to cancel future, ignoring: %m");
+
+        if (f->state == SD_FUTURE_PENDING) {
+                /* Fast path: when f's resolve callback already targets the current fiber (the default for
+                 * futures created on this fiber), we can suspend directly and let the existing trampoline
+                 * wake us up — no need to allocate a wait future just to learn about the resolution.
+                 * Otherwise fall back to sd_fiber_await() which sets up an explicit waiter. */
+                if (f->callback == fiber_resume_trampoline && f->userdata == sd_fiber_get_current())
+                        r = sd_fiber_suspend();
+                else
+                        r = sd_fiber_await(f);
+                if (r < 0 && r != -ECANCELED)
+                        log_debug_errno(r, "Failed to wait for future to finish, ignoring: %m");
+        }
+
+        return sd_future_unref(f);
+}
+
+DEFINE_POINTER_ARRAY_CLEAR_FUNC(sd_future*, sd_future_cancel_wait_unref);
+DEFINE_POINTER_ARRAY_FREE_FUNC(sd_future*, sd_future_cancel_wait_unref);
+
+int sd_future_new(const sd_future_ops *ops, sd_future **ret) {
+        assert_return(ops, -EINVAL);
+        assert_return(ops->size >= endoffsetof_field(sd_future_ops, set_priority), -EINVAL);
+        assert_return(ops->alloc, -EINVAL);
+        assert_return(ops->free, -EINVAL);
+        assert_return(ret, -EINVAL);
+
+        sd_future *f = new(sd_future, 1);
+        if (!f)
+                return -ENOMEM;
+
+        *f = (sd_future) {
+                .n_ref = 1,
+                .state = SD_FUTURE_PENDING,
+                .ops = ops,
+        };
+
+        f->private = ops->alloc();
+        if (!f->private) {
+                free(f);
+                return -ENOMEM;
+        }
+
+        /* If we're being created on a fiber, default the callback to resuming that fiber on resolve —
+         * this is almost always what you want, and it saves the usual set_callback boilerplate before
+         * sd_fiber_suspend(). Callers that want different behavior can override with
+         * sd_future_set_callback(). */
+        sd_future *fiber = sd_fiber_get_current();
+        if (fiber)
+                (void) sd_future_set_callback(f, fiber_resume_trampoline, fiber);
+
+        *ret = f;
+        return 0;
+}
+
+int sd_future_state(sd_future *f) {
+        assert_return(f, -EINVAL);
+        return f->state;
+}
+
+int sd_future_result(sd_future *f) {
+        assert_return(f, -EINVAL);
+        assert_return(f->state == SD_FUTURE_RESOLVED, -EBUSY);
+        return f->result;
+}
+
+void* sd_future_get_userdata(sd_future *f) {
+        assert_return(f, NULL);
+        return f->userdata;
+}
+
+void* sd_future_get_private(sd_future *f) {
+        assert_return(f, NULL);
+        return f->private;
+}
+
+const sd_future_ops* sd_future_get_ops(sd_future *f) {
+        assert_return(f, NULL);
+        return f->ops;
+}
+
+int sd_future_set_callback(sd_future *f, sd_future_func_t callback, void *userdata) {
+        assert_return(f, -EINVAL);
+
+        f->callback = callback;
+        f->userdata = userdata;
+        return 0;
+}
+
+int sd_future_set_priority(sd_future *f, int64_t priority) {
+        assert_return(f, -EINVAL);
+        assert_return(f->state == SD_FUTURE_PENDING, -ESTALE);
+        assert_return(f->ops->set_priority, -EOPNOTSUPP);
+
+        return f->ops->set_priority(f, priority);
+}
+
+int sd_future_cancel(sd_future *f) {
+        assert_return(f, -EINVAL);
+        assert_return(f->ops->cancel, -EOPNOTSUPP);
+
+        if (f->state == SD_FUTURE_RESOLVED)
+                return 0;
+
+        return f->ops->cancel(f);
+}
+
+typedef struct WaitFuture {
+        sd_future *target;
+} WaitFuture;
+
+static void* wait_future_alloc(void) {
+        return new0(WaitFuture, 1);
+}
+
+static void wait_future_free(sd_future *f) {
+        WaitFuture *wf = ASSERT_PTR(sd_future_get_private(ASSERT_PTR(f)));
+
+        set_remove(wf->target->waiters, f);
+        sd_future_unref(wf->target);
+        free(wf);
+}
+
+static int wait_future_cancel(sd_future *f) {
+        WaitFuture *wf = ASSERT_PTR(sd_future_get_private(ASSERT_PTR(f)));
+
+        set_remove(wf->target->waiters, f);
+        return sd_future_resolve(f, -ECANCELED);
+}
+
+static const sd_future_ops wait_future_ops = {
+        .size = sizeof(sd_future_ops),
+        .alloc = wait_future_alloc,
+        .free = wait_future_free,
+        .cancel = wait_future_cancel,
+};
+
+int sd_future_new_wait(sd_future *target, sd_future **ret) {
+        int r;
+
+        assert_return(target, -EINVAL);
+        assert_return(ret, -EINVAL);
+
+        _cleanup_(sd_future_unrefp) sd_future *f = NULL;
+        r = sd_future_new(&wait_future_ops, &f);
+        if (r < 0)
+                return r;
+
+        WaitFuture *wf = sd_future_get_private(f);
+        wf->target = sd_future_ref(target);
+
+        if (target->state == SD_FUTURE_RESOLVED)
+                r = sd_future_resolve(f, target->result);
+        else
+                r = set_ensure_put(&target->waiters, &trivial_hash_ops, f);
+        if (r < 0)
+                return r;
+
+        *ret = TAKE_PTR(f);
+        return 0;
+}
diff --git a/src/libsystemd/sd-future/test-fiber.c b/src/libsystemd/sd-future/test-fiber.c
new file mode 100644 (file)
index 0000000..2760b91
--- /dev/null
@@ -0,0 +1,1171 @@
+/* SPDX-License-Identifier: LGPL-2.1-or-later */
+
+#include <signal.h>
+
+#if HAVE_VALGRIND_VALGRIND_H
+#  include <valgrind/valgrind.h>
+#endif
+
+#include "sd-event.h"
+#include "sd-future.h"
+
+#include "architecture.h"
+#include "log-context.h"
+#include "memory-util.h"
+#include "pidref.h"
+#include "process-util.h"
+#include "tests.h"
+#include "time-util.h"
+
+static int simple_fiber(void *userdata) {
+        int *value = ASSERT_PTR(userdata);
+        return *value;
+}
+
+TEST(fiber_simple) {
+        _cleanup_(sd_event_unrefp) sd_event *e = NULL;
+        ASSERT_OK(sd_event_new(&e));
+        ASSERT_OK(sd_event_set_exit_on_idle(e, true));
+
+        _cleanup_(sd_future_unrefp) sd_future *f = NULL;
+        int value = 5;
+        ASSERT_OK(sd_fiber_new(e, "simple", simple_fiber, &value, NULL, &f));
+        ASSERT_OK(sd_event_loop(e));
+        ASSERT_EQ(sd_future_result(f), 5);
+}
+
+/* Fiber that yields once */
+static int yielding_fiber(void *userdata) {
+        int *counter = userdata;
+        (*counter)++;
+
+        sd_fiber_yield();
+
+        (*counter)++;
+        return 0;
+}
+
+/* Test: Single fiber that yields */
+TEST(fiber_single_yield) {
+        _cleanup_(sd_event_unrefp) sd_event *e = NULL;
+        ASSERT_OK(sd_event_new(&e));
+        ASSERT_OK(sd_event_set_exit_on_idle(e, true));
+
+        _cleanup_(sd_future_unrefp) sd_future *f = NULL;
+        int counter = 0;
+        ASSERT_OK(sd_fiber_new(e, "yielding", yielding_fiber, &counter, NULL, &f));
+
+        /* First iteration: fiber runs until first yield */
+        ASSERT_EQ(counter, 0);
+        ASSERT_OK_POSITIVE(sd_event_run(e, 0));
+        ASSERT_EQ(counter, 1);
+
+        /* Second iteration: fiber runs from yield to completion */
+        ASSERT_OK_POSITIVE(sd_event_run(e, 0));
+        ASSERT_EQ(counter, 2);
+
+        /* No more fibers to run */
+        ASSERT_OK_ZERO(sd_event_loop(e));
+}
+
+static int counting_fiber(void *userdata) {
+        int counter = 0;
+
+        for (int i = 0; i < 5; i++) {
+                counter++;
+                sd_fiber_yield();
+        }
+
+        return counter;
+}
+
+/* Test: Multiple fibers yielding cooperatively */
+TEST(fiber_multiple_yield) {
+        _cleanup_(sd_event_unrefp) sd_event *e = NULL;
+        ASSERT_OK(sd_event_new(&e));
+        ASSERT_OK(sd_event_set_exit_on_idle(e, true));
+
+        sd_future *fibers[5] = {};
+        CLEANUP_ELEMENTS(fibers, sd_future_unref_array_clear);
+
+        for (size_t i = 0; i < ELEMENTSOF(fibers); i++) {
+                _cleanup_free_ char *name = NULL;
+                ASSERT_OK(asprintf(&name, "counting-%zu", i));
+                ASSERT_OK(sd_fiber_new(e, name, counting_fiber, NULL, NULL, &fibers[i]));
+        }
+
+        ASSERT_OK(sd_event_loop(e));
+
+        for (size_t i = 0; i < ELEMENTSOF(fibers); i++)
+                ASSERT_OK_EQ(sd_future_result(fibers[i]), 5);
+}
+
+static int priority_fiber(void *userdata) {
+        int *counter = ASSERT_PTR(userdata);
+
+        (*counter)++;
+        sd_fiber_yield();
+
+        return *counter;
+}
+
+/* Test: Priority-based scheduling */
+TEST(fiber_priority_ascending) {
+        _cleanup_(sd_event_unrefp) sd_event *e = NULL;
+        ASSERT_OK(sd_event_new(&e));
+        ASSERT_OK(sd_event_set_exit_on_idle(e, true));
+
+        sd_future *fibers[5] = {};
+        CLEANUP_ELEMENTS(fibers, sd_future_unref_array_clear);
+        int counter = 0;
+
+        for (size_t i = 0; i < ELEMENTSOF(fibers); i++) {
+                _cleanup_free_ char *name = NULL;
+                ASSERT_OK(asprintf(&name, "priority-%zu", i));
+                ASSERT_OK(sd_fiber_new(e, name, priority_fiber, &counter, NULL, &fibers[i]));
+                ASSERT_OK(sd_future_set_priority(fibers[i], i));
+        }
+
+        ASSERT_OK(sd_event_loop(e));
+
+        /* The fibers have ascending priorities, so we the first one to run to completion,
+         * followed by the second one, etc. */
+
+        for (size_t i = 0; i < ELEMENTSOF(fibers); i++)
+                ASSERT_EQ(sd_future_result(fibers[i]), (int) i + 1);
+}
+
+TEST(fiber_priority_identical) {
+        _cleanup_(sd_event_unrefp) sd_event *e = NULL;
+        ASSERT_OK(sd_event_new(&e));
+        ASSERT_OK(sd_event_set_exit_on_idle(e, true));
+
+        sd_future *fibers[5] = {};
+        CLEANUP_ELEMENTS(fibers, sd_future_unref_array_clear);
+        int counter = 0;
+
+        for (size_t i = 0; i < ELEMENTSOF(fibers); i++) {
+                _cleanup_free_ char *name = NULL;
+                ASSERT_OK(asprintf(&name, "priority-%zu", i));
+                ASSERT_OK(sd_fiber_new(e, name, priority_fiber, &counter, NULL, &fibers[i]));
+        }
+
+        ASSERT_OK(sd_event_loop(e));
+
+        /* The fibers have the same priorities, so we expect all of them to run once first, and then they'll
+         * all run again another time, so they should all return the same value. */
+
+        for (size_t i = 0; i < ELEMENTSOF(fibers); i++)
+                ASSERT_EQ(sd_future_result(fibers[i]), (int) 5);
+}
+
+static int error_fiber(void *userdata) {
+        return -ENOENT;
+}
+
+TEST(fiber_error_return) {
+        _cleanup_(sd_event_unrefp) sd_event *e = NULL;
+        ASSERT_OK(sd_event_new(&e));
+        ASSERT_OK(sd_event_set_exit_on_idle(e, true));
+
+        _cleanup_(sd_future_unrefp) sd_future *f = NULL;
+        ASSERT_OK(sd_fiber_new(e, "error", error_fiber, NULL, NULL, &f));
+
+        ASSERT_OK(sd_event_loop(e));
+        ASSERT_EQ(sd_future_result(f), -ENOENT);
+}
+
+static int cancel_fiber(void *userdata) {
+        return sd_fiber_yield();
+}
+
+TEST(fiber_cancel_basic) {
+        _cleanup_(sd_event_unrefp) sd_event *e = NULL;
+        ASSERT_OK(sd_event_new(&e));
+        ASSERT_OK(sd_event_set_exit_on_idle(e, true));
+
+        _cleanup_(sd_future_unrefp) sd_future *f = NULL;
+        int value = 42;
+        ASSERT_OK(sd_fiber_new(e, "cancel", cancel_fiber, &value, NULL, &f));
+
+        ASSERT_OK(sd_future_cancel(f));
+        ASSERT_OK(sd_event_loop(e));
+        ASSERT_ERROR(sd_future_result(f), ECANCELED);
+}
+
+static int fiber_that_yields(void *userdata) {
+        int *yield_count = userdata;
+        int r;
+
+        for (int i = 0; i < 5; i++) {
+                (*yield_count)++;
+                r = sd_fiber_yield();
+                if (r < 0)
+                        return r;  /* Propagate cancellation error */
+        }
+
+        return 0;
+}
+
+/* Test: fiber_yield() returns error when fiber is cancelled externally */
+TEST(fiber_cancel_propagation_via_yield) {
+        _cleanup_(sd_event_unrefp) sd_event *e = NULL;
+        ASSERT_OK(sd_event_new(&e));
+        ASSERT_OK(sd_event_set_exit_on_idle(e, true));
+
+        _cleanup_(sd_future_unrefp) sd_future *f = NULL;
+        int yield_count = 0;
+        ASSERT_OK(sd_fiber_new(e, "yielding", fiber_that_yields, &yield_count, NULL, &f));
+
+        ASSERT_OK_POSITIVE(sd_event_run(e, 0));
+        ASSERT_EQ(yield_count, 1);
+        ASSERT_OK_POSITIVE(sd_event_run(e, 0));
+        ASSERT_EQ(yield_count, 2);
+
+        ASSERT_OK(sd_future_cancel(f));
+
+        ASSERT_OK(sd_event_loop(e));
+
+        /* sd_fiber should have been cancelled */
+        ASSERT_ERROR(sd_future_result(f), ECANCELED);
+        ASSERT_EQ(yield_count, 2);
+}
+
+/* Test: Cancel a fiber that has already completed */
+TEST(fiber_cancel_completed) {
+        _cleanup_(sd_event_unrefp) sd_event *e = NULL;
+        ASSERT_OK(sd_event_new(&e));
+        ASSERT_OK(sd_event_set_exit_on_idle(e, true));
+
+        _cleanup_(sd_future_unrefp) sd_future *f = NULL;
+        int value = 42;
+        ASSERT_OK(sd_fiber_new(e, "simple", simple_fiber, &value, NULL, &f));
+
+        /* Run the fiber to completion */
+        ASSERT_OK(sd_event_loop(e));
+
+        /* Canceling a completed fiber should be a no-op */
+        ASSERT_OK(sd_future_cancel(f));
+        ASSERT_EQ(sd_future_result(f), 42);
+}
+
+static int multiple_yield_fiber(void *userdata) {
+        int *counter = userdata;
+        int r;
+
+        for (int i = 0; i < 3; i++) {
+                (*counter)++;
+                r = sd_fiber_yield();
+                if (r < 0)
+                        return r;
+        }
+
+        return 0;
+}
+
+/* Test: Cancel one fiber among multiple */
+TEST(fiber_cancel_one_of_many) {
+        _cleanup_(sd_event_unrefp) sd_event *e = NULL;
+        ASSERT_OK(sd_event_new(&e));
+        ASSERT_OK(sd_event_set_exit_on_idle(e, true));
+
+        sd_future *fibers[3] = {};
+        CLEANUP_ELEMENTS(fibers, sd_future_unref_array_clear);
+        int counters[3] = {0, 0, 0};
+        for (size_t i = 0; i < ELEMENTSOF(fibers); i++)
+                ASSERT_OK(sd_fiber_new(e, "multiple-yield", multiple_yield_fiber, &counters[i], NULL, &fibers[i]));
+
+        /* Run one iteration - all fibers yield after incrementing once */
+        ASSERT_OK_POSITIVE(sd_event_run(e, 0));
+        ASSERT_OK_POSITIVE(sd_event_run(e, 0));
+        ASSERT_OK_POSITIVE(sd_event_run(e, 0));
+        ASSERT_EQ(counters[0], 1);
+        ASSERT_EQ(counters[1], 1);
+        ASSERT_EQ(counters[2], 1);
+
+        /* Cancel the second fiber */
+        ASSERT_OK(sd_future_cancel(fibers[1]));
+
+        /* Run to completion */
+        ASSERT_OK(sd_event_loop(e));
+
+        /* First and third fibers should complete normally */
+        ASSERT_EQ(counters[0], 3);
+        ASSERT_EQ(counters[2], 3);
+        ASSERT_EQ(sd_future_result(fibers[0]), 0);
+        ASSERT_EQ(sd_future_result(fibers[2]), 0);
+
+        /* Second fiber should be canceled with counter at 1 */
+        ASSERT_EQ(counters[1], 1);
+        ASSERT_EQ(sd_future_result(fibers[1]), -ECANCELED);
+}
+
+/* Test: sd_fiber_await() - wait for a fiber to complete */
+static int slow_fiber(void *userdata) {
+        int *counter = userdata;
+
+        for (int i = 0; i < 3; i++) {
+                (*counter)++;
+                sd_fiber_yield();
+        }
+
+        return 42;
+}
+
+static int waiting_fiber(void *userdata) {
+        sd_future *target = userdata;
+        int r;
+
+        r = sd_fiber_await(target);
+        if (r < 0)
+                return r;
+
+        r = sd_future_result(target);
+        return r == 42 ? 0 : -EIO;
+}
+
+TEST(fiber_wait_for_basic) {
+        _cleanup_(sd_event_unrefp) sd_event *e = NULL;
+        ASSERT_OK(sd_event_new(&e));
+        ASSERT_OK(sd_event_set_exit_on_idle(e, true));
+
+        /* Create target fiber with lower priority (runs second) */
+        _cleanup_(sd_future_unrefp) sd_future *target = NULL, *waiter = NULL;
+        int counter = 0;
+        ASSERT_OK(sd_fiber_new(e, "slow", slow_fiber, &counter, NULL, &target));
+        ASSERT_OK(sd_future_set_priority(target, 1));
+
+        /* Create waiter fiber with higher priority (runs first) */
+        ASSERT_OK(sd_fiber_new(e, "waiting", waiting_fiber, target, NULL, &waiter));
+        ASSERT_OK(sd_future_set_priority(waiter, 0));
+
+        ASSERT_OK(sd_event_loop(e));
+
+        ASSERT_OK(sd_future_result(waiter));
+        ASSERT_OK_EQ(sd_future_result(target), 42);
+        ASSERT_EQ(counter, 3);
+}
+
+/* Test: wait for already completed fiber */
+static int wait_for_completed_fiber(void *userdata) {
+        sd_future *target = userdata;
+        int r;
+
+        r = sd_fiber_await(target);
+        if (r < 0)
+                return r;
+
+        return sd_future_result(target);
+}
+
+TEST(fiber_wait_for_completed) {
+        _cleanup_(sd_event_unrefp) sd_event *e = NULL;
+        ASSERT_OK(sd_event_new(&e));
+        ASSERT_OK(sd_event_set_exit_on_idle(e, true));
+
+        _cleanup_(sd_future_unrefp) sd_future *target = NULL, *waiter = NULL;
+        int value = 100;
+
+        /* Create target fiber with higher priority (runs first) */
+        ASSERT_OK(sd_fiber_new(e, "simple", simple_fiber, &value, NULL, &target));
+        ASSERT_OK(sd_future_set_priority(target, 0));
+        /* Create waiter fiber with lower priority (runs second, after target completes) */
+        ASSERT_OK(sd_fiber_new(e, "wait-for-completed", wait_for_completed_fiber, target, NULL, &waiter));
+        ASSERT_OK(sd_future_set_priority(waiter, 1));
+
+        ASSERT_OK(sd_event_loop(e));
+
+        ASSERT_OK_EQ(sd_future_result(waiter), 100);
+        ASSERT_OK_EQ(sd_future_result(target), 100);
+}
+
+/* Test: awaiting an already-resolved future returns the future's result directly */
+static int await_resolved_fiber(void *userdata) {
+        sd_future *target = userdata;
+
+        ASSERT_EQ((int) sd_future_state(target), (int) SD_FUTURE_RESOLVED);
+        ASSERT_OK_EQ(sd_fiber_await(target), 77);
+        return 0;
+}
+
+TEST(fiber_await_resolved_returns_result) {
+        _cleanup_(sd_event_unrefp) sd_event *e = NULL;
+        ASSERT_OK(sd_event_new(&e));
+        ASSERT_OK(sd_event_set_exit_on_idle(e, true));
+
+        _cleanup_(sd_future_unrefp) sd_future *target = NULL, *waiter = NULL;
+        int value = 77;
+
+        /* Higher-priority target runs to completion before the waiter starts. */
+        ASSERT_OK(sd_fiber_new(e, "target", simple_fiber, &value, NULL, &target));
+        ASSERT_OK(sd_future_set_priority(target, 0));
+        ASSERT_OK(sd_fiber_new(e, "await-resolved", await_resolved_fiber, target, NULL, &waiter));
+        ASSERT_OK(sd_future_set_priority(waiter, 1));
+
+        ASSERT_OK(sd_event_loop(e));
+
+        ASSERT_OK(sd_future_result(waiter));
+        ASSERT_OK_EQ(sd_future_result(target), 77);
+}
+
+/* Test: wait for cancelled fiber */
+static int wait_for_cancelled_fiber(void *userdata) {
+        sd_future *target = userdata;
+        int r;
+
+        r = sd_fiber_await(target);
+        if (r < 0)
+                return r;
+
+        return sd_future_result(target);
+}
+
+TEST(fiber_wait_for_cancelled) {
+        _cleanup_(sd_event_unrefp) sd_event *e = NULL;
+        ASSERT_OK(sd_event_new(&e));
+        ASSERT_OK(sd_event_set_exit_on_idle(e, true));
+
+        _cleanup_(sd_future_unrefp) sd_future *target = NULL, *waiter = NULL;
+        int counter = 0;
+        ASSERT_OK(sd_fiber_new(e, "yielding", fiber_that_yields, &counter, NULL, &target));
+        ASSERT_OK(sd_fiber_new(e, "wait-for-cancelled", wait_for_cancelled_fiber, target, NULL, &waiter));
+
+        ASSERT_OK_POSITIVE(sd_event_run(e, 0));
+        ASSERT_OK_POSITIVE(sd_event_run(e, 0));
+
+        ASSERT_OK(sd_future_cancel(target));
+
+        ASSERT_OK(sd_event_loop(e));
+
+        ASSERT_ERROR(sd_future_result(waiter), ECANCELED);
+        ASSERT_ERROR(sd_future_result(target), ECANCELED);
+}
+
+/* Test: multiple fibers waiting for the same target */
+static int multi_waiter_fiber(void *userdata) {
+        sd_future *target = userdata;
+        int r;
+
+        r = sd_fiber_await(target);
+        if (r < 0)
+                return r;
+
+        return sd_future_result(target);
+}
+
+TEST(fiber_wait_for_multiple_waiters) {
+        _cleanup_(sd_event_unrefp) sd_event *e = NULL;
+        ASSERT_OK(sd_event_new(&e));
+        ASSERT_OK(sd_event_set_exit_on_idle(e, true));
+
+        _cleanup_(sd_future_unrefp) sd_future *target = NULL;
+        int counter = 0;
+        ASSERT_OK(sd_fiber_new(e, "slow", slow_fiber, &counter, NULL, &target));
+
+        sd_future *waiters[3] = {};
+        CLEANUP_ELEMENTS(waiters, sd_future_unref_array_clear);
+        for (size_t i = 0; i < ELEMENTSOF(waiters); i++)
+                ASSERT_OK(sd_fiber_new(e, "multi-waiter", multi_waiter_fiber, target, NULL, &waiters[i]));
+
+        ASSERT_OK(sd_event_loop(e));
+
+        for (size_t i = 0; i < ELEMENTSOF(waiters); i++)
+                ASSERT_OK_EQ(sd_future_result(waiters[i]), 42);
+
+        ASSERT_OK_EQ(sd_future_result(target), 42);
+        ASSERT_EQ(counter, 3);
+}
+
+/* Test: chain of waiting fibers */
+static int chain_waiter_fiber(void *userdata) {
+        sd_future *target = userdata;
+        int r;
+
+        r = sd_fiber_await(target);
+        if (r < 0)
+                return r;
+
+        r = sd_future_result(target);
+        return r + 1;
+}
+
+TEST(fiber_wait_for_chain) {
+        _cleanup_(sd_event_unrefp) sd_event *e = NULL;
+        ASSERT_OK(sd_event_new(&e));
+        ASSERT_OK(sd_event_set_exit_on_idle(e, true));
+
+        sd_future *fibers[5] = {};
+        CLEANUP_ELEMENTS(fibers, sd_future_unref_array_clear);
+        int value = 10;
+
+        ASSERT_OK(sd_fiber_new(e, "simple", simple_fiber, &value, NULL, &fibers[0]));
+
+        /* Each subsequent fiber waits for the previous and adds 1 */
+        for (size_t i = 1; i < ELEMENTSOF(fibers); i++)
+                ASSERT_OK(sd_fiber_new(e, "chain-waiter", chain_waiter_fiber, fibers[i - 1], NULL, &fibers[i]));
+
+        ASSERT_OK(sd_event_loop(e));
+
+        /* Check results: 10, 11, 12, 13, 14 */
+        for (size_t i = 0; i < ELEMENTSOF(fibers); i++)
+                ASSERT_OK_EQ(sd_future_result(fibers[i]), 10 + (int) i);
+}
+
+static int nested_run_inner_fiber(void *userdata) {
+        int *counter = ASSERT_PTR(userdata);
+
+        (*counter)++;
+        int r = sd_fiber_yield();
+        if (r < 0)
+                return r;
+        (*counter)++;
+
+        return 0;
+}
+
+static int nested_run_outer_fiber(void *userdata) {
+        int *counter = ASSERT_PTR(userdata);
+        _cleanup_(sd_event_unrefp) sd_event *inner = NULL;
+        _cleanup_(sd_future_unrefp) sd_future *nested = NULL;
+        int r;
+
+        /* Yield once before the nested loop: this forces the outer fiber to later resume through its own
+         * siglongjmp back to its resume_context after the inner fiber_run() has executed, which is
+         * exactly the path that breaks when the resume context is stored thread-globally instead of
+         * per-fiber. */
+        r = sd_fiber_yield();
+        if (r < 0)
+                return r;
+
+        r = sd_event_new(&inner);
+        if (r < 0)
+                return r;
+
+        r = sd_event_set_exit_on_idle(inner, true);
+        if (r < 0)
+                return r;
+
+        /* Spawn a fiber on the inner event loop. Driving it via sd_event_loop(inner) causes fiber_run() to
+         * be invoked while we are already executing inside fiber_run() for the outer fiber. */
+        r = sd_fiber_new(inner, "inner", nested_run_inner_fiber, counter, NULL, &nested);
+        if (r < 0)
+                return r;
+
+        r = sd_event_loop(inner);
+        if (r < 0)
+                return r;
+
+        r = sd_future_result(nested);
+        if (r < 0)
+                return r;
+
+        /* Yield again after the inner loop has returned. If the outer fiber's resume context was clobbered
+         * by the nested fiber_run(), the siglongjmp underneath this yield would jump into an already
+         * unwound stack frame. */
+        return sd_fiber_yield();
+}
+
+TEST(fiber_nested_run) {
+        _cleanup_(sd_event_unrefp) sd_event *e = NULL;
+        ASSERT_OK(sd_event_new(&e));
+        ASSERT_OK(sd_event_set_exit_on_idle(e, true));
+
+        _cleanup_(sd_future_unrefp) sd_future *outer = NULL;
+        int counter = 0;
+        ASSERT_OK(sd_fiber_new(e, "outer", nested_run_outer_fiber, &counter, NULL, &outer));
+
+        ASSERT_OK(sd_event_loop(e));
+        ASSERT_OK(sd_future_result(outer));
+
+        /* The inner fiber incremented the counter once before yielding and once after resuming. */
+        ASSERT_EQ(counter, 2);
+}
+
+static int nested_current_check_inner_fiber(void *userdata) {
+        sd_future **slots = ASSERT_PTR(userdata);
+
+        slots[1] = sd_fiber_get_current();
+        int r = sd_fiber_yield();
+        if (r < 0)
+                return r;
+        /* After resuming, the current fiber must still be us, not the outer fiber that was current when
+         * fiber_run() re-entered. */
+        if (sd_fiber_get_current() != slots[1])
+                return -EBADF;
+
+        return 0;
+}
+
+static int nested_current_check_outer_fiber(void *userdata) {
+        sd_future **slots = ASSERT_PTR(userdata);
+        _cleanup_(sd_event_unrefp) sd_event *inner = NULL;
+        _cleanup_(sd_future_unrefp) sd_future *nested = NULL;
+        int r;
+
+        slots[0] = sd_fiber_get_current();
+
+        r = sd_event_new(&inner);
+        if (r < 0)
+                return r;
+
+        r = sd_event_set_exit_on_idle(inner, true);
+        if (r < 0)
+                return r;
+
+        r = sd_fiber_new(inner, "inner", nested_current_check_inner_fiber, slots, NULL, &nested);
+        if (r < 0)
+                return r;
+
+        r = sd_event_loop(inner);
+        if (r < 0)
+                return r;
+
+        r = sd_future_result(nested);
+        if (r < 0)
+                return r;
+
+        /* After the nested fiber_run() has returned, the current fiber must have been restored to the
+         * outer fiber rather than left as NULL or pointing at the (now freed) inner fiber. */
+        if (sd_fiber_get_current() != slots[0])
+                return -EBADF;
+
+        return 0;
+}
+
+TEST(fiber_nested_run_current_restored) {
+        _cleanup_(sd_event_unrefp) sd_event *e = NULL;
+        ASSERT_OK(sd_event_new(&e));
+        ASSERT_OK(sd_event_set_exit_on_idle(e, true));
+
+        sd_future *slots[2] = {};
+        _cleanup_(sd_future_unrefp) sd_future *outer = NULL;
+        ASSERT_OK(sd_fiber_new(e, "outer", nested_current_check_outer_fiber, slots, NULL, &outer));
+
+        ASSERT_OK(sd_event_loop(e));
+        ASSERT_OK(sd_future_result(outer));
+
+        ASSERT_NOT_NULL(slots[0]);
+        ASSERT_NOT_NULL(slots[1]);
+        ASSERT_TRUE(slots[0] != slots[1]);
+}
+
+static int nested_cancellation_fiber(void *userdata) {
+        int *counter = ASSERT_PTR(userdata);
+        _cleanup_(sd_future_cancel_wait_unrefp) sd_future *nested = NULL;
+        int r;
+
+        if (*counter >= 5)
+                return sd_fiber_sleep(10 * USEC_PER_SEC);
+
+        (*counter)++;
+
+        _cleanup_free_ char *name = NULL;
+        if (asprintf(&name, "nested-cancellation-%i", *counter) < 0)
+                return -ENOMEM;
+
+        /* Create a nested fiber within this fiber */
+        r = sd_fiber_new(sd_fiber_get_event(), name, nested_cancellation_fiber, counter, NULL, &nested);
+        if (r < 0)
+                return r;
+
+        /* Wait for the nested fiber to complete */
+        r = sd_fiber_await(nested);
+        if (r < 0)
+                return r;
+
+        /* If we got here without cancellation, verify the nested fiber completed */
+        return sd_future_result(nested);
+}
+
+static int exit_loop_fiber(void *userdata) {
+        /* Just exit the event loop, causing the outer fiber to be cancelled */
+        return sd_event_exit(sd_fiber_get_event(), 0);
+}
+
+TEST(fiber_nested_cancellation) {
+        _cleanup_(sd_event_unrefp) sd_event *e = NULL;
+        ASSERT_OK(sd_event_new(&e));
+
+        int counter = 0;
+
+        /* Create outer fiber with higher priority (runs first) */
+        _cleanup_(sd_future_unrefp) sd_future *outer = NULL;
+        ASSERT_OK(sd_fiber_new(e, "outer", nested_cancellation_fiber, &counter, NULL, &outer));
+
+        /* Create exit fiber with lower priority (runs after all nested fibers have suspended) */
+        _cleanup_(sd_future_unrefp) sd_future *exit_fiber = NULL;
+        ASSERT_OK(sd_fiber_new(e, "exit-loop", exit_loop_fiber, NULL, NULL, &exit_fiber));
+        ASSERT_OK(sd_future_set_priority(exit_fiber, 1));
+
+        /* Run the event loop - the exit fiber should cause it to exit,
+         * which should cancel the outer fiber, which should cancel the nested fiber, and so forth. */
+        ASSERT_OK(sd_event_loop(e));
+
+        /* The exit fiber should have completed successfully */
+        ASSERT_OK(sd_future_result(exit_fiber));
+
+        /* The outer fiber should have been cancelled */
+        ASSERT_ERROR(sd_future_result(outer), ECANCELED);
+
+        /* The nested fiber was created and incremented counter once before being cancelled */
+        ASSERT_GT(counter, 0);
+}
+
+static int nested_fiber_cleanup_nested_fiber(void *userdata) {
+        int *counter = ASSERT_PTR(userdata);
+        int r;
+
+        r = sd_fiber_sleep(10 * USEC_PER_SEC);
+        if (r == -ECANCELED)
+                (*counter)++;
+        else if (r < 0)
+                return r;
+
+        return 0;
+}
+
+static int nested_fiber_cleanup_fiber(void *userdata) {
+        _cleanup_(sd_future_cancel_wait_unrefp) sd_future *nested = NULL;
+        int r;
+
+        /* Create a nested fiber within this fiber. */
+        r = sd_fiber_new(sd_fiber_get_event(), "nested", nested_fiber_cleanup_nested_fiber, userdata, NULL, &nested);
+        if (r < 0)
+                return r;
+
+        /* Yield and then exit, the nested fiber should be cancelled. */
+        return sd_fiber_yield();
+}
+
+TEST(nested_fiber_cleanup) {
+        _cleanup_(sd_event_unrefp) sd_event *e = NULL;
+        ASSERT_OK(sd_event_new(&e));
+        ASSERT_OK(sd_event_set_exit_on_idle(e, true));
+
+        _cleanup_(sd_future_unrefp) sd_future *outer = NULL;
+        int counter = 0;
+        ASSERT_OK(sd_fiber_new(e, "outer", nested_fiber_cleanup_fiber, &counter, NULL, &outer));
+
+        ASSERT_OK(sd_event_loop(e));
+
+        /* The outer fiber should have finished normally */
+        ASSERT_OK(sd_future_result(outer));
+
+        /* The nested fiber was created and incremented its counter once when it was cancelled. */
+        ASSERT_GT(counter, 0);
+}
+
+static int priority_check_fiber(void *userdata) {
+        int64_t *ret = ASSERT_PTR(userdata);
+
+        /* Verify that sd_fiber_get_priority() returns the value set via sd_future_set_priority() */
+        ASSERT_OK(sd_fiber_get_priority(ret));
+
+        /* Exercise sd_fiber_sleep() which internally creates a time future. This verifies that the priority
+         * is correctly propagated to the time event source (via f->time.source, not f->io.source). */
+        return sd_fiber_sleep(1);
+}
+
+TEST(fiber_priority_get) {
+        _cleanup_(sd_event_unrefp) sd_event *e = NULL;
+        ASSERT_OK(sd_event_new(&e));
+        ASSERT_OK(sd_event_set_exit_on_idle(e, true));
+
+        int64_t got_priority = 0;
+        _cleanup_(sd_future_unrefp) sd_future *f = NULL;
+        ASSERT_OK(sd_fiber_new(e, "priority-check", priority_check_fiber, &got_priority, NULL, &f));
+        ASSERT_OK(sd_future_set_priority(f, 10));
+
+        ASSERT_OK(sd_event_loop(e));
+        ASSERT_OK(sd_future_result(f));
+
+        /* Verify priority was stored and retrievable */
+        ASSERT_EQ(got_priority, 10);
+}
+
+static int floating_fiber(void *userdata) {
+        int *counter = ASSERT_PTR(userdata);
+
+        (*counter)++;
+        int r = sd_fiber_yield();
+        if (r < 0)
+                return r;
+        (*counter)++;
+
+        return 0;
+}
+
+TEST(fiber_floating) {
+        _cleanup_(sd_event_unrefp) sd_event *e = NULL;
+        ASSERT_OK(sd_event_new(&e));
+        ASSERT_OK(sd_event_set_exit_on_idle(e, true));
+
+        _cleanup_(sd_future_unrefp) sd_future *f = NULL;
+        int counter = 0;
+        ASSERT_OK(sd_fiber_new(e, "floating", floating_fiber, &counter, NULL, &f));
+
+        ASSERT_OK_ZERO(sd_fiber_get_floating(f));
+        ASSERT_OK(sd_fiber_set_floating(f, true));
+        ASSERT_OK_POSITIVE(sd_fiber_get_floating(f));
+
+        /* Drop our handle: the floating ref keeps the future alive until the fiber resolves, after
+         * which the self-unref frees it. If this didn't work we'd either leak (visible under ASan) or
+         * trip fiber_free()'s "state == COMPLETED" assertion. */
+        f = sd_future_unref(f);
+
+        ASSERT_OK(sd_event_loop(e));
+        ASSERT_EQ(counter, 2);
+}
+
+static int drop_extra_ref(sd_future *f) {
+        /* Drop an extra ref the test installed before the callback fires. After this returns, the
+         * floating self-ref is the only thing keeping the future alive — exercising the path where
+         * the floating unref in fiber_run() is the last unref. */
+        sd_future_unref(f);
+        return 0;
+}
+
+TEST(fiber_floating_callback_drops_ref) {
+        _cleanup_(sd_event_unrefp) sd_event *e = NULL;
+        ASSERT_OK(sd_event_new(&e));
+        ASSERT_OK(sd_event_set_exit_on_idle(e, true));
+
+        sd_future *f = NULL;
+        int counter = 0;
+        ASSERT_OK(sd_fiber_new(e, "floating-cb", floating_fiber, &counter, NULL, &f));
+
+        ASSERT_OK(sd_fiber_set_floating(f, true));
+
+        /* Bump the ref for the callback to drop, then install the callback. */
+        sd_future_ref(f);
+        ASSERT_OK(sd_future_set_callback(f, drop_extra_ref, NULL));
+
+        /* Drop our handle. Refs remaining: floating self-ref + the extra ref the callback will drop. */
+        f = sd_future_unref(f);
+
+        ASSERT_OK(sd_event_loop(e));
+        ASSERT_EQ(counter, 2);
+}
+
+TEST(fiber_floating_toggle) {
+        _cleanup_(sd_event_unrefp) sd_event *e = NULL;
+        ASSERT_OK(sd_event_new(&e));
+        ASSERT_OK(sd_event_set_exit_on_idle(e, true));
+
+        _cleanup_(sd_future_unrefp) sd_future *f = NULL;
+        int counter = 0;
+        ASSERT_OK(sd_fiber_new(e, "floating-toggle", floating_fiber, &counter, NULL, &f));
+
+        /* Toggling floating on and off again should leave the refcount unchanged: set_floating(true)
+         * takes a ref and set_floating(false) drops it. If the accounting were off, the subsequent
+         * event loop would either free the future while the fiber still runs (fiber_free assertion)
+         * or leak it. */
+        ASSERT_OK(sd_fiber_set_floating(f, true));
+        ASSERT_OK(sd_fiber_set_floating(f, false));
+        ASSERT_OK_ZERO(sd_fiber_get_floating(f));
+
+        /* Setting floating to the same value twice should be a no-op. */
+        ASSERT_OK(sd_fiber_set_floating(f, false));
+        ASSERT_OK(sd_fiber_set_floating(f, true));
+        ASSERT_OK(sd_fiber_set_floating(f, true));
+
+        /* Drop our handle; the still-floating ref drives cleanup. */
+        f = sd_future_unref(f);
+
+        ASSERT_OK(sd_event_loop(e));
+        ASSERT_EQ(counter, 2);
+}
+
+/* Test: SD_FIBER_TIMEOUT scope expires while the fiber is suspended with no other wakeup source. */
+static int timeout_suspend_fiber(void *userdata) {
+        SD_FIBER_TIMEOUT(50 * USEC_PER_MSEC);
+
+        /* Plain suspend with no other future to wake us — only the deadline timer can resume. */
+        return sd_fiber_suspend();
+}
+
+TEST(fiber_timeout_suspend_expires) {
+        _cleanup_(sd_event_unrefp) sd_event *e = NULL;
+        ASSERT_OK(sd_event_new(&e));
+        ASSERT_OK(sd_event_set_exit_on_idle(e, true));
+
+        _cleanup_(sd_future_unrefp) sd_future *f = NULL;
+        ASSERT_OK(sd_fiber_new(e, "timeout-suspend", timeout_suspend_fiber, NULL, NULL, &f));
+
+        ASSERT_OK(sd_event_loop(e));
+        ASSERT_ERROR(sd_future_result(f), ETIME);
+}
+
+/* Test: SD_FIBER_TIMEOUT scope around a sleep that finishes before the deadline expires; the
+ * cleanup must cancel the timer cleanly without leaving a stale wakeup. */
+static int timeout_in_time_fiber(void *userdata) {
+        SD_FIBER_TIMEOUT(1 * USEC_PER_SEC);
+        return sd_fiber_sleep(10 * USEC_PER_MSEC);
+}
+
+TEST(fiber_timeout_sleep_in_time) {
+        _cleanup_(sd_event_unrefp) sd_event *e = NULL;
+        ASSERT_OK(sd_event_new(&e));
+        ASSERT_OK(sd_event_set_exit_on_idle(e, true));
+
+        _cleanup_(sd_future_unrefp) sd_future *f = NULL;
+        ASSERT_OK(sd_fiber_new(e, "in-time", timeout_in_time_fiber, NULL, NULL, &f));
+
+        ASSERT_OK(sd_event_loop(e));
+        ASSERT_OK_ZERO(sd_future_result(f));
+}
+
+/* Test: SD_FIBER_TIMEOUT(USEC_INFINITY) is a no-op — no timer is created and the fiber completes
+ * normally. */
+static int timeout_infinite_fiber(void *userdata) {
+        SD_FIBER_TIMEOUT(USEC_INFINITY);
+        return sd_fiber_sleep(10 * USEC_PER_MSEC);
+}
+
+TEST(fiber_timeout_infinite_no_op) {
+        _cleanup_(sd_event_unrefp) sd_event *e = NULL;
+        ASSERT_OK(sd_event_new(&e));
+        ASSERT_OK(sd_event_set_exit_on_idle(e, true));
+
+        _cleanup_(sd_future_unrefp) sd_future *f = NULL;
+        ASSERT_OK(sd_fiber_new(e, "infinite", timeout_infinite_fiber, NULL, NULL, &f));
+
+        ASSERT_OK(sd_event_loop(e));
+        ASSERT_OK_ZERO(sd_future_result(f));
+}
+
+/* Test: SD_FIBER_WITH_TIMEOUT block form returns -ETIME from the suspend inside it. */
+static int with_timeout_block_fiber(void *userdata) {
+        int r = 0;
+        SD_FIBER_WITH_TIMEOUT(50 * USEC_PER_MSEC)
+                r = sd_fiber_suspend();
+        return r;
+}
+
+TEST(fiber_with_timeout_block) {
+        _cleanup_(sd_event_unrefp) sd_event *e = NULL;
+        ASSERT_OK(sd_event_new(&e));
+        ASSERT_OK(sd_event_set_exit_on_idle(e, true));
+
+        _cleanup_(sd_future_unrefp) sd_future *f = NULL;
+        ASSERT_OK(sd_fiber_new(e, "with-timeout", with_timeout_block_fiber, NULL, NULL, &f));
+
+        ASSERT_OK(sd_event_loop(e));
+        ASSERT_ERROR(sd_future_result(f), ETIME);
+}
+
+/* Test: nested SD_FIBER_TIMEOUT — inner scope's timer fires first; once we're back in just the
+ * outer scope, suspending again must time out via the still-armed outer timer. */
+static int nested_timeout_fiber(void *userdata) {
+        int *fired = ASSERT_PTR(userdata);
+
+        SD_FIBER_TIMEOUT(50 * USEC_PER_MSEC); /* outer */
+
+        SD_FIBER_WITH_TIMEOUT(20 * USEC_PER_MSEC) { /* inner — expires first */
+                int r = sd_fiber_suspend();
+                if (r != -ETIME)
+                        return -ENOTRECOVERABLE;
+                (*fired)++;
+        }
+
+        /* Inner scope is gone, but the outer timer is still armed (it only used ~20ms of its
+         * 100ms budget). Suspending again must eventually wake us with -ETIME. */
+        int r = sd_fiber_suspend();
+        if (r != -ETIME)
+                return -ENOTRECOVERABLE;
+        (*fired)++;
+
+        return 0;
+}
+
+TEST(fiber_timeout_nested) {
+        _cleanup_(sd_event_unrefp) sd_event *e = NULL;
+        ASSERT_OK(sd_event_new(&e));
+        ASSERT_OK(sd_event_set_exit_on_idle(e, true));
+
+        int fired = 0;
+        _cleanup_(sd_future_unrefp) sd_future *f = NULL;
+        ASSERT_OK(sd_fiber_new(e, "nested-timeout", nested_timeout_fiber, &fired, NULL, &f));
+
+        ASSERT_OK(sd_event_loop(e));
+        ASSERT_OK_ZERO(sd_future_result(f));
+        ASSERT_EQ(fired, 2);
+}
+
+/* Test: signal mask is per-thread, not per-fiber. Changes one fiber makes via pthread_sigmask
+ * must be visible to other fibers on the same thread, both while the modifying fiber is
+ * suspended and after it resumes. The fiber switch (sigsetjmp/siglongjmp with savesigs=0)
+ * deliberately doesn't save or restore the mask. */
+static int sigmask_peer_fiber(void *userdata) {
+        sigset_t set, current;
+
+        /* The waiter blocked SIGUSR1 before await'ing us; the per-thread mask should still
+         * have it blocked here. */
+        ASSERT_OK_ZERO(-pthread_sigmask(SIG_SETMASK, NULL, &current));
+        ASSERT_TRUE(sigismember(&current, SIGUSR1));
+
+        ASSERT_OK(sigemptyset(&set));
+        ASSERT_OK(sigaddset(&set, SIGUSR1));
+        ASSERT_OK_ZERO(-pthread_sigmask(SIG_UNBLOCK, &set, NULL));
+
+        return 0;
+}
+
+static int sigmask_waiter_fiber(void *userdata) {
+        sd_future *peer = ASSERT_PTR(userdata);
+        sigset_t set, current;
+
+        ASSERT_OK(sigemptyset(&set));
+        ASSERT_OK(sigaddset(&set, SIGUSR1));
+        ASSERT_OK_ZERO(-pthread_sigmask(SIG_BLOCK, &set, NULL));
+
+        ASSERT_OK_ZERO(-pthread_sigmask(SIG_SETMASK, NULL, &current));
+        ASSERT_TRUE(sigismember(&current, SIGUSR1));
+
+        int r = sd_fiber_await(peer);
+        if (r < 0)
+                return r;
+
+        /* The peer unblocked SIGUSR1 while we were suspended. The change is per-thread, so
+         * we must observe it here. */
+        ASSERT_OK_ZERO(-pthread_sigmask(SIG_SETMASK, NULL, &current));
+        ASSERT_FALSE(sigismember(&current, SIGUSR1));
+
+        return 0;
+}
+
+TEST(fiber_signal_mask_is_per_thread) {
+        _cleanup_(sd_event_unrefp) sd_event *e = NULL;
+        ASSERT_OK(sd_event_new(&e));
+        ASSERT_OK(sd_event_set_exit_on_idle(e, true));
+
+        sigset_t saved;
+        ASSERT_OK_ZERO(-pthread_sigmask(SIG_SETMASK, NULL, &saved));
+
+        _cleanup_(sd_future_unrefp) sd_future *waiter = NULL, *peer = NULL;
+        ASSERT_OK(sd_fiber_new(e, "sigmask-peer", sigmask_peer_fiber, NULL, NULL, &peer));
+        ASSERT_OK(sd_future_set_priority(peer, 1));
+        ASSERT_OK(sd_fiber_new(e, "sigmask-waiter", sigmask_waiter_fiber, peer, NULL, &waiter));
+        ASSERT_OK(sd_future_set_priority(waiter, 0));
+
+        ASSERT_OK(sd_event_loop(e));
+        ASSERT_OK(sd_future_result(waiter));
+        ASSERT_OK(sd_future_result(peer));
+
+        ASSERT_OK_ZERO(-pthread_sigmask(SIG_SETMASK, &saved, NULL));
+}
+
+/* Test: log context is per-fiber. fiber_run() swaps the thread-local log context (and prefix) with
+ * a per-fiber stash on entry and exit, so fields pushed by one fiber must not leak into another
+ * fiber that runs while the first is suspended, and must be restored when the first resumes. */
+static int log_context_peer_fiber(void *userdata) {
+        size_t *peer_observed = ASSERT_PTR(userdata);
+
+        /* The waiter pushed a field before await'ing us. If log context were shared across fibers,
+         * we would observe it here. Record what we see and let the caller verify. */
+        *peer_observed = log_context_num_fields();
+
+        return 0;
+}
+
+static int log_context_waiter_fiber(void *userdata) {
+        sd_future *peer = ASSERT_PTR(userdata);
+
+        size_t before_push = log_context_num_fields();
+
+        LOG_CONTEXT_PUSH("WAITER=here");
+        size_t after_push = log_context_num_fields();
+        if (after_push != before_push + 1)
+                return -EBADF;
+
+        int r = sd_fiber_await(peer);
+        if (r < 0)
+                return r;
+
+        /* Our pushed field must be visible again after the peer ran and resumed us. */
+        if (log_context_num_fields() != after_push)
+                return -EBADF;
+
+        return 0;
+}
+
+TEST(fiber_log_context_per_fiber) {
+        _cleanup_(sd_event_unrefp) sd_event *e = NULL;
+        ASSERT_OK(sd_event_new(&e));
+        ASSERT_OK(sd_event_set_exit_on_idle(e, true));
+
+        size_t baseline = log_context_num_fields();
+
+        size_t peer_observed = 0;
+        _cleanup_(sd_future_unrefp) sd_future *waiter = NULL, *peer = NULL;
+        ASSERT_OK(sd_fiber_new(e, "log-peer", log_context_peer_fiber, &peer_observed, NULL, &peer));
+        ASSERT_OK(sd_future_set_priority(peer, 1));
+        ASSERT_OK(sd_fiber_new(e, "log-waiter", log_context_waiter_fiber, peer, NULL, &waiter));
+        ASSERT_OK(sd_future_set_priority(waiter, 0));
+
+        ASSERT_OK(sd_event_loop(e));
+        ASSERT_OK(sd_future_result(waiter));
+        ASSERT_OK(sd_future_result(peer));
+
+        /* Inside the peer, only the peer's own FIBER= field (pushed by fiber_run) should have been
+         * active — the waiter's WAITER= push must have been swapped out. */
+        ASSERT_EQ(peer_observed, baseline + 1);
+
+        /* The thread-local log context should be exactly as it was before the test ran. */
+        ASSERT_EQ(log_context_num_fields(), baseline);
+}
+
+static int stack_overflow_fiber(void *userdata) {
+        volatile char anchor;
+        size_t pagesz = page_size();
+
+        /* Walk one page at a time away from the fiber's current SP toward the guard page,
+         * writing one byte per page until the kernel raises a fatal signal. On downward
+         * stacks we walk to lower addresses (guard at the base); on upward stacks like
+         * hppa we walk to higher addresses (guard at the top of the mapping). The 64 MiB
+         * ceiling is purely a safety net so the test fails loudly instead of looping if
+         * the guard isn't there. */
+        for (size_t i = 1; i < (64U * U64_MB) / pagesz; i++) {
+                uintptr_t off = i * pagesz;
+                volatile char *p = (volatile char *) (STACK_GROWS_UP
+                                                      ? (uintptr_t) &anchor + off
+                                                      : (uintptr_t) &anchor - off);
+                *p = 0;
+        }
+        return 0;
+}
+
+TEST(fiber_stack_guard) {
+#if HAS_FEATURE_ADDRESS_SANITIZER
+        (void) log_tests_skipped("ASan intercepts deliberate stack OOB writes");
+        return;
+#endif
+#if HAVE_VALGRIND_VALGRIND_H
+        if (RUNNING_ON_VALGRIND) {
+                (void) log_tests_skipped("Valgrind intercepts deliberate stack OOB writes");
+                return;
+        }
+#endif
+
+        _cleanup_(pidref_done) PidRef pidref = PIDREF_NULL;
+        int r = pidref_safe_fork("(stack-overflow)", FORK_RESET_SIGNALS|FORK_LOG, &pidref);
+        ASSERT_OK(r);
+
+        if (r == 0) {
+                _cleanup_(sd_event_unrefp) sd_event *e = NULL;
+                ASSERT_OK(sd_event_new(&e));
+                ASSERT_OK(sd_event_set_exit_on_idle(e, true));
+
+                _cleanup_(sd_future_unrefp) sd_future *f = NULL;
+                ASSERT_OK(sd_fiber_new(e, "overflow", stack_overflow_fiber, NULL, NULL, &f));
+                (void) sd_event_loop(e);
+                _exit(EXIT_SUCCESS);    /* unreachable if the guard fires */
+        }
+
+        siginfo_t si;
+        ASSERT_OK(pidref_wait_for_terminate(&pidref, &si));
+        ASSERT_TRUE(IN_SET(si.si_code, CLD_KILLED, CLD_DUMPED));
+        ASSERT_TRUE(IN_SET(si.si_status, SIGSEGV, SIGBUS));
+}
+
+DEFINE_TEST_MAIN(LOG_DEBUG);
index f9c9a2627d55c0207e5e9f202debc549bce1b2ac..8da080bf18e61435ffe3b6c115c6f52b081d7b96 100644 (file)
@@ -69,6 +69,20 @@ typedef void (*_sd_destroy_t)(void *userdata);
 #  define _SD_STRINGIFY(x) _SD_XSTRINGIFY(x)
 #endif
 
+/* Mirror of CONCATENATE / UNIQ from macro-fundamental.h, available to public sd-* headers. */
+#ifndef _SD_CONCATENATE
+#  define _SD_XCONCATENATE(x, y) x ## y
+#  define _SD_CONCATENATE(x, y) _SD_XCONCATENATE(x, y)
+#endif
+
+#ifndef _SD_UNIQ
+#  ifdef __COUNTER__
+#    define _SD_UNIQ __COUNTER__
+#  else
+#    define _SD_UNIQ __LINE__
+#  endif
+#endif
+
 #ifndef _SD_BEGIN_DECLARATIONS
 #  ifdef __cplusplus
 #    define _SD_BEGIN_DECLARATIONS                              \
index ad455d73b217b0517787695e8f803352a68269e8..c35f811245db06563411f624dffc08d6cdb5e117 100644 (file)
@@ -36,6 +36,7 @@ _not_installed_headers = [
         'sd-dhcp6-option.h',
         'sd-dhcp6-protocol.h',
         'sd-dns-resolver.h',
+        'sd-future.h',
         'sd-ipv4acd.h',
         'sd-ipv4ll.h',
         'sd-lldp-rx.h',
diff --git a/src/systemd/sd-future.h b/src/systemd/sd-future.h
new file mode 100644 (file)
index 0000000..9d0d03a
--- /dev/null
@@ -0,0 +1,101 @@
+/* SPDX-License-Identifier: LGPL-2.1-or-later */
+#ifndef foosdfuturefoo
+#define foosdfuturefoo
+
+/***
+  systemd is free software; you can redistribute it and/or modify it
+  under the terms of the GNU Lesser General Public License as published by
+  the Free Software Foundation; either version 2.1 of the License, or
+  (at your option) any later version.
+
+  systemd is distributed in the hope that it will be useful, but
+  WITHOUT ANY WARRANTY; without even the implied warranty of
+  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+  Lesser General Public License for more details.
+
+  You should have received a copy of the GNU Lesser General Public License
+  along with systemd; If not, see <https://www.gnu.org/licenses/>.
+***/
+
+#include "_sd-common.h"
+
+_SD_BEGIN_DECLARATIONS;
+
+typedef struct sd_event sd_event;
+typedef struct sd_future sd_future;
+typedef struct sd_future_ops sd_future_ops;
+typedef int (*sd_future_func_t)(sd_future *f);
+typedef int (*sd_fiber_func_t)(void *userdata);
+typedef _sd_destroy_t sd_fiber_destroy_t;
+
+struct sd_future_ops {
+        size_t size;
+        void* (*alloc)(void);
+        void (*free)(sd_future *f);
+        int (*cancel)(sd_future *f);
+        int (*set_priority)(sd_future *f, int64_t priority);
+};
+
+__extension__ typedef enum _SD_ENUM_TYPE_S64(sd_future_state_t) {
+        SD_FUTURE_PENDING,
+        SD_FUTURE_RESOLVED,
+        _SD_ENUM_FORCE_S64(SD_FUTURE_STATE)
+} sd_future_state_t;
+
+int sd_future_new(const sd_future_ops *ops, sd_future **ret);
+int sd_future_cancel(sd_future *f);
+int sd_future_resolve(sd_future *f, int result);
+
+_SD_DECLARE_TRIVIAL_REF_UNREF_FUNC(sd_future);
+_SD_DEFINE_POINTER_CLEANUP_FUNC(sd_future, sd_future_unref);
+void sd_future_unref_array_clear(sd_future *array[], size_t n);
+void sd_future_unref_array(sd_future *array[], size_t n);
+
+sd_future* sd_future_cancel_wait_unref(sd_future *f);
+_SD_DEFINE_POINTER_CLEANUP_FUNC(sd_future, sd_future_cancel_wait_unref);
+void sd_future_cancel_wait_unref_array_clear(sd_future *array[], size_t n);
+void sd_future_cancel_wait_unref_array(sd_future *array[], size_t n);
+
+int sd_future_state(sd_future *f);
+int sd_future_result(sd_future *f);
+void* sd_future_get_userdata(sd_future *f);
+void* sd_future_get_private(sd_future *f);
+const sd_future_ops* sd_future_get_ops(sd_future *f);
+
+int sd_future_set_callback(sd_future *f, sd_future_func_t callback, void *userdata);
+int sd_future_set_priority(sd_future *f, int64_t priority);
+
+int sd_future_new_wait(sd_future *target, sd_future **ret);
+
+int sd_fiber_new(sd_event *e, const char *name, sd_fiber_func_t func, void *userdata, sd_fiber_destroy_t destroy, sd_future **ret);
+
+int sd_fiber_set_floating(sd_future *f, int b);
+int sd_fiber_get_floating(sd_future *f);
+
+int sd_fiber_is_running(void);
+sd_future* sd_fiber_get_current(void);
+int sd_fiber_get_priority(int64_t *ret);
+sd_event* sd_fiber_get_event(void);
+
+int sd_fiber_yield(void);
+int sd_fiber_sleep(uint64_t usec);
+int sd_fiber_await(sd_future *target);
+int sd_fiber_suspend(void);
+int sd_fiber_resume(sd_future *f, int result);
+
+sd_future* sd_fiber_timeout(uint64_t timeout);
+
+#define SD_FIBER_TIMEOUT(timeout) _SD_FIBER_TIMEOUT(_SD_UNIQ, (timeout))
+#define _SD_FIBER_TIMEOUT(uniq, timeout)                                                                                                        \
+        sd_future *_SD_CONCATENATE(_sd_fto_, uniq) __attribute__((cleanup(sd_future_cancel_wait_unrefp), unused)) = sd_fiber_timeout(timeout)
+
+#define SD_FIBER_WITH_TIMEOUT(timeout) _SD_FIBER_WITH_TIMEOUT(_SD_UNIQ, (timeout))
+#define _SD_FIBER_WITH_TIMEOUT(uniq, timeout)                                                                                                                   \
+        for (sd_future *_SD_CONCATENATE(_sd_fto_, uniq) __attribute__((cleanup(sd_future_cancel_wait_unrefp), unused)) = sd_fiber_timeout(timeout),             \
+                       *_SD_CONCATENATE(_sd_fto_b_, uniq) = (sd_future*) (uintptr_t) 1;                                                                         \
+             _SD_CONCATENATE(_sd_fto_b_, uniq);                                                                                                                 \
+             _SD_CONCATENATE(_sd_fto_b_, uniq) = NULL)
+
+_SD_END_DECLARATIONS;
+
+#endif
index 9be6f2a6d95322ba94a93a12e019da08579c37f5..b074903c00ac99845a5aa5ee7b30c952a0a562cd 100644 (file)
@@ -3,7 +3,7 @@
 integration_tests += [
         integration_test_template + {
                 'name' : fs.name(meson.current_source_dir()),
-                'coredump-exclude-regex' : '/(bash|python3.[0-9]+|systemd-executor)$',
+                'coredump-exclude-regex' : '/(bash|python3.[0-9]+|systemd-executor|test-fiber)$',
                 'cmdline' : integration_test_template['cmdline'] + [
                         '''
 
index a36e6c32d7ec02970771f3af4d24a21d20450283..8a935a513fe432ce2bf6f1c159f19ec803cf4180 100755 (executable)
@@ -28,6 +28,7 @@ LIBC_LIB_PREFIXES: tuple[str, ...] = (
     'libm.so.',
     'libresolv.so.',
     'libc.musl-',
+    'libucontext.',
 )
 
 # GCC runtime support libraries (stack unwinding, soft-float helpers, C++ standard library). The