[thirdparty/systemd.git] / docs / JOURNAL_NATIVE_PROTOCOL.md

---
title: Native Journal Protocol
category: Interfaces
layout: default
SPDX-License-Identifier: LGPL-2.1-or-later
---

# Native Journal Protocol

`systemd-journald.service` accepts log data via various protocols:

* Classic RFC3164 BSD syslog via the `/dev/log` socket
* STDOUT/STDERR of programs via `StandardOutput=journal` + `StandardError=journal` in service files (both of which are default settings)
* Kernel log messages via the `/dev/kmsg` device node
* Audit records via the kernel's audit subsystem
* Structured log messages via `journald`'s native protocol

The latter is what this document is about: if you are developing a program and
want to pass structured log data to `journald`, it's the Journal's native
protocol that you want to use. The systemd project provides the
[`sd_journal_print(3)`](https://www.freedesktop.org/software/systemd/man/sd_journal_print.html)
API that implements the client side of this protocol. This document explains
what this interface does behind the scenes, in case you'd like to implement a
client for it yourself, without linking to `libsystemd` — for example because
you work in a programming language other than C or otherwise want to avoid the
dependency.

## Basics

The native protocol of `journald` is spoken on the
`/run/systemd/journal/socket` `AF_UNIX`/`SOCK_DGRAM` socket on which
`systemd-journald.service` listens. Each datagram sent to this socket
encapsulates one journal entry that shall be written. Since datagrams are
subject to a size limit and we want to allow large journal entries, datagrams
sent over this socket may come in one of two formats:

* A datagram with the literal journal entry data as payload, without
  any file descriptors attached.

* A datagram with an empty payload, but with a single
  [`memfd`](https://man7.org/linux/man-pages/man2/memfd_create.2.html)
  file descriptor that contains the literal journal entry data.

Other combinations are not permitted, i.e. datagrams with both payload and file
descriptors, or datagrams with neither, or more than one file descriptor. Such
datagrams are ignored. The `memfd` file descriptor should be fully sealed. The
binary format in the datagram payload and in the `memfd` memory is
identical. Typically a client would attempt to first send the data as datagram
payload, but if this fails with an `EMSGSIZE` error it would immediately retry
via the `memfd` logic.

A client probably should bump up the `SO_SNDBUF` socket option of its `AF_UNIX`
socket towards `journald` in order to delay blocking I/O as much as possible.

## Data Format

Each datagram should consist of a number of environment-like key/value
assignments. Unlike environment variable assignments the value may contain NUL
bytes however, as well as any other binary data. Keys may not include the `=`
or newline characters (or any other control characters or non-ASCII characters)
and may not be empty.

Serialization into the datagram payload or `memfd` is straightforward: each
key/value pair is serialized via one of two methods:

* The first method inserts a `=` character between key and value, and suffixes
the result with `\n` (i.e. the newline character, ASCII code 10). Example: a
key `FOO` with a value `BAR` is serialized `F`, `O`, `O`, `=`, `B`, `A`, `R`,
`\n`.

* The second method should be used if the value of a field contains a `\n`
byte. In this case, the key name is serialized as is, followed by a `\n`
character, followed by a (non-aligned) little-endian unsigned 64-bit integer
encoding the size of the value, followed by the literal value data, followed by
`\n`. Example: a key `FOO` with a value `BAR` may be serialized using this
second method as: `F`, `O`, `O`, `\n`, `\003`, `\000`, `\000`, `\000`, `\000`,
`\000`, `\000`, `\000`, `B`, `A`, `R`, `\n`.

If the value of a key/value pair contains a newline character (`\n`), it *must*
be serialized using the second method. If it does not, either method is
permitted. However, it is generally recommended to use the first method if
possible for all key/value pairs where applicable since the generated datagrams
are easily recognized and understood by the human eye this way, without any
manual binary decoding — which improves the debugging experience a lot, in
particular with tools such as `strace` that can show datagram content as text
dump. After all, log messages are highly relevant for debugging programs, hence
optimizing log traffic for readability without special tools is generally
desirable.

Note that keys that begin with `_` have special semantics in `journald`: they
are *trusted* and implicitly appended by `journald` on the receiving
side. Clients should not send them — if they do anyway, they will be ignored.

The most important key/value pair to send is `MESSAGE=`, as that contains the
actual log message text. Other relevant keys a client should send in most cases
are `PRIORITY=`, `CODE_FILE=`, `CODE_LINE=`, `CODE_FUNC=`, `ERRNO=`. It's
recommended to generate these fields implicitly on the client side. For further
information see the [relevant documentation of these
fields](https://www.freedesktop.org/software/systemd/man/systemd.journal-fields.html).

The order in which the fields are serialized within one datagram is undefined
and may be freely chosen by the client. The server side might or might not
retain or reorder it when writing it to the Journal.

Some programs might generate multi-line log messages (e.g. a stack unwinder
generating log output about a stack trace, with one line for each stack
frame). It's highly recommended to send these as a single datagram, using a
single `MESSAGE=` field with embedded newline characters between the lines (the
second serialization method described above must hence be used for this
field). If possible do not split up individual events into multiple Journal
events that might then be processed and written into the Journal as separate
entries. The Journal toolchain is capable of handling multi-line log entries
just fine, and it's generally preferred to have a single set of metadata fields
associated with each multi-line message.

Note that the same keys may be used multiple times within the same datagram,
with different values. The Journal supports this and will write such entries to
disk without complaining. This is useful for associating a single log entry
with multiple suitable objects of the same type at once. This should only be
used for specific Journal fields however, where this is expected. Do not use
this for Journal fields where this is not expected and where code reasonably
assumes per-event uniqueness of the keys. In most cases code that consumes and
displays log entries is likely to ignore such non-unique fields or only
consider the first of the specified values. Specifically, if a Journal entry
contains multiple `MESSAGE=` fields, likely only the first one is
displayed. Note that a well-written logging client library thus will not use a
plain dictionary for accepting structured log metadata, but rather a data
structure that allows non-unique keys, for example an array, or a dictionary
that optionally maps to a set of values instead of a single value.

## Example Datagram

Here's an encoded message, with various common fields, all encoded according to
the first serialization method, with the exception of one, where the value
contains a newline character, and thus the second method is needed to be used.

```
PRIORITY=3\n
SYSLOG_FACILITY=3\n
CODE_FILE=src/foobar.c\n
CODE_LINE=77\n
BINARY_BLOB\n
\004\000\000\000\000\000\000\000xx\nx\n
CODE_FUNC=some_func\n
SYSLOG_IDENTIFIER=footool\n
MESSAGE=Something happened.\n
```

(Lines are broken here after each `\n` to make things more readable. C-style
backslash escaping is used.)

## Automatic Protocol Upgrading

It might be wise to automatically upgrade to logging via the Journal's native
protocol in clients that previously used the BSD syslog protocol. Behaviour in
this case should be pretty obvious: try connecting a socket to
`/run/systemd/journal/socket` first (on success use the native Journal
protocol), and if that fails fall back to `/dev/log` (and use the BSD syslog
protocol).

Programs normally logging to STDERR might also choose to upgrade to native
Journal logging in case they are invoked via systemd's service logic, where
STDOUT and STDERR are going to the Journal anyway. By preferring the native
protocol over STDERR-based logging, structured metadata can be passed along,
including priority information and more — which is not available on STDERR
based logging. If a program wants to detect automatically whether its STDERR is
connected to the Journal's stream transport, look for the `$JOURNAL_STREAM`
environment variable. The systemd service logic sets this variable to a
colon-separated pair of device and inode number (formatted in decimal ASCII) of
the STDERR file descriptor. If the `.st_dev` and `.st_ino` fields of the
`struct stat` data returned by `fstat(STDERR_FILENO, …)` match these values a
program can be sure its STDERR is connected to the Journal, and may then opt to
upgrade to the native Journal protocol via an `AF_UNIX` socket of its own, and
cease to use STDERR.

Why bother with this environment variable check? A service program invoked by
systemd might employ shell-style I/O redirection on invoked subprograms, and
those should likely not upgrade to the native Journal protocol, but instead
continue to use the redirected file descriptors passed to them. Thus, by
comparing the device and inode number of the actual STDERR file descriptor with
the one the service manager passed, one can make sure that no I/O redirection
took place for the current program.

## Alternative Implementations

If you are looking for alternative implementations of this protocol (besides
systemd's own in `sd_journal_print()`), consider
[GLib's](https://gitlab.gnome.org/GNOME/glib/-/blob/main/glib/gmessages.c) or
[`dbus-broker`'s](https://github.com/bus1/dbus-broker/blob/main/src/util/log.c).

And that's already all there is to it.
Commit	Line	Data
1a80f4e0 LP	1	---
	2	title: Native Journal Protocol
	3	category: Interfaces
	4	layout: default
0aff7b75	5	SPDX-License-Identifier: LGPL-2.1-or-later
1a80f4e0 LP	6	---
	7
	8	# Native Journal Protocol
	9
	10	`systemd-journald.service` accepts log data via various protocols:
	11
	12	* Classic RFC3164 BSD syslog via the `/dev/log` socket
	13	* STDOUT/STDERR of programs via `StandardOutput=journal` + `StandardError=journal` in service files (both of which are default settings)
	14	* Kernel log messages via the `/dev/kmsg` device node
	15	* Audit records via the kernel's audit subsystem
	16	* Structured log messages via `journald`'s native protocol
	17
	18	The latter is what this document is about: if you are developing a program and
	19	want to pass structured log data to `journald`, it's the Journal's native
f223fd6a	20	protocol that you want to use. The systemd project provides the
1a80f4e0 LP	21	[`sd_journal_print(3)`](https://www.freedesktop.org/software/systemd/man/sd_journal_print.html)
	22	API that implements the client side of this protocol. This document explains
	23	what this interface does behind the scenes, in case you'd like to implement a
	24	client for it yourself, without linking to `libsystemd` — for example because
	25	you work in a programming language other than C or otherwise want to avoid the
	26	dependency.
	27
	28	## Basics
	29
	30	The native protocol of `journald` is spoken on the
	31	`/run/systemd/journal/socket` `AF_UNIX`/`SOCK_DGRAM` socket on which
	32	`systemd-journald.service` listens. Each datagram sent to this socket
	33	encapsulates one journal entry that shall be written. Since datagrams are
	34	subject to a size limit and we want to allow large journal entries, datagrams
	35	sent over this socket may come in one of two formats:
	36
	37	* A datagram with the literal journal entry data as payload, without
	38	any file descriptors attached.
	39
	40	* A datagram with an empty payload, but with a single
	41	[`memfd`](https://man7.org/linux/man-pages/man2/memfd_create.2.html)
	42	file descriptor that contains the literal journal entry data.
	43
	44	Other combinations are not permitted, i.e. datagrams with both payload and file
	45	descriptors, or datagrams with neither, or more than one file descriptor. Such
	46	datagrams are ignored. The `memfd` file descriptor should be fully sealed. The
	47	binary format in the datagram payload and in the `memfd` memory is
	48	identical. Typically a client would attempt to first send the data as datagram
	49	payload, but if this fails with an `EMSGSIZE` error it would immediately retry
	50	via the `memfd` logic.
	51
	52	A client probably should bump up the `SO_SNDBUF` socket option of its `AF_UNIX`
	53	socket towards `journald` in order to delay blocking I/O as much as possible.
	54
	55	## Data Format
	56
	57	Each datagram should consist of a number of environment-like key/value
	58	assignments. Unlike environment variable assignments the value may contain NUL
	59	bytes however, as well as any other binary data. Keys may not include the `=`
	60	or newline characters (or any other control characters or non-ASCII characters)
	61	and may not be empty.
	62
4bb37359	63	Serialization into the datagram payload or `memfd` is straightforward: each
1a80f4e0 LP	64	key/value pair is serialized via one of two methods:
	65
	66	* The first method inserts a `=` character between key and value, and suffixes
	67	the result with `\n` (i.e. the newline character, ASCII code 10). Example: a
	68	key `FOO` with a value `BAR` is serialized `F`, `O`, `O`, `=`, `B`, `A`, `R`,
	69	`\n`.
	70
	71	* The second method should be used if the value of a field contains a `\n`
	72	byte. In this case, the key name is serialized as is, followed by a `\n`
da890466	73	character, followed by a (non-aligned) little-endian unsigned 64-bit integer
1a80f4e0 LP	74	encoding the size of the value, followed by the literal value data, followed by
	75	`\n`. Example: a key `FOO` with a value `BAR` may be serialized using this
	76	second method as: `F`, `O`, `O`, `\n`, `\003`, `\000`, `\000`, `\000`, `\000`,
	77	`\000`, `\000`, `\000`, `B`, `A`, `R`, `\n`.
	78
	79	If the value of a key/value pair contains a newline character (`\n`), it must
	80	be serialized using the second method. If it does not, either method is
	81	permitted. However, it is generally recommended to use the first method if
	82	possible for all key/value pairs where applicable since the generated datagrams
	83	are easily recognized and understood by the human eye this way, without any
	84	manual binary decoding — which improves the debugging experience a lot, in
	85	particular with tools such as `strace` that can show datagram content as text
	86	dump. After all, log messages are highly relevant for debugging programs, hence
	87	optimizing log traffic for readability without special tools is generally
	88	desirable.
	89
	90	Note that keys that begin with `_` have special semantics in `journald`: they
	91	are trusted and implicitly appended by `journald` on the receiving
	92	side. Clients should not send them — if they do anyway, they will be ignored.
	93
	94	The most important key/value pair to send is `MESSAGE=`, as that contains the
	95	actual log message text. Other relevant keys a client should send in most cases
	96	are `PRIORITY=`, `CODE_FILE=`, `CODE_LINE=`, `CODE_FUNC=`, `ERRNO=`. It's
	97	recommended to generate these fields implicitly on the client side. For further
	98	information see the [relevant documentation of these
	99	fields](https://www.freedesktop.org/software/systemd/man/systemd.journal-fields.html).
	100
	101	The order in which the fields are serialized within one datagram is undefined
	102	and may be freely chosen by the client. The server side might or might not
	103	retain or reorder it when writing it to the Journal.
	104
	105	Some programs might generate multi-line log messages (e.g. a stack unwinder
	106	generating log output about a stack trace, with one line for each stack
	107	frame). It's highly recommended to send these as a single datagram, using a
	108	single `MESSAGE=` field with embedded newline characters between the lines (the
	109	second serialization method described above must hence be used for this
	110	field). If possible do not split up individual events into multiple Journal
	111	events that might then be processed and written into the Journal as separate
	112	entries. The Journal toolchain is capable of handling multi-line log entries
	113	just fine, and it's generally preferred to have a single set of metadata fields
	114	associated with each multi-line message.
	115
	116	Note that the same keys may be used multiple times within the same datagram,
	117	with different values. The Journal supports this and will write such entries to
	118	disk without complaining. This is useful for associating a single log entry
	119	with multiple suitable objects of the same type at once. This should only be
	120	used for specific Journal fields however, where this is expected. Do not use
	121	this for Journal fields where this is not expected and where code reasonably
	122	assumes per-event uniqueness of the keys. In most cases code that consumes and
	123	displays log entries is likely to ignore such non-unique fields or only
	124	consider the first of the specified values. Specifically, if a Journal entry
	125	contains multiple `MESSAGE=` fields, likely only the first one is
	126	displayed. Note that a well-written logging client library thus will not use a
	127	plain dictionary for accepting structured log metadata, but rather a data
	128	structure that allows non-unique keys, for example an array, or a dictionary
	129	that optionally maps to a set of values instead of a single value.
	130
	131	## Example Datagram
	132
	133	Here's an encoded message, with various common fields, all encoded according to
	134	the first serialization method, with the exception of one, where the value
	135	contains a newline character, and thus the second method is needed to be used.
	136
	137	```
138	PRIORITY=3\n
139	SYSLOG_FACILITY=3\n
140	CODE_FILE=src/foobar.c\n
141	CODE_LINE=77\n
142	BINARY_BLOB\n
143	\004\000\000\000\000\000\000\000xx\nx\n
144	CODE_FUNC=some_func\n
145	SYSLOG_IDENTIFIER=footool\n
146	MESSAGE=Something happened.\n
147	```
148
149	(Lines are broken here after each `\n` to make things more readable. C-style
150	backslash escaping is used.)
151
152	## Automatic Protocol Upgrading
153
154	It might be wise to automatically upgrade to logging via the Journal's native
155	protocol in clients that previously used the BSD syslog protocol. Behaviour in
156	this case should be pretty obvious: try connecting a socket to
157	`/run/systemd/journal/socket` first (on success use the native Journal
158	protocol), and if that fails fall back to `/dev/log` (and use the BSD syslog
159	protocol).
160
161	Programs normally logging to STDERR might also choose to upgrade to native
162	Journal logging in case they are invoked via systemd's service logic, where
163	STDOUT and STDERR are going to the Journal anyway. By preferring the native
164	protocol over STDERR-based logging, structured metadata can be passed along,
165	including priority information and more — which is not available on STDERR
166	based logging. If a program wants to detect automatically whether its STDERR is
167	connected to the Journal's stream transport, look for the `$JOURNAL_STREAM`
168	environment variable. The systemd service logic sets this variable to a
169	colon-separated pair of device and inode number (formatted in decimal ASCII) of
170	the STDERR file descriptor. If the `.st_dev` and `.st_ino` fields of the
171	`struct stat` data returned by `fstat(STDERR_FILENO, …)` match these values a
172	program can be sure its STDERR is connected to the Journal, and may then opt to
173	upgrade to the native Journal protocol via an `AF_UNIX` socket of its own, and
174	cease to use STDERR.
175
176	Why bother with this environment variable check? A service program invoked by
177	systemd might employ shell-style I/O redirection on invoked subprograms, and
178	those should likely not upgrade to the native Journal protocol, but instead
179	continue to use the redirected file descriptors passed to them. Thus, by
180	comparing the device and inode number of the actual STDERR file descriptor with
181	the one the service manager passed, one can make sure that no I/O redirection
182	took place for the current program.
183
184	## Alternative Implementations
185
186	If you are looking for alternative implementations of this protocol (besides
187	systemd's own in `sd_journal_print()`), consider
df1f621b	188	[GLib's](https://gitlab.gnome.org/GNOME/glib/-/blob/main/glib/gmessages.c) or
1a80f4e0 LP	189	[`dbus-broker`'s](https://github.com/bus1/dbus-broker/blob/main/src/util/log.c).
	190
	191	And that's already all there is to it.