Michael Brown [Mon, 28 Apr 2025 14:20:43 +0000 (15:20 +0100)]
[initrd] Use physical addresses for calculations on initrd locations
Commit ef03849 ("[uaccess] Remove redundant userptr_add() and
userptr_diff()") exposed a signedness bug in the comparison of initrd
locations, since the expression (initrd->data - current) was
effectively no longer coerced to a signed type.
In particular, the common case will be that the top of the initrd
region is the start of the iPXE .textdata region, which has virtual
address zero. This causes initrd->data to compare as being above the
top of the initrd region for all images, when this bug would
previously have been limited to affecting only initrds placed 2GB or
more below the start of .textdata.
Fix by using physical addresses for all comparisons on initrd
locations.
Reported-by: Sven Dreyer <sven@dreyer-net.de> Reported-by: Harald Jensås <hjensas@redhat.com> Reported-by: Jan ONDREJ (SAL) <ondrejj@salstar.sk> Signed-off-by: Michael Brown <mcb30@ipxe.org>
Michael Brown [Mon, 28 Apr 2025 10:20:16 +0000 (11:20 +0100)]
[multiboot] Remove userptr_t from Multiboot and ELF image parsing
Simplify Multiboot and ELF image parsing by assuming that the
Multiboot and ELF headers are directly accessible via pointer
dereferences, and add some missing header validations.
GCC 15 generates a warning when a string initializer is too large to
allow for a trailing NUL terminator byte. This type of initializer is
fairly common in signature strings such as ACPI table identifiers.
Michael Brown [Sun, 27 Apr 2025 16:37:44 +0000 (17:37 +0100)]
[build] Remove unsafe disable function wrapper from legacy NIC drivers
The legacy NIC drivers do not consistently take a second parameter in
their disable function. We currently use an unsafe function wrapper
that declares no parameters, and rely on the ABI allowing a second
parameter to be silently ignored if not expected by the caller. As of
GCC 15, this hack results in an incompatible pointer type warning.
Fix by removing the hack, and instead updating all relevant legacy NIC
drivers to take an unused second parameter in their disable function.
Michael Brown [Fri, 25 Apr 2025 12:24:21 +0000 (13:24 +0100)]
[fbcon] Avoid redrawing unchanged characters when scrolling
Scrolling currently involves redrawing every character cell, which can
be frustratingly slow on large framebuffer consoles. Accelerate this
operation by skipping the redraw for any unchanged character cells.
In the common case that large areas of the screen contain whitespace,
this optimises away the vast majority of the redrawing operations.
Michael Brown [Fri, 25 Apr 2025 09:52:26 +0000 (10:52 +0100)]
[fbcon] Remove userptr_t from framebuffer console drivers
Simplify the framebuffer console drivers by assuming that the raw
framebuffer, character cell array, background picture, and glyph data
are all directly accessible via pointer dereferences.
In particular, this avoids the need to copy each glyph during drawing:
the VESA framebuffer driver can simply return a pointer to the glyph
data stored in the video ROM.
Michael Brown [Thu, 24 Apr 2025 22:36:32 +0000 (23:36 +0100)]
[pxe] Remove userptr_t from PXE API call dispatcher
Simplify the PXE API call dispatcher code by assuming that the PXE
parameter block is accessible via a direct pointer dereference. This
avoids the need for the API call dispatcher to know the size of the
parameter block.
Michael Brown [Wed, 23 Apr 2025 11:47:53 +0000 (12:47 +0100)]
[umalloc] Remove userptr_t from user memory allocations
Use standard void pointers for umalloc(), urealloc(), and ufree(),
with the "u" prefix retained to indicate that these allocations are
made from external ("user") memory rather than from the internal heap.
Michael Brown [Wed, 23 Apr 2025 08:53:38 +0000 (09:53 +0100)]
[smbios] Remove userptr_t from SMBIOS structure parsing
Simplify the SMBIOS structure parsing code by assuming that all
structure content is fully accessible via pointer dereferences.
In particular, this allows the convoluted find_smbios_structure() and
read_smbios_structure() to be combined into a single function
smbios_structure() that just returns a direct pointer to the SMBIOS
structure, with smbios_string() similarly now returning a direct
pointer to the relevant string.
Michael Brown [Mon, 21 Apr 2025 23:28:07 +0000 (00:28 +0100)]
[crypto] Remove userptr_t from CMS verification and decryption
Simplify the CMS code by assuming that all content is fully accessible
via pointer dereferences. This avoids the need to use fragment loops
for calculating digests and decrypting (or reencrypting) data.
Michael Brown [Mon, 21 Apr 2025 21:40:59 +0000 (22:40 +0100)]
[crypto] Remove userptr_t from ASN.1 parsers
Simplify the ASN.1 code by assuming that all objects are fully
accessible via pointer dereferences. This allows the concept of
"additional data beyond the end of the cursor" to be removed, and
simplifies parsing of all ASN.1 image formats.
Michael Brown [Mon, 21 Apr 2025 15:16:01 +0000 (16:16 +0100)]
[uaccess] Remove user_to_phys() and phys_to_user()
Remove the intermediate concept of a user pointer from physical
address conversions, leaving virt_to_phys() and phys_to_virt() as the
directly implemented functions.
Michael Brown [Sun, 20 Apr 2025 17:29:48 +0000 (18:29 +0100)]
[uaccess] Remove redundant memcpy_user() and related string functions
The memcpy_user(), memmove_user(), memcmp_user(), memset_user(), and
strlen_user() functions are now just straightforward wrappers around
the corresponding standard library functions.
Michael Brown [Sun, 20 Apr 2025 16:26:48 +0000 (17:26 +0100)]
[uaccess] Change userptr_t to be a pointer type
The original motivation for the userptr_t type was to be able to
support a pure 16-bit real-mode memory model in which a segment:offset
value could be encoded as an unsigned long, with corresponding
copy_from_user() and copy_to_user() functions used to perform
real-mode segmented memory accesses.
Since this memory model was first created almost twenty years ago, no
serious effort has been made to support a pure 16-bit mode of
operation for iPXE. The constraints imposed by the memory model are
becoming increasingly cumbersome to work within: for example, the
parsing of devicetree structures is hugely simplified by being able to
use and return direct pointers to the names and property values. The
devicetree code therefore relies upon virt_to_user(), which is
nominally illegal under the userptr_t memory model.
Drop support for the concept of a memory location that cannot be
reached through a straightforward pointer dereference, by redefining
userptr_t to be a simple pointer type.
Michael Brown [Sun, 20 Apr 2025 16:18:06 +0000 (17:18 +0100)]
[uaccess] Rename userptr_sub() to userptr_diff()
Clarify the intended usage of userptr_sub() by renaming it to
userptr_diff() (to avoid confusion with userptr_add()), and fix the
existing call sites that erroneously use userptr_sub() to subtract an
offset from a userptr_t value.
Michael Brown [Sat, 19 Apr 2025 12:35:23 +0000 (13:35 +0100)]
[time] Use currticks() to provide the null system time
For platforms with no real-time clock (such as RISC-V SBI) we use the
null time source, which currently just returns a constant zero.
Switch to using currticks() to provide a clock that does not represent
the real current time, but does at least advance at approximately the
correct rate. In conjunction with the "ntp" command, this allows
these platforms to use time-dependent features such as X.509
certificate verification for HTTPS connections.
Michael Brown [Wed, 16 Apr 2025 23:29:41 +0000 (00:29 +0100)]
[efi] Inhibit calls to Shutdown() for wireless SNP devices
The UEFI model for wireless network configuration is somewhat
underdefined. At the time of writing, the EDK2 "UEFI WiFi Connection
Manager" driver provides only one way to configure wireless network
credentials, which is to enter them interactively via an HII form.
Credentials are not stored (or exposed via any protocol interface),
and so any temporary disconnection from the wireless network will
inevitably leave the interface in an unusable state that cannot be
recovered without user intervention.
Experimentation shows that at least some wireless network drivers
(observed with an HP Elitebook 840 G10) will disconnect from the
wireless network when the SNP Shutdown() method is called, or if the
device is not polled sufficiently frequently to maintain its
association to the network. We therefore inhibit calls to Shutdown()
and Stop() for any such SNP protocol interfaces, and mark our network
device as insomniac so that it will be polled even when closed.
Note that we need to inhibit not only our own calls to Shutdown() and
Stop(), but also those that will be attempted by MnpDxe when we
disconnect it from the SNP handle. We do this by patching the
installed SNP protocol interface structure to modify the Shutdown()
and Stop() method pointers, which is ugly but unavoidable.
Michael Brown [Wed, 16 Apr 2025 23:27:13 +0000 (00:27 +0100)]
[netdevice] Add the concept of an insomniac network device
Some network devices (observed with the SNP interface to the wireless
network card on an HP Elitebook 840 G10) will stop working if they are
left for too long without being polled.
Add the concept of an insomniac network device, that must continue to
be polled even when closed.
Note that drivers are already permitted to call netdev_rx() et al even
when closed: this will already be happening for USB devices since
polling operates at the level of the whole USB bus, rather than at the
level of individual USB devices.
Michael Brown [Wed, 16 Apr 2025 20:26:45 +0000 (21:26 +0100)]
[efi] Allow for custom methods for disconnecting existing drivers
Allow for greater control over the process used to disconnect existing
drivers from a device handle, by converting the "exclude" field from a
simple protocol GUID to a per-driver method.
Michael Brown [Tue, 15 Apr 2025 19:19:17 +0000 (20:19 +0100)]
[dt] Provide dt_ioremap() to map device registers
Devicetree devices encode register address ranges within the "reg"
property, with the number of cells used for addresses and for sizes
determined by the #address-cells and #size-cells properties of the
immediate parent device.
Record the number of address and size cells for each device, and
provide a dt_ioremap() function to allow drivers to map a specified
range without having to directly handle the "reg" property.
Michael Brown [Tue, 15 Apr 2025 12:11:48 +0000 (13:11 +0100)]
[crypto] Allow for explicit control of external trust sources
We currently disable all external trust sources (such as the UEFI
TlsCaCertificate variable) if an explicit TRUST=... parameter is
provided on the build command line.
Define an explicit TRUST_EXT build parameter that can be used to
explicitly disable external trust sources even if no TRUST=...
parameter is provided, or to explicitly enable external trust sources
even if an explicit TRUST=... parameter is provided. For example:
# Default trusted root certificate, disable external sources
make TRUST_EXT=0
If no TRUST_EXT parameter is specified, then continue to default to
disabling external trust sources if an explicit TRUST=... parameter is
provided, to maintain backwards compatibility with existing build
command lines.
Michael Brown [Mon, 14 Apr 2025 10:34:20 +0000 (11:34 +0100)]
[dt] Add basic concept of a devicetree bus
Add a basic model for devices instantiated by parsing the system
flattened device tree, with drivers matched via the "compatible"
property for any non-root node.
Michael Brown [Tue, 1 Apr 2025 15:53:02 +0000 (16:53 +0100)]
[fdt] Populate boot arguments in constructed device tree
When creating a device tree to pass to a booted operating system,
ensure that the "chosen" node exists, and populate the "bootargs"
property with the image command line.
Michael Brown [Mon, 31 Mar 2025 16:44:59 +0000 (17:44 +0100)]
[x509] Ensure certificate remains valid during x509_append()
The allocation of memory for the certificate chain link may cause the
certificate itself to be freed by the cache discarder, if the only
current reference to the certificate is held by the certificate store
and the system runs out of memory during the call to malloc().
Ensure that this cannot happen by taking out a temporary additional
reference to the certificate within x509_append(), rather than
requiring the caller to do so.
Michael Brown [Mon, 31 Mar 2025 15:36:33 +0000 (16:36 +0100)]
[tls] Support fragmentation of transmitted records
Large transmitted records may arise if we have long client certificate
chains or if a client sends a large block of data (such as a large
HTTP POST payload). Fragment records as needed to comply with the
value that we advertise via the max_fragment_length extension.
Michael Brown [Mon, 31 Mar 2025 13:25:41 +0000 (14:25 +0100)]
[tls] Send an empty client certificate chain if we have no certificate
RFC5246 states that "a client MAY send no certificates if it does not
have an appropriate certificate to send in response to the server's
authentication request". This use case may arise when the server is
using optional client certificate verification and iPXE has not been
provided with a client certificate to use.
Treat the absence of a suitable client certificate as a non-fatal
condition and send a Certificate message containing no certificates as
permitted by RFC5246.
Reported-by: Alexandre Ravey <alexandre@voilab.ch> Originally-implemented-by: Alexandre Ravey <alexandre@voilab.ch> Signed-off-by: Michael Brown <mcb30@ipxe.org>
Michael Brown [Mon, 31 Mar 2025 12:33:44 +0000 (13:33 +0100)]
[iobuf] Limit automatic I/O buffer alignment to page size
Without any explicit alignment requirement, we will currently allocate
I/O buffers on their own size rounded up to the nearest power of two.
This is done to simplify driver transmit code paths, which can assume
that a standard Ethernet frame lies within a single physical page and
therefore does not need to be split even for devices with DMA engines
that cannot cross page boundaries.
Limit this automatic alignment to a maximum of the page size, to avoid
requiring excessive alignment for unusually large buffers (such as a
buffer allocated for an HTTP POST with a large parameter list).
Michael Brown [Sun, 30 Mar 2025 23:15:27 +0000 (00:15 +0100)]
[tls] Encrypt data in place to reduce memory usage
Provide a custom xfer_alloc_iob() handler to ensure that transmit I/O
buffers contain sufficient headroom for the TLS record header and
record initialisation vector, and sufficient tailroom for the MAC,
block cipher padding, and authentication tag. This allows us to use
in-place encryption for the actual data within the I/O buffer, which
essentially halves the amount of memory that needs to be allocated for
a TLS data transmission.
Michael Brown [Sun, 30 Mar 2025 20:47:34 +0000 (21:47 +0100)]
[xfer] Use xfer_alloc_iob() for transmit I/O buffers on stream sockets
Datagram sockets such as UDP, ICMP, and fibre channel tend to provide
a custom xfer_alloc_iob() handler to ensure that transmit I/O buffers
contain sufficient headroom to accommodate any required protocol
headers.
Stream sockets such as TCP and TLS do not typically provide a custom
xfer_alloc_iob() handler at present. The default handler simply calls
alloc_iob(), and so stream socket consumers can therefore get away
with using alloc_iob() rather than xfer_alloc_iob().
Fix the HTTP and ONC RPC protocols to use xfer_alloc_iob() where
relevant, in order to operate correctly if the underlying stream
socket chooses to provide a custom xfer_alloc_iob() handler.
Michael Brown [Sat, 29 Mar 2025 23:01:21 +0000 (23:01 +0000)]
[isa] Disable legacy ISA device probing by default
Legacy ISA device probing involves poking at various I/O addresses to
guess whether or not a particular device is present.
Actual legacy ISA cards are essentially nonexistent by now, but the
probed I/O addresses have a habit of being reused for various
OEM-specific functions. This can cause some very undesirable side
effects. For example, probing for the "ne2k_isa" driver on an HP
Elitebook 840 G10 will cause the system to lock up in a way that
requires two cold reboots to recover.
Enable ISA_PROBE_ONLY in config/isa.h by default. This limits ISA
probing to use only the addresses specified in ISA_PROBE_ADDRS, which
is empty by default, and so effectively disables ISA probing. The
vanishingly small number of users who require ISA probing can simply
adjust this configuration in config/local/isa.h.
Michael Brown [Sat, 29 Mar 2025 21:28:53 +0000 (21:28 +0000)]
[efi] Allow for fact that SNP device may be removed by executed image
The executed image may call DisconnectController() to remove our
network device. This will leave the net device unregistered but not
yet freed (since our installed PXE base code protocol retains a
reference to the net device).
Unregistration will cause the network upper-layer driver removal
functions to be called, which will free the SNP device structure.
When the image returns from StartImage(), the snpdev pointer may
therefore no longer be valid.
The SNP device structure is not reference counted, and so we cannot
simply take out a reference to ensure that it remains valid across the
call to StartImage(). However, the code path following the call to
StartImage() doesn't actually require the SNP device pointer, only the
EFI device handle.
Store the device handle in a local variable and ensure that snpdev is
invalidated before the call to StartImage() so that future code cannot
accidentally reintroduce this issue.
Michael Brown [Sat, 29 Mar 2025 14:57:16 +0000 (14:57 +0000)]
[efi] Disconnect existing drivers on a per-protocol basis
UEFI does not provide a direct method to disconnect the existing
driver of a specific protocol from a handle. We currently use
DisconnectController() to remove all drivers from a handle that we
want to drive ourselves, and then rely on recursion in the call to
ConnectController() to reconnect any drivers that did not need to be
disconnected in the first place.
Experience shows that OEMs tend not to ever test the disconnection
code paths in their UEFI drivers, and it is common to find drivers
that refuse to disconnect, fail to close opened handles, fail to
function correctly after reconnection, or lock up the entire system.
Implement a more selective form of disconnection, in which we use
OpenProtocolInformation() to identify the driver associated with a
specific protocol, and then disconnect only that driver.
Perform disconnections in reverse order of attachment priority, since
this is the order likely to minimise the number of cascaded implicit
disconnections.
This allows our MNP driver to avoid performing any disconnections at
all, since it does not require exclusive access to the MNP protocol.
It also avoids performing unnecessary disconnections and reconnections
of unrelated drivers such as the "UEFI WiFi Connection Manager" that
attaches to wireless network interfaces in order to manage wireless
network associations.
Michael Brown [Sat, 29 Mar 2025 15:11:57 +0000 (15:11 +0000)]
[efi] Show all drivers claiming support for a handle in debug messages
UEFI assumes in several places that an image installs only a single
driver binding protocol instance, and that this is installed on the
image handle itself. We therefore provide a single driver binding
protocol instance, which delegates to the various internal drivers
(for EFI_PCI_IO_PROTOCOL, EFI_USB_IO_PROTOCOL, etc) as appropriate.
The debug messages produced by our Supported() method can end up
slightly misleading, since they will report only the first internal
driver that claims support for a device. In the common case of the
all-drivers build, there may be multiple drivers that claim support
for the same handle: for example, the PCI, NII, SNP, and MNP drivers
are all likely to initially find the protocols that they need on the
same device handle.
Report all internal drivers that claim support for a device, to avoid
confusing debug messages.
Michael Brown [Sat, 29 Mar 2025 18:41:01 +0000 (18:41 +0000)]
[efi] Return success from Stop() if driver is already stopped
Return success if asked to stop driving a device that we are not
currently driving. This avoids propagating spurious errors to an
external caller of DisconnectController().
Michael Brown [Fri, 28 Mar 2025 14:20:44 +0000 (14:20 +0000)]
[efi] Install a device tree for the booted OS, if available
If we have a device tree available (e.g. because the user has
explicitly downloaded a device tree using the "fdt" command), then
provide it to the booted operating system as an EFI configuration
table.
Since x86 does not typically use device trees, we create weak symbols
for efi_fdt_install() and efi_fdt_uninstall() to avoid dragging FDT
support into all x86 UEFI binaries.
Michael Brown [Fri, 28 Mar 2025 14:17:29 +0000 (14:17 +0000)]
[fdt] Provide the ability to create a device tree for a booted OS
Provide fdt_create() to create a device tree to be passed to a booted
operating system. The device tree will be created from the FDT image
(if present), falling back to the system device tree (if present).
Michael Brown [Fri, 28 Mar 2025 14:10:55 +0000 (14:10 +0000)]
[efi] Create a copy of the system flattened device tree, if present
EFI configuration tables may be freed at any time, and there is no way
to be notified when the table becomes invalidated. Create a copy of
the system flattened device tree (if present), so that we do not risk
being left with an invalid pointer.
Michael Brown [Fri, 28 Mar 2025 14:08:18 +0000 (14:08 +0000)]
[fdt] Allow for parsing device trees where the length is known in advance
Allow for parsing device trees where an external factor (such as a
downloaded image length) determines the maximum length, which must be
validated against the length within the device tree header.
Michael Brown [Fri, 28 Mar 2025 12:42:30 +0000 (12:42 +0000)]
[fdt] Allow for the existence of multiple device trees
When running on a platform that uses FDT as its hardware description
mechanism, we are likely to have multiple device tree structures. At
a minimum, there will be the device tree passed to us from the
previous boot stage (e.g. OpenSBI), and the device tree that we
construct to be passed to the booted operating system.
Update the internal FDT API to include an FDT pointer in all function
parameter lists.
Michael Brown [Thu, 27 Mar 2025 11:30:27 +0000 (11:30 +0000)]
[fdt] Add the concept of an FDT image
Define the concept of an "FDT" image, representing a Flattened Device
Tree blob that has been downloaded in order to be provided to a kernel
or other executable image. FDT images are represented using an image
tag (as with other special-purpose images such as the UEFI shim), and
are similarly marked as hidden so that they will not be included in a
generated magic initrd or show up in a virtual filesystem directory
listing.
Michael Brown [Wed, 26 Mar 2025 13:58:22 +0000 (13:58 +0000)]
[undi] Ensure forward progress is made even if UNDI IRQ is stuck
If the UNDI interrupt remains constantly asserted (e.g. because the
BIOS has enabled interrupts for an unrelated device sharing the same
IRQ, or because of bugs in the OEM UNDI driver), then we may get stuck
in an interrupt storm.
We cannot safely chain to the previous interrupt handler (which could
plausibly handle an unrelated device interrupt) since there is no
well-defined behaviour for previous interrupt handlers. We have
observed BIOSes to provide default interrupt handlers that variously
do nothing, send EOI, disable the IRQ, or crash the system.
Fix by disabling the UNDI interrupt whenever our handler is triggered,
and rearm it as needed when polling the network device. This ensures
that forward progress continues to be made even if something causes
the interrupt to be constantly asserted.
Michael Brown [Wed, 26 Mar 2025 14:49:08 +0000 (14:49 +0000)]
[pxeprefix] Ensure that UNDI IRQ is disabled before starting iPXE
When using the undionly.kkpxe binary (which is never recommended), the
UNDI interrupt may still be enabled when iPXE starts up. If the PXE
base code interrupt handler is not well-behaved, this can result in
undefined behaviour when interrupts are first enabled (e.g. for
entropy gathering, or for allowing the timer tick to occur).
Fix by detecting and disabling the UNDI interrupt during the prefix
code.
Michael Brown [Wed, 26 Mar 2025 11:35:48 +0000 (11:35 +0000)]
[pxeprefix] Work around missing type values from PXENV_UNDI_GET_NIC_TYPE
The implementation of PXENV_UNDI_GET_NIC_TYPE in some PXE ROMs
(observed with an Intel X710 ROM in a Dell PowerEdge R6515) will fail
to write the NicType byte, leaving it uninitialised.
Prepopulate the NicType byte with a highly unlikely value as a
sentinel to allow us to detect this, and assume that any such devices
are overwhelmingly likely to be PCI devices.