Michael Brown [Wed, 20 Nov 2024 14:21:16 +0000 (14:21 +0000)]
[efi] Ensure local drives are connected when attempting a SAN boot
UEFI systems may choose not to connect drivers for local disk drives
when the boot policy is set to attempt a network boot. This may cause
the "sanboot" command to be unable to boot from a local drive, since
the relevant block device and filesystem drivers may not have been
connected.
Fix by ensuring that all available drivers are connected before
attempting to boot from an EFI block device.
Reported-by: Andrew Cottrell <andrew.cottrell@xtxmarkets.com> Tested-by: Andrew Cottrell <andrew.cottrell@xtxmarkets.com> Signed-off-by: Michael Brown <mcb30@ipxe.org>
Michael Brown [Tue, 29 Oct 2024 12:50:37 +0000 (12:50 +0000)]
[build] Allow for per-architecture cross-compilation prefixes
We currently require the variable CROSS (or CROSS_COMPILE) to be set
to specify the global cross-compilation prefix. This becomes
cumbersome when developing across multiple CPU architectures,
requiring frequent editing of build command lines and preventing
incompatible architectures from being built with a single command.
Allow a default cross-compilation prefix for each architecture to be
specified via the CROSS_COMPILE_<arch> variables. These may then be
provided as environment variables, e.g. using
This change requires some portions of the Makefile to be rearranged,
to allow for the fact that $(CROSS_COMPILE) may not have been set
until the build directory has been parsed to determine the CPU
architecture.
Michael Brown [Mon, 28 Oct 2024 22:58:56 +0000 (22:58 +0000)]
[riscv] Check if seed CSR is accessible from S-mode
The seed CSR defined by the Zkr extension is accessible only in M-mode
by default. Older versions of OpenSBI (prior to version 1.4) do not
set mseccfg.sseed, with the result that attempts to access the seed
CSR from S-mode will raise an illegal instruction exception.
Add a facility for testing the accessibility of arbitrary CSRs, and
use it to check that the seed CSR is accessible before reporting the
seed CSR entropy source as being functional.
Michael Brown [Mon, 28 Oct 2024 00:10:18 +0000 (00:10 +0000)]
[riscv] Add support for the seed CSR as an entropy source
The Zkr entropy source extension defines a potentially unprivileged
seed CSR that can be read to obtain 16 bits of entropy input, with a
mandated requirement that 256 entropy input bits read from the seed
CSR will contain at least 128 bits of min-entropy.
Michael Brown [Mon, 28 Oct 2024 13:48:11 +0000 (13:48 +0000)]
[riscv] Add support for RDTIME as a timer source
The Zicntr extension defines an unprivileged wall-clock time CSR that
roughly matches the behaviour of an invariant TSC on x86. The nominal
frequency of this timer may be read from the "timebase-frequency"
property of the CPU node in the device tree.
Add a timer source using RDTIME to provide implementations of udelay()
and currticks(), modelled on the existing RDTSC-based timer for x86.
Michael Brown [Mon, 28 Oct 2024 11:44:41 +0000 (11:44 +0000)]
[riscv] Add support for checking CPU extensions reported via device tree
RISC-V seems to allow for direct discovery of CPU features only from
M-mode (e.g. by setting up a trap handler and then attempting to
access a CSR), with S-mode code expected to read the resulting
constructed ISA description from the device tree.
Add the ability to check for the presence of named extensions listed
in the "riscv,isa" property of the device tree node corresponding to
the boot hart.
Michael Brown [Fri, 25 Oct 2024 13:21:27 +0000 (14:21 +0100)]
[uaccess] Rename UACCESS_EFI to UACCESS_FLAT
Running with flat physical addressing is a fairly common early boot
environment. Rename UACCESS_EFI to UACCESS_FLAT so that this code may
be reused in non-UEFI boot environments that also use flat physical
addressing.
Michael Brown [Tue, 22 Oct 2024 11:51:48 +0000 (12:51 +0100)]
[riscv] Add support for the SBI debug console
Add the ability to issue Supervisor Binary Interface (SBI) calls via
the ECALL instruction, and use the SBI DBCN extension to implement a
debug console.
Michael Brown [Mon, 21 Oct 2024 15:09:44 +0000 (16:09 +0100)]
[crypto] Add bigint_mod_invert() to calculate inverse modulo a power of two
Montgomery multiplication requires calculating the inverse of the
modulus modulo a larger power of two.
Add bigint_mod_invert() to calculate the inverse of any (odd) big
integer modulo an arbitrary power of two, using a lightly modified
version of the algorithm presented in "A New Algorithm for Inversion
mod p^k (KoƧ, 2017)".
The power of two is taken to be 2^k, where k is the number of bits
available in the big integer representation of the invertend. The
inverse modulo any smaller power of two may be obtained simply by
masking off the relevant bits in the inverse.
Michael Brown [Fri, 18 Oct 2024 12:13:28 +0000 (13:13 +0100)]
[usb] Expose USB device descriptor and strings via settings
Allow scripts to read basic information from USB device descriptors
via the settings mechanism. For example:
echo USB vendor ID: ${usb/${busloc}.8.2}
echo USB device ID: ${usb/${busloc}.10.2}
echo USB manufacturer name: ${usb/${busloc}.14.0}
The general syntax is
usb/<bus:dev>.<offset>.<length>
where bus:dev is the USB bus:device address (as obtained via the
"usbscan" command, or from e.g. ${net0/busloc} for a USB network
device), and <offset> and <length> select the required portion of the
USB device descriptor.
Following the usage of SMBIOS settings tags, a <length> of zero may be
used to indicate that the byte at <offset> contains a USB string
descriptor index, and an <offset> of zero may be used to indicate that
the <length> contains a literal USB string descriptor index.
Since the byte at offset zero can never contain a string index, and a
literal string index can never be zero, the combination of both
<length> and <offset> being zero may be used to indicate that the
entire device descriptor is to be read as a raw hex dump.
Michael Brown [Tue, 15 Oct 2024 12:50:51 +0000 (13:50 +0100)]
[crypto] Separate out bigint_reduce() from bigint_mod_multiply()
Faster modular multiplication algorithms such as Montgomery
multiplication will still require the ability to perform a single
direct modular reduction.
Neaten up the implementation of direct reduction and split it out into
a separate bigint_reduce() function, complete with its own unit tests.
Michael Brown [Tue, 8 Oct 2024 10:52:07 +0000 (11:52 +0100)]
[crypto] Use architecture-independent bigint_is_set()
Every architecture uses the same implementation for bigint_is_set(),
and there is no reason to suspect that a future CPU architecture will
provide a more efficient way to implement this operation.
Simplify the code by providing a single architecture-independent
implementation of bigint_is_set().
Michael Brown [Mon, 7 Oct 2024 11:13:42 +0000 (12:13 +0100)]
[crypto] Rename bigint_rol()/bigint_ror() to bigint_shl()/bigint_shr()
The big integer shift operations are misleadingly described as
rotations since the original x86 implementations are essentially
trivial loops around the relevant rotate-through-carry instruction.
The overall operation performed is a shift rather than a rotation.
Update the function names and descriptions to reflect this.
Michael Brown [Tue, 24 Sep 2024 18:15:11 +0000 (19:15 +0100)]
[arm] Support building as a Linux userspace binary for AArch32
Add support for building as a Linux userspace binary for AArch32.
This allows the self-test suite to be more easily run for the 32-bit
ARM code. For example:
make CROSS=arm-linux-gnu- bin-arm32-linux/tests.linux
Michael Brown [Tue, 24 Sep 2024 10:52:16 +0000 (11:52 +0100)]
[arm] Check PMCCNTR availability before use for profiling
Reading from PMCCNTR causes an undefined instruction exception when
running in PL0 (e.g. as a Linux userspace binary), unless the
PMUSERENR.EN bit is set.
Restructure profile_timestamp() for 32-bit ARM to perform an
availability check on the first invocation, with subsequent
invocations returning zero if PMCCNTR could not be enabled.
Michael Brown [Tue, 24 Sep 2024 13:49:32 +0000 (14:49 +0100)]
[profile] Standardise return type of profile_timestamp()
All consumers of profile_timestamp() currently treat the value as an
unsigned long. Only the elapsed number of ticks is ever relevant: the
absolute value of the timestamp is not used. Profiling is used to
measure short durations that are generally fewer than a million CPU
cycles, for which an unsigned long is easily large enough.
Standardise the return type of profile_timestamp() as unsigned long
across all CPU architectures. This allows 32-bit architectures such
as i386 and riscv32 to omit all logic associated with retrieving the
upper 32 bits of the 64-bit hardware counter, which simplifies the
code and allows riscv32 and riscv64 to share the same implementation.
Michael Brown [Thu, 19 Sep 2024 15:23:32 +0000 (16:23 +0100)]
[crypto] Use constant-time big integer multiplication
Big integer multiplication currently performs immediate carry
propagation from each step of the long multiplication, relying on the
fact that the overall result has a known maximum value to minimise the
number of carries performed without ever needing to explicitly check
against the result buffer size.
This is not a constant-time algorithm, since the number of carries
performed will be a function of the input values. We could make it
constant-time by always continuing to propagate the carry until
reaching the end of the result buffer, but this would introduce a
large number of redundant zero carries.
Require callers of bigint_multiply() to provide a temporary carry
storage buffer, of the same size as the result buffer. This allows
the carry-out from the accumulation of each double-element product to
be accumulated in the temporary carry space, and then added in via a
single call to bigint_add() after the multiplication is complete.
Since the structure of big integer multiplication is identical across
all current CPU architectures, provide a single shared implementation
of bigint_multiply(). The architecture-specific operation then
becomes the multiplication of two big integer elements and the
accumulation of the double-element product.
Note that any intermediate carry arising from accumulating the lower
half of the double-element product may be added to the upper half of
the double-element product without risk of overflow, since the result
of multiplying two n-bit integers can never have all n bits set in its
upper half. This simplifies the carry calculations for architectures
such as RISC-V and LoongArch64 that do not have a carry flag.
Michael Brown [Tue, 17 Sep 2024 12:11:43 +0000 (13:11 +0100)]
[gve] Allocate all possible event counters
The admin queue API requires us to tell the device how many event
counters we have provided via the "configure device resources" admin
queue command. There is, of course, absolutely no documentation
indicating how many event counters actually need to be provided.
We require only two event counters: one for the transmit queue, one
for the receive queue. (The receive queue doesn't seem to actually
make any use of its event counter, but the "create receive queue"
admin queue command will fail if it doesn't have an available event
counter to choose.)
In the absence of any documentation, we currently make the assumption
that allocating and configuring 16 counters (i.e. one whole cacheline)
will be sufficient to allow for the use of two counters.
This assumption turns out to be incorrect. On larger instance types
(observed with a c3d-standard-16 instance in europe-west4-a), we find
that creating the transmit or receive queues will each fail with a
probability of around 50% with the "failed precondition" error code.
Experimentation suggests that even though the device has accepted our
"configure device resources" command indicating that we are providing
only 16 event counters, it will attempt to choose any of its potential
32 event counters (and will then fail since the event counter that it
unilaterally chose is outside of the agreed range).
Work around this firmware bug by always allocating the maximum number
of event counters supported by the device. (This requires deferring
the allocation of the event counters until after issuing the "describe
device" command.)
Michael Brown [Mon, 16 Sep 2024 10:01:20 +0000 (11:01 +0100)]
[efi] Remove redundant EFI_BOOT_FILE definitions
As of commit 79c0173 ("[build] Create util/genfsimg for building
filesystem-based images"), the EFI boot file name for each CPU
architecture is defined within the genfsimg script itself, rather than
being passed in as a Makefile parameter.
Remove the now-redundant Makefile definitions for EFI_BOOT_FILE.
Reported-by: Christian I. Nilsson <nikize@gmail.com> Signed-off-by: Michael Brown <mcb30@ipxe.org>
Michael Brown [Sun, 15 Sep 2024 01:07:45 +0000 (02:07 +0100)]
[linux] Allow a sysroot to be specified via SYSROOT=...
The cross-compiler will typically use the appropriate sysroot
directory automatically. This may not work for toolchains where a
single cross-compiler is used to produce output for multiple CPU
variants (e.g. 32-bit and 64-bit RISC-V).
Add a SYSROOT=... parameter that may be used to specify the relevant
sysroot directory, e.g.
make CROSS=riscv64-linux-gnu- SYSROOT=/usr/riscv32-linux-gnu/sys-root \
bin-riscv32-linux/tests.linux
Michael Brown [Sun, 15 Sep 2024 01:01:46 +0000 (02:01 +0100)]
[efi] Use standard va_args macros instead of VA_START() etc
The EDK2 header macros VA_START(), VA_ARG() etc produce build errors
on some CPU architectures (notably on 32-bit RISC-V, which is not yet
supported by EDK2).
Fix by using the standard variable argument list macros.
Michael Brown [Fri, 13 Sep 2024 13:26:34 +0000 (14:26 +0100)]
[efi] Centralise definition of efi_cpu_nap()
Define a cpu_halt() function which is architecture-specific but
platform-independent, and merge the multiple architecture-specific
implementations of the EFI cpu_nap() function into a single central
efi_cpu_nap() that uses cpu_halt() if applicable.
Michael Brown [Thu, 12 Sep 2024 13:17:20 +0000 (14:17 +0100)]
[libc] Centralise architecture-independent portions of setjmp.h
The definitions of the setjmp() and longjmp() functions are common to
all architectures, with only the definition of the jump buffer
structure being architecture-specific.
Move the architecture-specific portions to bits/setjmp.h and provide a
common setjmp.h for the function definitions.
Michael Brown [Fri, 6 Sep 2024 14:09:12 +0000 (15:09 +0100)]
[cloud] Add family and architecture tags to AWS snapshots and images
Allow for easier identification of images and snapshots created by the
aws-import script by adding tags for image family (e.g. "iPXE") and
architecture (e.g. "x86_64") to both.
Michael Brown [Thu, 5 Sep 2024 13:01:19 +0000 (14:01 +0100)]
[ena] Change reported operating system type to "iPXE"
As described in commit 3b81a4e ("[ena] Provide a host information
page"), we currently report an operating system type of "Linux" in
order to work around broken versions of the ENA firmware that will
fail to create a completion queue if we report the correct operating
system type.
As of September 2024, the ENA team at AWS assures us that the entire
AWS fleet has been upgraded to fix this bug, and that we are now safe
to report the correct operating system type value in the "type" field
of struct ena_host_info.
The ENA team has also clarified that at least some deployed versions
of the ENA firmware still have the defect that requires us to report
an operating system version number of 2 (regardless of operating
system type), and so we continue to report ENA_HOST_INFO_VERSION_WTF
in the "version" field of struct ena_host_info.
Add an explicit warning on the previous known failure path, in case
some deployed versions of the ENA firmware turn out to not have been
upgraded as expected.
Michael Brown [Wed, 4 Sep 2024 12:29:30 +0000 (13:29 +0100)]
[gdb] Allow CPU architectures to omit support for GDB
Move the <gdbmach.h> file to <bits/gdbmach.h>, and provide a common
dummy implementation for all architectures that have not yet
implemented support for GDB.
Michael Brown [Mon, 2 Sep 2024 11:24:57 +0000 (12:24 +0100)]
[etherfabric] Fix use of uninitialised variable in falcon_xaui_link_ok()
The link status check in falcon_xaui_link_ok() reads from the
FCN_XX_CORE_STAT_REG_MAC register only on production hardware (where
the FPGA version reads as zero), but modifies the value and writes
back to this register unconditionally. This triggers an uninitialised
variable warning on newer versions of gcc.
Fix by assuming that the register exists only on production hardware,
and so moving the "modify-write" portion of the "read-modify-write"
operation to also be covered by the same conditional check.
Michael Brown [Thu, 29 Aug 2024 13:00:34 +0000 (14:00 +0100)]
[image] Add the "imgdecrypt" command
Add the "imgdecrypt" command that can be used to decrypt a detached
encrypted data image using a cipher key obtained from a separate CMS
envelope image. For example:
Michael Brown [Fri, 23 Aug 2024 11:28:21 +0000 (12:28 +0100)]
[crypto] Support decryption of images via CMS envelopes
Add support for decrypting images containing detached encrypted data
using a cipher key obtained from a separate CMS envelope image (in DER
or PEM format).
Michael Brown [Mon, 26 Aug 2024 14:15:09 +0000 (15:15 +0100)]
[test] Update CMS self-test terminology
Generalise CMS self-test data structure and macro names to refer to
"messages" rather than "signatures", in preparation for adding image
decryption tests.
Michael Brown [Mon, 26 Aug 2024 22:36:06 +0000 (23:36 +0100)]
[crypto] Allow for extraction of ASN.1 algorithm parameters
Some ASN.1 OID-identified algorithms require additional parameters,
such as an initialisation vector for a block cipher. The structure of
the parameters is defined by the individual algorithm.
Extend asn1_algorithm() to allow these additional parameters to be
returned via a separate ASN.1 cursor.
Michael Brown [Fri, 23 Aug 2024 11:25:36 +0000 (12:25 +0100)]
[crypto] Hold CMS message as a single ASN.1 object
Reduce the number of dynamic allocations required to parse a CMS
message by retaining the ASN.1 cursor returned from image_asn1() for
the lifetime of the CMS message. This allows embedded ASN.1 cursors
to be used for parsed objects within the message, such as embedded
signatures.
Michael Brown [Wed, 21 Aug 2024 15:25:10 +0000 (16:25 +0100)]
[crypto] Remove the concept of a public-key algorithm reusable context
Instances of cipher and digest algorithms tend to get called
repeatedly to process substantial amounts of data. This is not true
for public-key algorithms, which tend to get called only once or twice
for a given key.
Simplify the public-key algorithm API so that there is no reusable
algorithm context. In particular, this allows callers to omit the
error handling currently required to handle memory allocation (or key
parsing) errors from pubkey_init(), and to omit the cleanup calls to
pubkey_final().
This change does remove the ability for a caller to distinguish
between a verification failure due to a memory allocation failure and
a verification failure due to a bad signature. This difference is not
material in practice: in both cases, for whatever reason, the caller
was unable to verify the signature and so cannot proceed further, and
the cause of the error will be visible to the user via the return
status code.
Michael Brown [Wed, 21 Aug 2024 11:15:24 +0000 (12:15 +0100)]
[tls] Group client and server state in TLS connection structure
The TLS connection structure has grown to become unmanageably large as
new features and support for new TLS protocol versions have been added
over time.
Split out the portions of struct tls_connection that are specific to
client and server operations into separate structures, and simplify
some structure field names.
Michael Brown [Wed, 21 Aug 2024 10:45:36 +0000 (11:45 +0100)]
[tls] Group transmit and receive state in TLS connection structure
The TLS connection structure has grown to become unmanageably large as
new features and support for new TLS protocol versions have been added
over time.
Split out the portions of struct tls_connection that are specific to
transmit and receive operations into separate structures, and simplify
some structure field names.
Michael Brown [Tue, 20 Aug 2024 09:11:24 +0000 (10:11 +0100)]
[contrib] Remove obsolete rom-o-matic code
The rom-o-matic code does not form part of the iPXE codebase, has not
been maintained for over a decade, and does not appear to still be in
use anywhere in the world.
It does, however, result in a large number of false positive security
vulnerability reports from some low quality automated code analysis
tools such as Fortify SCA.
Remove this unused and obsolete code to reduce the burden of
responding to these false positives.
Michael Brown [Sun, 18 Aug 2024 19:01:08 +0000 (20:01 +0100)]
[test] Generalise public-key algorithm tests and use okx()
Generalise the existing support for performing RSA public-key
encryption, decryption, signature, and verification tests, and update
the code to use okx() for neater reporting of test results.
Michael Brown [Sun, 18 Aug 2024 09:43:52 +0000 (10:43 +0100)]
[crypto] Pass asymmetric keys as ASN.1 cursors
Asymmetric keys are invariably encountered within ASN.1 structures
such as X.509 certificates, and the various large integers within an
RSA key are themselves encoded using ASN.1.
Simplify all code handling asymmetric keys by passing keys as a single
ASN.1 cursor, rather than separate data and length pointers.
Michael Brown [Wed, 14 Aug 2024 22:40:50 +0000 (23:40 +0100)]
[efi] Allow discovery of PCI bus:dev.fn address ranges
Generalise the logic for identifying the matching PCI root bridge I/O
protocol to allow for identifying the closest matching PCI bus:dev.fn
address range, and use this to provide PCI address range discovery
(while continuing to inhibit automatic PCI bus probing).
This allows the "pciscan" command to work as expected under UEFI.
Michael Brown [Thu, 15 Aug 2024 07:46:41 +0000 (08:46 +0100)]
[pci] Separate permission to probe buses from bus:dev.fn range discovery
The UEFI device model requires us to not probe the PCI bus directly,
but instead to wait to be offered the opportunity to drive devices via
our driver service binding handle.
We currently inhibit PCI bus probing by having pci_discover() return
an empty range when using the EFI PCI I/O API. This has the unwanted
side effect that scanning the bus manually using the "pciscan" command
will also fail to discover any devices.
Separate out the concept of being allowed to probe PCI buses from the
mechanism for discovering PCI bus:dev.fn address ranges, so that this
limitation may be removed.
Michael Brown [Wed, 14 Aug 2024 13:00:48 +0000 (14:00 +0100)]
[crypto] Fix debug name for empty certificate chain validators
An attempt to use a validator for an empty certificate chain will
correctly fail the overall validation with the "empty certificate
chain" error propagated from x509_auto_append().
In a debug build, the call to validator_name() will attempt to call
x509_name() on a non-existent certificate, resulting in garbage in the
debug message.
Fix by checking for the special case of an empty certificate chain.
This issue does not affect non-debug builds, since validator_name() is
(as per its description) called only for debug messages.
Michael Brown [Mon, 12 Aug 2024 11:36:41 +0000 (12:36 +0100)]
[crypto] Generalise cms_signature to cms_message
There is some exploitable similarity between the data structures used
for representing CMS signatures and CMS encryption keys. In both
cases, the CMS message fundamentally encodes a list of participants
(either message signers or message recipients), where each participant
has an associated certificate and an opaque octet string representing
the signature or encrypted cipher key. The ASN.1 structures are not
identical, but are sufficiently similar to be worth exploiting: for
example, the SignerIdentifier and RecipientIdentifier data structures
are defined identically.
Rename data structures and functions, and add the concept of a CMS
message type.
Michael Brown [Fri, 9 Aug 2024 15:33:51 +0000 (16:33 +0100)]
[crypto] Pass image as parameter to CMS functions
The cms_signature() and cms_verify() functions currently accept raw
data pointers. This will not be possible for cms_decrypt(), which
will need the ability to extract fragments of ASN.1 data from a
potentially large image.
Change cms_signature() and cms_verify() to accept an image as an input
parameter, and move the responsibility for setting the image trust
flag within cms_verify() since that now becomes a more natural fit.
Michael Brown [Tue, 13 Aug 2024 11:25:25 +0000 (12:25 +0100)]
[crypto] Allow passing a NULL certificate store to x509_find() et al
Allow passing a NULL value for the certificate list to all functions
used for identifying an X.509 certificate from an existing set of
certificates, and rename function parameters to indicate that this
certificate list represents an unordered certificate store (rather
than an ordered certificate chain).
Michael Brown [Mon, 12 Aug 2024 11:26:52 +0000 (12:26 +0100)]
[crypto] Centralise mechanisms for identifying X.509 certificates
Centralise all current mechanisms for identifying an X.509 certificate
(by raw content, by subject, by issuer and serial number, and by
matching public key), and remove the certstore-specific and
CMS-specific variants of these functions.
Michael Brown [Wed, 7 Aug 2024 12:18:47 +0000 (13:18 +0100)]
[crypto] Extend asn1_enter() to handle partial object cursors
Handling large ASN.1 objects such as encrypted CMS files will require
the ability to use the asn1_enter() and asn1_skip() family of
functions on partial object cursors, where a defined additional length
is known to exist after the end of the data buffer pointed to by the
ASN.1 object cursor.
We already have support for partial object cursors in the underlying
asn1_start() operation used by both asn1_enter() and asn1_skip(), and
this is used by the DER image probe routine to check that the
potential DER file comprises a single ASN.1 SEQUENCE object.
Add asn1_enter_partial() to formalise the process of entering an ASN.1
partial object, and refactor the DER image probe routine to use this
instead of open-coding calls to the underlying asn1_start() operation.
There is no need for an equivalent asn1_skip_partial() function, since
only objects that are wholly contained within the partial cursor may
be successfully skipped.
Calling asn1_skip_if_exists() on a malformed ASN.1 object may
currently leave the cursor in a partially-updated state, where the tag
byte and one of the length bytes have been stripped. The cursor is
left with a valid data pointer and length and so no out-of-bounds
access can arise, but the cursor no longer points to the start of an
ASN.1 object.
Ensure that each ASN.1 cursor manipulation code path leads to the
cursor being either fully updated, left unmodified, or invalidated,
and update the function descriptions to reflect this.
Michael Brown [Tue, 6 Aug 2024 12:56:16 +0000 (13:56 +0100)]
[crypto] Do not return an error when skipping the final ASN.1 object
Successfully reaching the end of a well-formed ASN.1 object list is
arguably not an error, but the current code (dating back to the
original ASN.1 commit in 2007) will explicitly check for and report
this as an error condition.
Remove the explicit check for reaching the end of a well-formed ASN.1
object list, and instead return success along with a zero-length (and
hence implicitly invalidated) cursor.
Almost every existing caller of asn1_skip() or asn1_skip_if_exists()
currently ignores the return value anyway. Skipped objects are (by
definition) not of interest to the caller, and the invalidation
behaviour of asn1_skip() ensures that any errors will be safely caught
on a subsequent attempt to actually use the ASN.1 object content.
Since these existing callers ignore the return value, they cannot be
affected by this change.
There is one existing caller of asn1_skip_if_exists() that does check
the return value: in asn1_skip() itself, an error returned from
asn1_skip_if_exists() will cause the cursor to be invalidated. In the
case of an error indicating only that the cursor length is already
zero, invalidation is a no-op, and so this change affects only the
return value propagated from asn1_skip().
This leaves only a single call site within ocsp_request() where the
return value from asn1_skip() is currently checked. The return status
here is moot since there is no way for the code in question to fail
(absent a bug in the ASN.1 construction or parsing code).
There are therefore no callers of asn1_skip() or asn1_skip_if_exists()
that rely on an error being returned for successfully reaching the end
of a well-formed ASN.1 object list. Simplify the code by redefining
this as a successful outcome.
Michael Brown [Wed, 31 Jul 2024 15:43:27 +0000 (16:43 +0100)]
[cpuid] Allow hypervisor CPUID leaves to be accessed as settings
Redefine bit 30 of an SMBIOS numerical setting to be part of the
function number, in order to allow access to hypervisor CPUID leaves.
This technically breaks backwards compatibility with scripts
attempting to read more than 64 consecutive functions. Since there is
no meaningful block of 64 consecutive related functions, it is
vanishingly unlikely that this capability has ever been used.
Michael Brown [Wed, 31 Jul 2024 15:35:31 +0000 (16:35 +0100)]
[cpuid] Allow reading hypervisor CPUID leaves
Hypervisors typically intercept CPUID leaves in the range 0x40000000
to 0x400000ff, with leaf 0x40000000 returning the maximum supported
function within this range in register %eax.
iPXE currently masks off bit 30 from the requested CPUID leaf when
checking to see if a function is supported, which causes this check to
read from leaf 0x00000000 instead of 0x40000000.
Michael Brown [Wed, 31 Jul 2024 15:26:48 +0000 (16:26 +0100)]
[smbios] Allow reading an entire SMBIOS data structure as a setting
The general syntax for SMBIOS settings:
smbios/<instance>.<type>.<offset>.<length>
is currently extended such that a <length> of zero indicates that the
byte at <offset> contains a string index, and an <offset> of zero
indicates that the <length> contains a literal string index.
Since the byte at offset zero can never contain a string index, and a
literal string index can never have a zero value, the combination of
both <length> and <offset> being zero is currently invalid and will
always return "not found".
Extend the syntax such that the combination of both <length> and
<offset> being zero may be used to read the entire data structure.
Michael Brown [Tue, 30 Jul 2024 14:40:11 +0000 (15:40 +0100)]
[cloud] Add utility to read INT13CON partition in Google Compute Engine
Following the example of aws-int13con, add a utility that can be used
to read the INT13 console log from a used iPXE boot disk in Google
Compute Engine.
There seems to be no easy way to directly read the contents of either
a disk image or a snapshot in Google Cloud. Work around this
limitation by creating a snapshot and attaching this snapshot as a
data disk to a temporary Linux instance, which is then used to echo
the INT13 console log to the serial port.
Michael Brown [Wed, 24 Jul 2024 23:10:38 +0000 (00:10 +0100)]
[gve] Increase number of receive buffers to reduce packet loss
Experiments suggest that using fewer than 64 receive buffers leads to
excessive packet drop rates on some instance types (observed with a
c3-standard-4 instance in europe-west4-a).
Fix by increasing the number of receive data buffers (and adjusting
the length of the registrable queue page address list to match).
Michael Brown [Wed, 24 Jul 2024 13:30:58 +0000 (14:30 +0100)]
[gve] Add driver for Google Virtual Ethernet NIC
The Google Virtual Ethernet NIC (GVE or gVNIC) is found only in Google
Cloud instances. There is essentially zero documentation available
beyond the mostly uncommented source code in the Linux kernel.
Michael Brown [Fri, 5 Jul 2024 22:54:50 +0000 (23:54 +0100)]
[cloud] Add utility for importing images to Google Compute Engine
Following the example of aws-import, add a utility that can be used to
upload an iPXE disk image to Google Compute Engine as a bootable
image. For example:
make CONFIG=cloud EMBED=config/cloud/gce.ipxe \
bin-x86_64-pcbios/ipxe.usb bin-x86_64-efi/ipxe.usb
make CONFIG=cloud EMBED=config/cloud/gce.ipxe \
CROSS=aarch64-linux-gnu- bin-arm64-efi/ipxe.usb
The iPXE disk image is automatically wrapped into a tarball containing
a single file named "disk.raw", uploaded to a temporary bucket in
Google Cloud Storage, and used to create a bootable image. The
temporary bucket is deleted after use.
An appropriate image family name is identified automatically: "ipxe"
for BIOS images, "ipxe-uefi-x86-64" for x86_64 UEFI images, and
"ipxe-uefi-arm64" for AArch64 UEFI images. This allows the latest
image within each family to be launched within needing to know the
precise image name.
Google Compute Engine images are globally scoped and are available
(and cached upon first use) in all regions. The initial placement of
the image may be controlled indirectly by using the "--location"
option to specify the Google Cloud Storage location used for the
temporary upload bucket: the image will then be created in the closest
multi-region to the storage location.
Michael Brown [Thu, 27 Jun 2024 12:26:39 +0000 (13:26 +0100)]
[ipv6] Expose router address for DHCPv6 leased addresses
The DHCPv6 protocol does not itself provide a router address or a
prefix length. This information is instead obtained from the router
advertisements.
Our IPv6 minirouting table construction logic will first construct an
entry for each advertised prefix, and later update the entry to
include an address assigned within that prefix via stateful DHCPv6 (if
applicable).
This logic fails if the address assigned via stateful DHCPv6 does not
fall within any of the advertised prefixes (e.g. if the network is
configured to use DHCPv6-assigned /128 addresses with no advertised
on-link prefixes). We will currently treat this situation as
equivalent to having a manually assigned address with no corresponding
router address or prefix length: the routing table entry will use the
default /64 prefix length and will not include the router address.
DHCPv6 is triggered only in response to a router advertisement with
the "Managed Address Configuration (M)" or "Other Configuration (O)"
flags set, and a router address is therefore available at the point
that we initiate DHCPv6.
Record the router address when initiating DHCPv6, and expose this
router address as part of the DHCPv6 settings block. This allows the
routing table entry for any address assigned via stateful DHCPv6 to
correctly include the router address, even if the assigned address
does not fall within an advertised prefix.
Also provide a fixed /128 prefix length as part of the DHCPv6 settings
block. When an address assigned via stateful DHCPv6 does not fall
within an advertised prefix, this will cause the routing table entry
to have a /128 prefix length as expected. (When such an address does
fall within an advertised prefix, it will continue to use the
advertised prefix length.)
Originally-fixed-by: Guvenc Gulce <guevenc.guelce@sap.com> Signed-off-by: Michael Brown <mcb30@ipxe.org>
Michael Brown [Wed, 26 Jun 2024 11:29:38 +0000 (04:29 -0700)]
[ipv4] Support small subnets with no directed broadcast address
In a small subnet (with a /31 or /32 subnet mask), all addresses
within the subnet are valid host addresses: there is no separate
network address or directed broadcast address.
The logic used in iPXE to determine whether or not to use a link-layer
broadcast address will currently fail in these subnets. In a /31
subnet, the higher of the two host addresses (i.e. the address with
all host bits set) will be treated as a broadcast address. In a /32
subnet, the single valid host address will be treated as a broadcast
address.
Fix by adding the concept of a host mask, defined such that an address
in the local subnet with all of the mask bits set to zero represents
the network address, and an address in the local subnet with all of
the mask bits set to one represents the directed broadcast address.
For most subnets, this is simply the inverse of the subnet mask. For
small subnets (/31 or /32) we can obtain the desired behaviour by
setting the host mask to all ones, so that only the local broadcast
address 255.255.255.255 will be treated as a broadcast address.
Originally-fixed-by: Lukas Stockner <lstockner@genesiscloud.com> Signed-off-by: Michael Brown <mcb30@ipxe.org>
Michael Brown [Thu, 20 Jun 2024 22:48:59 +0000 (15:48 -0700)]
[hci] Remove the generalised widget user interface abstraction
Remove the now-unused generalised text widget user interface, along
with the associated concept of a widget set and the implementation of
a read-only label widget.
Michael Brown [Fri, 31 May 2024 09:10:53 +0000 (10:10 +0100)]
[form] Add support for dynamically created interactive forms
Add support for presenting a dynamic user interface as an interactive
form, alongside the existing support for presenting a dynamic user
interface as a menu.
An interactive form may be used to allow a user to input (or edit)
values for multiple settings on a single screen, as a user-friendly
alternative to prompting for setting values via the "read" command.
In the present implementation, all input fields must fit on a single
screen (with no scrolling), and the only supported widget type is an
editable text box.
Michael Brown [Thu, 20 Jun 2024 21:24:53 +0000 (14:24 -0700)]
[dynui] Generalise the concept of a menu to a dynamic user interface
We currently have an abstract model of a dynamic menu as a list of
items, each of which has a name, a description, and assorted metadata
such as a shortcut key. The "menu" and "item" commands construct
representations in this abstract model, and the "choose" command then
presents the items as a single-choice menu, with the selected item's
name used as the output value.
This same abstraction may be used to model a dynamic form as a list of
editable items, each of which has a corresponding setting name, an
optional description label, and assorted metadata such as a shortcut
key. By defining a "form" command as an alias for the "menu" command,
we could construct and present forms using commands such as:
#!ipxe
form Login to ${url}
item username Username or email address
item --secret password Password
present
or
#!ipxe
form Configure IPv4 networking for ${netX/ifname}
item netX/ip IPv4 address
item netX/netmask Subnet mask
item netX/gateway Gateway address
item netX/dns DNS server address
present
Reusing the same abstract model for both menus and forms allows us to
minimise the increase in code size, since the implementation of the
"form" and "item" commands is essentially zero-cost.
Rename everything within the abstract data model from "menu" to
"dynamic user interface" to reflect this generalisation.
Michael Brown [Thu, 20 Jun 2024 19:21:57 +0000 (12:21 -0700)]
[hci] Allow tab key to be used to cycle through UI elements
Add support for wraparound scrolling and allow the tab key to be used
to move forward through a list of elements, wrapping back around to
the beginning of the list on overflow.
This is mildly useful for a menu, and likely to be a strong user
expectation for an interactive form.
Michael Brown [Tue, 18 Jun 2024 22:17:03 +0000 (15:17 -0700)]
[hci] Rename "item" command's first parameter from "label" to "name"
Switch terminology for the "item" command from "item <label> <text>"
to "item <name> <text>", in preparation for repurposing the "item"
command to cover interactive forms as well as menus.
Since this renaming affects only a positional parameter, it does not
break compatibility with any existing scripts.
Michael Brown [Fri, 14 Jun 2024 10:47:55 +0000 (11:47 +0100)]
[hci] Draw all widgets on the standard screen
The curses concept of a window has been supported but never actively
used in iPXE since the mucurses library was first implemented in 2006.
Simplify the code by removing the ability to place a widget set in a
specified window, and instead use the standard screen for all drawing
operations.
This simplification allows the widget set parameter to be omitted for
the draw_widget() and edit_widget() operations, since the only reason
for its inclusion was to provide access to the specified window.
Michael Brown [Wed, 15 May 2024 13:10:33 +0000 (14:10 +0100)]
[hci] Provide a general concept of a text widget set
Create a generic abstraction of a text widget, refactor the existing
editable text box widget to use this abstraction, add an
implementation of a non-editable text label widget, and generalise the
login user interface to use this generic widget abstraction.
Michael Brown [Wed, 17 Apr 2024 14:45:36 +0000 (15:45 +0100)]
[hci] Fix semantics of replace_string() to match code comments
The comments for replace_string() state that a successful return
status guarantees that the dynamically allocated string pointer is no
longer NULL (even if it was initially NULL and the replacement string
is NULL or empty). This is relied upon by readline() to guarantee
that it will always return a non-NULL string if successful.
The code behaviour does not currently match this comment: an empty
replacement string may result in a successful return status even if
the (single-byte) allocation fails.
Fix up the code behaviour to match the comments, and to additionally
ensure that the edit history is filled in even in the event of an
allocation failure.
Michael Brown [Tue, 16 Apr 2024 13:19:01 +0000 (14:19 +0100)]
[efi] Veto the Dhcp6Dxe driver on all platforms
The reference implementation of Dhcp6Dxe in EDK2 has a fatal flaw: the
code in EfiDhcp6Stop() will poll the network in a tight loop until
either a response is received or a timer tick (at TPL_CALLBACK)
occurs. When EfiDhcp6Stop() is called at TPL_CALLBACK or higher, this
will result in an endless loop and an apparently frozen system.
Since this is the reference implementation of Dhcp6Dxe, it is likely
that almost all platforms have the same problem.
Fix by vetoing the broken driver. If the upstream driver is ever
fixed and a new version number issued, then we could plausibly test
against the version number exposed via the driver binding protocol.
Michael Brown [Mon, 15 Apr 2024 14:59:49 +0000 (15:59 +0100)]
[hci] Use dynamically allocated buffers for editable strings
Editable strings currently require a fixed-size buffer, which is
inelegant and limits the potential for creating interactive forms with
a variable number of edit box widgets.
Remove this limitation by switching to using a dynamically allocated
buffer for editable strings and edit box widgets.
Michael Brown [Mon, 15 Apr 2024 13:28:38 +0000 (14:28 +0100)]
[efi] Do not attempt to download autoexec.ipxe without a valid base URI
If we do not have a current working URI (after applying the EFI device
path settings and any cached DHCP settings), then an attempt to
download autoexec.ipxe will fail since there is no base URI from which
to resolve the full autoexec.ipxe URI.
Avoid this potentially confusing error message by attempting the
download only if we have successfully obtained a current working URI.
Pavel Krotkiy [Wed, 3 Apr 2024 11:52:56 +0000 (12:52 +0100)]
[netdevice] Add "linktype" setting
Add a new setting to provide access to the link layer protocol type
from scripts. This can be useful in order to skip configuring
interfaces based on their link layer protocol or, conversely,
configure only selected interface types (Ethernet, IPoIB, etc.)
Michael Brown [Fri, 29 Mar 2024 13:03:38 +0000 (13:03 +0000)]
[efi] Restructure handling of autoexec.ipxe script
We currently attempt to obtain the autoexec.ipxe script via early use
of the EFI_SIMPLE_FILE_SYSTEM_PROTOCOL or EFI_PXE_BASE_CODE_PROTOCOL
interfaces to obtain an opaque block of memory, which is then
registered as an image at an appropriate point during our startup
sequence. The early use of these existent interfaces allows us to
obtain the script even if our subsequent actions (e.g. disconnecting
drivers in order to connect up our own) may cause the script to become
inaccessible.
This mirrors the approach used under BIOS, where the autoexec.ipxe
script is provided by the prefix (e.g. as an initrd image when using
the .lkrn build of iPXE) and so must be copied into a normally
allocated image from wherever it happens to previously exist in
memory.
We do not currently have support for downloading an autoexec.ipxe
script if we were ourselves downloaded via UEFI HTTP boot.
There is an EFI_HTTP_PROTOCOL defined within the UEFI specification,
but it is so poorly designed as to be unusable for the simple purpose
of downloading an additional file from the same directory. It
provides almost nothing more than a very slim wrapper around
EFI_TCP4_PROTOCOL (or EFI_TCP6_PROTOCOL). It will not handle
redirection, content encoding, retries, or even fundamentals such as
the Content-Length header, leaving all of this up to the caller.
The UEFI HTTP Boot driver will install an EFI_LOAD_FILE_PROTOCOL
instance on the loaded image's device handle. This looks promising at
first since it provides the LoadFile() API call which is specified to
accept an arbitrary filename parameter. However, experimentation (and
inspection of the code in EDK2) reveals a multitude of problems that
prevent this from being usable. Calling LoadFile() will idiotically
restart the entire DHCP process (and potentially pop up a UI requiring
input from the user for e.g. a wireless network password). The
filename provided to LoadFile() will be ignored. Any downloaded file
will be rejected unless it happens to match one of the limited set of
types expected by the UEFI HTTP Boot driver. The list of design
failures and conceptual mismatches is fairly impressive.
Choose to bypass every possible aspect of UEFI HTTP support, and
instead use our own HTTP client and network stack to download the
autoexec.ipxe script over a temporary MNP network device. Since this
approach works for TFTP as well as HTTP, drop the direct use of
EFI_PXE_BASE_CODE_PROTOCOL. For consistency and simplicity, also drop
the direct use of EFI_SIMPLE_FILE_SYSTEM_PROTOCOL and rely upon our
existing support to access local files via "file:" URIs.
This approach results in console output during the "iPXE initialising
devices...ok" message that appears while startup is in progress.
Remove the trailing "ok" so that this intermediate output appears at a
sensible location on the screen. The welcome banner that will be
printed immediately afterwards provides an indication that startup has
completed successfully even absent the explicit "ok".
Michael Brown [Tue, 2 Apr 2024 18:36:00 +0000 (19:36 +0100)]
[cachedhcp] Allow cached DHCPACK to apply to temporary network devices
Retain a reference to the cached DHCPACK until the late startup phase,
and allow it to be recycled for reuse. This allows the cached DHCPACK
to be used for a temporary MNP network device and then subsequently
reused for the corresponding real network device.