Michael Brown [Mon, 4 Mar 2024 14:21:59 +0000 (14:21 +0000)]
[efi] Generalise block device boot to support arbitrary EFI handles
SAN devices created by iPXE are visible to the firmware, and may be
accessed using the firmware's standard block I/O device interface
(e.g. INT 13 for BIOS, or EFI_BLOCK_IO_PROTOCOL for UEFI). The iPXE
code to perform a SAN boot acts as a client of this standard block I/O
device interface, even when the underlying block I/O is being
performed by iPXE itself.
We rely on this separation to allow the "sanboot" command to be used
to boot from a local disk: since the code to perform a SAN boot does
not need direct access to an underlying iPXE SAN device, it may be
used to boot from any device providing the firmware's standard block
I/O device interface.
Clean up the EFI SAN boot code to require only a drive number and an
EFI_BLOCK_IO_PROTOCOL handle, in preparation for adding support for
booting from a local disk under UEFI.
Michael Brown [Sun, 3 Mar 2024 15:25:11 +0000 (15:25 +0000)]
[efi] Use file system protocol to check for SAN boot filename existence
The "sanboot" command allows a custom boot filename to be specified
via the "--filename" option. We currently rely on LoadImage() to
perform both the existence check and to load the image ready for
execution. This may give a false negative result if Secure Boot is
enabled and the boot file is not correctly signed.
Carry out the existence check using EFI_SIMPLE_FILE_SYSTEM_PROTOCOL
separately from loading the image via LoadImage().
Michael Brown [Mon, 4 Mar 2024 12:50:25 +0000 (12:50 +0000)]
[block] Use drive number as debug message stream ID
We currently use the SAN device pointer as the debug message stream
identifier. This pointer is not always available: for example, when
booting from a local disk there is no underlying SAN device.
Switch to using the drive number as the debug message colour stream
identifier, so that all block device debug messages may be colourised
consistently.
Michael Brown [Mon, 4 Mar 2024 12:08:01 +0000 (12:08 +0000)]
[efi] Use long forms of device paths in debug messages
We currently call ConvertDevicePathToText() with DisplayOnly=TRUE when
constructing a device path to appear within a debug message. For
ATAPI device paths, this will unfortunately omit some key information:
the textual representation will not indicate which ATA bus or drive is
represented. This can lead to misleading debug messages that appear
to refer to identical devices.
Fix by setting DisplayOnly=FALSE to select the long form of device
path textual representations.
Michael Brown [Thu, 29 Feb 2024 14:04:47 +0000 (14:04 +0000)]
[settings] Add parsing for UUID and GUID settings types
The ":uuid" and ":guid" settings types are currently format-only: it
is possible to format a setting as a UUID (via e.g. "show foo:uuid")
but it is not currently possible to parse a string into a UUID setting
(via e.g. "set foo:uuid 406343fe-998b-44be-8a28-44ca38cb202b").
Use uuid_aton() to implement parsing of these settings types, and add
appropriate test cases for both.
Michael Brown [Thu, 29 Feb 2024 13:58:50 +0000 (13:58 +0000)]
[uuid] Add uuid_aton() to parse a UUID from a string
Add uuid_aton() to parse a UUID value from a string (analogous to
inet_aton(), inet6_aton(), sock_aton(), etc), treating it as a
32-digit hex string with optional hyphen separators. The placement of
the separators is not checked: each byte within the hex string may be
separated by a hyphen, or not separated at all.
Add dedicated self-tests for UUID parsing and formatting (already
partially covered by the ":uuid" and ":guid" settings self-tests).
Michael Brown [Tue, 27 Feb 2024 13:34:17 +0000 (13:34 +0000)]
[efi] Work around broken boot services table manipulation by UEFI shim
The UEFI shim installs wrappers around several boot services functions
before invoking its next stage bootloader, in an attempt to enforce
its desired behaviour upon the aforementioned bootloader. For
example, shim checks that the bootloader has either invoked
StartImage() or has called into the "shim lock protocol" before
allowing an ExitBootServices() call to proceed.
When invoking a shim, iPXE will also install boot services function
wrappers in order to work around assorted bugs in the UEFI shim code
that would otherwise prevent it from being used to boot a kernel. For
details on these workarounds, see commits 28184b7 ("[efi] Add support
for executing images via a shim") and 5b43181 ("[efi] Support versions
of shim that perform SBAT verification").
Using boot services function wrappers in this way is not intrinsically
problematic, provided that wrappers are installed before starting the
wrapped program, and uninstalled only after the wrapped program exits.
This strict ordering requirement ensures that all layers of wrappers
are called in the expected order, and that no calls are issued through
a no-longer-valid function pointer.
Unfortunately, the UEFI shim does not respect this strict ordering
requirement, and will instead uninstall (and reinstall) its wrappers
midway through the execution of the wrapped program. This leaves the
wrapped program with an inconsistent view of the boot services table,
leading to incorrect behaviour.
This results in a boot failure when a first shim is used to boot iPXE,
which then uses a second shim to boot a Linux kernel:
- First shim installs StartImage() and ExitBootServices() wrappers
- First shim invokes iPXE via its own PE loader
- iPXE installs ExitBootServices() wrapper
- iPXE invokes second shim via StartImage()
At this point, the first shim's StartImage() wrapper will illegally
uninstall its ExitBootServices() wrapper, without first checking that
nothing else has modified the ExitBootServices function pointer. This
effectively bypasses iPXE's own ExitBootServices() wrapper, which
causes a boot failure since the code within that wrapper does not get
called.
A proper fix would be for shim to install its wrappers before starting
the image and uninstall its wrappers only after the started image has
exited. Instead of repeatedly uninstalling and reinstalling its
wrappers while the wrapped program is running, shim should simply use
a flag to keep track of whether or not it needs to modify the
behaviour of the wrapped calls.
Experience shows that there is unfortunately no point in trying to get
a fix for this upstreamed into shim. We therefore work around the
shim bug by removing our ExitBootServices() wrapper and moving the
relevant code into our GetMemoryMap() wrapper.
Michael Brown [Fri, 23 Feb 2024 14:15:22 +0000 (14:15 +0000)]
[eap] Add support for the MS-CHAPv2 authentication method
Add support for EAP-MSCHAPv2 (note that this is not the same as
PEAP-MSCHAPv2), controllable via the build configuration option
EAP_METHOD_MSCHAPV2 in config/general.h.
Our model for EAP does not encompass mutual authentication: we will
starting sending plaintext packets (e.g. DHCP requests) over the link
even before EAP completes, and our only use for an EAP success is to
mark the link as unblocked.
We therefore ignore the content of the EAP-MSCHAPv2 success request
(containing the MS-CHAPv2 authenticator response) and just send back
an EAP-MSCHAPv2 success response, so that the EAP authenticator will
complete the process and send through the real EAP success packet
(which will, in turn, cause us to unblock the link).
Michael Brown [Fri, 23 Feb 2024 12:33:57 +0000 (12:33 +0000)]
[eap] Allow MD5-Challenge authentication method to be disabled
RFC 3748 states that implementations must support the MD5-Challenge
method. However, some network environments may wish to disable it as
a matter of policy.
Allow support for MD5-Challenge to be controllable via the build
configuration option EAP_METHOD_MD5 in config/general.h.
Also relocation relaxations have been introduced. Recent GCC (13.2)
and binutils 2.41+ use these types of relocations, which confuses
elf2efi tool. As a result, iPXE EFI images for LoongArch fail to
build with the following error:
Unrecognised relocation type 103
Fix by ignoring R_LARCH_B{16,21} and R_LARCH_PCREL20_S2 (as with other
PC-relative relocations), and by ignoring relaxations (R_LARCH_RELAX).
Relocation relaxations are basically optimizations: ignoring them
results in a correct binary (although it might be suboptimal).
Modified-by: Michael Brown <mcb30@ipxe.org> Signed-off-by: Michael Brown <mcb30@ipxe.org>
Geert Stappers [Sun, 18 Feb 2024 11:29:59 +0000 (12:29 +0100)]
[drivers] Sort PCI_ROM() entries numerically
Done with the help of this Perl script:
$MARKER = 'PCI_ROM'; # a regex
$AB = 1; # At Begin
@HEAD = ();
@ITEMS = ();
@TAIL = ();
foreach $fn (@ARGV) {
open(IN, $fn) or die "Can't open file '$fn': $!\n";
while (<IN>) {
if (/$MARKER/) {
push @ITEMS, $_;
$AB = 0; # not anymore at begin
}
else {
if ($AB) {
push @HEAD, $_;
}
else {
push @TAIL, $_;
}
}
}
} continue {
close IN;
open(OUT, ">$fn") or die "Can't open file '$fn' for output: $!\n";
print OUT @HEAD;
print OUT sort @ITEMS;
print OUT @TAIL;
close OUT;
# For a next file
$AB = 1;
@HEAD = ();
@ITEMS = ();
@TAIL = ();
}
Executed that script while src/drivers/ as current working directory,
provided '$(grep -rl PCI_ROM)' as argument.
Michael Brown [Thu, 22 Feb 2024 12:55:59 +0000 (12:55 +0000)]
[crypto] Force inlining of trivial wrapper functions
Inspection of the generated assembly shows that gcc will often emit
standalone implementations of frequently invoked functions such as
digest_update(), which contain no logic and exist only as syntactic
sugar.
Force inlining of these functions to reduce the overall binary size.
Michael Brown [Wed, 21 Feb 2024 16:45:50 +0000 (16:45 +0000)]
[crypto] Add implementation of MS-CHAPv2 authentication
Add an implementation of the authentication portions of the MS-CHAPv2
algorithm as defined in RFC 2759, along with the single test vector
provided therein.
Michael Brown [Wed, 14 Feb 2024 16:02:43 +0000 (16:02 +0000)]
[crypto] Allow for multiple cross-signed certificate download attempts
Certificates issued by Let's Encrypt have two options for their chain
of trust: the chain can either terminate in the self-signed ISRG Root
X1 root certificate, or in an intermediate ISRG Root X1 certificate
that is signed in turn by the self-signed DST Root CA X3 root
certificate. This is a historical artifact: when Let's Encrypt first
launched as a project, the chain ending in DST Root CA X3 was used
since existing clients would not have recognised the ISRG Root X1
certificate as a trusted root certificate.
The DST Root CA X3 certificate expired in September 2021, and so is no
longer trusted by clients (such as iPXE) that validate the expiry
times of all certificates in the certificate chain.
In order to maintain usability of certificates on older Android
devices, the default certificate chain provided by Let's Encrypt still
terminates in DST Root CA X3, even though that certificate has now
expired. On newer devices which include ISRG Root X1 as a trusted
root certificate, the intermediate version of ISRG Root X1 in the
certificate chain is ignored and validation is performed as though the
chain had terminated in the self-signed ISRG Root X1 root certificate.
On older Android devices which do not include ISRG Root X1 as a
trusted root certificate, the validation succeeds since Android
chooses to ignore expiry times for root certificates and so continues
to trust the DST Root CA X3 root certificate.
This backwards compatibility hack unfortunately breaks the cross-
signing mechanism used by iPXE, which assumes that the certificate
chain will always terminate in a non-expired root certificate.
Generalise the validator's cross-signed certificate download mechanism
to walk up the certificate chain in the event of a failure, attempting
to find a replacement cross-signed certificate chain starting from the
next level up. This allows the validator to step over the expired
(and hence invalidatable) DST Root CA X3 certificate, and instead
download the cross-signed version of the ISRG Root X1 certificate.
This generalisation also gives us the ability to handle servers that
provide a full certificate chain including their root certificate:
iPXE will step over the untrusted public root certificate and attempt
to find a cross-signed version of it instead.
Michael Brown [Tue, 13 Feb 2024 16:27:31 +0000 (16:27 +0000)]
[crypto] Add x509_truncate() to truncate a certificate chain
Downloading a cross-signed certificate chain to partially replace
(rather than simply extend) an existing chain will require the ability
to discard all certificates after a specified link in the chain.
Extract the relevant logic from x509_free_chain() and expose it
separately as x509_truncate().
Michael Brown [Sat, 10 Feb 2024 14:41:29 +0000 (14:41 +0000)]
[build] Fix build failures with older versions of gcc
Some versions of gcc (observed with gcc 4.8.5 in CentOS 7) will report
spurious build_assert() failures for some assertions about structure
layouts. There is no clear pattern as to what causes these spurious
failures, and the build assertion does succeed in that no unresolvable
symbol reference is generated in the compiled code.
Adjust the assertions to work around these apparent compiler issues.
Michael Brown [Thu, 8 Feb 2024 16:39:35 +0000 (16:39 +0000)]
[libc] Allow build_assert() failures to be ignored via NO_WERROR=1
We build with -Werror by default so that any warning is treated as an
error and aborts the build. The build system allows NO_WERROR=1 to be
used to override this behaviour, in order to allow builds to succeed
when spurious warnings occur (e.g. when using a newer compiler that
includes checks for which the codebase is not yet prepared).
Some versions of gcc (observed with gcc 4.8.5 in CentOS 7) will report
spurious build_assert() failures: the compilation will fail due to an
allegedly unelided call to the build assertion's external function
declared with __attribute__((error)) even though the compiler does
manage to successfully elide the call (as verified by the fact that
there are no unresolvable symbol references in the compiler output).
Change build_assert() to declare __attribute__((warning)) instead of
__attribute__((error)) on its extern function. This will still abort
a normal build if the assertion fails, but may be overridden using
NO_WERROR=1 if necessary to work around a spurious assertion failure.
Note that if the build assertion has genuinely failed (i.e. if the
compiler has genuinely not been able to elide the call) then the
object will still contain an unresolvable symbol reference that will
cause the link to fail (which matches the behaviour of the old
linker_assert() mechanism).
Michael Brown [Wed, 7 Feb 2024 21:20:20 +0000 (21:20 +0000)]
[crypto] Add implementation of the DES cipher
The DES block cipher dates back to the 1970s. It is no longer
relevant for use in TLS cipher suites, but it is still used by the
MS-CHAPv2 authentication protocol which remains unfortunately common
for 802.1x port authentication.
Add an implementation of the DES block cipher, complete with the
extremely comprehensive test vectors published by NBS (the precursor
to NIST) in the form of an utterly adorable typewritten and hand-drawn
paper document.
Michael Brown [Wed, 7 Feb 2024 21:16:47 +0000 (21:16 +0000)]
[test] Remove dummy initialisation vector for ECB-mode AES tests
A block cipher in ECB mode has no concept of an initialisation vector,
and any data provided to cipher_setiv() for an ECB cipher will be
ignored. There is no requirement within our cipher algorithm
abstraction for a dummy initialisation vector to be provided.
Remove the entirely spurious dummy 16-byte initialisation vector from
the ECB test cases.
Michael Brown [Fri, 2 Feb 2024 17:09:06 +0000 (17:09 +0000)]
[crypto] Fix stray references to AES
The CBC_CIPHER() macro contains some accidentally hardcoded references
to an underlying AES cipher, instead of using the cipher specified in
the macro parameters.
Michael Brown [Wed, 31 Jan 2024 13:49:35 +0000 (13:49 +0000)]
[tls] Tidy up error handling flow in tls_send_plaintext()
Coverity reported that tls_send_plaintext() failed to check the return
status from tls_generate_random(), which could potentially result in
uninitialised random data being used as the block initialisation
vector (instead of intentionally random data).
Add the missing return status check, and separate out the error
handling code paths (since on the successful exit code path there will
be no need to free either the plaintext or the ciphertext anyway).
Ross Lagerwall [Tue, 30 Jan 2024 10:52:29 +0000 (10:52 +0000)]
[efi] Fix hang during ExitBootServices()
When ExitBootServices() invokes efi_shutdown_hook(), there may be
nothing to generate an interrupt since the timer is disabled in the
first step of ExitBootServices(). Additionally, for VMs OVMF masks
everything from the PIC (except the timer) by default. This means
that calling cpu_nap() may hang indefinitely. This was seen in
practice in netfront_reset() when running in a VM on XenServer.
Fix this by skipping the halt if an EFI shutdown is in progress.
Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Modified-by: Michael Brown <mcb30@ipxe.org> Signed-off-by: Michael Brown <mcb30@ipxe.org>
Michael Brown [Tue, 30 Jan 2024 16:48:46 +0000 (16:48 +0000)]
[tls] Make key exchange algorithms selectable via build configuration
Allow the choice of key exchange algorithms to be controlled via build
configuration options in config/crypto.h, as is already done for the
choices of public-key algorithms, cipher algorithms, and digest
algorithms.
Michael Brown [Tue, 30 Jan 2024 15:23:39 +0000 (15:23 +0000)]
[tls] Split out Diffie-Hellman parameter signature verification
DHE and ECDHE use essentially the same mechanism for verifying the
signature over the Diffie-Hellman parameters, though the format of the
parameters is different between the two methods.
Split out the verification of the parameter signature so that it may
be shared between the DHE and ECDHE key exchange algorithms.
Michael Brown [Tue, 30 Jan 2024 15:17:49 +0000 (15:17 +0000)]
[tls] Generate key material after sending ClientKeyExchange
The construction of the key material for the pending cipher suites
from the TLS master secret must happen regardless of which key
exchange algorithm is in use, and the key material is not required to
send the ClientKeyExchange handshake (which is sent before changing
cipher suites).
Centralise the call to tls_generate_keys() after performing key
exchange via the selected algorithm.
Michael Brown [Tue, 30 Jan 2024 13:38:15 +0000 (13:38 +0000)]
[tls] Restructure construction of ClientHello message
Define an individual local structure for each extension and a single
structure for the list of extensions. This makes it viable to add
extensions such as the Supported Elliptic Curves extension, which must
not be present if the list of curves is empty.
Michael Brown [Tue, 30 Jan 2024 13:14:21 +0000 (13:14 +0000)]
[crypto] Check for all-zeros result from X25519 key exchange
RFC7748 states that it is entirely optional for X25519 Diffie-Hellman
implementations to check whether or not the result is the all-zero
value (indicating that an attacker sent a malicious public key with a
small order). RFC8422 states that implementations in TLS must abort
the handshake if the all-zero value is obtained.
Return an error if the all-zero value is obtained, so that the TLS
code will not require knowledge specific to the X25519 curve.
Michael Brown [Fri, 19 Jan 2024 12:36:11 +0000 (12:36 +0000)]
[crypto] Add X25519 key exchange algorithm
Add an implementation of the X25519 key exchange algorithm as defined
in RFC7748.
This implementation is inspired by and partially based upon the paper
"Implementing Curve25519/X25519: A Tutorial on Elliptic Curve
Cryptography" by Martin Kleppmann, available for download from
https://www.cl.cam.ac.uk/teaching/2122/Crypto/curve25519.pdf
The underlying modular addition, subtraction, and multiplication
operations are completely redesigned for substantially improved
efficiency compared to the TweetNaCl implementation studied in that
paper (approximately 5x-10x faster and with 70% less memory usage).
Michael Brown [Fri, 19 Jan 2024 16:35:02 +0000 (16:35 +0000)]
[loong64] Replace broken big integer arithmetic implementations
The slightly incomprehensible LoongArch64 implementation for
bigint_subtract() is observed to produce incorrect results for some
input values.
Replace the suspicious LoongArch64 implementations of bigint_add(),
bigint_subtract(), bigint_rol() and bigint_ror(), and add a test case
for a subtraction that was producing an incorrect result with the
previous implementation.
Michael Brown [Fri, 19 Jan 2024 12:29:29 +0000 (12:29 +0000)]
[crypto] Add bigint_copy() as a convenient wrapper macro
Big integers may be efficiently copied using bigint_shrink() (which
will always copy only the size of the destination integer), but this
is potentially confusing to a reader of the code.
Provide bigint_copy() as an alias for bigint_shrink() so that the
intention of the calling code may be more obvious.
Michael Brown [Tue, 16 Jan 2024 13:24:29 +0000 (13:24 +0000)]
[libc] Replace linker_assert() with build_assert()
We currently implement build-time assertions via a mechanism that
generates a call to an undefined external function that will cause the
link to fail unless the compiler can prove that the asserted condition
is true (and thereby eliminate the undefined function call).
This assertion mechanism can be used for conditions that are not
amenable to the use of static_assert(), since static_assert() will not
allow for proofs via dead code elimination.
Add __attribute__((error(...))) to the undefined external function, so
that the error is raised at compile time rather than at link time.
This allows us to provide a more meaningful error message (which will
include the file name and line number, as with any other compile-time
error), and avoids the need for the caller to specify a unique symbol
name for the external function.
Change the name from linker_assert() to build_assert(), since the
assertion now takes place at compile time rather than at link time.
Michael Brown [Sun, 14 Jan 2024 12:12:18 +0000 (12:12 +0000)]
[build] Fix building with newer binutils
Newer versions of the GNU assembler (observed with binutils 2.41) will
complain about the ".arch i386" in files assembled with "as --64",
with the message "Error: 64bit mode not supported on 'i386'".
In files such as stack.S that contain no instructions to be assembled,
the ".arch i386" is redundant and may be removed entirely.
In the remaining files, fix by moving ".arch i386" below the relevant
".code16" or ".code32" directive, so that the assembler is no longer
expecting 64-bit instructions to be used by the time that the ".arch
i386" directive is encountered.
Reported-by: Ali Mustakim <alim@forwardcomputers.com> Signed-off-by: Michael Brown <mcb30@ipxe.org>
Michael Brown [Wed, 10 Jan 2024 15:23:07 +0000 (15:23 +0000)]
[eap] Add support for the MD5-Challenge authentication type
RFC 3748 states that support for MD5-Challenge is mandatory for EAP
implementations. The MD5 and CHAP code is already included in the
default build since it is required by iSCSI, and so this does not
substantially increase the binary size.
Michael Brown [Wed, 10 Jan 2024 15:30:36 +0000 (15:30 +0000)]
[eap] Add support for sending an EAP identity
Allow the ${netX/username} setting to be used to specify an EAP
identity to be returned in response to a Request-Identity, and provide
a mechanism for responding with a NAK to indicate which authentication
types we support.
If no identity is specified then fall back to the current behaviour of
not sending any Request-Identity response, so that switches will time
out and switch to MAC Authentication Bypass (MAB) if applicable.
Michael Brown [Tue, 19 Dec 2023 16:56:34 +0000 (16:56 +0000)]
[efi] Fix Coverity warning about unintended sign extension
The result of multiplying a uint16_t by another uint16_t will be a
signed int. Comparing this against a size_t will perform an unwanted
sign extension.
Fix by explicitly casting e_phnum to an unsigned int, thereby matching
the data type used for the loop index variable (and avoiding the
unwanted sign extension).
This mirrors wimboot commit 15f6162 ("[efi] Fix Coverity warning about
unintended sign extension").
Michael Brown [Wed, 29 Nov 2023 12:49:06 +0000 (12:49 +0000)]
[efi] Avoid modifying PE/COFF debug filename
The function efi_pecoff_debug_name() (called by efi_handle_name()) is
used to extract a filename from the debug data directory entry located
within a PE/COFF image. The name is copied into a temporary static
buffer to allow for modifications, but the code currently erroneously
modifies the original name within the loaded PE/COFF image.
Fix by performing the modification on the copy in the temporary
buffer, as originally intended.
Michael Brown [Mon, 27 Nov 2023 12:08:19 +0000 (12:08 +0000)]
[efi] Extend PE header size to cover space up to first section
Hybrid bzImage and UEFI binaries (such as wimboot) may place sections
at explicit offsets within the PE file, as described in commit b30a098
("[efi] Use load memory address as file offset for hybrid binaries").
This can leave a gap after the PE headers that is not covered by any
section. It is not entirely clear whether or not such gaps are
permitted in binaries submitted for Secure Boot signing.
To minimise potential problems, extend the PE header size to cover any
space before the first explicitly placed section.
Michael Brown [Fri, 24 Nov 2023 12:26:43 +0000 (12:26 +0000)]
[efi] Maximise image base address
iPXE images are linked with a starting virtual address of zero. Other
images (such as wimboot) may use a non-zero starting virtual address.
There is no direct equivalent of the PE ImageBase address field within
ELF object files. Choose to use the highest possible address that
accommodates all sections and the PE header itself, since this will
minimise the memory allocated to hold the loaded image.
Michael Brown [Fri, 24 Nov 2023 15:55:41 +0000 (15:55 +0000)]
[efi] Do not assume canonical PE section ordering
The BaseOfCode (and, in PE32, BaseOfData) fields imply an assumption
that binaries are laid out as code followed by initialised data
followed by uninitialised data. This assumption may not be valid for
complex binaries such as wimboot.
Remove this implicit assumption, and use arguably justifiable values
for the assorted summary start and size fields within the PE headers.
Michael Brown [Fri, 24 Nov 2023 12:16:49 +0000 (12:16 +0000)]
[efi] Treat 16-bit sections as hidden in hybrid binaries
Hybrid bzImage and UEFI binaries (such as wimboot) may include 16-bit
sections such as .bss16 that do not need to consume an entry in the PE
section list. Treat any such sections as hidden.
Michael Brown [Thu, 23 Nov 2023 14:17:36 +0000 (14:17 +0000)]
[efi] Place PE debug information in a hidden section
The PE debug information generated by elf2efi is used only to hold the
image filename, and the debug information is located via the relevant
data directory entry rather than via the section table.
Make the .debug section a hidden section in order to save one entry in
the PE section list. Choose to place the debug information in the
unused space at the end of the PE headers, since it no longer needs to
satisfy the general section alignment constraints.
Michael Brown [Thu, 23 Nov 2023 14:54:12 +0000 (14:54 +0000)]
[efi] Fix recorded overall size of headers in NT optional header
Commit 1e4c378 ("[efi] Shrink size of data directory in PE header")
reduced the number of entries used in the data directory and reduced
the recorded size of the NT "optional" header, but did not also adjust
the recorded overall size of the PE headers, resulting in unused space
between the PE headers and the first section.
Fix by reducing the initial recorded size of the PE headers by the
size of the omitted data directory entries.
Michael Brown [Wed, 22 Nov 2023 15:38:22 +0000 (15:38 +0000)]
[efi] Write out PE header only after writing sections
Hybrid bzImage and UEFI binaries (such as wimboot) include a bzImage
header within a section starting at offset zero, with the PE header
effectively occupying unused space within this section.
Allow for this by treating a section placed at offset zero as hidden,
and by deferring the writing of the PE header until after the output
sections have been written.
Michael Brown [Wed, 22 Nov 2023 14:57:05 +0000 (14:57 +0000)]
[efi] Use load memory address as file offset for hybrid binaries
Hybrid bzImage and UEFI binaries (such as wimboot) may be loaded as a
single contiguous blob without reference to the PE headers, and the
placement of sections within the PE file must therefore be known at
link time.
Use the load memory address (extracted from the ELF program headers)
to determine the physical placement of the section within the PE file
when generating a hybrid binary.
Michael Brown [Wed, 22 Nov 2023 23:05:39 +0000 (23:05 +0000)]
[efi] Mark PE images as large address aware
The images generated by elf2efi can be loaded anywhere in the address
space, and are not limited to the low 2GB.
Indicate this by setting the "large address aware" flag within the PE
header, for compatibility with EFI images generated by the EDK2 build
process. (The EDK2 PE loader does not ever check this flag, and it is
unlikely that any other EFI PE loader ever does so, but we may as well
report it accurately.)
Michael Brown [Wed, 22 Nov 2023 23:14:38 +0000 (23:14 +0000)]
[efi] Treat writable sections as data sections
Hybrid bzImage and UEFI binaries (such as wimboot) may include 16-bit
executable code that is opaque data from the perspective of a UEFI PE
binary, as described in wimboot commit fe456ca ("[efi] Use separate
.text and .data PE sections").
The ELF section will be marked as containing both executable code and
writable data. Choose to treat such a section as a data section
rather than a code section, since that matches the expected semantics
for ELF files that we expect to process.
Michael Brown [Tue, 7 Nov 2023 18:05:45 +0000 (18:05 +0000)]
[cloud] Add utility script to read iPXE output from INT13CON partition
Some AWS instance types still do not support serial console output or
screenshots. For these instance types, the only viable way to extract
debugging information is to use the INT13 console (which is already
enabled via CONFIG=cloud for all AWS images).
Obtaining the INT13 console output can be very cumbersome, since there
is no direct way to read from an AWS volume. The simplest current
approach is to stop the instance under test, detach its root volume,
and reattach the volume to a Linux instance in the same region.
Add a utility script aws-int13con to retrieve the INT13 console output
by creating a temporary snapshot, reading the first block from the
snapshot, and extracting the INT13 console partition content.
Michael Brown [Tue, 7 Nov 2023 15:54:59 +0000 (15:54 +0000)]
[cloud] Add ability to overwrite existing AMI images
AMI names must be unique within a region. Add a --overwrite option
that allows an existing AMI of the same name to be deregistered (and
its underlying snapshot deleted).
Michael Brown [Tue, 7 Nov 2023 11:08:33 +0000 (11:08 +0000)]
[eapol] Delay EAPoL-Start while waiting for EAP to complete
EAP exchanges may take a long time to reach a final status, especially
when relying upon MAC Authentication Bypass (MAB). Our current
behaviour of sending EAPoL-Start every few seconds until a final
status is obtained can prevent these exchanges from ever completing.
Fix by redefining the EAP supplicant state to allow EAPoL-Start to be
suppressed: either temporarily (while waiting for a full EAP exchange
to complete, in which case we need to eventually resend EAPoL-Start if
the final Success or Failure packet is lost), or permanently (while
waiting for the potentially very long MAC Authentication Bypass
timeout period).
Michael Brown [Thu, 2 Nov 2023 16:11:38 +0000 (16:11 +0000)]
[pci] Require discovery of a PCI device when determining usable PCI APIs
The PCI cloud API (PCIAPI_CLOUD) currently selects the first PCI API
that successfully discovers a PCI device address range. The ECAM API
may discover an address range but subsequently be unable to map the
configuration space region, which would result in the selected PCI API
being unusable.
Fix by instead selecting the first PCI API that can be successfully
used to discover a PCI device.
Michael Brown [Thu, 2 Nov 2023 15:38:08 +0000 (15:38 +0000)]
[pci] Check that ECAM configuration space is within reachable memory
Some machines (observed with an AWS EC2 m7a.large instance) will place
the ECAM configuration space window above 4GB, thereby making it
unreachable from non-paged 32-bit code. This problem is currently
ignored by iPXE, since the address is silently truncated in the call
to ioremap(). (Note that other uses of ioremap() are not affected
since the PCI core will already have checked for unreachable 64-bit
BARs when retrieving the physical address to be mapped.)
Fix by adding an explicit check that the region to be mapped starts
within the reachable memory address space. (Assume that no machines
will be sufficiently peverse to provide a region that straddles the
4GB boundary.)
Michael Brown [Thu, 2 Nov 2023 15:16:19 +0000 (15:16 +0000)]
[pci] Cache ECAM mapping errors
When an error occurs during ECAM configuration space mapping, preserve
the error within the existing cached mapping (instead of invalidating
the cached mapping) in order to avoid flooding the debug log with
repeated identical mapping errors.
Michael Brown [Thu, 2 Nov 2023 15:05:15 +0000 (15:05 +0000)]
[pci] Handle non-zero starting bus in ECAM allocations
The base address provided in the PCI ECAM allocation within the ACPI
MCFG table is the base address for the segment as a whole, not for the
starting bus within that allocation. On machines that provide ECAM
allocations with a non-zero starting bus number (observed with an AWS
EC2 m7a.large instance), this will result in iPXE accessing the wrong
memory addresses within the ECAM region.
Fix by adding the appropriate starting bus offset to the base address.
Michael Brown [Wed, 1 Nov 2023 22:03:34 +0000 (22:03 +0000)]
[pci] Force completion of ECAM configuration space writes
The PCIe specification requires that "processor and host bridge
implementations must ensure that a method exists for the software to
determine when the write using the ECAM is completed by the completer"
but does not specify any particular method to be used. Some platforms
might treat writes to the ECAM region as non-posted, others might
require reading back from a dedicated (and implementation-specific)
completion register to determine when the configuration space write
has completed.
Since PCI configuration space writes will never be used for any
performance-critical datapath operations (on any sane hardware), a
simple and platform-independent solution is to always read back from
the written register in order to guarantee that the write must have
completed. This is safe to do, since the PCIe specification defines a
limited set of configuration register types, none of which have read
side effects.
Michael Brown [Tue, 24 Oct 2023 10:43:56 +0000 (11:43 +0100)]
[iphone] Add missing va_start()/va_end() around reused argument list
The ipair_tx() function uses a va_list twice (first to calculate the
formatted string length before allocation, then to construct the
string in the allocated buffer) but is missing the va_start() and
va_end() around the second usage. This is undefined behaviour that
happens to work on some build platforms.
Fix by adding the missing va_start() and va_end() around the second
usage of the variadic argument list.
Reported-by: Andreas Hammarskjöld <andreas@2PintSoftware.com> Signed-off-by: Michael Brown <mcb30@ipxe.org>
Michael Brown [Fri, 6 Oct 2023 11:43:02 +0000 (12:43 +0100)]
[libc] Use wall clock time as seed for the (non-cryptographic) RNG
We currently use the number of timer ticks since power-on as a seed
for the non-cryptographic RNG implemented by random(). Since iPXE is
often executed directly after power-on, and since the timer tick
resolution is generally low, this can often result in identical seed
values being used on each cold boot attempt.
As of commit 41f786c ("[settings] Add "unixtime" builtin setting to
expose the current time"), the current wall-clock time is always
available within the default build of iPXE. Use this time instead, to
introduce variability between cold boot attempts on the same host.
(Note that variability between different hosts is obtained by using
the MAC address as an additional seed value.)
This has no effect on the separate DRBG used by cryptographic code.
Suggested-by: Heiko <heik0@xs4all.nl> Signed-off-by: Michael Brown <mcb30@ipxe.org>
Michael Brown [Fri, 15 Sep 2023 15:14:59 +0000 (16:14 +0100)]
[eapol] Send EAPoL-Start packets to trigger EAP authentication
We have no way to force a link-layer restart in iPXE, and therefore no
way to explicitly trigger a restart of EAP authentication. If an iPXE
script has performed some action that requires such a restart
(e.g. registering a device such that the port VLAN assignment will be
changed), then the only means currently available to effect the
restart is to reboot the whole system. If iPXE is taking over a
physical link already used by a preceding bootloader, then even a
reboot may not work.
In the EAP model, the supplicant is a pure responder and never
initiates transmissions. EAPoL extends this to include an EAPoL-Start
packet type that may be sent by the supplicant to (re)trigger EAP.
Add support for sending EAPoL-Start packets at two-second intervals on
links that are open and have reached physical link-up, but for which
EAP has not yet completed. This allows "ifclose ; ifopen" to be used
to restart the EAP process.
Michael Brown [Fri, 15 Sep 2023 15:10:07 +0000 (16:10 +0100)]
[eap] Define a supplicant model for EAP and EAPoL
Extend the EAP model to include a record of whether or not EAP
authentication has completed (successfully or otherwise), and to
provide a method for transmitting EAP responses.
Michael Brown [Wed, 13 Sep 2023 21:43:24 +0000 (22:43 +0100)]
[vmware] Use driver-private data to hold GuestInfo settings block
Simplify the per-netdevice GuestInfo settings code by using
driver-private data to hold the settings block, instead of using a
separate allocation.
The settings block (if existent) will be automatically unregistered
when the parent network device settings block is unregistered, and no
longer needs to be separately freed. The guestinfo_net_remove()
function may therefore be omitted completely.
Michael Brown [Wed, 13 Sep 2023 19:23:59 +0000 (20:23 +0100)]
[lldp] Use driver-private data to hold LLDP settings block
Simplify the LLDP code by using driver-private data to hold the LLDP
settings block, instead of using a separate allocation. This avoids
the need to maintain a list of LLDP settings blocks (since the LLDP
settings block pointer can always be obtained using netdev_priv()) and
obviates several failure paths.
Any recorded LLDP data is now freed when the network device is
unregistered, since there is no longer a dedicated reference counter
for the LLDP settings block. To minimise surprise, we also now
explicitly unregister the settings block. This is not strictly
necessary (since the block will be automatically unregistered when the
parent network device settings block is unregistered), but it
maintains symmetry between lldp_probe() and lldp_remove().
The overall reduction in the size of the LLDP code is around 15%.
Michael Brown [Wed, 13 Sep 2023 15:29:59 +0000 (16:29 +0100)]
[netdevice] Allocate private data for each network upper-layer driver
Allow network upper-layer drivers (such as LLDP, which attaches to
each network device in order to provide a corresponding LLDP settings
block) to specify a size for private data, which will be allocated as
part of the network device structure (as with the existing private
data allocated for the underlying device driver).
This will allow network upper-layer drivers to be simplified by
omitting memory allocation and freeing code. If the upper-layer
driver requires a reference counter (e.g. for interface
initialisation), then it may use the network device's existing
reference counter, since this is now the reference counter for the
containing block of memory.
Michael Brown [Tue, 5 Sep 2023 11:46:39 +0000 (12:46 +0100)]
[librm] Use explicit operand size when pushing a label address
We currently use "push $1f" within inline assembly to push the address
of the real-mode code fragment, relying on the assembler to treat this
as "pushl" for 32-bit code or "pushq" for 64-bit code.
As of binutils commit 5cc0077 ("x86: further adjust extend-to-32bit-
address conditions"), first included in binutils-2.41, this implicit
operand size is no longer calculated as expected and 64-bit builds
will fail with
Error: operand size mismatch for `push'
Fix by adding an explicit operand size to the "push" instruction.
Originally-fixed-by: Justin Cano <jstncno@gmail.com> Signed-off-by: Michael Brown <mcb30@ipxe.org>
The current implementation of vpm_ioread32() erroneously reads only 16
bits of data, which fails when used with the (stricter) virtio device
emulation in VirtualBox.
Fix by using the correct readl()/inl() I/O wrappers.
Reworded-by: Michael Brown <mcb30@ipxe.org> Signed-off-by: Michael Brown <mcb30@ipxe.org>
Michael Brown [Fri, 7 Jul 2023 14:05:39 +0000 (15:05 +0100)]
[console] Restore compatibility with "--key" values in existing scripts
Commit 3ef4f7e ("[console] Avoid overlap between special keys and
Unicode characters") renumbered the special key encoding to avoid
collisions with Unicode key values outside the ASCII range. This
change broke backwards compatibility with existing scripts that
specify key values using e.g. "prompt --key" or "menu --key".
Restore compatibility with existing scripts by tweaking the special
key encoding so that the relative key value (i.e. the delta from
KEY_MIN) is numerically equal to the old pre-Unicode key value, and by
modifying parse_key() to accept a relative key value.
Reported-by: Sven Dreyer <sven@dreyer-net.de> Signed-off-by: Michael Brown <mcb30@ipxe.org>
Michael Brown [Wed, 5 Jul 2023 14:24:32 +0000 (15:24 +0100)]
[linux] Set a default MAC address for tap devices
Avoid the need to always specify a local MAC address on the command
line by setting a default hardware MAC address (using the same default
address as for slirp devices).