git.ipfire.org Git - thirdparty/ipxe.git/log

[build] Reduce scope of wildcard .gitignore rules

Ensure that .gitignore rules do not cover any files that do exist
within the repository.

Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>

[smbios] Support scanning for the 64-bit SMBIOS3 entry point

Support scanning for the 64-bit SMBIOS3 entry point in addition to the
32-bit SMBIOS2 entry point.

Prefer use of the 32-bit entry point if present, since this is
guaranteed to be within accessible memory.

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[intel] Add PCI ID for I219-LM (23)

Successfully tested on FUJITSU LIFEBOOK U7413.

Signed-off-by: Christian Helmuth <christian.helmuth@genode-labs.com>

[efi] Add potentially missing relocation types

Add definitions for relocation types that may be missing on older
versions of the host system's elf.h.

This mirrors wimboot commit 47f6298 ("[efi] Add potentially missing
relocation types").

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[efi] Fix Coverity warning about unintended sign extension

The result of multiplying a uint16_t by another uint16_t will be a
signed int. Comparing this against a size_t will perform an unwanted
sign extension.

Fix by explicitly casting e_phnum to an unsigned int, thereby matching
the data type used for the loop index variable (and avoiding the
unwanted sign extension).

This mirrors wimboot commit 15f6162 ("[efi] Fix Coverity warning about
unintended sign extension").

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[efi] Add relocation types generated by clang

Add additional PC-relative relocation types that may be encountered
when converting binaries compiled with clang.

This mirrors the relevant elf2efi portions of wimboot commit 7910830
("[build] Support building with the clang compiler").

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[build] Use SOURCE_DATE_EPOCH for FAT serial number if it exists

Reported-by: Bernhard M. Wiedemann <bwiedemann@suse.de>
Signed-off-by: Michael Brown <mcb30@ipxe.org>

[efi] Allow compiling elf2efi with clang

The clang compiler does not (and apparently will not ever) allow for
variable-length arrays within structs.

Work around this limitation by using a fixed-length array to hold the
PDB filename in the debug section.

This mirrors wimboot commit f52c3ff ("[efi] Allow compiling elf2efi
with clang").

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[efi] Avoid modifying PE/COFF debug filename

The function efi_pecoff_debug_name() (called by efi_handle_name()) is
used to extract a filename from the debug data directory entry located
within a PE/COFF image. The name is copied into a temporary static
buffer to allow for modifications, but the code currently erroneously
modifies the original name within the loaded PE/COFF image.

Fix by performing the modification on the copy in the temporary
buffer, as originally intended.

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[efi] Extend PE header size to cover space up to first section

Hybrid bzImage and UEFI binaries (such as wimboot) may place sections
at explicit offsets within the PE file, as described in commit b30a098
("[efi] Use load memory address as file offset for hybrid binaries").
This can leave a gap after the PE headers that is not covered by any
section. It is not entirely clear whether or not such gaps are
permitted in binaries submitted for Secure Boot signing.

To minimise potential problems, extend the PE header size to cover any
space before the first explicitly placed section.

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[efi] Fix dependency list construction in EDK2 header import script

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[efi] Maximise image base address

iPXE images are linked with a starting virtual address of zero. Other
images (such as wimboot) may use a non-zero starting virtual address.

There is no direct equivalent of the PE ImageBase address field within
ELF object files. Choose to use the highest possible address that
accommodates all sections and the PE header itself, since this will
minimise the memory allocated to hold the loaded image.

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[efi] Do not assume canonical PE section ordering

The BaseOfCode (and, in PE32, BaseOfData) fields imply an assumption
that binaries are laid out as code followed by initialised data
followed by uninitialised data. This assumption may not be valid for
complex binaries such as wimboot.

Remove this implicit assumption, and use arguably justifiable values
for the assorted summary start and size fields within the PE headers.

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[efi] Treat 16-bit sections as hidden in hybrid binaries

Hybrid bzImage and UEFI binaries (such as wimboot) may include 16-bit
sections such as .bss16 that do not need to consume an entry in the PE
section list. Treat any such sections as hidden.

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[efi] Place PE debug information in a hidden section

The PE debug information generated by elf2efi is used only to hold the
image filename, and the debug information is located via the relevant
data directory entry rather than via the section table.

Make the .debug section a hidden section in order to save one entry in
the PE section list. Choose to place the debug information in the
unused space at the end of the PE headers, since it no longer needs to
satisfy the general section alignment constraints.

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[efi] Fix recorded overall size of headers in NT optional header

Commit 1e4c378 ("[efi] Shrink size of data directory in PE header")
reduced the number of entries used in the data directory and reduced
the recorded size of the NT "optional" header, but did not also adjust
the recorded overall size of the PE headers, resulting in unused space
between the PE headers and the first section.

Fix by reducing the initial recorded size of the PE headers by the
size of the omitted data directory entries.

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[efi] Write out PE header only after writing sections

Hybrid bzImage and UEFI binaries (such as wimboot) include a bzImage
header within a section starting at offset zero, with the PE header
effectively occupying unused space within this section.

Allow for this by treating a section placed at offset zero as hidden,
and by deferring the writing of the PE header until after the output
sections have been written.

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[efi] Use load memory address as file offset for hybrid binaries

Hybrid bzImage and UEFI binaries (such as wimboot) may be loaded as a
single contiguous blob without reference to the PE headers, and the
placement of sections within the PE file must therefore be known at
link time.

Use the load memory address (extracted from the ELF program headers)
to determine the physical placement of the section within the PE file
when generating a hybrid binary.

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[efi] Mark PE images as large address aware

The images generated by elf2efi can be loaded anywhere in the address
space, and are not limited to the low 2GB.

Indicate this by setting the "large address aware" flag within the PE
header, for compatibility with EFI images generated by the EDK2 build
process. (The EDK2 PE loader does not ever check this flag, and it is
unlikely that any other EFI PE loader ever does so, but we may as well
report it accurately.)

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[efi] Set NXCOMPAT bit in PE header

Indicate that the binary is compatible with W^X protections by setting
the NXCOMPAT bit in the DllCharacteristics field of the PE header.

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[efi] Treat writable sections as data sections

Hybrid bzImage and UEFI binaries (such as wimboot) may include 16-bit
executable code that is opaque data from the perspective of a UEFI PE
binary, as described in wimboot commit fe456ca ("[efi] Use separate
.text and .data PE sections").

The ELF section will be marked as containing both executable code and
writable data. Choose to treat such a section as a data section
rather than a code section, since that matches the expected semantics
for ELF files that we expect to process.

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[efi] Update to current EDK2 headers

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[cloud] Add utility script to read iPXE output from INT13CON partition

Some AWS instance types still do not support serial console output or
screenshots. For these instance types, the only viable way to extract
debugging information is to use the INT13 console (which is already
enabled via CONFIG=cloud for all AWS images).

Obtaining the INT13 console output can be very cumbersome, since there
is no direct way to read from an AWS volume. The simplest current
approach is to stop the instance under test, detach its root volume,
and reattach the volume to a Linux instance in the same region.

Add a utility script aws-int13con to retrieve the INT13 console output
by creating a temporary snapshot, reading the first block from the
snapshot, and extracting the INT13 console partition content.

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[cloud] Add ability to overwrite existing AMI images

AMI names must be unique within a region. Add a --overwrite option
that allows an existing AMI of the same name to be deregistered (and
its underlying snapshot deleted).

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[eapol] Limit number of EAPoL-Start packets transmitted per attempt

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[eapol] Delay EAPoL-Start while waiting for EAP to complete

EAP exchanges may take a long time to reach a final status, especially
when relying upon MAC Authentication Bypass (MAB). Our current
behaviour of sending EAPoL-Start every few seconds until a final
status is obtained can prevent these exchanges from ever completing.

Fix by redefining the EAP supplicant state to allow EAPoL-Start to be
suppressed: either temporarily (while waiting for a full EAP exchange
to complete, in which case we need to eventually resend EAPoL-Start if
the final Success or Failure packet is lost), or permanently (while
waiting for the potentially very long MAC Authentication Bypass
timeout period).

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[pci] Require discovery of a PCI device when determining usable PCI APIs

The PCI cloud API (PCIAPI_CLOUD) currently selects the first PCI API
that successfully discovers a PCI device address range. The ECAM API
may discover an address range but subsequently be unable to map the
configuration space region, which would result in the selected PCI API
being unusable.

Fix by instead selecting the first PCI API that can be successfully
used to discover a PCI device.

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[pci] Check that ECAM configuration space is within reachable memory

Some machines (observed with an AWS EC2 m7a.large instance) will place
the ECAM configuration space window above 4GB, thereby making it
unreachable from non-paged 32-bit code.  This problem is currently
ignored by iPXE, since the address is silently truncated in the call
to ioremap().  (Note that other uses of ioremap() are not affected
since the PCI core will already have checked for unreachable 64-bit
BARs when retrieving the physical address to be mapped.)

Fix by adding an explicit check that the region to be mapped starts
within the reachable memory address space.  (Assume that no machines
will be sufficiently peverse to provide a region that straddles the
4GB boundary.)

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[pci] Cache ECAM mapping errors

When an error occurs during ECAM configuration space mapping, preserve
the error within the existing cached mapping (instead of invalidating
the cached mapping) in order to avoid flooding the debug log with
repeated identical mapping errors.

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[pci] Handle non-zero starting bus in ECAM allocations

The base address provided in the PCI ECAM allocation within the ACPI
MCFG table is the base address for the segment as a whole, not for the
starting bus within that allocation. On machines that provide ECAM
allocations with a non-zero starting bus number (observed with an AWS
EC2 m7a.large instance), this will result in iPXE accessing the wrong
memory addresses within the ECAM region.

Fix by adding the appropriate starting bus offset to the base address.

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[pci] Force completion of ECAM configuration space writes

The PCIe specification requires that "processor and host bridge
implementations must ensure that a method exists for the software to
determine when the write using the ECAM is completed by the completer"
but does not specify any particular method to be used. Some platforms
might treat writes to the ECAM region as non-posted, others might
require reading back from a dedicated (and implementation-specific)
completion register to determine when the configuration space write
has completed.

Since PCI configuration space writes will never be used for any
performance-critical datapath operations (on any sane hardware), a
simple and platform-independent solution is to always read back from
the written register in order to guarantee that the write must have
completed. This is safe to do, since the PCIe specification defines a
limited set of configuration register types, none of which have read
side effects.

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[iphone] Add missing va_start()/va_end() around reused argument list

The ipair_tx() function uses a va_list twice (first to calculate the
formatted string length before allocation, then to construct the
string in the allocated buffer) but is missing the va_start() and
va_end() around the second usage. This is undefined behaviour that
happens to work on some build platforms.

Fix by adding the missing va_start() and va_end() around the second
usage of the variadic argument list.

Reported-by: Andreas Hammarskjöld <andreas@2PintSoftware.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>

[libc] Use wall clock time as seed for the (non-cryptographic) RNG

We currently use the number of timer ticks since power-on as a seed
for the non-cryptographic RNG implemented by random(). Since iPXE is
often executed directly after power-on, and since the timer tick
resolution is generally low, this can often result in identical seed
values being used on each cold boot attempt.

As of commit 41f786c ("[settings] Add "unixtime" builtin setting to
expose the current time"), the current wall-clock time is always
available within the default build of iPXE. Use this time instead, to
introduce variability between cold boot attempts on the same host.
(Note that variability between different hosts is obtained by using
the MAC address as an additional seed value.)

This has no effect on the separate DRBG used by cryptographic code.

Suggested-by: Heiko <heik0@xs4all.nl>
Signed-off-by: Michael Brown <mcb30@ipxe.org>

[eapol] Send EAPoL-Start packets to trigger EAP authentication

We have no way to force a link-layer restart in iPXE, and therefore no
way to explicitly trigger a restart of EAP authentication.  If an iPXE
script has performed some action that requires such a restart
(e.g. registering a device such that the port VLAN assignment will be
changed), then the only means currently available to effect the
restart is to reboot the whole system.  If iPXE is taking over a
physical link already used by a preceding bootloader, then even a
reboot may not work.

In the EAP model, the supplicant is a pure responder and never
initiates transmissions.  EAPoL extends this to include an EAPoL-Start
packet type that may be sent by the supplicant to (re)trigger EAP.

Add support for sending EAPoL-Start packets at two-second intervals on
links that are open and have reached physical link-up, but for which
EAP has not yet completed.  This allows "ifclose ; ifopen" to be used
to restart the EAP process.

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[eap] Define a supplicant model for EAP and EAPoL

Extend the EAP model to include a record of whether or not EAP
authentication has completed (successfully or otherwise), and to
provide a method for transmitting EAP responses.

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[fcoe] Use driver-private data to hold FCoE port structure

Simplify the FCoE code by using driver-private data to hold the FCoE
port for each network device, instead of using a separate allocation.

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[vmware] Use driver-private data to hold GuestInfo settings block

Simplify the per-netdevice GuestInfo settings code by using
driver-private data to hold the settings block, instead of using a
separate allocation.

The settings block (if existent) will be automatically unregistered
when the parent network device settings block is unregistered, and no
longer needs to be separately freed. The guestinfo_net_remove()
function may therefore be omitted completely.

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[ipv6] Use driver-private data to hold link-local IPv6 settings block

Simplify the IPv6 link-local settings code by using driver-private
data to hold the settings block, instead of using a separate
allocation.

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[lldp] Use driver-private data to hold LLDP settings block

Simplify the LLDP code by using driver-private data to hold the LLDP
settings block, instead of using a separate allocation.  This avoids
the need to maintain a list of LLDP settings blocks (since the LLDP
settings block pointer can always be obtained using netdev_priv()) and
obviates several failure paths.

Any recorded LLDP data is now freed when the network device is
unregistered, since there is no longer a dedicated reference counter
for the LLDP settings block.  To minimise surprise, we also now
explicitly unregister the settings block.  This is not strictly
necessary (since the block will be automatically unregistered when the
parent network device settings block is unregistered), but it
maintains symmetry between lldp_probe() and lldp_remove().

The overall reduction in the size of the LLDP code is around 15%.

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[netdevice] Allocate private data for each network upper-layer driver

Allow network upper-layer drivers (such as LLDP, which attaches to
each network device in order to provide a corresponding LLDP settings
block) to specify a size for private data, which will be allocated as
part of the network device structure (as with the existing private
data allocated for the underlying device driver).

This will allow network upper-layer drivers to be simplified by
omitting memory allocation and freeing code. If the upper-layer
driver requires a reference counter (e.g. for interface
initialisation), then it may use the network device's existing
reference counter, since this is now the reference counter for the
containing block of memory.

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[netdevice] Remove netdev_priv() helper function

Some network device drivers use the trivial netdev_priv() helper
function while others use the netdev->priv pointer directly.

Standardise on direct use of netdev->priv, in order to free up the
function name netdev_priv() for reuse.

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[librm] Use explicit operand size when pushing a label address

We currently use "push $1f" within inline assembly to push the address
of the real-mode code fragment, relying on the assembler to treat this
as "pushl" for 32-bit code or "pushq" for 64-bit code.

As of binutils commit 5cc0077 ("x86: further adjust extend-to-32bit-
address conditions"), first included in binutils-2.41, this implicit
operand size is no longer calculated as expected and 64-bit builds
will fail with

Error: operand size mismatch for `push'

Fix by adding an explicit operand size to the "push" instruction.

Originally-fixed-by: Justin Cano <jstncno@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>

[virtio] Fix implementation of vpm_ioread32()

The current implementation of vpm_ioread32() erroneously reads only 16
bits of data, which fails when used with the (stricter) virtio device
emulation in VirtualBox.

Fix by using the correct readl()/inl() I/O wrappers.

Reworded-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>

[dhcp] Request NTP server option

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[ntp] Define NTP server setting

Define the IPv4 NTP server setting to simplify the use of a
DHCP-provided NTP server in scripts, using e.g.

  #!ipxe
  dhcp
  ntp ${ntp}

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[console] Restore compatibility with "--key" values in existing scripts

Commit 3ef4f7e ("[console] Avoid overlap between special keys and
Unicode characters") renumbered the special key encoding to avoid
collisions with Unicode key values outside the ASCII range. This
change broke backwards compatibility with existing scripts that
specify key values using e.g. "prompt --key" or "menu --key".

Restore compatibility with existing scripts by tweaking the special
key encoding so that the relative key value (i.e. the delta from
KEY_MIN) is numerically equal to the old pre-Unicode key value, and by
modifying parse_key() to accept a relative key value.

Reported-by: Sven Dreyer <sven@dreyer-net.de>
Signed-off-by: Michael Brown <mcb30@ipxe.org>

[linux] Set a default MAC address for tap devices

Avoid the need to always specify a local MAC address on the command
line by setting a default hardware MAC address (using the same default
address as for slirp devices).

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[linux] Fix error control flow in af_packet_nic_probe()

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[linux] Fix error control flow in tap_probe()

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[netdevice] Stop link block timer when device is closed

A running link block timer holds a reference to the network device and
will prevent it from being freed until the timer expires. It is
impossible for free_netdev() to be called while the timer is still
running: the call to stop_timer() therein is therefore a no-op.

Stop the link block timer when the device is closed, to allow a
link-blocked device to be freed immediately upon unregistration of the
device. (Since link block state is updated in response to received
packets, the state is effectively undefined for a closed device: there
is therefore no reason to leave the timer running.)

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[interface] Fix debug message values for temporary interfaces

The interface debug message values constructed by INTF_DBG() et al
rely on the interface being embedded within a containing object. This
assumption is not valid for the temporary outbound-only interfaces
constructed on the stack by intf_shutdown() and xfer_vredirect().

Formalise the notion of a temporary outbound-only interface as having
a NULL interface descriptor, and overload the "original interface
descriptor" field to contain a pointer to the original interface that
the temporary interface is shadowing.

Originally-fixed-by: Vincent Fazio <vfazio@gmail.com>
Signed-off-by: Michael Brown <mcb30@ipxe.org>

[build] Inhibit more linker warnings about an implied executable stack

Add .note.GNU-stack section declarations to the autogenerated PCI
device ID list objects.

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[build] Silence the "creating blib.a" message

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[console] Avoid overlap between special keys and Unicode characters

The special key range (from KEY_MIN upwards) currently overlaps with
the valid range for Unicode characters, and therefore prohibits the
use of Unicode key values outside the ASCII range.

Create space for Unicode key values by moving the special keys to the
range immediately above the maximum valid Unicode character. This
allows the existing encoding of special keys as an efficiently packed
representation of the equivalent ANSI escape sequence to be maintained
almost as-is.

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[console] Avoid overlap between remapping flags and character values

The keyboard remapping flags currently occupy bits 8 and upwards of
the to-be-mapped character value. This overlaps the range used for
special keys (KEY_MIN and upwards) and also overlaps the valid Unicode
character range.

No conflict is created by this overlap, since by design only ASCII
character values (as generated by an ASCII-only keyboard driver) are
subject to remapping, and so the to-be-remapped character values exist
in a conceptually separate namespace from either special keys or
non-ASCII Unicode characters. However, the overlap is potentially
confusing for readers of the code.

Minimise cognitive load by using bits 24 and upwards for the keyboard
remapping flags.

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[build] Use separate code segment if supported by linker

Some versions of ld will complain that the automatically created (and
unused by our build process) ELF program headers include a "LOAD
segment with RWX permissions".

Silence this warning by adding "-z separate-code" to the linker
options, where supported.

For BIOS builds, where the prefix will generally require writable
access to its own (tiny) code segment, simply inhibit the warning
completely via "--no-warn-rwx-segments".

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[build] Inhibit linker warnings about an implied executable stack

Signed-off-by: Geert Stappers <stappers@stappers.it>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>

[build] Avoid using multiple target patterns in pattern rules

Multiple target patterns in pattern rules are treated as grouped
targets regardless of the separator character. Newer verions of make
will generate "warning: pattern recipe did not update peer target" to
warn that the rule was expected to update all of the (implicitly)
grouped targets.

Fix by splitting all multiple target pattern rules into single target
pattern rules.

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[loong64] Add support for building EFI binaries

Signed-off-by: Xiaotian Wu <wuxiaotian@loongson.cn>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>

[loong64] Add CPU sleeping API for EFI LoongArch64

Signed-off-by: Xiaotian Wu <wuxiaotian@loongson.cn>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>

[loong64] Add I/O API for LoongArch64

Signed-off-by: Xiaotian Wu <wuxiaotian@loongson.cn>
Modified-by: Michael Brown <mcb30@ipxe.org>
Signed-off-by: Michael Brown <mcb30@ipxe.org>

[ioapi] Centralise definitions for dummy PIO

There is no common standard for I/O-space access for non-x86 CPU
families, and non-MMIO peripherals are vanishingly rare.

Generalise the existing ARM definitions for dummy PIO to allow for
reuse by other CPU architectures.

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[arm] Add missing arch/arm/core source directory

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[arm] Remove redundant inclusion of io.h

The PCI I/O API (supporting accesses to PCI configuration space) is
not related to the general I/O API (supporting accesses to
memory-mapped I/O peripherals).

Remove the spurious inclusion of ipxe/io.h from the PCI I/O header.

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[efi] Process veto objects in reverse order of enumeration

While not guaranteed by the UEFI specification, the enumeration of
handles, protocols, and openers will generally return results in order
of creation. Processing these objects in reverse order (as is already
done when calling DisconnectController() on the list of all handles)
will generally therefore perform the forcible uninstallation
operations in reverse order of object creation, which minimises the
number of implicit operations performed (e.g. when disconnecting a
controller that itself still has existent child controllers).

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[efi] Check for protocols opened by vetoed driver and image handles

The UEFI specification states that the AgentHandle may be either the
driving binding protocol handle or the image handle.

Check for both handles when searching for stale handles to be forcibly
closed on behalf of a vetoed driver.

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[efi] Unload vetoed drivers by image handle rather than driver handle

In most cases, the driver handle will be the image handle itself.
However, this is not required by the UEFI specification, and some
images will install multiple driver binding handles.

Use the image handle (extracted from the driver binding protocol
instance) when attempting to unload the driver's image.

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[efi] Pass more detailed driver information to veto methods

Pass the driver binding handle, the driver binding protocol instance,
the image handle, and the loaded image protocol instance to all veto
methods.

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[efi] Show manufacturer in veto debug output

Simplify the process of adding new entries to the veto list by
including the manufacturer name within the standard debug output.

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[efi] Always poll for TX completions

Polling for TX completions is arguably redundant when there are no
transmissions currently in progress.  Commit c6c7e78 ("[efi] Poll for
TX completions only when there is an outstanding TX buffer") switched
to setting the PXE_OPFLAGS_GET_TRANSMITTED_BUFFERS flag only when
there is an in-progress transmission awaiting completion, in order to
reduce reported TX errors and debug message noise from buggy NII
implementations that report spurious TX completions whenever the
transmit queue is empty.

Some other NII implementations (observed with the Realtek driver in a
Dell Latitude 3440) seem to have a bug in the transmit datapath
handling which results in the transmit ring freezing after sending a
few hundred packets under heavy load.  The symptoms are that the
TPPoll register's NPQ bit remains set and the 256-entry transmit ring
contains a large number of uncompleted descriptors (with the OWN bit
set), the first two of which have identical data buffer addresses.

Though iPXE will submit at most one in-progress transmission via NII,
the Dell/Realtek driver seems to make a page-aligned copy of each
transmit data buffer and to report TX completions immediately without
waiting for the packet to actually be transmitted.  These synthetic TX
completions continue even after the hardware transmit ring freezes.

Setting PXE_OPFLAGS_GET_TRANSMITTED_BUFFERS on every poll reduces the
probability of this Dell/Realtek driver bug being triggered by a
factor of around 500, which brings the failure rate down to the point
that it can sensibly be managed by external logic such as the
"--timeout" option for image downloads.  Closing and reopening the
interface (via "ifclose"/"ifopen") will clear the error condition and
allow transmissions to resume.

Revert to setting PXE_OPFLAGS_GET_TRANSMITTED_BUFFERS on every poll,
and silently ignore situations in which the hardware reports a
completion when no transmission is in progress.  This approximately
matches the behaviour of the SnpDxe driver, which will also generally
set PXE_OPFLAGS_GET_TRANSMITTED_BUFFERS on every poll.

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[efi] Provide read-only access to EFI variables via settings mechanism

EFI variables do not map neatly to the iPXE settings mechanism, since
the EFI variable identifier includes a namespace GUID that cannot
cleanly be supplied as part of a setting name.  Creating a new EFI
variable requires the variable's attributes to be specified, which
does not fit within iPXE's settings concept.

However, EFI variable names are generally unique even without the
namespace GUID, and EFI does provide a mechanism to iterate over all
existent variables.  We can therefore provide read-only access to EFI
variables by comparing only the names and ignoring the namespace
GUIDs.

Provide an "efi" settings block that implements this mechanism using a
syntax such as:

  echo Platform language is ${efi/PlatformLang:string}

  show efi/SecureBoot:int8

Settings are returned as raw binary values by default since an EFI
variable may contain boolean flags, integer values, ASCII strings,
UCS-2 strings, EFI device paths, X.509 certificates, or any other
arbitrary blob of data.

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[efi] Veto the VMware UefiPxeBcDxe driver

The EDK2 UefiPxeBcDxe driver includes some remarkably convoluted and
unsafe logic in its driver binding protocol Start() and Stop() methods
in order to support a pair of nominally independent driver binding
protocols (one for IPv4, one for IPv6) sharing a single dynamically
allocated data structure.  This PXEBC_PRIVATE_DATA structure is
installed as a dummy protocol on the NIC handle in order to allow both
IPv4 and IPv6 driver binding protocols to locate it as needed.

The error handling code path in the UefiPxeBcDxe driver's Start()
method may attempt to uninstall the dummy protocol but fail to do so.
This failure is ignored and the containing memory is subsequently
freed anyway.  On the next invocation of the driver binding protocol,
it will find and use this already freed block of memory.  At some
point another memory allocation will occur, the PXEBC_PRIVATE_DATA
structure will be corrupted, and some undefined behaviour will occur.

The UEFI firmware used in VMware ESX 8 includes some proprietary
changes which attempt to install copies of the EFI_LOAD_FILE_PROTOCOL
and EFI_PXE_BASE_CODE_PROTOCOL instances from the IPv4 child handle
onto the NIC handle (along with a VMware-specific protocol with GUID
5190120d-453b-4d48-958d-f0bab3bc2161 and a NULL instance pointer).
This will inevitably fail with iPXE, since the NIC handle already
includes an EFI_LOAD_FILE_PROTOCOL instance.

These VMware proprietary changes end up triggering the unsafe error
handling code path described above.  The typical symptom is that an
attempt to exit from iPXE back to the UEFI firmware will crash the VM
with a General Protection fault from within the UefiPxeBcDxe driver:
this happens when the UefiPxeBcDxe driver's Stop() method attempts to
call through a function pointer in the (freed) PXEBC_PRIVATE_DATA
structure, but the function pointer has by then been overwritten by
UCS-2 character data from an unrelated memory allocation.

Work around this failure by adding the VMware UefiPxeBcDxe driver to
the driver veto list.

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[efi] Include protocol interface address in debug output

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[efi] Add UefiPxeBcDxe module GUID

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[efi] Add HttpBootDxe module GUID

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[efi] Add new IScsiDxe module GUID

The old IPv4-only IScsiDxe driver in MdeModulePkg/Universal/Network
was replaced by a dual-stack IScsiDxe driver in NetworkPkg.

Add the module GUID for this driver.

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[efi] Add HTTP header and GUID definitions

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[efi] Add DNS headers and GUID definitions

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[efi] Add Ip4Config2 header and GUID definition

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[efi] Add IPv6 versions of existing IPv4 headers and GUID definitions

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[efi] Update to current EDK2 headers

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[efi] Disable static assertions in EFI headers on non-EFI platforms

The EDK2 headers may be included even in builds for non-EFI platforms.
Commits such as 9de6c45 ("[arm] Use -fno-short-enums for all 32-bit
ARM builds") have so far ensured that the compile-time checks within
the EDK2 headers will pass even when building for a non-EFI platform.

As a more general solution, temporarily disable static assertions
while including UefiBaseType.h if building on a non-EFI platform.
This avoids the need to modify the ABI on other platforms.

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[crypto] Add support for PKCS#8 private key format

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[efi] Implement "shim" as a dummy command on non-EFI platforms

The "shim" command will skip downloading the shim binary (and is
therefore a conditional no-op) if there is already a selected EFI
image that can be executed directly via LoadImage()/StartImage().
This allows the same iPXE script to be used with Secure Boot either
enabled or disabled.

Generalise this further to provide a dummy "shim" command that is an
unconditional no-op on non-EFI platforms. This then allows the same
iPXE script to be used for BIOS, EFI with Secure Boot disabled, or EFI
with Secure Boot enabled.

The same effect could be achieved by using "iseq ${platform} efi"
within the script, but this would complicate end-user documentation.

To minimise the code size impact, the dummy "shim" command is a pure
no-op that does not call parse_options() and so will ignore even
standardised arguments such as "--help".

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[efi] Support versions of shim that perform SBAT verification

The UEFI shim implements a fairly nicely designed revocation mechanism
designed around the concept of security generations. Unfortunately
nobody in the shim community has thus far added the relevant metadata
to the Linux kernel, with the result that current versions of shim are
incapable of booting current versions of the Linux kernel.

Experience shows that there is unfortunately no point in trying to get
a fix for this upstreamed into shim. We therefore default to working
around this undesirable behaviour by patching data read from the
"SbatLevel" variable used to hold SBAT configuration.

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[efi] Separate GetMemoryMap() wrapper from shim unlocker

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[efi] Add "shim" command

Allow a shim to be used to facilitate booting a kernel using a script
such as:

    kernel /images/vmlinuz console=ttyS0,115200n8
    initrd /images/initrd.img
    shim /images/shimx64.efi
    boot

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[efi] Add support for executing images via a shim

Add support for using a shim as a helper to execute an EFI image.
When a shim has been specified via shim(), the shim image will be
passed to LoadImage() instead of the selected EFI image and the
command line will be prepended with the name of the selected EFI
image. The selected EFI image will be accessible to the shim via the
virtual filesystem as a hidden file.

Reduce the Secure Boot attack surface by removing, where possible, the
spurious requirement for a third party second stage loader binary such
as GRUB to be used solely in order to call the "shim lock protocol"
entry point.

Do not install the EFI PXE APIs when using a shim, since if shim finds
EFI_PXE_BASE_CODE_PROTOCOL on the loaded image's device handle then it
will attempt to download files afresh instead of using the files
already downloaded by iPXE and exposed via the EFI_SIMPLE_FILE_SYSTEM
protocol. (Experience shows that there is no point in trying to get a
fix for this upstreamed into shim.)

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[efi] Add definitions for the UEFI shim lock protocol

The UEFI shim includes a "shim lock protocol" that can be used by a
third party second stage loader such as GRUB to verify a kernel image.

Add definitions for the relevant portions of this protocol interface.

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[efi] Add efi_asprintf() and efi_vasprintf()

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[image] Generalise concept of selected image

Most image flags are independent values: any combination of flags may
be set for any image, and the flags for one image are independent of
the flags for any other image.  The "selected" flag does not follow
this pattern: at most one image may be marked as selected at any time.

When invoking a kernel via the UEFI shim, there will be multiple
"special" images: the selected kernel itself, the shim image, and
potentially a shim-signed GRUB binary to be used as a crutch to assist
shim in loading the kernel (since current versions of the UEFI shim
are not capable of directly loading a Linux kernel).

Remove the "selected" image flag and replace it with a general concept
of an image tag with the same semantics: a given tag may be assigned
to at most one image, an image may be found by its tag only while the
image is currently registered, and a tag will survive unregistration
and reregistration of an image (if it has not already been assigned to
a new image).  For visual consistency, also replace the current image
pointer with a current image tag.

The image pointer stored within the image tag holds only a weak
reference to the image, since the selection of an image should not
prevent that image from being freed.  (The strong reference to the
currently executing image is held locally within the execution scope
of image_exec(), and is logically separate from the current image
pointer.)

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[efi] Attempt to detect EFI images that fail Secure Boot verification

An EFI image that is rejected by LoadImage() due to failing Secure
Boot verification is still an EFI image. Unfortunately, the extremely
broken UEFI Secure Boot model provides no way for us to unambiguously
determine that a valid EFI executable image was rejected only because
it failed signature verification. We must therefore use heuristics to
guess whether not an image that was rejected by LoadImage() could
still be loaded via a separate PE loader such as the UEFI shim.

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[ci] Work around Ubuntu packaging metadata issues

The libc6-dbg:i386 package has spontaneously started failing to
install from the Azure package repositories used by the GitHub Actions
runners, with the somewhat recalcitrant error message:

libc6:i386: Depends: libgcc-s1:i386 but it is not going to be installed

Work around this unexplained issue by explicitly requesting
installation of the libgcc-s1:i386 package.

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[efi] Allow currently selected image to be opened as "grub*.efi"

Versions 15.4 and earlier of the UEFI shim are incapable of correctly
parsing the command line in order to extract the second stage loader
filename, and will always attempt to load "grubx64.efi" or equivalent.

Versions 15.3 and later of the UEFI shim are currently incapable of
loading a Linux kernel directly anyway, since the kernel does not
include SBAT metadata. These versions will require a genuine
shim-signed GRUB binary to be used as a crutch to assist shim in
loading a Linux kernel.

This leaves versions 15.2 and earlier of the UEFI shim (as currently
used in e.g. RHEL7) as being capable of directly loading a Linux
kernel, but incorrectly attempting to load it using the filename
"grubx64.efi" or equivalent. To support the bugs in these older
versions of the UEFI shim, allow the currently selected image to be
opened via any filename of the form "grub*.efi".

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[efi] Allow currently executing image to be opened via virtual filesystem

When invoking a kernel via the UEFI shim, the kernel image must be
accessible via EFI_SIMPLE_FILE_SYSTEM_PROTOCOL but must not be present
in the magic initrd constructed from all registered images.

Re-register a currently executing EFI image and mark it as hidden,
thereby allowing it to be accessed via the virtual filesystem exposed
via EFI_SIMPLE_FILE_SYSTEM_PROTOCOL without appearing in the magic
initrd contents.

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[image] Allow for images to be hidden from lists of all images

When invoking a kernel via the UEFI shim, the kernel (and potentially
also a helper binary such as GRUB) must be accessible via the virtual
filesystem exposed via EFI_SIMPLE_FILE_SYSTEM_PROTOCOL but must not be
present in the magic initrd constructed from all registered images.

Allow for images to be flagged as hidden, which will cause them to be
excluded from API-level lists of all images such as the virtual
filesystem directory contents, the magic initrd, or the Multiboot
module list. Hidden images remain visible to iPXE commands including
"imgstat", which will show a "[HIDDEN]" flag for such images.

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[efi] Show original filenames in debug messages

Show the original filename as used by the consumer when calling our
EFI_SIMPLE_FILE_SYSTEM_PROTOCOL's Open() method.

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[efi] Allow downloaded images to take precedence over constructed files

Try searching for a matching registered image before checking for
fixed filenames (such as "initrd.magic" for the dynamically generated
magic initrd file). This minimises surprise by ensuring that an
explicitly downloaded image will always be used verbatim.

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[efi] Allow for sections to be excluded from the generated PE file

Hybrid bzImage and UEFI binaries (such as wimboot) include a bzImage
header within a section starting at offset zero, with the PE header
effectively occupying unused space within this section. This section
should not appear as a named section in the resulting PE file.

Allow for the existence of hidden sections that do not result in a
section header being written to the PE file.

Signed-off-by: Michael Brown <mcb30@ipxe.org>

[efi] Allow elf2efi to be used for hybrid binaries

Hybrid 32-bit BIOS and 64-bit UEFI binaries (such as wimboot) may
include R_X86_64_32 relocation records for the 32-bit BIOS portions.
These should be ignored when generating PE relocation records, since
they apply only to code that cannot be executed within the context of
the 64-bit UEFI binary, and creating a 4-byte relocation record is
invalid in a binary that may be relocated anywhere within the 64-bit
address space (see commit 907cffb "[efi] Disallow R_X86_64_32
relocations").

Add a "--hybrid" option to elf2efi, which will cause R_X86_64_32
relocation records to be silently discarded.

Signed-off-by: Michael Brown <mcb30@ipxe.org>