Michael Brown [Fri, 22 Sep 2017 13:02:40 +0000 (14:02 +0100)]
[efi] Inhibit our driver Start() method during disconnection attempts
Some HP BIOSes (observed with a Z840) seem to attempt to connect our
drivers in the middle of our call to DisconnectController(). The
precise chain of events is unclear, but the symptom is that we see
several calls to our Supported() and Start() methods, followed by a
system lock-up.
Work around this dubious BIOS behaviour by explicitly failing calls to
our Start() method while we are in the middle of attempting to
disconnect drivers.
Reported-by: Jordan Wright <jordan.m.wright@disney.com> Debugged-by: Adrian Lucrèce Céleste <adrianlucrececeleste@airmail.cc> Debugged-by: Christian Nilsson <nikize@gmail.com> Tested-by: Jordan Wright <jordan.m.wright@disney.com> Signed-off-by: Michael Brown <mcb30@ipxe.org>
Michael Brown [Mon, 18 Sep 2017 12:32:39 +0000 (13:32 +0100)]
[build] Exclude selected directories from Secure Boot builds
When submitting binaries for UEFI Secure Boot signing, certain
known-dubious subsystems (such as 802.11 and NFS) must be excluded
from the build. Mark the directories containing these subsystems as
insecure, and allow the build target to include an explicit "security
flag" (a literal "-sb" appended to the build platform) to exclude
these source directories from the build process.
For example:
make bin-x86_64-efi-sb/ipxe.efi
will build iPXE with all code from the 802.11 and NFS subsystems
excluded from the build.
Michael Brown [Wed, 13 Sep 2017 07:07:55 +0000 (10:07 +0300)]
[efi] Continue to connect remaining handles after connection errors
Some UEFI BIOSes will deliberately break the implementation of
ConnectController() to return errors for devices that have been
"disabled" via the BIOS setup screen. (As an added bonus, such BIOSes
may return garbage EFI_STATUS values such as 0xff.)
Work around these broken UEFI BIOSes by ignoring failures and
continuing to attempt to connect any remaining handles.
Michael Brown [Wed, 6 Sep 2017 22:56:22 +0000 (23:56 +0100)]
[efi] Match behaviour of SnpDxe for truncated received packets
The UEFI specification does not state whether or not a return value of
EFI_BUFFER_TOO_SMALL from the SNP Receive() method should follow the
usual EFI API behaviour of allowing the caller to retry the request
with an increased buffer size.
Examination of the SnpDxe driver in EDK2 suggests that Receive() will
just return the truncated packet (complete with any requested
link-layer header fields), so match this behaviour.
Michael Brown [Wed, 6 Sep 2017 22:18:29 +0000 (23:18 +0100)]
[efi] Check buffer length for packets retrieved via our SNP protocol
We do not currently check the length of the caller's buffer for
received packets. This creates a potential buffer overrun when iPXE
is being used via the SNP or UNDI protocols.
Fix by checking the buffer length and correctly returning the required
length and an EFI_BUFFER_TOO_SMALL error.
Reported-by: Paul McMillan <paul.mcmillan@oracle.com> Signed-off-by: Michael Brown <mcb30@ipxe.org>
Michael Brown [Tue, 5 Sep 2017 21:55:05 +0000 (22:55 +0100)]
[peerdist] Gather and report peer statistics during download
Record and report the number of peers (calculated as the maximum
number of peers discovered for a block's segment at the time that the
block download is complete), and the percentage of blocks retrieved
from peers rather than from the origin server.
Michael Brown [Tue, 5 Sep 2017 22:21:34 +0000 (23:21 +0100)]
[monojob] Check for job progress only once per timer tick
Checking for job progress is essentially a user interface activity,
and can safely be performed only once per timer tick (as is already
done with checking for keypresses).
Michael Brown [Tue, 5 Sep 2017 11:21:11 +0000 (12:21 +0100)]
[netdevice] Cancel all pending transmissions on any transmit error
Some external code (such as the UEFI UNDI driver for the Realtek USB
NIC on a Microsoft Surface Book) will block during transmission
attempts and can take several seconds to report a transmit error. If
there is a large queue of pending transmissions, then the accumulated
time from a series of such failures can easily exceed the EFI watchdog
timeout, resulting in what appears to be a system lockup followed by a
reboot.
Work around this problem by immediately cancelling any pending
transmissions as soon as any transmit error occurs.
The only expected transmit error under normal operation is ENOBUFS
arising when the hardware transmit queue is full. By definition, this
can happen only for drivers that do not utilise deferred
transmissions, and so this new behaviour will not affect these
drivers.
Michael Brown [Tue, 5 Sep 2017 09:48:41 +0000 (10:48 +0100)]
[efi] Raise TPL when calling UNDI entry point
The SnpDxe driver raises the task priority level to TPL_CALLBACK when
calling the UNDI entry point. This does not appear to be a documented
requirement, but we should probably match the behaviour of SnpDxe to
minimise surprises to third party code.
Michael Brown [Mon, 4 Sep 2017 13:00:32 +0000 (14:00 +0100)]
[malloc] Avoid false positive warnings from valgrind
Calling discard_cache() is likely to result in a call to
free_memblock(), which will call valgrind_make_blocks_noaccess()
before returning. This causes valgrind to report an invalid read on
the next iteration through the loop in alloc_memblock().
Fix by explicitly calling valgrind_make_blocks_defined() after
discard_cache() returns. Also call valgrind_make_blocks_noaccess()
before calling discard_cache(), to guard against free list corruption
while executing cache discarders.
Michael Brown [Wed, 30 Aug 2017 09:15:25 +0000 (10:15 +0100)]
[romprefix] Avoid unaligned accesses within ROM headers
Ensure that all headers (PCI, UNDI, PnP, iPXE) are aligned to at least
four bytes, so that all accesses to header fields will be correctly
aligned even when reading directly from the expansion ROM BAR.
Reported-by: Peter von Konigsmark <peter@exablaze.com> Signed-off-by: Michael Brown <mcb30@ipxe.org>
Michael Brown [Fri, 28 Jul 2017 20:19:45 +0000 (21:19 +0100)]
[hyperv] Do not steal ownership from the Gen 2 UEFI firmware
We must not steal ownership from the Gen 2 UEFI firmware, since doing
so will cause an immediate system crash (most likely in the form of a
reboot).
This problem was masked before commit a0f6e75 ("[hyperv] Do not fail
if guest OS ID MSR is already set"), since prior to that commit we
would always fail if we found any non-zero guest OS identity. We now
accept a non-zero previous guest OS identity in order to allow for
situations such as chainloading from iPXE to another iPXE, and as a
prerequisite for commit b91cc98 ("[hyperv] Cope with Windows Server
2016 enlightenments").
A proper fix would be to reverse engineer the UEFI protocols exposed
within the Hyper-V Gen 2 firmware and use these to bind to the VMBus
device representing the network connection, (with the native Hyper-V
driver moved to become a BIOS-only feature).
As an interim solution, fail to initialise the native Hyper-V driver
if we detect the guest OS identity known to be used by the Gen 2 UEFI
firmware. This will cause the standard all-drivers build (ipxe.efi)
to fall back to using the SNP driver.
Michael Brown [Fri, 28 Jul 2017 14:40:44 +0000 (15:40 +0100)]
[build] Fix ARM32 EFI builds with current EDK2 headers
EDK2 commit 6440385 ("MdePkg/Include: Add enumeration size checks to
Base.h") enforced the UEFI specification mandate that enums should
always be 32 bits. This revealed a latent bug in iPXE, which does not
build with -fno-short-enums.
Fix by adding -fno-short-enums to CFLAGS for ARM32 EFI builds.
Reported-by: Benjamin S. Allen <bsallen@alcf.anl.gov> Signed-off-by: Michael Brown <mcb30@ipxe.org>
Michael Brown [Fri, 28 Jul 2017 12:50:35 +0000 (13:50 +0100)]
[build] Fix use of inline assembly on GCC 4.8 ARM64 builds
The inline assembly used in include/errno.h to generate the einfo
blocks requires the ability to generate an immediate constant with no
immediate-value prefix (such as the dollar sign for x86 assembly).
We currently achieve this via the undocumented "%c0" form of operand.
This causes an "invalid operand prefix" error on GCC 4.8 for ARM64
builds.
Fix by switching to the equally undocumented "%a0" form of operand,
which appears to work correctly on all tested versions of GCC.
Reported-by: Benjamin S. Allen <bsallen@alcf.anl.gov> Signed-off-by: Michael Brown <mcb30@ipxe.org>
Michael Brown [Fri, 21 Jul 2017 13:48:27 +0000 (14:48 +0100)]
[efi] Enumerate PCI BARs in same order as SnpDxe
The UEFI specification has an implicit and demonstrably incorrect
requirement (in the Mem_IO() calling convention) that any UNDI network
device has at most one memory BAR and one I/O BAR.
Some UEFI platforms have been observed to report the existence of
non-existent additional I/O BARs, causing iPXE to select the wrong
BAR. This problem does not affect the SnpDxe driver, since that
driver will always choose the lowest numbered existent BAR of each
type.
Adjust iPXE's behaviour to match that of SnpDxe, i.e. to always select
the lowest numbered BAR(s).
Debugged-by: Andreas Hammarskjöld <junior@2PintSoftware.com> Debugged-by: Adklei <adklei@realtek.com> Signed-off-by: Michael Brown <mcb30@ipxe.org>
Michael Brown [Fri, 7 Jul 2017 16:25:37 +0000 (17:25 +0100)]
[smsc75xx] Expose functionality shared with LAN78xx devices
The LAN78xx datapath is essentially identical to that of the SMSC75xx.
Expose the transmit, poll, and bulk IN endpoint operations to allow
for reuse by the LAN78xx driver.
Jason Wang [Mon, 10 Jul 2017 09:10:41 +0000 (17:10 +0800)]
[virtio] Support VIRTIO_NET_F_IOMMU_PLATFORM
Since we don't enable IOMMU at all, we can then simply enable the
IOMMU support by claiming the support of VIRITO_F_IOMMU_PLATFORM.
This fixes booting failure when iommu_platform is set from qemu cli.
Signed-off-by: Jason Wang <jasowang@redhat.com> Signed-off-by: Michael Brown <mcb30@ipxe.org>
Michael Brown [Thu, 6 Jul 2017 15:58:22 +0000 (16:58 +0100)]
[smscusb] Abstract out common SMSC USB device functionality
The smsc75xx and smsc95xx drivers include a substantial amount of
identical functionality, varying only in the base address of register
sets. Abstract out this common functionality to allow code to be
shared between the drivers.
Michael Brown [Mon, 3 Jul 2017 12:38:55 +0000 (13:38 +0100)]
[usb] Use non-zero language ID to retrieve strings
We currently use a zero language ID to retrieve strings such as the
ECM/NCM MAC address. This works on most hardware devices, but is
known to fail on some software emulated CDC-NCM devices.
Fix by using the first supported language ID, falling back to English
(0x0409) if any error occurs when fetching the list of supported
languages. This matches the behaviour of the Linux kernel.
Michael Brown [Tue, 13 Jun 2017 11:12:11 +0000 (12:12 +0100)]
[crypto] Provide asn1_built() to construct a cursor from a builder
Our ASN.1 parsing code uses a struct asn1_cursor, while the object
construction code uses a struct asn1_builder. These structures are
identical apart from the const modifier applied to the data pointer in
struct asn1_cursor.
Provide asn1_built() to safely typecast a struct asn1_builder to a
struct asn1_cursor, allowing constructed objects to be passed to
functions expecting a struct asn1_cursor.
Michael Brown [Thu, 15 Jun 2017 13:50:20 +0000 (14:50 +0100)]
[cpuid] Allow input %ecx value to be specified
For some CPUID leaves (e.g. %eax=0x00000004), the result depends on
the input value of %ecx. Allow this subfunction number to be
specified as a parameter to the cpuid() wrapper.
The subfunction number is exposed via the ${cpuid/...} settings
mechanism using the syntax
Michael Brown [Wed, 14 Jun 2017 11:33:16 +0000 (12:33 +0100)]
[build] Use -no-pie on newer versions of gcc
Some distributions patch gcc to generate position independent
executables by default. We currently include a workaround to check
for this and to add -fno-PIE -nopie to CFLAGS if required.
Newer patched versions of gcc require -fno-PIE -no-pie instead. Check
for both variants.
Reported-by: Nathan Rennie-Waldock <nathan.renniewaldock@gmail.com> Originally-fixed-by: Markos Chandras <mchandras@suse.de> Signed-off-by: Michael Brown <mcb30@ipxe.org>
Michael Brown [Tue, 13 Jun 2017 12:16:26 +0000 (13:16 +0100)]
[hdprefix] Avoid attempts to read beyond the end of the disk
When booting from a hard disk image (e.g. bin/ipxe.usb) within an
emulator such as QEMU, the disk may not exist beyond the end of the
image. Limit all reads to the length of the image to avoid spurious
errors when loading the iPXE image.
Michael Brown [Tue, 23 May 2017 14:44:22 +0000 (15:44 +0100)]
[acpi] Expose ACPI tables via settings mechanism
Allow values to be read from ACPI tables using the syntax
${acpi/<signature>.<index>.0.<offset>.<length>}
where <signature> is the ACPI table signature as a 32-bit hexadecimal
number (e.g. 0x41504093 for the 'APIC' signature on the MADT), <index>
is the index into the array of tables matching this signature,
<offset> is the byte offset within the table, and <length> is the
field length in bytes.
Numeric values are returned in reverse byte order, since ACPI numeric
values are usually little-endian.
For example:
${acpi/0x41504943.0.0.0.0} - entire MADT table in raw hex
${acpi/0x41504943.0.0.0x0a.6:string} - MADT table OEM ID
${acpi/0x41504943.0.0.0x24.4:uint32} - local APIC address
Michael Brown [Mon, 22 May 2017 12:17:23 +0000 (13:17 +0100)]
[tls] Keep cipherstream window open until TLS negotiation is complete
When performing a SAN boot, the plainstream window size will be zero
(since this is the mechanism used internally to indicate that no data
should be fetched via the initial request). This zero value currently
propagates to the advertised TCP window size, which prevents the TLS
negotiation from completing.
Fix by ensuring that the cipherstream window is held open until TLS
negotiation is complete, and only then falling back to passing through
the plainstream window size.
Reported-by: John Wigley <johnwigley#ipxe@acorna.co.uk> Tested-by: John Wigley <johnwigley#ipxe@acorna.co.uk> Signed-off-by: Michael Brown <mcb30@ipxe.org>
Michael Brown [Fri, 19 May 2017 11:21:18 +0000 (12:21 +0100)]
[efi] Prevent EFI code from being linked in to non-EFI builds
Ensure that efi_systab is an undefined symbol in non-EFI builds. In
particular, this prevents users from incorrectly enabling IMAGE_EFI in
a BIOS build of iPXE.
Michael Brown [Fri, 19 May 2017 01:50:21 +0000 (02:50 +0100)]
[xen] Provide 18 4kB receive buffers to work around xen-netback bug
The Xen network backend (xen-netback) suffered from a regression
between upstream Linux kernels 3.18 and 4.2 inclusive, which would
cause packet reception to fail unless at least 18 receive buffers were
available. This bug was fixed in kernel commit 1d5d485 ("xen-netback:
require fewer guest Rx slots when not using GSO").
Work around this bug in affected versions of xen-netback by providing
the requisite 18 receive buffers.
Reported-by: Taylor Schneider <tschneider@live.com> Tested-by: Taylor Schneider <tschneider@live.com> Signed-off-by: Michael Brown <mcb30@ipxe.org>
Michael Brown [Wed, 10 May 2017 14:30:51 +0000 (15:30 +0100)]
[iscsi] Fix iBFT when no explicit initiator name setting exists
Commit 7cfdd76 ("[block] Describe all SAN devices via ACPI tables")
changed the definition of the iSCSI initiator IQN in the iBFT to
represent a common initiator IQN used for all iSCSI sessions, and
attempted to calculate this common initiator IQN by fetching the
common ${initiator-iqn} setting.
This fails when no explicit ${initiator-iqn} has been specified
(i.e. when an initiator IQN has instead been constructed from either
the hostname or system UUID), and results in an empty initiator IQN in
the iBFT.
Fix by using the initiator IQN of an arbitrary iSCSI session
present in the iBFT.
Debugged-by: Tal Aloni <tal.aloni.il@gmail.com> Signed-off-by: Michael Brown <mcb30@ipxe.org>
Michael Brown [Wed, 3 May 2017 12:01:11 +0000 (13:01 +0100)]
[iscsi] Always send FirstBurstLength parameter
As of kernel 4.11, the LIO target will propose a value for
FirstBurstLength if the initiator did not do so. This is entirely
redundant in our case, since FirstBurstLength is defined by RFC 3720
to be
"Irrelevant when: ( InitialR2T=Yes and ImmediateData=No )"
and we already enforce both InitialR2T=Yes and ImmediateData=No in our
initial proposal. However, LIO (arguably correctly) complains when we
do not respond to its redundant proposal of an already-irrelevant
value.
Fix by always proposing the default value for FirstBurstLength.
Debugged-by: Patrick Seeburger <info@8bit.de> Tested-by: Patrick Seeburger <info@8bit.de> Signed-off-by: Michael Brown <mcb30@ipxe.org>
Michael Brown [Mon, 1 May 2017 13:01:54 +0000 (14:01 +0100)]
[efi] Standardise PCI debug messages
Use the PCI bus:dev.fn address in debug messages, falling back to the
EFI handle name only if we do not yet have enough information to
determine the bus:dev.fn address.
Include the vendor and device IDs in debug messages when no suitable
driver is found, to match the diagnostics available in a BIOS
environment.
Michael Brown [Tue, 25 Apr 2017 13:13:22 +0000 (14:13 +0100)]
[hyperv] Cope with Windows Server 2016 enlightenments
An "enlightened" external bootloader (such as Windows Server 2016's
winload.exe) may take ownership of the Hyper-V connection before all
INT 13 operations have been completed. When this happens, all VMBus
devices are implicitly closed and we are left with a non-functional
network connection.
Detect when our Hyper-V connection has been lost (by checking the
SynIC message page MSR). Reclaim ownership of the Hyper-V connection
and reestablish any VMBus devices, without disrupting any existing
iPXE state (such as IPv4 settings attached to the network device).
Windows Server 2016 will not cleanly take ownership of an active
Hyper-V connection. Experimentation shows that we can quiesce by
resetting only the SynIC message page MSR; this results in a
successful SAN boot (on a Windows 2012 R2 physical host). Choose to
quiesce by resetting (almost) all MSRs, in the hope that this will be
more robust against corner cases such as a stray synthetic interrupt
occurring during the handover.
Michael Brown [Wed, 26 Apr 2017 19:02:23 +0000 (20:02 +0100)]
[block] Provide abstraction to allow system to be quiesced
When performing a SAN boot via INT 13, there is no way for the
operating system to indicate that it has finished using the INT 13 SAN
device. We therefore have no opportunity to clean up state before the
loaded operating system's native drivers take over. This can cause
problems when booting Windows, which tends not to be forgiving of
unexpected system state.
Windows will typically write a flag to the SAN device as the last
action before transferring control to the native drivers. We can use
this as a heuristic to bring the system to a quiescent state (without
performing a full shutdown); this provides us an opportunity to
temporarily clean up state that could otherwise prevent a successful
Windows boot.
Michael Brown [Sun, 16 Apr 2017 20:26:13 +0000 (21:26 +0100)]
[intel] Do not enable ASDE on i350 backplane NIC
On most Intel NICs, Auto-Speed Detection Enable (ASDE) can be used to
automatically detect the correct link speed by sampling the link using
the internal PHY. This feature is automatically inhibited when not
appropriate for the physical link (e.g. when using internal SerDes
mode on the 8254x).
On the i350 datasheet ASDE is a reserved bit, but the relevant
auto-speed detection hardware appears still to be present. However,
enabling ASDE on the i350 1000BASE-KX backplane NIC seems to cause an
immediate link failure. It is possible that the auto-speed detection
hardware is still present, is not connected to a physical link, and is
not inhibited from being applied in this mode.
Work around this problem by adding an INTEL_NO_ASDE flag bit
(analogous to INTEL_NO_PHY_RST), and applying this for the i350
backplane NIC.
Michael Brown [Fri, 14 Apr 2017 09:09:57 +0000 (10:09 +0100)]
[intel] Show original CTRL and STATUS values in debugging output
In situations where iPXE fails to reach link-up as expected, it is
useful to know the original values of the CTRL and STATUS registers
prior to our reset attempt.
Michael Brown [Wed, 12 Apr 2017 14:03:25 +0000 (15:03 +0100)]
[block] Allow use of a non-default EFI SAN boot filename
Some older operating systems (e.g. RHEL6) use a non-default filename
on the root disk and rely on setting an EFI variable to point to the
bootloader. This does not work when performing a SAN boot on a
machine where the EFI variable is not present.
Fix by allowing a non-default filename to be specified via the
"sanboot --filename" option or the "san-filename" setting. For
example:
Adamczyk, Konrad [Thu, 30 Mar 2017 13:54:59 +0000 (13:54 +0000)]
[thunderx] Use ThunderxConfigProtocol to obtain board configuration
Following changes were introduced:
- added GetBgxProp and GetLmacProp methods to ThunderxConfigProtocol
- replaced direct BOARD_CFG access with usage of introduced methods
- removed redundant BOARD_CFG
- changed GUID of ThunderxConfigProtocol, as this is not compatible
with previous version
- changed UINTN* to UINT64* buffer type to fix issue on 32-bit
platforms with MAC address
This change allows us to avoid alignment of BOARD_CFG definitions
every time it changes in UEFI.
Signed-off-by: Konrad Adamczyk <konrad.adamczyk@cavium.com> Modified-by: Michael Brown <mcb30@ipxe.org> Signed-off-by: Michael Brown <mcb30@ipxe.org>
Michael Brown [Wed, 29 Mar 2017 09:29:44 +0000 (12:29 +0300)]
[scsi] Retry TEST UNIT READY command
The TEST UNIT READY command is issued automatically when the device is
opened, and is not the result of a command being issued by the caller.
This is required in order that a permanent TEST UNIT READY failure can
be used to identify unusable paths in a multipath SAN device.
Since the TEST UNIT READY command is not part of the caller's command
issuing process, it is not covered by any external retry loops (such
as the main retry loop in sandev_command()).
We must therefore be prepared to retry the TEST UNIT READY command
within the SCSI layer itself. We retry only the TEST UNIT READY
command so as not to multiply the number of potential retries for
normal commands (which are already retried by sandev_command()).
Michael Brown [Tue, 28 Mar 2017 20:37:03 +0000 (23:37 +0300)]
[http] Notify data transfer interface when underlying connection is ready
HTTP implements xfer_window_changed() on the underlying server
connection using http_step(), which does not propagate the window
change notification to the data transfer interface. This breaks the
multipath-capable SAN boot code, which relies on the window change
notification to discover that the HTTP block device is ready for
commands to be issued.
Fix by sending xfer_window_changed() in http_step() once the
underlying connection has been determined to be ready.
Michael Brown [Mon, 27 Mar 2017 15:20:34 +0000 (18:20 +0300)]
[block] Describe all SAN devices via ACPI tables
Describe all SAN devices via ACPI tables such as the iBFT. For tables
that can describe only a single device (i.e. the aBFT and sBFT), one
table is installed per device. For multi-device tables (i.e. the
iBFT), all devices are described in a single table.
An underlying SAN device connection may be closed at the time that we
need to construct an ACPI table. We therefore introduce the concept
of an "ACPI descriptor" which enables the SAN boot code to maintain an
opaque pointer to the underlying object, and an "ACPI model" which can
build tables from a list of such descriptors. This separates the
lifecycles of ACPI descriptions from the lifecycles of the block
device interfaces, and allows for construction of the ACPI tables even
if the block device interface has been closed.
For a multipath SAN device, iPXE will wait until sufficient
information is available to describe all devices but will not wait for
all paths to connect successfully. For example: with a multipath
iSCSI boot iPXE will wait until at least one path has become available
and name resolution has completed on all other paths. We do this
since the iBFT has to include IP addresses rather than DNS names. We
will commence booting without waiting for the inactive paths to either
become available or close; this avoids unnecessary boot delays.
Note that the Linux kernel will refuse to accept an iBFT with more
than two NIC or target structures. We therefore describe only the
NICs that are actually required in order to reach the described
targets. Any iBFT with at most two targets is therefore guaranteed to
describe at most two NICs.
For some block device protocols, the active path may continue to
receive xfer_window_changed() notifications during normal use. These
currently result in the active path being erroneously closed.
Fix by ignoring any xfer_window_changed() messages if this path is
already the active path.
Michael Brown [Mon, 27 Mar 2017 10:18:14 +0000 (13:18 +0300)]
[block] Retry reopening indefinitely for multipath devices
For multipath SAN devices, verify that the device is capable of being
opened (i.e. that all URIs are parseable and that at least one path is
alive) and thereafter retry indefinitely to reopen the device as
needed.
Michael Brown [Mon, 27 Mar 2017 12:32:29 +0000 (15:32 +0300)]
[block] Add a small delay between attempts to reopen SAN targets
When all SAN targets are completely unreachable, there will be a
natural delay between reopening attempts due to the network connection
timeout on the unreachable targets.
However, some SAN targets may accept connections instantly and report
a temporary unavailability by e.g. failing the TEST UNIT READY
command. If all targets are behaving this way then there will be no
natural delay, and we will attempt to saturate the network with
connection attempts.
Fix by introducing a small delay between attempts.
Michael Brown [Mon, 27 Mar 2017 10:06:16 +0000 (13:06 +0300)]
[block] Allow SAN retry count to be reconfigured
Allow the SAN retry count to be configured via the ${san-retry}
setting, defaulting to the current value of 10 retries if not
specified.
Note that setting a retry count of zero is inadvisable, since iSCSI
targets in particular will often report spurious errors such as "power
on occurred" for the first few commands.
Michael Brown [Mon, 27 Mar 2017 07:50:59 +0000 (10:50 +0300)]
[int13con] Avoid overwriting random portions of SAN boot disks
The INT13 console type (CONSOLE_INT13) autodetects at initialisation
time a magic partition to be used for logging iPXE console output. If
the INT13 drive number mapping is subsequently changed (e.g. because
iPXE was used to perform a SAN boot), then the console logging output
will be written to the incorrect disk.
Fix by recording the INT13 vector at initialisation time, and using
this original vector to emulate INT13 calls for all subsequent
accesses. This should be robust against drive remapping performed
either by ourselves or by another bootloader (e.g. a chainloaded
undionly.kpxe which then performs a SAN boot).
Michael Brown [Sun, 26 Mar 2017 18:03:50 +0000 (21:03 +0300)]
[int13] Improve geometry guessing for unaligned partitions
Some partition tables have partitions that are not aligned to a
cylinder boundary, which confuses the current geometry guessing logic.
Enhance the existing logic to ensure that we never reduce our guesses
for the number of heads or sectors per track, and add extra logic to
calculate the exact number of sectors per track if we find a partition
that starts within cylinder zero.
Michael Brown [Sun, 26 Mar 2017 12:12:11 +0000 (15:12 +0300)]
[block] Add basic multipath support
Add basic support for multipath block devices. The "sanboot" and
"sanhook" commands now accept a list of SAN URIs. We open all URIs
concurrently. The first connection to become available for issuing
block device commands is marked as the active path and used for all
subsequent commands; all other connections are then closed. Whenever
the active path fails, we reopen all URIs and repeat the process.
Michael Brown [Sun, 26 Mar 2017 12:42:52 +0000 (15:42 +0300)]
[block] Add dummy SAN device
Add a dummy SAN device which allows the "sanhook" command to be tested
even when no SAN booting capability is present on the platform. This
allows substantial portions of the SAN boot code to be run in Linux
under Valgrind.
Michael Brown [Sun, 26 Mar 2017 08:21:14 +0000 (11:21 +0300)]
[scsi] Avoid duplicate call to scsicmd_close() on TEST UNIT READY failure
When the TEST UNIT READY command receives an error response, the
shutdown of the command's block data interface will result in
scsidev_ready() closing the SCSI device. This will subsequently
result in a duplicate call to scsicmd_close(), leading to an assertion
failure when list_del() is called for the second time.
Fix by removing the command from the list of outstanding commands
before shutting down the command's interfaces.
Michael Brown [Thu, 23 Mar 2017 16:15:24 +0000 (18:15 +0200)]
[iobuf] Increase minimum I/O buffer size to 128 bytes
The eIPoIB translation layer needs to translate outbound ARP packets
from Ethernet to IPoIB. A 64-byte buffer (starting with the Ethernet
header) does not provide enough tailroom to expand to hold the two
20-byte IPoIB MAC addresses. The result is that an UNDI API user will
be unable to send ARP packets.
We could potentially shuffle the packet contents to reuse the space
occupied by the stripped Ethernet link-layer header, but this would
add complexity. Instead, fix by increasing the minimum allocation
size to 128 bytes.
Mike McCormack [Thu, 23 Mar 2017 15:54:03 +0000 (17:54 +0200)]
[sky2] Use 32-bit read to read Y2_VAUX_AVAIL
B0_CTST is a 24bit register according to the vendor driver (sk98lin).
A 16bit read on B0_CTST will always return 0 for Y2_VAUX_AVAIL
(1<<16), so use a 32bit read when testing Y2_VAUX_AVAIL.
[This patch is copied directly from the Linux kernel tree.]
Signed-off-by: Mike McCormack <mikem@ring3k.org> Signed-off-by: Michael Brown <mcb30@ipxe.org>