Michael Brown [Wed, 12 Apr 2017 14:03:25 +0000 (15:03 +0100)]
[block] Allow use of a non-default EFI SAN boot filename
Some older operating systems (e.g. RHEL6) use a non-default filename
on the root disk and rely on setting an EFI variable to point to the
bootloader. This does not work when performing a SAN boot on a
machine where the EFI variable is not present.
Fix by allowing a non-default filename to be specified via the
"sanboot --filename" option or the "san-filename" setting. For
example:
Adamczyk, Konrad [Thu, 30 Mar 2017 13:54:59 +0000 (13:54 +0000)]
[thunderx] Use ThunderxConfigProtocol to obtain board configuration
Following changes were introduced:
- added GetBgxProp and GetLmacProp methods to ThunderxConfigProtocol
- replaced direct BOARD_CFG access with usage of introduced methods
- removed redundant BOARD_CFG
- changed GUID of ThunderxConfigProtocol, as this is not compatible
with previous version
- changed UINTN* to UINT64* buffer type to fix issue on 32-bit
platforms with MAC address
This change allows us to avoid alignment of BOARD_CFG definitions
every time it changes in UEFI.
Signed-off-by: Konrad Adamczyk <konrad.adamczyk@cavium.com> Modified-by: Michael Brown <mcb30@ipxe.org> Signed-off-by: Michael Brown <mcb30@ipxe.org>
Michael Brown [Wed, 29 Mar 2017 09:29:44 +0000 (12:29 +0300)]
[scsi] Retry TEST UNIT READY command
The TEST UNIT READY command is issued automatically when the device is
opened, and is not the result of a command being issued by the caller.
This is required in order that a permanent TEST UNIT READY failure can
be used to identify unusable paths in a multipath SAN device.
Since the TEST UNIT READY command is not part of the caller's command
issuing process, it is not covered by any external retry loops (such
as the main retry loop in sandev_command()).
We must therefore be prepared to retry the TEST UNIT READY command
within the SCSI layer itself. We retry only the TEST UNIT READY
command so as not to multiply the number of potential retries for
normal commands (which are already retried by sandev_command()).
Michael Brown [Tue, 28 Mar 2017 20:37:03 +0000 (23:37 +0300)]
[http] Notify data transfer interface when underlying connection is ready
HTTP implements xfer_window_changed() on the underlying server
connection using http_step(), which does not propagate the window
change notification to the data transfer interface. This breaks the
multipath-capable SAN boot code, which relies on the window change
notification to discover that the HTTP block device is ready for
commands to be issued.
Fix by sending xfer_window_changed() in http_step() once the
underlying connection has been determined to be ready.
Michael Brown [Mon, 27 Mar 2017 15:20:34 +0000 (18:20 +0300)]
[block] Describe all SAN devices via ACPI tables
Describe all SAN devices via ACPI tables such as the iBFT. For tables
that can describe only a single device (i.e. the aBFT and sBFT), one
table is installed per device. For multi-device tables (i.e. the
iBFT), all devices are described in a single table.
An underlying SAN device connection may be closed at the time that we
need to construct an ACPI table. We therefore introduce the concept
of an "ACPI descriptor" which enables the SAN boot code to maintain an
opaque pointer to the underlying object, and an "ACPI model" which can
build tables from a list of such descriptors. This separates the
lifecycles of ACPI descriptions from the lifecycles of the block
device interfaces, and allows for construction of the ACPI tables even
if the block device interface has been closed.
For a multipath SAN device, iPXE will wait until sufficient
information is available to describe all devices but will not wait for
all paths to connect successfully. For example: with a multipath
iSCSI boot iPXE will wait until at least one path has become available
and name resolution has completed on all other paths. We do this
since the iBFT has to include IP addresses rather than DNS names. We
will commence booting without waiting for the inactive paths to either
become available or close; this avoids unnecessary boot delays.
Note that the Linux kernel will refuse to accept an iBFT with more
than two NIC or target structures. We therefore describe only the
NICs that are actually required in order to reach the described
targets. Any iBFT with at most two targets is therefore guaranteed to
describe at most two NICs.
For some block device protocols, the active path may continue to
receive xfer_window_changed() notifications during normal use. These
currently result in the active path being erroneously closed.
Fix by ignoring any xfer_window_changed() messages if this path is
already the active path.
Michael Brown [Mon, 27 Mar 2017 10:18:14 +0000 (13:18 +0300)]
[block] Retry reopening indefinitely for multipath devices
For multipath SAN devices, verify that the device is capable of being
opened (i.e. that all URIs are parseable and that at least one path is
alive) and thereafter retry indefinitely to reopen the device as
needed.
Michael Brown [Mon, 27 Mar 2017 12:32:29 +0000 (15:32 +0300)]
[block] Add a small delay between attempts to reopen SAN targets
When all SAN targets are completely unreachable, there will be a
natural delay between reopening attempts due to the network connection
timeout on the unreachable targets.
However, some SAN targets may accept connections instantly and report
a temporary unavailability by e.g. failing the TEST UNIT READY
command. If all targets are behaving this way then there will be no
natural delay, and we will attempt to saturate the network with
connection attempts.
Fix by introducing a small delay between attempts.
Michael Brown [Mon, 27 Mar 2017 10:06:16 +0000 (13:06 +0300)]
[block] Allow SAN retry count to be reconfigured
Allow the SAN retry count to be configured via the ${san-retry}
setting, defaulting to the current value of 10 retries if not
specified.
Note that setting a retry count of zero is inadvisable, since iSCSI
targets in particular will often report spurious errors such as "power
on occurred" for the first few commands.
Michael Brown [Mon, 27 Mar 2017 07:50:59 +0000 (10:50 +0300)]
[int13con] Avoid overwriting random portions of SAN boot disks
The INT13 console type (CONSOLE_INT13) autodetects at initialisation
time a magic partition to be used for logging iPXE console output. If
the INT13 drive number mapping is subsequently changed (e.g. because
iPXE was used to perform a SAN boot), then the console logging output
will be written to the incorrect disk.
Fix by recording the INT13 vector at initialisation time, and using
this original vector to emulate INT13 calls for all subsequent
accesses. This should be robust against drive remapping performed
either by ourselves or by another bootloader (e.g. a chainloaded
undionly.kpxe which then performs a SAN boot).
Michael Brown [Sun, 26 Mar 2017 18:03:50 +0000 (21:03 +0300)]
[int13] Improve geometry guessing for unaligned partitions
Some partition tables have partitions that are not aligned to a
cylinder boundary, which confuses the current geometry guessing logic.
Enhance the existing logic to ensure that we never reduce our guesses
for the number of heads or sectors per track, and add extra logic to
calculate the exact number of sectors per track if we find a partition
that starts within cylinder zero.
Michael Brown [Sun, 26 Mar 2017 12:12:11 +0000 (15:12 +0300)]
[block] Add basic multipath support
Add basic support for multipath block devices. The "sanboot" and
"sanhook" commands now accept a list of SAN URIs. We open all URIs
concurrently. The first connection to become available for issuing
block device commands is marked as the active path and used for all
subsequent commands; all other connections are then closed. Whenever
the active path fails, we reopen all URIs and repeat the process.
Michael Brown [Sun, 26 Mar 2017 12:42:52 +0000 (15:42 +0300)]
[block] Add dummy SAN device
Add a dummy SAN device which allows the "sanhook" command to be tested
even when no SAN booting capability is present on the platform. This
allows substantial portions of the SAN boot code to be run in Linux
under Valgrind.
Michael Brown [Sun, 26 Mar 2017 08:21:14 +0000 (11:21 +0300)]
[scsi] Avoid duplicate call to scsicmd_close() on TEST UNIT READY failure
When the TEST UNIT READY command receives an error response, the
shutdown of the command's block data interface will result in
scsidev_ready() closing the SCSI device. This will subsequently
result in a duplicate call to scsicmd_close(), leading to an assertion
failure when list_del() is called for the second time.
Fix by removing the command from the list of outstanding commands
before shutting down the command's interfaces.
Michael Brown [Thu, 23 Mar 2017 16:15:24 +0000 (18:15 +0200)]
[iobuf] Increase minimum I/O buffer size to 128 bytes
The eIPoIB translation layer needs to translate outbound ARP packets
from Ethernet to IPoIB. A 64-byte buffer (starting with the Ethernet
header) does not provide enough tailroom to expand to hold the two
20-byte IPoIB MAC addresses. The result is that an UNDI API user will
be unable to send ARP packets.
We could potentially shuffle the packet contents to reuse the space
occupied by the stripped Ethernet link-layer header, but this would
add complexity. Instead, fix by increasing the minimum allocation
size to 128 bytes.
Mike McCormack [Thu, 23 Mar 2017 15:54:03 +0000 (17:54 +0200)]
[sky2] Use 32-bit read to read Y2_VAUX_AVAIL
B0_CTST is a 24bit register according to the vendor driver (sk98lin).
A 16bit read on B0_CTST will always return 0 for Y2_VAUX_AVAIL
(1<<16), so use a 32bit read when testing Y2_VAUX_AVAIL.
[This patch is copied directly from the Linux kernel tree.]
Signed-off-by: Mike McCormack <mikem@ring3k.org> Signed-off-by: Michael Brown <mcb30@ipxe.org>
Michael Brown [Thu, 23 Mar 2017 15:35:10 +0000 (17:35 +0200)]
[pcnet32] Eliminate redundant register read
The value of ( ( x & 0x0c00 ) | 0x0c00 ) is always 0x0c00 regardless
of the value of x, and so the read_csr() is redundant. (There are no
read side effects for this register, according to the datasheet.)
This line of code originated in Linux kernel 2.3.19pre1 as
In the absence of commit messages, the intention of the code is
unclear. However, the logic resulting in a fixed value of 0x0c00 has
remained unaltered for over 17 years, and can probably be assumed to
have the correct overall result.
Michael Brown [Wed, 22 Mar 2017 18:20:53 +0000 (20:20 +0200)]
[travis] Add minimal .travis.yml file
Allow for automated builds via Travis CI (https://travis-ci.org).
Note that the bin-i386-linux build platform is deliberately omitted
since the required linux-libc-dev:i386 package is not on the allowed
packages list for the Travis 14.04 ("trusty") container environment.
Michael Brown [Wed, 22 Mar 2017 14:46:03 +0000 (16:46 +0200)]
[coverity] Add Coverity user model
Add a trivial model file to prevent Coverity from making various
incorrect assumptions about functions where the iPXE behaviour
diverges from POSIX or Linux norms.
Michael Brown [Wed, 22 Mar 2017 12:41:01 +0000 (14:41 +0200)]
[xen] Use standard calling pattern for asprintf()
Our asprintf() implementation guarantees that strp will be NULL on
allocation failure, but this is not standard behaviour. Detect errors
by checking for a negative return value instead of a NULL pointer.
Michael Brown [Wed, 22 Mar 2017 12:11:19 +0000 (14:11 +0200)]
[pixbuf] Avoid potential division by zero
Avoid potential division by zero when performing the check against
multiplication overflow. (Note that if the width is zero then there
can be no overflow anyway, so it is then safe to bypass the check.)
Michael Brown [Wed, 22 Mar 2017 08:47:46 +0000 (10:47 +0200)]
[infiniband] Return status code from ib_create_cq() and ib_create_qp()
Any underlying errors arising during ib_create_cq() or ib_create_qp()
are lost since the functions simply return NULL on error. This makes
debugging harder, since a debug-enabled build is required to discover
the root cause of the error.
Fix by returning a status code from these functions, thereby allowing
any underlying errors to be propagated.
Michael Brown [Wed, 22 Mar 2017 06:19:33 +0000 (08:19 +0200)]
[build] Avoid confusing sparse in single-argument DBG() macros
For visual consistency with surrounding lines, the definitions of
DBG_MORE(), DBG_PAUSE(), etc include an unnecessary ##__VA_ARGS__
argument which is always elided. This confuses sparse, which
complains about DBG_MORE_IF() being called with more than one
argument.
Work around this problem by adding an unused variable argument list to
the single-argument macros DBG_MORE_IF() and DBG_PAUSE_IF().
Michael Brown [Tue, 21 Mar 2017 13:07:10 +0000 (15:07 +0200)]
[eoib] Avoid passing a NULL I/O buffer to netdev_tx_complete_err()
Report errors in eoib_duplicate() via netdev_tx_err() rather than
netdev_tx_complete_err(), since netdev_tx_complete_err() accepts only
valid I/O buffers that are currently in the network device's transmit
queue.
Michael Brown [Tue, 21 Mar 2017 09:46:17 +0000 (11:46 +0200)]
[arbel] Avoid potential integer overflow when calculating memory mappings
When the area to be mapped straddles the 2GB boundary, the expression
(high+size) will overflow on the first loop iteration. Fix by using
(end-size), which cannot underflow.
Michael Brown [Tue, 21 Mar 2017 09:46:17 +0000 (11:46 +0200)]
[hermon] Avoid potential integer overflow when calculating memory mappings
When the area to be mapped straddles the 2GB boundary, the expression
(high+size) will overflow on the first loop iteration. Fix by using
(end-size), which cannot underflow.
Michael Brown [Sun, 19 Mar 2017 15:57:24 +0000 (15:57 +0000)]
[undi] Move PXE API caller back into UNDI driver
As of commit 10d19bd ("[pxe] Always retrieve cached DHCPACK and apply
to relevant network device"), the UNDI driver has been the only user
of pxeparent_call(). Remove the unnecessary layer of abstraction by
refactoring this code back into undinet.c, and fix the ability of
undiisr.S to fall back to chaining to the original handler if we were
unable to unhook our own ISR.
This effectively reverts commit 337e1ed ("[pxe] Separate parent PXE
API caller from UNDINET driver").
Michael Brown [Sun, 19 Mar 2017 13:22:33 +0000 (13:22 +0000)]
[efi] Skip cable detection at initialisation where possible
We currently request cable detection in PXE_OPCODE_INITIALIZE to work
around buggy Emulex drivers (see commit c0b61ba ("[efi] Work around
bugs in Emulex NII driver")).
This causes problems with some other NII drivers (e.g. Mellanox),
which may time out if the underlying link is intrinsically slow to
come up.
Attempt to work around both problems simultaneously by requesting
cable detection only if the underlying NII driver does not support
link status reporting via PXE_OPCODE_GET_STATUS. (This is based on a
potentially incorrect assumption that the buggy Emulex drivers do not
claim to report link status via PXE_OPCODE_GET_STATUS.)
Michael Brown [Mon, 13 Mar 2017 12:18:46 +0000 (12:18 +0000)]
[efi] Provide ACPI table description for SAN devices
Provide a basic proof of concept ACPI table description (e.g. iBFT for
iSCSI) for SAN devices in a UEFI environment, using a control flow
that is functionally identical to that used in a BIOS environment.
Originally-implemented-by: Vishvananda Ishaya Abrams <vish.ishaya@oracle.com> Signed-off-by: Michael Brown <mcb30@ipxe.org>
Some iSCSI targets send NOP-In. Rather than closing the connection
when we receive one, it is more user friendly to log a debug message
and keep the connection open. Eventually, it would be nice if iPXE
supported replying to NOP-Ins, but we might as well keep the
connection open until the target disconnects us.
Modified-by: Michael Brown <mcb30@ipxe.org> Signed-off-by: Michael Brown <mcb30@ipxe.org>
Michael Brown [Thu, 9 Mar 2017 12:45:45 +0000 (12:45 +0000)]
[scsi] Avoid duplicate calls to scsicmd_close()
When a SCSI device is closed in error, the shutdown of the device's
block data interface will probably lead to any outstanding commands
being closed (by whichever object is currently connected to the block
data interface). However, commands remain in the list of outstanding
commands until the final reference is dropped. The result is that
scsidev_close() will make a second call to scsicmd_close() for each
command. This is harmless, but produces confusing debug messages.
Fix by treating the outstanding command list as holding an explicit
reference to each command, and removing the command from the list of
outstanding commands in scsicmd_close().
Michael Brown [Tue, 7 Mar 2017 16:11:22 +0000 (16:11 +0000)]
[block] Retry any SAN device operation
The SCSI layer currently implements a retry loop in order to retry
commands that fail due to spurious "error" conditions such as "power
on occurred". Move this retry loop to the generic SAN device layer:
this allow for retries due to other transient error conditions such as
an iSCSI target having dropped the connection due to inactivity.
Michael Brown [Mon, 6 Mar 2017 12:25:20 +0000 (12:25 +0000)]
[block] Centralise "san-drive" setting
The concept of the SAN drive number is meaningful only in a BIOS
environment, where it represents the INT13 drive number (0x80 for the
first hard disk). We retain this concept in a UEFI environment to
allow for a simple way for iPXE commands to refer to SAN drives.
Centralise the concept of the default drive number, since it is shared
between all supported environments.
It is required to reset BGX context state for the LMAC using
BGX_CMR_CONFIG register.
This solves problem with network connectivity in Linux booted from
iPXE.
Signed-off-by: Bartosz Szczepanek <bartosz.szczepanek@cavium.com> Modified-by: Michael Brown <mcb30@ipxe.org> Signed-off-by: Michael Brown <mcb30@ipxe.org>
Michael Brown [Thu, 2 Feb 2017 16:52:55 +0000 (16:52 +0000)]
[http] Cleanly shut down potentially looped interfaces
Use intfs_shutdown() and intfs_restart() to cleanly shut down multiple
interfaces that may loop back to the same object.
This fixes a regression introduced by commit daa8ed9 ("[interface]
Provide intf_reinit() to reinitialise nullified interfaces") which
broke the use of HTTP Basic and Digest authentication.
Reported-by: murmansk <murmansk@hotmail.com> Reported-by: Brett Waldo <brettwaldo@gmail.com> Signed-off-by: Michael Brown <mcb30@ipxe.org>
Michael Brown [Thu, 2 Feb 2017 15:49:21 +0000 (15:49 +0000)]
[interface] Provide the ability to shut down multiple interfaces
Shutting down (and optionally restarting) multiple interfaces is
fraught with problems if there are loops in the interface connectivity
(e.g. the HTTP content-decoded and transfer-decoded interfaces, which
will generally loop back to each other). Various workarounds
currently exist across the codebase, generally involving preceding
calls to intf_nullify() to avoid problems due to known loops.
Provide intfs_shutdown() and intfs_restart() to allow all of an
object's interfaces to be shut down (or restarted) in a single call,
without having to worry about potential external loops.
Michael Brown [Thu, 26 Jan 2017 11:39:25 +0000 (11:39 +0000)]
[settings] Add "unixtime" builtin setting to expose the current time
Expose the current wall-clock time (in seconds since the Epoch), since
this is often useful in captured boot logs and can also be useful when
checking unexpected X.509 certificate validation failures.
Use a :uint32 setting to avoid Y2K38 rollover, thereby ensuring that
this will eventually be somebody else's problem.
Originally-implemented-by: Malte zu Klampen <malte@pclab.ifg.uni-kiel.de> Originally-implemented-by: Richard Moore <rich@richud.com> Tested-by: Esben Storgaard Nielsen <esn@solar.dk> Signed-off-by: Christian Nilsson <nikize@gmail.com> Signed-off-by: Michael Brown <mcb30@ipxe.org>