device: handle broader range of errors in RoutineReceiveIncoming
RoutineReceiveIncoming exits immediately on net.ErrClosed,
but not on other errors. However, for errors that are known
to be permanent, such as syscall.EAFNOSUPPORT,
we may as well exit immediately instead of retrying.
This considerably speeds up the package device tests right now,
because the Bind sometimes (incorrectly) returns syscall.EAFNOSUPPORT
instead of net.ErrClosed.
Googlers have a habit of graffiting their name in TODO items that then
are never addressed, and other people won't go near those because
they're marked territory of another animal. I've been gradually cleaning
these up as I see them, but this commit just goes all the way and
removes the remaining stragglers.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
We're loosing our ownership of the port when bringing the device down,
which means another test process could reclaim it. Avoid this by
retrying for 4 seconds.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
device: make RoutineReadFromTUN keep encryption queue alive
RoutineReadFromTUN can trigger a call to SendStagedPackets.
SendStagedPackets attempts to protect against sending
on the encryption queue by checking peer.isRunning and device.isClosed.
However, those are subject to TOCTOU bugs.
If that happens, we get this:
goroutine 1254 [running]:
golang.zx2c4.com/wireguard/device.(*Peer).SendStagedPackets(0xc000798300)
.../wireguard-go/device/send.go:321 +0x125
golang.zx2c4.com/wireguard/device.(*Device).RoutineReadFromTUN(0xc000014780)
.../wireguard-go/device/send.go:271 +0x21c
created by golang.zx2c4.com/wireguard/device.NewDevice
.../wireguard-go/device/device.go:315 +0x298
Fix this with a simple, big hammer: Keep the encryption queue
alive as long as it might be written to.
The immediate motivation for this change is an observed deadlock.
1. A goroutine calls peer.Stop. That calls peer.queue.Lock().
2. Another goroutine is in RoutineSequentialReceiver.
It receives an elem from peer.queue.inbound.
3. The peer.Stop goroutine calls close(peer.queue.inbound),
close(peer.queue.outbound), and peer.stopping.Wait().
It blocks waiting for RoutineSequentialReceiver
and RoutineSequentialSender to exit.
4. The RoutineSequentialReceiver goroutine calls peer.SendStagedPackets().
SendStagedPackets attempts peer.queue.RLock().
That blocks forever because the peer.Stop
goroutine holds a write lock on that mutex.
A background motivation for this change is that it can be expensive
to have a mutex in the hot code path of RoutineSequential*.
The mutex was necessary to avoid attempting to send elems on a closed channel.
This commit removes that danger by never closing the channel.
Instead, we send a sentinel nil value on the channel to indicate
to the receiver that it should exit.
The only problem with this is that if the receiver exits,
we could write an elem into the channel which would never get received.
If it never gets received, it cannot get returned to the device pools.
To work around this, we use a finalizer. When the channel can be GC'd,
the finalizer drains any remaining elements from the channel and
restores them to the device pool.
After that change, peer.queue.RWMutex no longer makes sense where it is.
It is only used to prevent concurrent calls to Start and Stop.
Move it to a more sensible location and make it a plain sync.Mutex.
device: remove device.state.stopping from RoutineTUNEventReader
The TUN event reader does three things: Change MTU, device up, and device down.
Changing the MTU after the device is closed does no harm.
Device up and device down don't make sense after the device is closed,
but we can check that condition before proceeding with changeState.
There's thus no reason to block device.Close on RoutineTUNEventReader exiting.
This commit simplifies device state management.
It creates a single unified state variable and documents its semantics.
It also makes state changes more atomic.
As an example of the sort of bug that occurred due to non-atomic state changes,
the following sequence of events used to occur approximately every 2.5 million test runs:
* RoutineTUNEventReader received an EventDown event.
* It called device.Down, which called device.setUpDown.
* That set device.state.changing, but did not yet attempt to lock device.state.Mutex.
* Test completion called device.Close.
* device.Close locked device.state.Mutex.
* device.Close blocked on a call to device.state.stopping.Wait.
* device.setUpDown then attempted to lock device.state.Mutex and blocked.
Deadlock results. setUpDown cannot progress because device.state.Mutex is locked.
Until setUpDown returns, RoutineTUNEventReader cannot call device.state.stopping.Done.
Until device.state.stopping.Done gets called, device.state.stopping.Wait is blocked.
As long as device.state.stopping.Wait is blocked, device.state.Mutex cannot be unlocked.
This commit fixes that deadlock by holding device.state.mu
when checking that the device is not closed.
The leak test had rare flakes.
If a system goroutine started at just the wrong moment, you'd get a false positive.
Instead of looping until the goroutines look good and then checking,
exit completely as soon as the number of goroutines looks good.
Also, check more frequently, in an attempt to complete faster.
device: benchmark the waitpool to compare it to the prior channels
Here is the old implementation:
type WaitPool struct {
c chan interface{}
}
func NewWaitPool(max uint32, new func() interface{}) *WaitPool {
p := &WaitPool{c: make(chan interface{}, max)}
for i := uint32(0); i < max; i++ {
p.c <- new()
}
return p
}
device: use linked list for per-peer allowed-ip traversal
This makes the IpcGet method much faster.
We also refactor the traversal API to use a callback so that we don't
need to allocate at all. Avoiding allocations we do self-masking on
insertion, which in turn means that split intermediate nodes require a
copy of the bits.
benchmark old ns/op new ns/op delta
BenchmarkUAPIGet-16 3243 2659 -18.01%
benchmark old allocs new allocs delta
BenchmarkUAPIGet-16 35 30 -14.29%
benchmark old bytes new bytes delta
BenchmarkUAPIGet-16 1218 737 -39.49%
This benchmark is good, though it's only for a pair of peers, each with
only one allowedips. As this grows, the delta expands considerably.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
device: combine debug and info log levels into 'verbose'
There are very few cases, if any, in which a user only wants one of
these levels, so combine it into a single level.
While we're at it, reduce indirection on the loggers by using an empty
function rather than a nil function pointer. It's not like we have
retpolines anyway, and we were always calling through a function with a
branch prior, so this seems like a net gain.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
The primary, motivating change is to use a function instead
of a *log.Logger as the basic unit of logging.
Using functions provides a lot more flexibility for
people to bring their own logging system.
It also introduces logging helper methods on Device.
These reduce line noise at the call site.
They also allow for log functions to be nil;
when nil, instead of generating a log line and throwing it away,
we don't bother generating it at all.
This spares allocation and pointless work.
This is a breaking change, although the fix required
of clients is fairly straightforward.