]> git.ipfire.org Git - thirdparty/wireguard-go.git/log
thirdparty/wireguard-go.git
4 years agotun: windows: protect reads from closing
Jason A. Donenfeld [Tue, 27 Apr 2021 02:22:45 +0000 (22:22 -0400)] 
tun: windows: protect reads from closing

The code previously used the old errors channel for checking, rather
than the simpler boolean, which caused issues on shutdown, since the
errors channel was meaningless. However, looking at this exposed a more
basic problem: Close() and all the other functions that check the closed
boolean can race. So protect with a basic RW lock, to ensure that
Close() waits for all pending operations to complete.

Reported-by: Joshua Sjoding <joshua.sjoding@scjalliance.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agoconn: windows: do not error out when receiving UDP jumbogram
Jason A. Donenfeld [Tue, 27 Apr 2021 02:07:03 +0000 (22:07 -0400)] 
conn: windows: do not error out when receiving UDP jumbogram

If we receive a large UDP packet, don't return an error to receive.go,
which then terminates the receive loop. Instead, simply retry.

Considering Winsock's general finickiness, we might consider other
places where an attacker on the wire can generate error conditions like
this.

Reported-by: Sascha Dierberg <sascha.dierberg@gmail.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agoversion: bump snapshot 0.0.20210424
Jason A. Donenfeld [Sat, 24 Apr 2021 17:07:27 +0000 (13:07 -0400)] 
version: bump snapshot

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agotun: freebsd: avoid OOB writes
Jason A. Donenfeld [Mon, 19 Apr 2021 21:10:23 +0000 (15:10 -0600)] 
tun: freebsd: avoid OOB writes

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agotun: freebsd: become controlling process when reopening tun FD
Jason A. Donenfeld [Mon, 19 Apr 2021 21:01:36 +0000 (15:01 -0600)] 
tun: freebsd: become controlling process when reopening tun FD

When we pass the TUN FD to the child, we have to call TUNSIFPID;
otherwise when we close the device, we get a splat in dmesg.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agotun: freebsd: restructure and cleanup
Jason A. Donenfeld [Mon, 19 Apr 2021 20:54:59 +0000 (14:54 -0600)] 
tun: freebsd: restructure and cleanup

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agotun: freebsd: remove horrific hack for getting tunnel name
Jason A. Donenfeld [Mon, 19 Apr 2021 02:26:32 +0000 (20:26 -0600)] 
tun: freebsd: remove horrific hack for getting tunnel name

As of FreeBSD 12.1, there's TUNGIFNAME.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agotun: freebsd: set IFF_MULTICAST for routing daemons
Jason A. Donenfeld [Mon, 19 Apr 2021 02:09:04 +0000 (20:09 -0600)] 
tun: freebsd: set IFF_MULTICAST for routing daemons

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agomain: print kernel warning on OpenBSD and FreeBSD too
Jason A. Donenfeld [Fri, 16 Apr 2021 05:32:44 +0000 (23:32 -0600)] 
main: print kernel warning on OpenBSD and FreeBSD too

More kernels!

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agodevice: don't defer unlocking from loop
Jason A. Donenfeld [Mon, 12 Apr 2021 22:19:35 +0000 (16:19 -0600)] 
device: don't defer unlocking from loop

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agoconn: reconstruct v4 vs v6 receive function based on symtab
Jason A. Donenfeld [Fri, 9 Apr 2021 23:21:35 +0000 (17:21 -0600)] 
conn: reconstruct v4 vs v6 receive function based on symtab

This is kind of gross but it's better than the alternatives.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agodevice: allocate new buffer in receive death spiral
Kristupas Antanavičius [Mon, 12 Apr 2021 11:50:58 +0000 (14:50 +0300)] 
device: allocate new buffer in receive death spiral

Note: this bug is "hidden" by avoiding "death spiral" code path by
6228659 ("device: handle broader range of errors in RoutineReceiveIncoming").

If the code reached "death spiral" mechanism, there would be multiple
double frees happening. This results in a deadlock on iOS, because the
pools are fixed size and goroutine might stop until somebody makes
space in the pool.

This was almost 100% repro on the new ARM Macbooks:

- Build with 'ios' tag for Mac. This will enable bounded pools.
- Somehow call device.IpcSet at least couple of times (update config)
- device.BindUpdate() would be triggered
- RoutineReceiveIncoming would enter "death spiral".
- RoutineReceiveIncoming would stall on double free (pool is already
  full)
- The stuck routine would deadlock 'device.closeBindLocked()' function
  on line 'netc.stopping.Wait()'

Signed-off-by: Kristupas Antanavičius <kristupas.antanavicius@nordsec.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agoconn: windows: reset ring to starting position after free
Jason A. Donenfeld [Sat, 10 Apr 2021 00:08:48 +0000 (18:08 -0600)] 
conn: windows: reset ring to starting position after free

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agoconn: windows: compare head and tail properly
Jason A. Donenfeld [Fri, 9 Apr 2021 00:17:59 +0000 (18:17 -0600)] 
conn: windows: compare head and tail properly

By not comparing these with the modulo, the ring became nearly never
full, resulting in completion queue buffers filling up prematurely.

Reported-by: Joshua Sjoding <joshua.sjoding@scjalliance.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agowinrio: test that IOCP-based RIO is supported
Jason A. Donenfeld [Tue, 6 Apr 2021 17:45:10 +0000 (11:45 -0600)] 
winrio: test that IOCP-based RIO is supported

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agoall: make conn.Bind.Open return a slice of receive functions
Josh Bleecher Snyder [Wed, 31 Mar 2021 20:55:18 +0000 (13:55 -0700)] 
all: make conn.Bind.Open return a slice of receive functions

Instead of hard-coding exactly two sources from which
to receive packets (an IPv4 source and an IPv6 source),
allow the conn.Bind to specify a set of sources.

Beneficial consequences:

* If there's no IPv6 support on a system,
  conn.Bind.Open can choose not to return a receive function for it,
  which is simpler than tracking that state in the bind.
  This simplification removes existing data races from both
  conn.StdNetBind and bindtest.ChannelBind.
* If there are more than two sources on a system,
  the conn.Bind no longer needs to add a separate muxing layer.

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
4 years agoconn: winrio: pass key parameter into struct
Jason A. Donenfeld [Fri, 2 Apr 2021 16:36:41 +0000 (10:36 -0600)] 
conn: winrio: pass key parameter into struct

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agodevice: handle broader range of errors in RoutineReceiveIncoming
Josh Bleecher Snyder [Tue, 30 Mar 2021 19:36:59 +0000 (12:36 -0700)] 
device: handle broader range of errors in RoutineReceiveIncoming

RoutineReceiveIncoming exits immediately on net.ErrClosed,
but not on other errors. However, for errors that are known
to be permanent, such as syscall.EAFNOSUPPORT,
we may as well exit immediately instead of retrying.

This considerably speeds up the package device tests right now,
because the Bind sometimes (incorrectly) returns syscall.EAFNOSUPPORT
instead of net.ErrClosed.

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
4 years agoconn: document retry loop in StdNetBind.Open
Josh Bleecher Snyder [Mon, 29 Mar 2021 20:27:21 +0000 (13:27 -0700)] 
conn: document retry loop in StdNetBind.Open

It's not obvious on a first read what the loop is doing.

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
4 years agoconn: use local ipvN vars in StdNetBind.Open
Josh Bleecher Snyder [Mon, 29 Mar 2021 20:21:06 +0000 (13:21 -0700)] 
conn: use local ipvN vars in StdNetBind.Open

This makes it clearer that they are fresh on each attempt,
and avoids the bookkeeping required to clearing them on failure.

Also, remove an unnecessary err != nil.

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
4 years agoconn: unify code in StdNetBind.Send
Josh Bleecher Snyder [Mon, 29 Mar 2021 20:11:11 +0000 (13:11 -0700)] 
conn: unify code in StdNetBind.Send

The sending code is identical for ipv4 and ipv6;
select the conn, then use it.

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
4 years agodevice: rename unsafeCloseBind to closeBindLocked
Josh Bleecher Snyder [Mon, 29 Mar 2021 19:36:09 +0000 (12:36 -0700)] 
device: rename unsafeCloseBind to closeBindLocked

And document a bit.
This name is more idiomatic.

Signed-off-by: Josh Bleecher Snyder <josharian@gmail.com>
4 years agoversion: bump snapshot 0.0.20210323
Jason A. Donenfeld [Tue, 23 Mar 2021 19:07:19 +0000 (13:07 -0600)] 
version: bump snapshot

4 years agotun: freebsd: use broadcast mode instead of PPP mode
Jason A. Donenfeld [Tue, 23 Mar 2021 18:41:34 +0000 (12:41 -0600)] 
tun: freebsd: use broadcast mode instead of PPP mode

It makes the routing configuration simpler.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agodevice: signal to close device in separate routine
Jason A. Donenfeld [Thu, 11 Mar 2021 16:29:10 +0000 (09:29 -0700)] 
device: signal to close device in separate routine

Otherwise we wind up deadlocking.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agotun: linux: do not spam events every second from hack listener
Jason A. Donenfeld [Thu, 11 Mar 2021 16:23:11 +0000 (09:23 -0700)] 
tun: linux: do not spam events every second from hack listener

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agotun: freebsd: allow empty names
Kay Diam [Sun, 7 Mar 2021 16:21:31 +0000 (17:21 +0100)] 
tun: freebsd: allow empty names

This change allows omitting the tun interface name setting. When the
name is not set, the kernel automatically picks up the tun name and
index.

Signed-off-by: Kay Diam <kay.diam@gmail.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agowinpipe: move syscalls into x/sys
Jason A. Donenfeld [Wed, 3 Mar 2021 11:26:59 +0000 (12:26 +0100)] 
winpipe: move syscalls into x/sys

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agomemmod: use resource functions from x/sys
Jason A. Donenfeld [Wed, 3 Mar 2021 14:05:19 +0000 (15:05 +0100)] 
memmod: use resource functions from x/sys

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agomemmod: do not use IsBadReadPtr
Jason A. Donenfeld [Wed, 3 Mar 2021 13:38:26 +0000 (14:38 +0100)] 
memmod: do not use IsBadReadPtr

It should be enough to check for the trailing zero name.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agoconn: linux: unexport mutex
Jason A. Donenfeld [Sat, 6 Mar 2021 16:20:46 +0000 (09:20 -0700)] 
conn: linux: unexport mutex

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agomod: bump x/sys
Jason A. Donenfeld [Fri, 5 Mar 2021 22:06:08 +0000 (15:06 -0700)] 
mod: bump x/sys

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agomod: rename COPYING to LICENSE
Jason A. Donenfeld [Sat, 6 Mar 2021 16:03:28 +0000 (09:03 -0700)] 
mod: rename COPYING to LICENSE

Otherwise the netstack module doesn't show up on the package site.

https://github.com/golang/go/issues/43817#issuecomment-764987580

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agotun/netstack: bump deps and api
Jason A. Donenfeld [Sat, 6 Mar 2021 15:21:18 +0000 (08:21 -0700)] 
tun/netstack: bump deps and api

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agodevice: get rid of peers.empty boolean in timersActive
Jason A. Donenfeld [Thu, 25 Feb 2021 11:28:53 +0000 (12:28 +0100)] 
device: get rid of peers.empty boolean in timersActive

There's no way for len(peers)==0 when a current peer has
isRunning==false.

This requires some struct reshuffling so that the uint64 pointer is
aligned.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agoconn: implement RIO for fast Windows UDP sockets
Jason A. Donenfeld [Mon, 22 Feb 2021 17:47:41 +0000 (18:47 +0100)] 
conn: implement RIO for fast Windows UDP sockets

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agoglobal: remove TODO name graffiti
Jason A. Donenfeld [Mon, 22 Feb 2021 14:43:08 +0000 (15:43 +0100)] 
global: remove TODO name graffiti

Googlers have a habit of graffiting their name in TODO items that then
are never addressed, and other people won't go near those because
they're marked territory of another animal. I've been gradually cleaning
these up as I see them, but this commit just goes all the way and
removes the remaining stragglers.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agodevice: test up/down using virtual conn
Jason A. Donenfeld [Mon, 22 Feb 2021 03:30:31 +0000 (04:30 +0100)] 
device: test up/down using virtual conn

This prevents port clashing bugs.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agodevice: cleanup unused test components
Jason A. Donenfeld [Mon, 22 Feb 2021 01:57:41 +0000 (02:57 +0100)] 
device: cleanup unused test components

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agoconn: make binds replacable
Jason A. Donenfeld [Mon, 22 Feb 2021 01:01:50 +0000 (02:01 +0100)] 
conn: make binds replacable

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agodevice: disable waitpool tests
Jason A. Donenfeld [Mon, 22 Feb 2021 14:12:03 +0000 (15:12 +0100)] 
device: disable waitpool tests

This code is stable, and the test is finicky, especially on high core
count systems, so just disable it.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agotun: make NativeTun.Close well behaved, not crash on double close
Brad Fitzpatrick [Thu, 18 Feb 2021 22:53:22 +0000 (14:53 -0800)] 
tun: make NativeTun.Close well behaved, not crash on double close

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
4 years agoREADME: bump document Go requirement to 1.16
Brad Fitzpatrick [Thu, 18 Feb 2021 22:42:04 +0000 (14:42 -0800)] 
README: bump document Go requirement to 1.16

Signed-off-by: Brad Fitzpatrick <bradfitz@tailscale.com>
4 years agoglobal: stop using ioutil
Jason A. Donenfeld [Wed, 17 Feb 2021 21:19:27 +0000 (22:19 +0100)] 
global: stop using ioutil

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agoconn: bump to 1.16 and get rid of NetErrClosed hack
Jason A. Donenfeld [Tue, 16 Feb 2021 20:05:25 +0000 (21:05 +0100)] 
conn: bump to 1.16 and get rid of NetErrClosed hack

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agoversion: bump snapshot 0.0.20210212
Jason A. Donenfeld [Fri, 12 Feb 2021 17:00:59 +0000 (18:00 +0100)] 
version: bump snapshot

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agodevice: remove old version file
Jason A. Donenfeld [Fri, 12 Feb 2021 16:59:50 +0000 (17:59 +0100)] 
device: remove old version file

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agogitignore: remove old hacks
Jason A. Donenfeld [Thu, 11 Feb 2021 14:48:56 +0000 (15:48 +0100)] 
gitignore: remove old hacks

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agodevice: use container/list instead of open coding it
Jason A. Donenfeld [Wed, 10 Feb 2021 17:19:11 +0000 (18:19 +0100)] 
device: use container/list instead of open coding it

This linked list implementation is awful, but maybe Go 2 will help
eventually, and at least we're not open coding the hlist any more.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agodevice: retry Up() in up/down test
Jason A. Donenfeld [Wed, 10 Feb 2021 00:01:37 +0000 (01:01 +0100)] 
device: retry Up() in up/down test

We're loosing our ownership of the port when bringing the device down,
which means another test process could reclaim it. Avoid this by
retrying for 4 seconds.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agoconn: close old fd before trying again
Jason A. Donenfeld [Tue, 9 Feb 2021 23:43:31 +0000 (00:43 +0100)] 
conn: close old fd before trying again

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agodevice: flush peer queues before starting device
Jason A. Donenfeld [Tue, 9 Feb 2021 23:39:28 +0000 (00:39 +0100)] 
device: flush peer queues before starting device

In case some old packets snuck in there before, this flushes before
starting afresh.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agodevice: create peer queues at peer creation time
Jason A. Donenfeld [Tue, 9 Feb 2021 23:21:12 +0000 (00:21 +0100)] 
device: create peer queues at peer creation time

Rather than racing with Start(), since we're never destroying these
queues, we just set the variables at creation time.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agodevice: return error from Up() and Down()
Jason A. Donenfeld [Tue, 9 Feb 2021 23:12:23 +0000 (00:12 +0100)] 
device: return error from Up() and Down()

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agorwcancel: add an explicit close call
Jason A. Donenfeld [Tue, 9 Feb 2021 19:18:21 +0000 (20:18 +0100)] 
rwcancel: add an explicit close call

This lets us collect FDs even if the GC doesn't do it for us.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agorwcancel: use errors.Is for unwrapping
Jason A. Donenfeld [Tue, 9 Feb 2021 18:54:00 +0000 (19:54 +0100)] 
rwcancel: use errors.Is for unwrapping

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agotun: use errors.Is for unwrapping
Jason A. Donenfeld [Tue, 9 Feb 2021 18:48:27 +0000 (19:48 +0100)] 
tun: use errors.Is for unwrapping

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agoconn: use errors.Is for unwrapping
Jason A. Donenfeld [Tue, 9 Feb 2021 18:46:57 +0000 (19:46 +0100)] 
conn: use errors.Is for unwrapping

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agodevice: handshake routine writes into encryption queue
Jason A. Donenfeld [Tue, 9 Feb 2021 18:26:45 +0000 (19:26 +0100)] 
device: handshake routine writes into encryption queue

Since RoutineHandshake calls peer.SendKeepalive(), it potentially is a
writer into the encryption queue, so we need to bump the wg count.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agodevice: make RoutineReadFromTUN keep encryption queue alive
Josh Bleecher Snyder [Tue, 9 Feb 2021 17:53:00 +0000 (09:53 -0800)] 
device: make RoutineReadFromTUN keep encryption queue alive

RoutineReadFromTUN can trigger a call to SendStagedPackets.
SendStagedPackets attempts to protect against sending
on the encryption queue by checking peer.isRunning and device.isClosed.
However, those are subject to TOCTOU bugs.

If that happens, we get this:

goroutine 1254 [running]:
golang.zx2c4.com/wireguard/device.(*Peer).SendStagedPackets(0xc000798300)
        .../wireguard-go/device/send.go:321 +0x125
golang.zx2c4.com/wireguard/device.(*Device).RoutineReadFromTUN(0xc000014780)
        .../wireguard-go/device/send.go:271 +0x21c
created by golang.zx2c4.com/wireguard/device.NewDevice
        .../wireguard-go/device/device.go:315 +0x298

Fix this with a simple, big hammer: Keep the encryption queue
alive as long as it might be written to.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
4 years agoconn: try harder to have v4 and v6 ports agree
Jason A. Donenfeld [Tue, 9 Feb 2021 17:45:12 +0000 (18:45 +0100)] 
conn: try harder to have v4 and v6 ports agree

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agodevice: only allocate peer queues once
Josh Bleecher Snyder [Tue, 9 Feb 2021 17:08:17 +0000 (09:08 -0800)] 
device: only allocate peer queues once

This serves two purposes.

First, it makes repeatedly stopping then starting a peer cheaper.
Second, it prevents a data race observed accessing the queues.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
4 years agodevice: clarify device.state.state docs (again)
Josh Bleecher Snyder [Tue, 9 Feb 2021 16:27:48 +0000 (08:27 -0800)] 
device: clarify device.state.state docs (again)

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
4 years agodevice: run fewer iterations in TestUpDown
Josh Bleecher Snyder [Tue, 9 Feb 2021 16:20:11 +0000 (08:20 -0800)] 
device: run fewer iterations in TestUpDown

The high iteration count was useful when TestUpDown
was the nexus of new bugs to investigate.

Now that it has stabilized, that's less valuable.
And it slows down running the tests and crowds out other tests.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
4 years agodevice: run fewer trials in TestWaitPool when race detector enabled
Josh Bleecher Snyder [Tue, 9 Feb 2021 16:18:47 +0000 (08:18 -0800)] 
device: run fewer trials in TestWaitPool when race detector enabled

On a many-core machine with the race detector enabled,
this test can take several minutes to complete.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
4 years agodevice: remove nil elem check in finalizers
Josh Bleecher Snyder [Tue, 9 Feb 2021 16:15:21 +0000 (08:15 -0800)] 
device: remove nil elem check in finalizers

This is not necessary, and removing it speeds up detection of UAF bugs.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
4 years agodevice: rename unsafeRemovePeer to removePeerLocked
Jason A. Donenfeld [Tue, 9 Feb 2021 15:11:33 +0000 (16:11 +0100)] 
device: rename unsafeRemovePeer to removePeerLocked

This matches the new naming scheme of upLocked and downLocked.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agodevice: remove deviceStateNew
Jason A. Donenfeld [Tue, 9 Feb 2021 14:39:19 +0000 (15:39 +0100)] 
device: remove deviceStateNew

It's never used and we won't have a use for it. Also, move to go-running
stringer, for those without GOPATHs.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agodevice: fix comment typo and shorten state.mu.Lock to state.Lock
Jason A. Donenfeld [Tue, 9 Feb 2021 14:35:43 +0000 (15:35 +0100)] 
device: fix comment typo and shorten state.mu.Lock to state.Lock

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agodevice: fix typo in comment
Jason A. Donenfeld [Tue, 9 Feb 2021 14:32:55 +0000 (15:32 +0100)] 
device: fix typo in comment

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agodevice: fix alignment on 32-bit machines and test for it
Jason A. Donenfeld [Tue, 9 Feb 2021 14:30:32 +0000 (15:30 +0100)] 
device: fix alignment on 32-bit machines and test for it

The test previously checked the offset within a substruct, not the
offset within the allocated struct, so this adds the two together.

It then fixes an alignment crash on 32-bit machines.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agodevice: do not log on idempotent device state change
Jason A. Donenfeld [Tue, 9 Feb 2021 14:25:43 +0000 (15:25 +0100)] 
device: do not log on idempotent device state change

Part of being actually idempotent is that we shouldn't penalize code
that takes advantage of this property with a log splat.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agodevice: do not attach finalizer to non-returned object
Jason A. Donenfeld [Tue, 9 Feb 2021 14:09:50 +0000 (15:09 +0100)] 
device: do not attach finalizer to non-returned object

Before, the code attached a finalizer to an object that wasn't returned,
resulting in immediate garbage collection. Instead return the actual
pointer.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agodevice: lock elem in autodraining queue before freeing
Jason A. Donenfeld [Tue, 9 Feb 2021 14:00:59 +0000 (15:00 +0100)] 
device: lock elem in autodraining queue before freeing

Without this, we wind up freeing packets that the encryption/decryption
queues still have, resulting in a UaF.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agodevice: remove listen port race in tests
Jason A. Donenfeld [Mon, 8 Feb 2021 23:59:39 +0000 (00:59 +0100)] 
device: remove listen port race in tests

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agodevice: generate test keys on the fly
Jason A. Donenfeld [Mon, 8 Feb 2021 23:33:18 +0000 (00:33 +0100)] 
device: generate test keys on the fly

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agodevice: remove mutex from Peer send/receive
Josh Bleecher Snyder [Mon, 8 Feb 2021 21:02:52 +0000 (13:02 -0800)] 
device: remove mutex from Peer send/receive

The immediate motivation for this change is an observed deadlock.

1. A goroutine calls peer.Stop. That calls peer.queue.Lock().
2. Another goroutine is in RoutineSequentialReceiver.
   It receives an elem from peer.queue.inbound.
3. The peer.Stop goroutine calls close(peer.queue.inbound),
   close(peer.queue.outbound), and peer.stopping.Wait().
   It blocks waiting for RoutineSequentialReceiver
   and RoutineSequentialSender to exit.
4. The RoutineSequentialReceiver goroutine calls peer.SendStagedPackets().
   SendStagedPackets attempts peer.queue.RLock().
   That blocks forever because the peer.Stop
   goroutine holds a write lock on that mutex.

A background motivation for this change is that it can be expensive
to have a mutex in the hot code path of RoutineSequential*.

The mutex was necessary to avoid attempting to send elems on a closed channel.
This commit removes that danger by never closing the channel.
Instead, we send a sentinel nil value on the channel to indicate
to the receiver that it should exit.

The only problem with this is that if the receiver exits,
we could write an elem into the channel which would never get received.
If it never gets received, it cannot get returned to the device pools.

To work around this, we use a finalizer. When the channel can be GC'd,
the finalizer drains any remaining elements from the channel and
restores them to the device pool.

After that change, peer.queue.RWMutex no longer makes sense where it is.
It is only used to prevent concurrent calls to Start and Stop.
Move it to a more sensible location and make it a plain sync.Mutex.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
4 years agodevice: create channels.go
Josh Bleecher Snyder [Mon, 8 Feb 2021 20:38:19 +0000 (12:38 -0800)] 
device: create channels.go

We have a bunch of stupid channel tricks, and I'm about to add more.
Give them their own file. This commit is 100% code movement.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
4 years agodevice: print direction when ping transit fails
Josh Bleecher Snyder [Mon, 8 Feb 2021 19:36:55 +0000 (11:36 -0800)] 
device: print direction when ping transit fails

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
4 years agodevice: separate timersInit from timersStart
Josh Bleecher Snyder [Mon, 8 Feb 2021 18:01:35 +0000 (10:01 -0800)] 
device: separate timersInit from timersStart

timersInit sets up the timers.
It need only be done once per peer.

timersStart does the work to prepare the timers
for a newly running peer. It needs to be done
every time a peer starts.

Separate the two and call them in the appropriate places.
This prevents data races on the peer's timers fields
when starting and stopping peers.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
4 years agodevice: don't track device interface state in RoutineTUNEventReader
Josh Bleecher Snyder [Thu, 21 Jan 2021 17:26:14 +0000 (09:26 -0800)] 
device: don't track device interface state in RoutineTUNEventReader

We already track this state elsewhere. No need to duplicate.
The cost of calling changeState is negligible.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
4 years agodevice: improve MTU change handling
Josh Bleecher Snyder [Thu, 21 Jan 2021 17:23:45 +0000 (09:23 -0800)] 
device: improve MTU change handling

The old code silently accepted negative MTUs.
It also set MTUs above the maximum.
It also had hard to follow deeply nested conditionals.

Add more paranoid handling,
and make the code more straight-line.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
4 years agodevice: remove device.state.stopping from RoutineTUNEventReader
Josh Bleecher Snyder [Mon, 8 Feb 2021 18:19:28 +0000 (10:19 -0800)] 
device: remove device.state.stopping from RoutineTUNEventReader

The TUN event reader does three things: Change MTU, device up, and device down.
Changing the MTU after the device is closed does no harm.
Device up and device down don't make sense after the device is closed,
but we can check that condition before proceeding with changeState.
There's thus no reason to block device.Close on RoutineTUNEventReader exiting.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
4 years agodevice: overhaul device state management
Josh Bleecher Snyder [Tue, 19 Jan 2021 17:02:16 +0000 (09:02 -0800)] 
device: overhaul device state management

This commit simplifies device state management.
It creates a single unified state variable and documents its semantics.

It also makes state changes more atomic.
As an example of the sort of bug that occurred due to non-atomic state changes,
the following sequence of events used to occur approximately every 2.5 million test runs:

* RoutineTUNEventReader received an EventDown event.
* It called device.Down, which called device.setUpDown.
* That set device.state.changing, but did not yet attempt to lock device.state.Mutex.
* Test completion called device.Close.
* device.Close locked device.state.Mutex.
* device.Close blocked on a call to device.state.stopping.Wait.
* device.setUpDown then attempted to lock device.state.Mutex and blocked.

Deadlock results. setUpDown cannot progress because device.state.Mutex is locked.
Until setUpDown returns, RoutineTUNEventReader cannot call device.state.stopping.Done.
Until device.state.stopping.Done gets called, device.state.stopping.Wait is blocked.
As long as device.state.stopping.Wait is blocked, device.state.Mutex cannot be unlocked.
This commit fixes that deadlock by holding device.state.mu
when checking that the device is not closed.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
4 years agodevice: remove unnecessary zeroing in peer.SendKeepalive
Josh Bleecher Snyder [Mon, 8 Feb 2021 17:21:31 +0000 (09:21 -0800)] 
device: remove unnecessary zeroing in peer.SendKeepalive

elem.packet is always already nil.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
4 years agodevice: remove device.state.stopping from RoutineHandshake
Josh Bleecher Snyder [Wed, 3 Feb 2021 00:14:54 +0000 (16:14 -0800)] 
device: remove device.state.stopping from RoutineHandshake

It is no longer necessary.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
4 years agodevice: remove device.state.stopping from RoutineDecryption
Josh Bleecher Snyder [Tue, 19 Jan 2021 19:10:05 +0000 (11:10 -0800)] 
device: remove device.state.stopping from RoutineDecryption

It is no longer necessary, as of 454de6f3e64abd2a7bf9201579cd92eea5280996
(device: use channel close to shut down and drain decryption channel).

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
4 years agomain: add back version file
Jason A. Donenfeld [Thu, 4 Feb 2021 14:33:04 +0000 (15:33 +0100)] 
main: add back version file

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agotai64n: add string representation for error messages
Jason A. Donenfeld [Wed, 3 Feb 2021 16:56:46 +0000 (17:56 +0100)] 
tai64n: add string representation for error messages

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agodevice: take peer handshake when reinitializing last sent handshake
Jason A. Donenfeld [Wed, 3 Feb 2021 16:52:31 +0000 (17:52 +0100)] 
device: take peer handshake when reinitializing last sent handshake

This papers over other unrelated races, unfortunately.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agodevice: fix goroutine leak test
Josh Bleecher Snyder [Wed, 3 Feb 2021 16:26:27 +0000 (08:26 -0800)] 
device: fix goroutine leak test

The leak test had rare flakes.
If a system goroutine started at just the wrong moment, you'd get a false positive.
Instead of looping until the goroutines look good and then checking,
exit completely as soon as the number of goroutines looks good.
Also, check more frequently, in an attempt to complete faster.

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
4 years agodevice: add up/down stress test
Jason A. Donenfeld [Wed, 3 Feb 2021 16:43:41 +0000 (17:43 +0100)] 
device: add up/down stress test

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agodevice: pass cfg strings around in tests instead of reader
Jason A. Donenfeld [Wed, 3 Feb 2021 16:29:01 +0000 (17:29 +0100)] 
device: pass cfg strings around in tests instead of reader

This makes it easier to tag things onto the end manually for quick hacks.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agodevice: benchmark the waitpool to compare it to the prior channels
Jason A. Donenfeld [Wed, 3 Feb 2021 15:54:45 +0000 (16:54 +0100)] 
device: benchmark the waitpool to compare it to the prior channels

Here is the old implementation:

    type WaitPool struct {
        c chan interface{}
    }

    func NewWaitPool(max uint32, new func() interface{}) *WaitPool {
        p := &WaitPool{c: make(chan interface{}, max)}
        for i := uint32(0); i < max; i++ {
            p.c <- new()
        }
        return p
    }

    func (p *WaitPool) Get() interface{} {
        return <- p.c
    }

    func (p *WaitPool) Put(x interface{}) {
        p.c <- x
    }

It performs worse than the new one:

    name         old time/op  new time/op  delta
    WaitPool-16  16.4µs ± 5%  15.1µs ± 3%  -7.86%  (p=0.008 n=5+5)

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agodevice: test that we do not leak goroutines
Josh Bleecher Snyder [Tue, 2 Feb 2021 18:41:20 +0000 (10:41 -0800)] 
device: test that we do not leak goroutines

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
4 years agodevice: tie encryption queue lifetime to the peers that write to it
Josh Bleecher Snyder [Tue, 2 Feb 2021 18:46:34 +0000 (10:46 -0800)] 
device: tie encryption queue lifetime to the peers that write to it

Signed-off-by: Josh Bleecher Snyder <josh@tailscale.com>
4 years agodevice: use a waiting sync.Pool instead of a channel
Jason A. Donenfeld [Tue, 2 Feb 2021 17:37:49 +0000 (18:37 +0100)] 
device: use a waiting sync.Pool instead of a channel

Channels are FIFO which means we have guaranteed cache misses.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agodevice: reduce number of append calls when padding
Jason A. Donenfeld [Fri, 29 Jan 2021 19:10:48 +0000 (20:10 +0100)] 
device: reduce number of append calls when padding

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agodevice: use int64 instead of atomic.Value for time stamp
Jason A. Donenfeld [Fri, 29 Jan 2021 17:54:19 +0000 (18:54 +0100)] 
device: use int64 instead of atomic.Value for time stamp

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
4 years agodevice: use new model queues for handshakes
Jason A. Donenfeld [Fri, 29 Jan 2021 17:24:45 +0000 (18:24 +0100)] 
device: use new model queues for handshakes

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>