Timo Sirainen [Mon, 24 Oct 2016 21:22:20 +0000 (00:22 +0300)]
director: Prevent race conditions by adding USER_KILL_STATE_FLUSHING
In theory it's possible that a user is freed during a flush and added back
before flush is finished, possibly even being moved again. This check makes
sure that we don't finish such move unless we're actually at the correct
flushing state. (If there's another flush also running for the user it'll
be ignored.)
Timo Sirainen [Mon, 24 Oct 2016 21:13:23 +0000 (00:13 +0300)]
director: If user host conflict is detected, make sure new host is sent back.
USER-KICK-HASH was sent, but the sender didn't get back a USER reply with
the new host. This could have increased how long user's host differred in
directors.
Avoids repeating this error:
Error: User hash 2957018085 is being redirected to two hosts: 10.0.0.30 and 10.0.0.201 (old_ts=1477338836)
Timo Sirainen [Mon, 24 Oct 2016 19:41:25 +0000 (22:41 +0300)]
director: HOST-RESET-USERS moves users more slowly now.
By default only 100 users can be moved in parallel. This can be overridden
with HOST-RESET-USERS parameter.
This delaying is especially useful when director_flush_socket is used to
avoid huge floods to the script service. Even without the socket it's still
good for avoiding unnecessary load spikes when all users are kicked at once
and they reconnect back at the same time.
Timo Sirainen [Mon, 24 Oct 2016 19:22:28 +0000 (22:22 +0300)]
director: Fix sending up/down state in handshakes.
They were never sent, because HOSTs were sent before director had waited for
the remote to send its version number. So sender thought that the remote's
minor_version was too old and it didn't send the up/down state at all.
This caused errors like:
Warning: director(10.0.0.30:9090/left): Host 10.0.0.30 is being updated before previous update had finished (up -> down) - setting to state=down vhosts=100
Error: director(10.0.0.30:9090/left): Director 10.0.0.30 SYNC request hosts don't match us - resending hosts (seq=6, remote hosts_hash=262126213, my hosts_hash=2458934259)
Timo Sirainen [Mon, 24 Oct 2016 17:16:57 +0000 (20:16 +0300)]
director: Make sure IP address parsing works in DIRECTOR-ADD/REMOVE
We were passing the entire string through to net_addr2ip(). It seems that
inet_aton() stops at whitespace though, so this wasn't actually causing
errors at least on Linux.
Aki Tuomi [Tue, 25 Oct 2016 07:29:50 +0000 (10:29 +0300)]
imap-login: Skip NIL value in ID handler
NIL value can cause hard crash, depending what
the key is. For x-proxy-ttl, NIL will crash
on any system, x-originating-ip will crash on
some nss versions (e.g. centos 6.7).
Migitating factor here is that the NIL value is only
accepted from trusted network.
Timo Sirainen [Mon, 24 Oct 2016 10:08:47 +0000 (13:08 +0300)]
lib-http: Add ioloop and lock wait information to timeout messages.
It'll now log for example:
9007 Request timed out (Request sent 7.087 secs ago, 0.076 in other ioloops, 7.012 in locks, connected 7.087 secs ago)
Which points out that the problem wasn't really with the HTTP, but with
locking. This likely should be fixed in some way also in lib-http so that
it gives a bit of extra time for reading the request, but that's a separate
fix.
Timo Sirainen [Mon, 24 Oct 2016 09:17:44 +0000 (12:17 +0300)]
dict-client: Fix logging how much time was spent in other ioloops in slow lookup warnings.
The warning's idea is to show why the lookup could have been slow. We
differentiate between time spent in dict_wait() waiting only for the
dict result and time spent in other ioloops waiting for potentially other
things as well (and time spent waiting for locks during this time).
The previous code didn't work right when multiple ioloops were used, which
happened sometimes.
Also changed %u to %d just in case some calculation is wrong. It's nicer to
get a slightly negative value rather than a huge positive one.
Aki Tuomi [Thu, 13 Oct 2016 13:11:48 +0000 (16:11 +0300)]
director: Support flush socket
This allows specifying an URI to execute
on user kill. It can be of form
exec:/path/to/bin, unix:/path/to/socket or
tcp:ip:port
The location is sent FLUSH username-hash
per killed user. You can execute some
action there, and you are expected to
return '+\nOK\n' as reply once you are
done.
Timo Sirainen [Fri, 21 Oct 2016 11:34:47 +0000 (14:34 +0300)]
director: Fix shutdown_clients=no to not break
The director process must shut down even with with shutdown_clients=no.
Otherwise the two director processes will try to keep competing with each
others and log errors like:
director: Warning: Director 10.0.0.123:9090/right disconnected us with reason: Replacing with new incoming connection
director: Warning: Director 10.0.0.123:9090/right disconnected us with reason: Replacing with 10.0.0.124:9090
Timo Sirainen [Thu, 20 Oct 2016 21:25:20 +0000 (00:25 +0300)]
imap-hibernate: Fix "DONE" handling.
1. If only "DONE\r\n" was sent, it randomly failed with BAD because of
out-of-bounds buffer read.
2. If "DONE\r\n" was followed by a command tag but no space afterwards, we
kept waiting for the input to continue. But since the DONE was already sent,
we should break the IDLE already at that point without any further waiting.
Timo Sirainen [Thu, 20 Oct 2016 14:45:44 +0000 (17:45 +0300)]
global: Replaced t_strsplit_tab() calls with t_strsplit_tabescaped()
This is useful especially in auth code to support LFs in extra fields.
Other pieces of code were also tab-escaping strings, but never unescaping
them. Usually it didn't matter, because nobody would use the escaped
characters. Still, the code wasn't exactly behaving correctly.
One downside to this change is that it's now possible to pass through TABs,
CRs and LFs through the various protocols. In theory this shouldn't cause
any problems, but combined with other bugs this could trigger some security
problems.
Timo Sirainen [Thu, 20 Oct 2016 07:11:53 +0000 (10:11 +0300)]
login proxy: Hanging outgoing SSL connections caused using already-freed memory
This mainly happened when login proxy closed the connection due to connect
timeout. The ssl-proxy still had a reference and existed for a longer time.
If SSL handshake still succeeded afterwards, it now called
login_proxy_ssl_handshaked(), which accessed the already-freed proxy and
likely crashed.
Fixed the ssl-client proxy code specifically. Alternatively ssl_proxy_free()
could be calling ssl_proxy_destroy() always, but since ssl-server side of
the code seems to have been working fine, I don't want to accidentally
break it.
Aki Tuomi [Wed, 19 Oct 2016 15:44:35 +0000 (18:44 +0300)]
lib: Add drop_setuid_root for restrict_access
drop_setuid_root, when set to true, will detect
and try to drop getuid()==0. This is done by
recovering current effective UID to set->uid
if set->uid == -1, and then doing seteuid(0).
It will also drop out any other extra privileges,
such as extra groups not requested for.
lib-storage: clean up mailbox_list_create to improve readability
There is no reason to use mailbox_list_driver_find() here instead of
mailbox_list_find_class() as (1) we do not need the index into the list
driver array, and (2) dealing with double-pointers is harder than regular
pointers.