Stephan Bosch [Thu, 15 Sep 2016 00:09:47 +0000 (02:09 +0200)]
lib-http: client: Unlink all queues from peer when it is disconnected.
Before, queues were only destroyed when the whole client was destroyed.
This change and subsequent changes prepare for being able to destroy a queue when it becomes unused.
Stephan Bosch [Wed, 14 Sep 2016 19:37:38 +0000 (21:37 +0200)]
lib-http: client: If a peer object is no longer linked to a queue, don't close it until all connections are inactive.
The peer object is canceled, rather than closed. Which means that any newly started and idle connections are closed immediately.
Requests may be pending though.
This is only relevant when hosts/queues are removed at some point.
This is a preparational change for having a maximum lifetime on hosts/queues, in which case this becomes a possibility.
Timo Sirainen [Tue, 1 Nov 2016 10:43:03 +0000 (12:43 +0200)]
lib-storage: Don't enable modseqs on STATUS_HIGHESTMODSEQ.
This is requested always by IMAP's SELECT command even when the IMAP client
hasn't requested it. We don't want to unnecessarily enable modseqs that use up
memory and disk space when they're not really needed. Other callers may also be
interested in asking for HIGHESTMODSEQ (which is nowadays actually always
available) without enabling full modseq tracking.
Aki Tuomi [Mon, 31 Oct 2016 13:37:00 +0000 (15:37 +0200)]
lib-test: Do not init/deinit library twice
If lib is initialized, do not initialize
or deinitialize library. This is done to
allow using master_service in unit tests
which also wants to perform lib init
and deinit itself.
Paul Howarth [Fri, 28 Oct 2016 14:10:16 +0000 (17:10 +0300)]
configure: Improve check for OpenSSL without EC support
The original test was for EC_KEY_new but some systems had that and not
EVP_PKEY_CTX_new_id, so the test was switched to that function.
However, Fedora releases 12 through 17 have EVP_PKEY_CTX_new_id but
not EC_KEY_new. So we need to test for both functions before enabling
the dcrypt build.
Timo Sirainen [Fri, 21 Oct 2016 11:34:47 +0000 (14:34 +0300)]
director: Fix shutdown_clients=no to not break
The director process must shut down even with with shutdown_clients=no.
Otherwise the two director processes will try to keep competing with each
others and log errors like:
director: Warning: Director 10.0.0.123:9090/right disconnected us with reason: Replacing with new incoming connection
director: Warning: Director 10.0.0.123:9090/right disconnected us with reason: Replacing with 10.0.0.124:9090
Timo Sirainen [Thu, 20 Oct 2016 21:25:20 +0000 (00:25 +0300)]
imap-hibernate: Fix "DONE" handling.
1. If only "DONE\r\n" was sent, it randomly failed with BAD because of
out-of-bounds buffer read.
2. If "DONE\r\n" was followed by a command tag but no space afterwards, we
kept waiting for the input to continue. But since the DONE was already sent,
we should break the IDLE already at that point without any further waiting.
Timo Sirainen [Wed, 26 Oct 2016 15:06:36 +0000 (18:06 +0300)]
*-login: Removed enforcing maximum calculated fd limit.
Just use the regular ulimit. Login process has become complicated enough
that counting the exact fd size isn't so easy anymore.
Also apparently this low fd limit is causing errors with new Linux kernels:
pop3-login: Error: fd_send(pop3, 18) failed: Too many references: cannot splice
Timo Sirainen [Sun, 16 Oct 2016 22:07:50 +0000 (01:07 +0300)]
director: Moved all user killing state to struct director_kill_context
This should make it a bit easier to understand the life time of user
killing. It also simplifies code by removing struct
director_user_kill_finish_ctx.
Finally, this already reduces memory usage with 32bit systems, and would
make it possible to reduce also on 64bit systems if timestamp is shrank to
31 bits and weak bit moved after it. I'm not sure if that would be better
for performance though. In any case it would provide free space for 4 extra
bytes if that were needed in future.
Timo Sirainen [Mon, 24 Oct 2016 21:22:20 +0000 (00:22 +0300)]
director: Prevent race conditions by adding USER_KILL_STATE_FLUSHING
In theory it's possible that a user is freed during a flush and added back
before flush is finished, possibly even being moved again. This check makes
sure that we don't finish such move unless we're actually at the correct
flushing state. (If there's another flush also running for the user it'll
be ignored.)
Timo Sirainen [Mon, 24 Oct 2016 21:13:23 +0000 (00:13 +0300)]
director: If user host conflict is detected, make sure new host is sent back.
USER-KICK-HASH was sent, but the sender didn't get back a USER reply with
the new host. This could have increased how long user's host differred in
directors.
Avoids repeating this error:
Error: User hash 2957018085 is being redirected to two hosts: 10.0.0.30 and 10.0.0.201 (old_ts=1477338836)
Timo Sirainen [Mon, 24 Oct 2016 19:41:25 +0000 (22:41 +0300)]
director: HOST-RESET-USERS moves users more slowly now.
By default only 100 users can be moved in parallel. This can be overridden
with HOST-RESET-USERS parameter.
This delaying is especially useful when director_flush_socket is used to
avoid huge floods to the script service. Even without the socket it's still
good for avoiding unnecessary load spikes when all users are kicked at once
and they reconnect back at the same time.
Timo Sirainen [Mon, 24 Oct 2016 19:22:28 +0000 (22:22 +0300)]
director: Fix sending up/down state in handshakes.
They were never sent, because HOSTs were sent before director had waited for
the remote to send its version number. So sender thought that the remote's
minor_version was too old and it didn't send the up/down state at all.
This caused errors like:
Warning: director(10.0.0.30:9090/left): Host 10.0.0.30 is being updated before previous update had finished (up -> down) - setting to state=down vhosts=100
Error: director(10.0.0.30:9090/left): Director 10.0.0.30 SYNC request hosts don't match us - resending hosts (seq=6, remote hosts_hash=262126213, my hosts_hash=2458934259)
Timo Sirainen [Mon, 24 Oct 2016 17:16:57 +0000 (20:16 +0300)]
director: Make sure IP address parsing works in DIRECTOR-ADD/REMOVE
We were passing the entire string through to net_addr2ip(). It seems that
inet_aton() stops at whitespace though, so this wasn't actually causing
errors at least on Linux.
Aki Tuomi [Tue, 25 Oct 2016 07:29:50 +0000 (10:29 +0300)]
imap-login: Skip NIL value in ID handler
NIL value can cause hard crash, depending what
the key is. For x-proxy-ttl, NIL will crash
on any system, x-originating-ip will crash on
some nss versions (e.g. centos 6.7).
Migitating factor here is that the NIL value is only
accepted from trusted network.
Timo Sirainen [Mon, 24 Oct 2016 10:08:47 +0000 (13:08 +0300)]
lib-http: Add ioloop and lock wait information to timeout messages.
It'll now log for example:
9007 Request timed out (Request sent 7.087 secs ago, 0.076 in other ioloops, 7.012 in locks, connected 7.087 secs ago)
Which points out that the problem wasn't really with the HTTP, but with
locking. This likely should be fixed in some way also in lib-http so that
it gives a bit of extra time for reading the request, but that's a separate
fix.
Timo Sirainen [Mon, 24 Oct 2016 09:17:44 +0000 (12:17 +0300)]
dict-client: Fix logging how much time was spent in other ioloops in slow lookup warnings.
The warning's idea is to show why the lookup could have been slow. We
differentiate between time spent in dict_wait() waiting only for the
dict result and time spent in other ioloops waiting for potentially other
things as well (and time spent waiting for locks during this time).
The previous code didn't work right when multiple ioloops were used, which
happened sometimes.
Also changed %u to %d just in case some calculation is wrong. It's nicer to
get a slightly negative value rather than a huge positive one.