git.ipfire.org Git - thirdparty/haproxy.git/log

MINOR: threads: add a MAX_THREADS define instead of LONGBITS

This one allows not to inflate some structures when threads are
disabled. Now struct global is 1.4 kB instead of 33 kB.

Should be backported to 1.8 for ease of backporting of upcoming
patches.

MINOR: global/threads: move cpu_map at the end of the global struct

The "thread" part is 32kB long, better move it at the end of the
structure since it's only used during initialization, to keep the
rest grouped together.

Should be backported to 1.8 to ease backporting of upcoming patches,
no functional impact.

MINOR: servers: Don't report duplicate dyncookies for disabled servers.

Especially with server-templates, it can happen servers starts with a
placeholder IP, in the disabled state. In this case, we don't want to report
that the same cookie was generated for multiple servers. So defer the test
until the server is enabled.

This should be backported to 1.8.

BUG/MEDIUM: peers: fix expire date wasn't updated if entry is modified remotely.

The stktable_touch_remote considers the expire field stored in the stksess
struct.
The expire field was updated on the a newly created stksess to store.

But if the stksess with a same key is still present the expire was not updated.

This patch postpones the update of the expire field of the stksess just before
processing the "touch".

These bug was introduced in commit:

MEDIUM: threads/stick-tables: handle multithreads on stick tables.

And the fix should be backported on 1.8.

MINOR: sample: add date_us sample

Add date_us sample that returns the microsecond part of the timeval
structure representing the date of the structure. The "second" part of
the timeval can already be fetched by the "date" sample

BUG/MINOR: poll: too large size allocation for FD events

Commit 80da05a ("MEDIUM: poll: do not use FD_* macros anymore") which
appeared in 1.5-dev18 and which was backported to 1.4.23 made explicit
use of arrays of FDs mapped to unsigned ints. The problem lies in the
allocated size for poll(), as the resulting size is in bits and not
bytes, resulting in poll() arrays being 8 times larger than necessary!

In practice poll() is not used on highly loaded systems, explaining why
nobody noticed. But it definetely has to be addressed.

This fix needs to be backported to all stable versions.

CONTRIB: debug: fix a few flags definitions

Commit f4cfcf9 ("MINOR: debug/flags: Add missing flags") added a number
of missing flags but a few of them were incorrect, hiding real values.
This can be backported to 1.8.

DOC: clarify the scope of ssl_fc_is_resumed

Clarify that it's for incoming connections.

MINOR: spoe: Don't queue a SPOE context if nothing is sent

When some messages must be sent to an agent, the SPOE context of the stream is
queued to be handled by an SPOE applet. If there is no available applet, a new
one is created, thus opening a connection with the agent.

Since the support of ACLs on messages, some processing can now be discarded. So,
to avoid opening a connection for nothing, the SPOE context is now queued after
the messages encoding.

MINOR: spoe: add register-var-names directive in spoe-agent configuration

In addition to "option force-set-var", recently added, this directive can be
used to selectivelly register unknown variable names, without totally relaxing
their registration during the runtime, like "option force-set-var" does.

So there is no way for a malicious agent to exhaust memory by defining a too
high number of variable names. In other hand, you need to enumerate all
variable names. This could be painfull in some circumstances.

Remember, this directive is only usefull when the variable names are not
referenced anywhere in the HAProxy configuration or the SPOE one.

Thanks to Etienne Carrière for his help on this part.

BUG/MEDIUM: stream: properly handle client aborts during redispatch

James Mc Bride reported an interesting case affecting all versions since
at least 1.5 : if a client aborts a connection on an empty buffer at the
exact moment a server redispatch happens, the CF_SHUTW_NOW flag on the
channel is immediately turned into CF_SHUTW, which is not caught by
check_req_may_abort(), leading the redispatch to be performed anyway
with the channel marked as shut in both directions while the stream
interface correctly establishes. This situation makes no sense.
Ultimately the transfer times out and the server-side stream interface
remains in EST state while the client is in CLO state, and this case
doesn't correspond to anything we can handle in process_stream, leading
to poll() being woken up all the time without any progress being made.
And the session cannot even be killed from the CLI.

So we must ensure that check_req_may_abort() also considers the case
where the channel is already closed, which is what this patch does.
Thanks to James for providing detailed captures allowing to diagnose
the problem.

This fix must be backported to all maintained versions.

BUILD/MINOR: ancient gcc versions atomic fix

Commit 1a69af6d3892fe1946bb8babb3044d2d26afd46e introduced code
for atomic prior to 4.7. Unfortunately clang uses as well those
constants which is misleading.

MINOR: hathreads: add support for gcc < 4.7

Till now the use of __atomic_* gcc builtins required gcc >= 4.7. Since
some supported and quite common operating systems like CentOS 6 still
come with older versions (4.4) and the mapping to the older builtins
is reasonably simple, let's implement it.

This code is only used for gcc < 4.7. It has been quickly tested on a
machine using gcc 4.4.4 and provided expected results.

This patch should be backported to 1.8.

BUG/MEDIUM: mworker: execvp failure depending on argv[0]

The copy_argv() function lacks a check on '-' to remove the -x, -sf and
-st parameters.

When reloading a master process with a path starting by /st, /sf, or
/x.. the copy_argv() function skipped argv[0] leading to an execvp()
without the binary.

MINOR: dns: Handle SRV record weight correctly.

A SRV record weight can range from 0 to 65535, while haproxy weight goes
from 0 to 256, so we have to divide it by 256 before handing it to haproxy.
Also, a SRV record with a weight of 0 doesn't mean the server shouldn't be
used, so use a minimum weight of 1.

This should probably be backported to 1.8.

BUG/MINOR: lua: Fix return value of Socket.settimeout

The `socket.tcp.settimeout` method of Lua returns `1` in all cases,
while the `Socket.settimeout` method of haproxy returns `0` in all
cases. This breaks the `socket.http` module, because it validates
the return value of `settimeout`.

This bug was introduced in commit 7e7ac32dad1e15c19152d37aaf9ea6b3f00a7226
(which is the very first commit adding the Socket class to Lua). This
bugfix should be backported to every branch containing that commit:
- 1.6
- 1.7
- 1.8

A test case for this bug is as follows:

The 'Test' response header will contain an HTTP status code with the
patch applied and will be zero (nil) without the patch applied.

http.lua:
  http = require("socket.http")

  core.register_action("bug", { "http-req" }, function(txn)
   local b, c, h = http.request {
   url = "http://93.184.216.34",
   headers = {
   Host = "example.com"
   },
   create = core.tcp,
   redirect = false
   }

   txn:set_var("txn.foo", c)
  end)

haproxy.cfg:
  global
   lua-load /scratch/haproxy/http.lua

  frontend fe
   bind 127.0.0.1:8080
   http-request lua.bug
   http-response set-header Test %[var(txn.foo)]

   default_backend be

  backend be
   server s example.com:80

BUG/MEDIUM: lua: Fix IPv6 with separate port support for Socket.connect

The `socket.tcp.connect` method of Lua requires at least two parameters:
The host and the port. The `Socket.connect` method of haproxy requires
only one when a host with a combined port is provided. This stems from
the fact that `str2sa_range` is used internally in `hlua_socket_connect`.
This very fact unfortunately causes a diversion in the behaviour of
Lua's socket class and haproxy's for IPv6 addresses:

sock:connect("::1", "80")

works fine with Lua, but fails with:

connect: cannot parse destination address '::1'

in haproxy, because `str2sa_range` parses the trailing `:1` as the port.

This patch forcefully adds a `:` to the end of the address iff a port
number greater than `0` is given as the second parameter.

Technically this breaks backwards compatibility, because the docs state:

> The syntax "127.0.0.1:1234" is valid. in this case, the
> parameter *port* is ignored.

But: The connect() call can only succeed if the second parameter is left
out (which causes no breakage) or if the second parameter is an integer
or a numeric string.

It seems unlikely that someone would provide an address with a port number
and would also provide a second parameter containing a number other than
zero. Thus I feel this breakage is warranted to fix the mismatch between
haproxy's socket class and Lua's one.

This commit should be backported to haproxy 1.8 only, because of the
possible breakage of existing Lua scripts.

DOC: lua: Fix typos in comments of hlua_socket_receive

BUG/MINOR: lua: Fix default value for pattern in Socket.receive

The default value of the pattern in `Socket.receive` is `*l` according
to the documentation and in the `socket.tcp.receive` method of Lua.

The default value of `wanted` in `int hlua_socket_receive(struct lua_State *)`
reflects this requirement, but the function fails to ensure this
nonetheless:

If no parameter is given the top of the Lua stack will have the index 1.
`lua_pushinteger(L, wanted);` then pushes the default value onto the stack
(with index 2).
The following `lua_replace(L, 2);` then pops the top index (2) and tries to
replace the index 2 with it.
I am not sure why exactly that happens (possibly, because one cannot replace
non-existent stack indicies), but this causes the stack index to be lost.

`hlua_socket_receive_yield` then tries to read the stack index 2, to
determine what to read and get the value `0`, instead of the correct
HLSR_READ_LINE, thus taking the wrong branch.

Fix this by ensuring that the top of the stack is not replaced by itself.

This bug was introduced in commit 7e7ac32dad1e15c19152d37aaf9ea6b3f00a7226
(which is the very first commit adding the Socket class to Lua). This
bugfix should be backported to every branch containing that commit:
- 1.6
- 1.7
- 1.8

A test case for this bug is as follows:

The 'Test' response header will contain an HTTP status line with the
patch applied and will be empty without the patch applied. Replacing
the `sock:receive()` with `sock:receive("*l")` will cause the status
line to appear with and without the patch

http.lua:
  core.register_action("bug", { "http-req" }, function(txn)
   local sock = core.tcp()
   sock:settimeout(60)
   sock:connect("127.0.0.1:80")
   sock:send("GET / HTTP/1.0\r\n\r\n")
   response = sock:receive()
   sock:close()
   txn:set_var("txn.foo", response)
  end)

haproxy.cfg (bits omitted for brevity):
  global
   lua-load /scratch/haproxy/http.lua

  frontend fe
   bind 127.0.0.1:8080
   http-request lua.bug
   http-response set-header Test %[var(txn.foo)]

   default_backend be

  backend be
   server s 127.0.0.1:80

BUG/MEDIUM: ssl: cache doesn't release shctx blocks

Since the rework of the shctx with the hot list system, the ssl cache
was putting session inside the hot list, without removing them.
Once all block were used, they were all locked in the hot list, which
was forbiding to reuse them for new sessions.

Bug introduced by 4f45bb9 ("MEDIUM: shctx: separate ssl and shctx")

Thanks to Jeffrey J. Persch for reporting this bug.

Must be backported to 1.8.

CLEANUP: rbtree: remove

Remove the rbtree implementation. It's not used, it's not even connected to
the build, and we probably have no use for it .

BUILD: ssl: silence a warning when building without NPN nor ALPN support

When building with a library not offering any of these, ssl_conf_cur
is not used.

Can be backported to 1.8.

BUG/MEDIUM: h2: properly handle the END_STREAM flag on empty DATA frames

Peter Lindegaard Hansen reported a problem affecting some POST requests
sent by MSIE on 1.8.3. Lukas found that we incorrectly dealt with the
END_STREAM flag on empty DATA frames.

What happens in fact is that while we correctly report that we've read a
zero-byte frame, since commit 8fc016d ("BUG/MEDIUM: h2: support uploading
partial DATA frames") backported into 1.8.2, we've been able to return
without updating the parser's state nor checking the frame flags in this
case.

The fix is trival, we just need not to return too early.

This fix must be backported to 1.8.

MEDIUM: h2: prepare a graceful shutdown when the frontend is stopped

During a reload operation, instead of keeping the H2 connections opened
forever causing confusion during configuration changes, let's send a
graceful shutdown so that the client knows that it would better open a
new connection for future requests. We can't really catch the signal
from H2, but we can advertise this graceful shutdown upon the next I/O
event (eg: a WINDOW_UPDATE from the client or a new request). One of
the visible effect is that the old process quits much faster.

This patch should be backported to 1.8 since it is affected by this
problem.

CONTRIB: hpack: add an hpack decoder

This decoder takes a series of hex codes on stdin using one line
per HEADERS frame and shows the decoded headers.

DEBUG: hpack: add more traces to the hpack decoder

These ones are only enabled when DEBUG_HPACK is defined so they have no
effect on the production code.

DEBUG: hpack: make hpack_dht_dump() expose the output file

It's more convenient to be able to choose between stdout and stderr.

MINOR: h2: add a function to report pseudo-header names

For debugging we need to be able to dump pseudo headers when we know
their name, let's put this there as we already have the other way
around.

BUG/MAJOR: hpack: don't return direct references to the dynamic headers table

Maximilian Böhm and Lucas Rolff both reported some random failed requests
with HTTP/2. Upon deep investigation on detailed traces provided by Lucas,
it turned out that some header names were occasionally corrupted and used
to point to random strings within the dynamic headers table.

The HPACK decoder must always return copies of header names that point
to the dynamic headers table. Otherwise, the insertion of a header after
the current one leading to a reorganization of the table will change the
data the pointer designates. Unfortunately, one such copy was missing for
indexed names, leading to random request failures due to invalid header
names.

Many thanks to Lucas who ran a large number of tests with full traces
helping to capture a reproduceable sequence exhibiting this issue.

This patch must be backported to 1.8.

BUG/MEDIUM: http: don't automatically forward request close

Maximilian Böhm, and Lucas Rolff reported some frequent HTTP/2 POST
failures affecting version 1.8.2 that were not affecting 1.8.1. Lukas
Tribus determined that these ones appeared consecutive to commit a48c141
("BUG/MAJOR: connection: refine the situations where we don't send shutw()").

It turns out that the HTTP request forwarding engine lets a shutr from
the client be automatically forwarded to the server unless chunked
encoding is in use. It's a bit tricky to meet this condition as it only
happens if the shutr is not reported in the initial request. So if a
request is large enough or the body is delayed after the headers (eg:
Expect: 100-continue), the the function quits with channel_auto_close()
left enabled. The patch above was not really related in fact. It's just
that a previous bug was causing this shutw to be skipped at the lower
layers, and the two bugs used to cancel themselves.

In the HTTP request we should only pass the close in tunnel mode, as
other cases either need to keep the connection alive (eg: for reuse)
or will force-close it. Also the forced close will properly take care
of avoiding the painful time-wait, which is not possible with the early
close.

This patch must be backported to 1.8 as it directly impacts HTTP/2, and
may be backported to older version to save them from being abused by
clients causing TIME_WAITs between haproxy and the server.

Thanks to Lukas and Lucas for running many tests with captures allowing
the bug to be narrowed down.

MINOR: don't close stdio anymore

Closing the standard IO FDs (0,1,2) can be troublesome, especially in
the case of the master-worker.

Instead of closing those FDs, they are now pointing to /dev/null which
prevents sending debugging messages to the wrong FDs.

This patch could be backported in 1.8.

BUG/MEDIUM: mworker: don't close stdio several time

This patch makes sure that a frontend socket that gets created after
initialization won't be closed when the master gets re-executed.

When used in daemon mode, the master-worker is closing the FDs 0, 1, 2
after the fork of the children.

When the master was reloading, those FDs were assigned again during the
parsing of the configuration (probably for some listeners), and the
workers were closing them thinking it was the stdio.

This patch must be backported to 1.8.

BUG/MEDIUM: h2: ensure we always know the stream before sending a reset

The recent patch introducing the H2_CS_FRAME_E state to emit stream
resets was not totally correct in that in the rare case where there is
no room left to emit the reset, the next call to process it later could
use an uninitialized stream. This only affects responses to frames that
are sent on closed streams though.

This fix must be backported to 1.8.

DOC/MINOR: configuration: typo, formatting fixes

- Add simple typo and formatting fixes
- Eliminate a couple > 80 column lines

Changes do not affect technical content and can be backported.

BUG/MEDIUM: h2: improve handling of frames received on closed streams

The h2spec utility found certain situations where we're returning an
RST_STREAM while a GOAWAY is expected. While we can't always reliably
decide which one to use (eg: after a stream has been closed for a long
time), in practice we often still have the stream available until it's
destroyed at the application level. This provides the flags we need to
verify the conditions that led to its closure, namely if RST was sent
or received, or if it was regularly closed using a double ES.

The first step consists in marking all closed streams as having already
sent an RST_STREAM frame. This will ensure that we can send an RST_STREAM
for a late transmission on a stream we have forgotten about instead of
risking to break the connection. The next steps consist in re-arranging
the H2_SS_CLOSED checks so that we can deliver a GOAWAY frame for the
few cases where an unexpected frame was received after a double ES.

By carefully taking care of these specificities, we can reduce by 4 the
number of remaining compliance issues.

Note: some tests start to become a bit long and to be repeated at various
places. Probably that adding a bitmask of allowed/forbidden frame types
per state and/or per situation could significantly help. It's likely
that some deeper tests in the frame handlers could also be removed now
as they can't be triggered anymore.

This fix should be backported to 1.8.

BUG/MEDIUM: h2: properly handle and report some stream errors

Some stream errors applied to half-closed and closed streams are not
properly reported, especially after the stream transistions to the
closed state. The reason is that the code checks for this "error"
stream state in order to send an RST frame. But if the stream was
just closed or was already closed, there's no way to validate this
condition, and the error is never reported to the peer.

In order to address this situation, we'll add a new FRAME_E demux state
which indicates that the previously parsed frame triggered a stream error
of type STREAM CLOSED that needs to be reported. Proceeding like this
will ensure that we don't lose that information even if we can't
immediately send the message. It also removes the confusion where FRAME_A
could be used either for ACKs or for RST.

The state transition has been added after every h2s_error() on the demux
path. It seems that we might need to have two distinct h2s_error()
functions, one for the mux and another one for the demux, though it
would provide little benefit. It also becomes more apparent that the
H2_SS_ERROR state is only used to detect the need to report an error
on the mux direction. Maybe this will have to be revisited later.

This simple change managed to eliminate 5 bugs reported by h2spec.

This fix must be backported to 1.8.

BUG/MEDIUM: checks: properly set servers to stopping state on 404

Paul Lockaby reported that since 1.8, disable-on-404 doesn't work
anymore in that the server stay up despite returning 404. Cyril spotted
that this was caused by a copy-paste error introduced by commit 5a13351
("BUG/MEDIUM: log: check result details truncated.") causing
set_server_running() to be called instead of set_server_stopping() in
this case.

It can be reproduced with the simple test config below :

  defaults
     mode http
     timeout connect 1s
     timeout client  10s
     timeout server  10s

  listen http
     bind :8888
     option httpchk GET /
     http-check disable-on-404
     server s1 127.0.0.1:9001 check
     server s2 127.0.0.1:9002 check
     http-response add-header x-served-by %s

  listen s1
     bind :9001
     server next 127.0.0.1:9002
     http-response set-status 404

  frontend s2
     bind :9002
     http-request redirect location /

S1 is supposed to be stopping and s2 up, which is not the case. After
calling the correct function, only S2 is used now.

This needs to be backported to 1.8.

BUG/MAJOR: connection: refine the situations where we don't send shutw()

Since commit f9ce57e ("MEDIUM: connection: make conn_sock_shutw() aware
of lingering"), we refrain from performing the shutw() on the socket if
there is no lingering risk. But there is a problem with this in tunnel
and in TCP modes where a client is explicitly allowed to send a shutw
to the server, eventhough it it risky.

Not doing it creates this situation reported by Ricardo Fraile and
diagnosed by Christopher : a typical HTTP client (eg: curl) connecting
via the config below to an HTTP server would receive its response,
immediately close while the server remains in keep-alive mode. The
shutr() received by haproxy from the client is "propagated" to the
server side but not acted upon because fdtab[fd].linger_risk is set,
so we expect that the next close will immediately complete this
operation.

  listen proxy-tcp
    bind 127.0.0.1:8888
    mode tcp
    timeout connect 5s
    timeout server  10s
    timeout client  10s
    server server1 127.0.0.1:8000

But since the whole stream will not end until the server closes in
turn, the server doesn't close and haproxy expires on server timeout.
This problem has already struck by waking up an older bug and was
partially fixed with commit 8059351 ("BUG/MEDIUM: http: don't disable
lingering on requests with tunnelled responses") though it was not
enough.

The problem is that linger_risk is not suited here. In fact we need to
know whether or not it is desired to close normally or silently, and
whether or not a shutr() has already been received on this connection.

This is the approach this patch takes, and it solves the problem for
the various difficult modes (tcp, http-server-close, pretend-keepalive).

This fix needs to be backported to 1.8. Many thanks to Ricardo for
providing very detailed traces and configurations.

BUG/MEDIUM: cache: don't cache the response on no-cache="set-cookie"

If the server mentions no-cache="set-cookie" in the response headers,
we must guarantee that any set-cookie field will not be stored. We
cannot edit the stored response on the fly to trim the set-cookie
header so we can refrain from storing a response containing such a
header. In theory we could use TX_SCK_PRESENT for this but this one
is only set when the cookie is being watched by the configuration.
Since these responses are not very frequent and often accompanied
with a set-cookie header, let's simply refrain from caching whenever
such directive is present.

This needs to be backported to 1.8.

BUG/MEDIUM: cache: respect the request cache-control header

Till now if a client emitted a request featureing a cache-control header,
this one was not respected and a stale object could still be delievered.r
This patch ensures that :
  - cache-control: no-cache disables retrieval from the cache but does
    not prevent the newly fetched object from being stored ;
  - cache-control: no-store can safely retrieve from the cache but prevents
    from storing any fetched object
  - cache-control: max-age/max-stale/min-fresh act like no-cache
  - pragma: no-cache acts like cache-control: no-cache.

This needs to be backported to 1.8.

BUG/MEDIUM: cache: replace old object on store

Currently the cache aborts a store operation if the object to store
already exists in the cache. This is used to avoid storing multiple
copies at the same time on concurrent accesses. It causes an issue
though, which is that existing unexpired objects cannot be updated.
This happens when any request criterion disables the retrieval from
the cache (eg: with max-age or any other cache-control condition).

For now, let's simply replace the previous existing entry by unlinking
it from the index. This could possibly be improved in the future if
needed.

This fix needs to be backported to 1.8.

BUG/MEDIUM: cache: do not try to retrieve host-less requests from the cache

All HTTP/1.1 requests the Host header share the same hash key 0 and
will be return the first cached object. Let's add the check on the call
to sha1_hosturi() to prevent this from happening.

This must be backported to 1.8.

MINOR: http: add a function to check request's cache-control header field

The new function check_request_for_cacheability() is used to check if
a request may be served from the cache, and/or allows the response to
be stored into the cache. For this it checks the cache-control and
pragma header fields, and adjusts the existing TX_CACHEABLE and a new
TX_CACHE_IGNORE flags.

For now, just like its response side counterpart, it only checks the
first value of the header field. These functions should be reworked to
improve their parsers and validate all elements.

BUG/MINOR: cache: do not force the TX_CACHEABLE flag before checking cacheability

The cache used to set this flag before calling
check_response_for_cacheability() due to the way the flags were previously
set (too late), but this is a bad idea as it loses the information of the
implicit caching rules related to the method and the status code. Let's
only rely on what was determined during the request and response parsing
instead and not change it.

This fix must be backported to 1.8, and it requires that the following
patches are also merged :
- MINOR: http: adjust the list of supposedly cacheable methods
- MINOR: http: update the list of cacheable status codes as per RFC7231
- MINOR: http: start to compute the transaction's cacheability from the request
- BUG/MINOR: http: do not ignore cache-control: public

BUG/MINOR: http: properly detect max-age=0 and s-maxage=0 in responses

In 1.3.8, commit a15645d ("[MAJOR] completed the HTTP response processing.")
improved the response parser by taking care of the cache-control header
field. The parser is wrong because it is split in two parts, one checking
for elements containing an equal sign and the other one for those without.
The "max-age=0" and "s-maxage=0" tests were located at the wrong place and
thus have never matched. In practice the side effect was very minimal given
that this code used to be enabled only when checking if a cookie had the
risk of being cached or not. Recently in 1.8 it was also used to decide if
the response could be cached but in practice the cache takes care of these
values by itself so there is very limited impact.

This fix can be backported to all stable versions.

BUG/MINOR: http: do not ignore cache-control: public

In check_response_for_cacheability(), we don't check the
cache-control flags if the response is already supposed not to be
cacheable. This was introduced very early when cache-control:public
was not checked, and it basically results in this last one not being
able to properly mark the response as cacheable if it uses a status
code which is non-cacheable by default. Till now the impact is very
limited as it doesn't check that cookies set on non-default status
codes are not cacheable, and it prevents the cache from caching such
responses.

Let's fix this by doing two things :
  - remove the test for !TX_CACHEABLE in the aforementionned function
  - however take care of 1xx status codes here (which used to be
    implicitly dealt with by the test above) and remove the explicit
    check for 101 in the caller

This fix must be backported to 1.8.

MINOR: http: start to compute the transaction's cacheability from the request

There has always been something odd with the way the cache-control flags
are checked. Since it was made for checking for the risk of leaking cookies
only, all the processing was done in the response. Because of this it is not
possible to reuse the transaction flags correctly for use with the cache.

This patch starts to change this by moving the method check in the request
so that we know very early whether the transaction is expected to be cacheable
and that this status evolves along with checked headers. For now it's not
enough to use from the cache yet but at least it makes the flag more
consistent along the transaction processing.

MINOR: http: update the list of cacheable status codes as per RFC7231

Since RFC2616, the following codes were added to the list of codes
cacheable by default : 204, 404, 405, 414, 501. For now this it only
checked by the checkcache option to detect cacheable cookies.

MINOR: http: adjust the list of supposedly cacheable methods

We used to have a rule inherited from RFC2616 saying that the POST
method was the only uncacheable one, but things have changed since
and RFC7231+7234 made it clear that in fact only GET/HEAD/OPTIONS/TRACE
are cacheable. Currently this rule is only used to detect cacheable
cookies.

BUG/MEDIUM: lua: fix crash when using bogus mode in register_service()

When using an incorrect 'mode' as 2nd argument of core.register_service(),
HAProxy crashes while displaying the error message.

To be backported to 1.8, 1.7 and 1.6.

BUG/MEDIUM: checks: a server passed in maint state was not forced down.

Setting a server in maint mode, the required next_state was not set
before calling the 'lb_down' function and so the system state was never
commited.

This patch should be backported in 1.8

BUG/MEDIUM: stream: don't consider abortonclose on muxes which close cleanly

The H2 mux can cleanly report an error when a client closes, which is not
the case for the pass-through mux which only reports shutr. That was the
reason why "option abortonclose" was created since there was no way to
distinguish a clean shutdown after sending the request from an abort.

The problem is that in case of H2, the streams are always shut read after
the request is complete (when the END_STREAM flag is received), and that
when this lands on a backend configured with "option abortonclose", this
aborts the request. Disabling abortonclose is not always an option when
H1 and H2 have to coexist.

This patch makes use of the newly introduced mux capabilities reported
via the stream interface's SI_FL_CLEAN_ABRT indicating that the mux is
safe and that there is no need to turn a clean shutread into an abort.
This way abortonclose has no effect on requests initiated from an H2
mux.

This patch as well as these 3 previous ones need to be backported to
1.8 :
- BUG/MINOR: h2: properly report a stream error on RST_STREAM
- MINOR: mux: add flags to describe a mux's capabilities
- MINOR: stream-int: set flag SI_FL_CLEAN_ABRT when mux supports clean aborts

MINOR: stream-int: set flag SI_FL_CLEAN_ABRT when mux supports clean aborts

By copying the info in the stream interface that the mux cleanly reports
aborts, we'll have the ability to check this flag wherever needed regardless
of the presence of a mux or not.

MINOR: mux: add flags to describe a mux's capabilities

This new field will be used to describe certain properties of some
muxes. For now we only add MX_FL_CLEAN_ABRT to indicate that a mux
is able to unambiguously report aborts using CS_FL_ERROR contrary
to others who may only report it via a read0. This will be used to
improve handling of the abortonclose option with H2. Other flags
may come later to report multiplexing capabilities or not, support
of client/server sides etc.

BUG/MINOR: h2: properly report a stream error on RST_STREAM

We want to report such an error since H2 allows to differenciate
between an end of stream and an abort.

To be backported to 1.8.

CONTRIB: halog: Fix compiler warnings in halog.c

There were several unused variables in halog.c that each caused a
compiler warning [-Wunused-but-set-variable]. This patch simply
removes the declaration of said vairables and any instance where the
unused variable was assigned a value.

CONTRIB: iprange: Fix compiler warning in iprange.c

The declaration of main() in iprange.c did not specify a type, causing
a compiler warning [-Wimplicit-int]. This patch simply declares main()
to be type 'int' and calls exit(0) at the end of the function.

MINOR: spoe: add force-set-var option in spoe-agent configuration

For security reasons, the spoe filter was only able to change values of
existing variables. In specific cases (ex : with LUA code), the name of
variables are unknown at the configuration parsing phase.
The force-set-var option can be enabled to register all variables.

MEDIUM: netscaler: add support for standard NetScaler CIP protocol

It looks like two version of the protocol exist as reported by
Andreas Mahnke. This patch add support for both legacy and standard CIP
protocol according to NetScaler specifications.

MEDIUM: netscaler: do not analyze original IP packet size

Original informations about the client are stored in the CIP encapsulated
IP header, hence there is no need to consider original IP packet length
to determine if data are missing. Instead this change detect missing
data if the remaining buffer is large enough to contain a minimal IP and
TCP header and if the buffer has as much data as CIP is telling.

MINOR: netscaler: check in one-shot if buffer is large enough for IP and TCP header

There is minimal gain in checking first the IP header length and then
the TCP header length since we always want to capture information about
both protocols.

IPv4 length calculation was incorrect since IPv4 ip_len actually defines
the total length of IPv4 header and following data.

BUG/MAJOR: netscaler: address truncated CIP header detection

Buffer line is manually incremented in order to progress in the trash
buffer but calculation are made omitting this manual offset.

This leads to random packets being rejected with the following error:

HTTP/1: Truncated NetScaler Client IP header received

Instead, once original IP header is found, use the IP header length
without considering the CIP encapsulation.

BUG/MEDIUM: netscaler: use the appropriate IPv6 header size

IPv6 header has a fixed size of 40 bytes, not 20.

MINOR: netscaler: rename cip_len to clarify its uage

cip_len was meant to be the length of the data encapsulated in the CIP
protocol, the size the IP and TCP header

MINOR: netscaler: remove the use of cip_magic only used once

MINOR: netscaler: respect syntax

As per doc/coding-style.txt

DOC/MINOR: intro: typo, wording, formatting fixes

- Fix a couple typos
- Introduce a couple simple rewordings
- Eliminate > 80 column lines

Changes do not affect technical content and can be backported.

BUG/MEDIUM: mworker: Set FD_CLOEXEC flag on log fd

A log socket (UDP or UNIX) is opened by the master during its startup, when the
first log message is sent. So, to prevent FD leaks, we must ensure we correctly
close it during a reload. By setting FD_CLOEXEC bit on it, we are sure it will
be automatically closed it during a reload.

This patch must be backported in 1.8.

MINOR: sample: rename the "len" converter to "length"

This converter was recently introduced by commit ed0d24e ("MINOR:
sample: add len converter").

As found by Cyril, it causes an issue in "http-request capture"
statements. The non-obvious problem is that an old syntax for sample
expressions and converters used to support a series of words, each
representing a converter. This used to be how the "stick" directives
were created initially. By having a converter called "len", a
statement such as "http-request capture foo len 10" considers "len"
as a converter and not as the capture length.

This obsolete syntax needs to be changed in 1.9 but it's too late
for other versions. It's worth noting that the same problem can
happen if converters are registered on the fly using Lua. Other
language keywords that currently have to be avoided in converters
include "id", "table", "if", "unless".

BUG: MINOR: http: don't check http-request capture id when len is provided

Randomly, haproxy could fail to start when a "http-request capture"
action is defined, without any change to the configuration. The issue
depends on the memory content, which may raise a fatal error like :
unable to find capture id 'xxxx' referenced by http-request capture
rule

Commit fd608dd2 already prevents the condition to happen, but this one
should be included for completeness and to reclect the code on the
response side.

The issue was introduced recently by commit 29730ba5 and should only be
backported to haproxy 1.8.

BUG: MAJOR: lb_map: server map calculation broken

Adrian Williams reported that several balancing methods were broken and
sent all requests to one backend. This is a regression in haproxy 1.8 where
the server score was not correctly recalculated.

This fix must be backported to the 1.8 branch.

MINOR: sample: add len converter

Add len converter that returns the length of a string

BUG/MINOR: stream-int: don't try to receive again after receiving an EOS

When an end of stream has been reported, we should not try to receive again
as the mux layer might not be prepared to this and could report unexpected
errors.

This is more of a strengthening measure that follows the introduction of
conn_stream that came in 1.8. It's desired to backport this into 1.8 though
it's uncertain at this time whether it may have caused real issues.

BUG/MEDIUM: h2: fix stream limit enforcement

Commit 4974561 ("BUG/MEDIUM: h2: enforce the per-connection stream limit")
implemented a stream limit enforcement on the connection but it was not
correctly done as it would count streams still known by the connection,
which includes the lingering ones that are already marked close. We need
to count only the non-closed ones, which this patch does. The effect is
that some streams are rejected a bit before the limit.

This fix needs to be backported to 1.8.

BUG/MEDIUM: http: don't disable lingering on requests with tunnelled responses

The HTTP forwarding engine needs to disable lingering on requests in
case the connection to the server has to be suddenly closed due to
http-server-close being used, so that we don't accumulate lethal
TIME_WAIT sockets on the outgoing side. A problem happens when the
server doesn't advertise a response size, because the response
message quickly goes through the MSG_DONE and MSG_TUNNEL states,
and once the client has transferred all of its data, it turns to
MSG_DONE and immediately sets NOLINGER and closes before the server
has a chance to respond. The problem is that this destroys some of
the pending DATA being uploaded, the server doesn't receive all of
them, detects an error and closes.

This early NOLINGER is inappropriate in this situation because it
happens before the response is transmitted. This state transition
to MSG_TUNNEL doesn't happen when the response size is known since
we stay in MSG_DATA (and related states) during all the transfer.

Given that the issue is only related to connections not advertising
a response length and that by definition these connections cannot be
reused, there's no need for NOLINGER when the response's transfer
length is not known, which can be verified when entering the CLOSED
state. That's what this patch does.

This fix needs to be backported to 1.8 and very likely to 1.7 and
older as it affects the very rare case where a client immediately
closes after the last uploaded byte (typically a script). However
given that the risk of occurrence in HTTP/1 is extremely low, it is
probably wise to wait before backporting it before 1.8.

BUG/MEDIUM: h2: don't close after the first DATA frame on tunnelled responses

Tunnelled responses are those without a content-length nor a chunked
encoding. They are specially dealt with in the current code but the
behaviour is not correct. The fact that the chunk size is left to zero
with a state artificially set to CHUNK_SIZE validates the test on
whether or not to set the end of stream flag. Thus the first DATA
frame always carries the ES flag and subsequent ones remain blocked.

This patch fixes it in two ways :
  - update h1m->curr_len to the size of the current buffer so that it
    is properly subtracted later to find the real end ;
  - don't set the state to CHUNK_SIZE when there's no content-length
    and instead set it to CHUNK_SIZE only when there's chunking.

This fix needs to be backported to 1.8.

BUG/MEDIUM: h2: don't switch the state to HREM before end of DATA frame

We used to switch the stream's state to HREM when seeing and ES bit on
the DATA frame before actually being able to process that frame, possibly
resulting in the DATA frame being processed after the stream was seen as
half-closed and possibly being rejected. The state must not change before
the frame is really processed.

Also fixes a harmless typo in the flag name which should have DATA and
not HEADERS in its name (but all values are equal).

Must be backported to 1.8.

MINOR: h2: don't demand that a DATA frame is complete before processing it

Since last commit it's not required that the DATA frames are complete anymore
so better start with what we have. Only the HEADERS frame requires this. This
may be backported as part of the upload fixes.

BUG/MEDIUM: h2: support uploading partial DATA frames

We currently have a problem with DATA frames when they don't fit into
the destination buffer. While it was imagined that in theory this never
happens, in practice it does when "option http-buffer-request" is set,
because the headers don't leave the target buffer before trying to read
so if the frame is full, there's never enough room.

This fix consists in reading what can be read from the frame and advancing
the input buffer. Once the contents left are only the padding, the frame
is completely processed. This also solves another problem we had which is
that it was possible to fill a request buffer beyond its reserve because
the <count> argument was not respected in h2_rcv_buf(). Thus it's possible
that some POST requests sent at once with a headers+body filling exactly a
buffer could result in "400 bad req" when trying to add headers.

This fix must be backported to 1.8.

MINOR: h2: store the demux padding length in the h2c struct

We'll try to process partial frames and for this we need to know the
padding length. The first step requires to extract it during the parsing
and store it in the demux context in the connection. Till now it was only
processed at once.

BUG/MEDIUM: h2: debug incoming traffic in h2_wake()

Even after previous commit ("BUG/MEDIUM: h2: work around a connection
API limitation") there is still a problem with some requests. Sometimes
when polling for more request data while some pending data lies in the
buffer, there's no way to enter h2_recv() because the FD is not marked
ready for reading.

We need to slightly change the approach and make h2_recv() only receive
from the buffer and h2_wake() always attempt to demux if the demux is not
blocked.

However, if the connection is already being polled for reading, it will
not wake up from polling. For this reason we need to cheat and also
pretend a request for sending data, which ensures that as soon as any
direction may move, we can continue to demux. This shows that in the
long term we probably need a better way to resume an interrupted
operation at the mux level.

With this fix, no more hangups happen during uploads. Note that this
time the setup required to provoke the hangups was a bit complex :
  - client is "curl" running on local host, uploading 1.7 MB of
    data via haproxy
  - haproxy running on local host, forwarding to a remote server
    through a 100 Mbps only switch
  - timeouts disabled on haproxy
  - remote server made of thttpd executing a cgi reading request data
    through "dd bs=10" to slow down everything.

With such a setup, around 3-5% of the connections would hang up.

This fix needs to be backported to 1.8.

BUG/MEDIUM: h2: work around a connection API limitation

The connection API permits us to enable or disable receiving on a
connection. The underlying FD layer arranges this with the polling
and the fd cache. In practice, if receiving was allowed and an end
of buffer was reached, the FD is subscribed to the polling. If later
we want to process pending data from the buffer, we have to enable
receiving again, but since it's already enabled (in polled mode),
nothing happens and the pending data remain stuck until a new event
happens on the connection to wake the FD up. This is a limitation of
the internal connection API which is not very friendly to the new mux
architecture.

The visible effect is that certain uploads to slow servers experience
truncation on timeout on their last blocks because nothing new comes
from the connection to wake it up while it's being polled.

In order to work around this, there are two solutions :
  - either cheat on the connection so that conn_update_xprt_polling()
    always performs a call to fd_may_recv() after fd_want_recv(), that
    we can trigger from the mux by always calling conn_xprt_stop_recv()
    before conn_xprt_want_recv(), but that's a bit tricky and may have
    side effects on other parts (eg: SSL)

  - or we refrain from receiving in the mux as soon as we're busy on
    anything else, regardless of whether or not some room is available
    in the receive buffer.

This patch takes the second approach above. This way once we read some
data, as soon as we detect that we're stuck, we immediately stop receiving.
This ensures the event doesn't go into polled mode for this period and
that as soon as we're unstuck we can continue. In fact this guarantees
that we can only wait on one side of the mux for a given direction. A
future improvement of the connection layer should make it possible to
resume processing of an interrupted receive operation.

This fix must be backported to 1.8.

BUG/MEDIUM: h2: enable recv polling whenever demuxing is possible

In order to allow demuxing when the dmux buffer is full, we need to
enable data receipt in multiple conditions. Since the conditions are a
bit complex, they have been delegated to a new function h2_recv_allowed()
which follows these rules :

  - if an error or a shutdown was detected on the connection and the buffer
    is empty, we must not attempt to receive
  - if the demux buf failed to be allocated, we must not try to receive and
    we know there is nothing pending
  - if the buffer is not full, we may attempt to receive
  - if no flag indicates a blocking condition, we may attempt to receive
  - otherwise must may not attempt

No more truncated payloads are detected in tests anymore, which seems to
indicate that the issue was worked around. A better connection API will
have to be created for new versions to make this stuff simpler and more
intuitive.

This fix needs to be backported to 1.8 along with the rest of the patches
related to CS_FL_RCV_MORE.

BUG/MEDIUM: h2: automatically set CS_FL_RCV_MORE when the output buffer is full

If we can't demux pending data due to a stream buffer full condition, we
now set CS_FL_RCV_MORE on the conn_stream so that the stream layer knows
it must call back as soon as possible to restart demuxing. Without this,
some uploaded payloads are truncated if the server does not consume them
fast enough and buffers fill up.

Note that this is still not enough to solve the problem, some changes are
required on the recv() and update_poll() paths to allow to restart reading
even with a buffer full condition.

This patch must be backported to 1.8.

BUG/MEDIUM: stream-int: always set SI_FL_WAIT_ROOM on CS_FL_RCV_MORE

When a stream interface tries to read data from a mux using rcv_buf(),
sometimes it sees 0 as the return value and concludes that there's no
more data while there are, resulting in the connection being polled for
more data and no new attempt being made at reading these pending data.

Now it will automatically check for flag CS_FL_RCV_MORE to know if the
mux really did not have anything available or was not able to provide
these data by lack of room in the destination buffer, and will set
SI_FL_WAIT_ROOM accordingly. This will ensure that once current data
lying in the buffer are forwarded to the other side, reading chk_rcv()
will be called to re-enable reading.

It's important to note that in practice it will rely on the mux's
update_poll() function to re-enable reading and that where the calls
are placed in the stream interface, it's not possible to perform a
new synchronous rcv_buf() call. Thus a corner case remains where the
mux cannot receive due to a full buffer or any similar condition, but
needs to be able to wake itself up to deliver pending data. This is a
limitation of the current connection/conn_stream API which will likely
need a new event subscription to at least call ->wake() asynchronously
(eg: mux->{kick,restart,touch,update} ?).

For now the affected mux (h2 only) will have to take care of the extra
logic to carefully enable polling to restart processing incoming data.

This patch relies on previous one (MINOR: conn_stream: add new flag
CS_FL_RCV_MORE to indicate pending data) and both must be backported to
1.8.

MINOR: conn_stream: add new flag CS_FL_RCV_MORE to indicate pending data

Due to the nature of multiplexed protocols, it will often happen that
some operations are only performed on full frames, preventing any partial
operation from being performed. HTTP/2 is one such example. The current
MUX API causes a problem here because the rcv_buf() function has no way
to let the stream layer know that some data could not be read due to a
lack of room in the buffer, but that data are definitely present. The
problem with this is that the stream layer might not know it needs to
call the function again after it has made some room. And if the frame
in the buffer is not followed by any other, nothing will move anymore.

This patch introduces a new conn_stream flag CS_FL_RCV_MORE whose purpose
is to indicate on the stream that more data than what was received are
already available for reading as soon as more room will be available in
the buffer.

This patch doesn't make use of this flag yet, it only declares it. It is
expected that other similar flags may come in the future, such as reports
of pending end of stream, errors or any such event that might save the
caller from having to poll, or simply let it know that it can take some
actions after having processed data.

BUG/MEDIUM: lua/notification: memory leak

The thread patches adds refcount for notifications. The notifications are
used with the Lua cosocket. These refcount free the notifications when
the session is cleared. In the Lua task case, it not have sessions, so
the nofications are never cleraed.

This patch adds a garbage collector for signals. The garbage collector
just clean the notifications for which the end point is disconnected.

This patch should be backported in 1.8

DOC: notifications: add precisions about thread usage

Precise the terms of use the notification functions.

MINOR: systemd: remove comment about HAPROXY_STATS_SOCKET

This variable was used by the wrapper which was removed in
a6cfa9098e5a. The correct way to do seamless reload is now to enable
"expose-fd listeners" on the stat socket.

BUG/MEDIUM: threads/vars: Fix deadlock in register_name

In register_name, before locking the var_names array, we check the variable name
validity. So if we try to register an invalid or empty name, we need to return
without unlocking it (because it was never locked).

This patch must be backported in 1.8.

BUG/MEDIUM: email-alert: don't set server check status from a email-alert task

This avoids possible 100% cpu usage deadlock on a EMAIL_ALERTS_LOCK and
avoids sending lots of emails when 'option log-health-checks' is used.
It is avoided to change the server state and possibly queue a new email
while processing the email alert by setting check->status to
HCHK_STATUS_UNKNOWN which will exit the set_server_check_status(..) early.

This needs to be backported to 1.8.

CONTRIB: halog: Add help text for -s switch in halog program

It was not documented. May be backported to older releases.

MINOR: mworker: Improve wording in `void mworker_wait()`

Replace "left" / "leaving" with "exit" / "exiting".

This should be backported to haproxy 1.8.

MINOR: mworker: Update messages referencing exit-on-failure

Commit 4cfede87a313456fcbce7a185312460b4e1d05b7 removed
`exit-on-failure` in favor of `no-exit-on-failure`, but failed
to update references to the former in user facing messages.

This should be backported to haproxy 1.8.

BUG/MEDIUM: h2: fix handling of end of stream again

Commit 9470d2c ("BUG/MINOR: h2: try to abort closed streams as
soon as possible") tried to address the situations where a stream
is closed by the client, but caused a side effect which is that in
some cases, a regularly closed stream reports an error to the stream
layer. The reason is that we purposely matched H2_SS_CLOSED in the
test for H2_SS_ERROR to report this so that we can check for RST,
but it accidently catches certain end of transfers as well. This
results in valid requests to report flags "CD" in the logs.

Instead, let's roll back to detecting H2_SS_ERROR and explicitly check
for a received RST. This way we can correctly abort transfers without
mistakenly reporting errors in normal situations.

This fix needs to be backported to 1.8 as the fix above was merged into
1.8.1.

BUG/MEDIUM: peers: set NOLINGER on the outgoing stream interface

Since peers were ported to an applet in 1.5, an issue appeared which
is that certain attempts to close an outgoing connection are a bit
"too nice". Specifically, protocol errors and stream timeouts result
in a clean shutdown to be sent, waiting for the other side to confirm.
This is particularly problematic in the case of timeouts since by
definition the other side will not confirm as it has disappeared.

As found by Fred, this issue was further emphasized in 1.8 by commit
f9ce57e ("MEDIUM: connection: make conn_sock_shutw() aware of
lingering") which causes clean shutdowns not to be sent if the fd is
marked as linger_risk, because now even a clean timeout will not be
sent on an idle peers session, and the other one will have nothing
to respond to.

The solution here is to set NOLINGER on the outgoing stream interface
to ensure we always close whenever we attempt a simple shutdown.

However it is important to keep in mind that this also underlines
some weaknesses of the shutr/shutw processing inside process_stream()
and that all this part needs to be reworked to clearly consider the
abort case, and to stop the confusion between linger_risk and NOLINGER.

This fix needs to be backported as far as 1.5 (all versions are affected).
However, during testing of the backport it was found that 1.5 never tries
to close the peers connection on timeout, so it suffers for another issue.

BUG/MEDIUM: checks: a down server going to maint remains definitely stucked on down state.

The new admin state was not correctly commited in this case.
Checks were fully disabled but the server was not marked in MAINT state.
It results with a server definitely stucked on the DOWN state.

This patch should be backported on haproxy 1.8

BUG/MEDIUM: ssl engines: Fix async engines fds were not considered to fix fd limit automatically.

The number of async fd is computed considering the maxconn, the number
of sides using ssl and the number of engines using async mode.

This patch should be backported on haproxy 1.8

BUG/MEDIUM: mworker: also close peers sockets in the master

There's a nasty case related to signaling all processes via SIGUSR1.
Since the master process still holds the peers sockets, the old process
trying to connect to the new one to teach it its tables has a risk to
connect to the master instead, which will not do anything, causing the
old process to hang instead of quitting.

This patch ensures we correctly close the peers in the master process
on startup, just like it is done for proxies. Ultimately we would rather
have a complete list of listeners to avoid such issues. But that's a bit
trickier as it would require using unbind_all() and avoiding side effects
the master could cause to other processes (like unlinking unix sockets).

To be backported to 1.8.

BUG/MINOR: ssl: support tune.ssl.cachesize 0 again

Since the split of the shctx and the ssl cache, we lost the ability to
disable the cache with tune.ssl.cachesize 0.

Worst than that, when using this configuration, haproxy segfaults during
the configuration parsing.

Must be backported to 1.8.