From: Christopher Faulet Date: Thu, 7 Apr 2016 13:30:10 +0000 (+0200) Subject: DOC: filters: Add filters documentation X-Git-Tag: v1.7-dev3~26 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=c3fe5330be4c2d15ea50ae3a2d01e9287461d13c;p=thirdparty%2Fhaproxy.git DOC: filters: Add filters documentation The configuration documention has been updated. Doc about the filter line has been added and a new chapter (§. 9) has been created to list and document supported filters (for now, flt_trace and flt_http_comp). The developer documentation about filters has also been added. The is a "pre" version. Incoming changes in the filter API will require an update. This documentation requires a deeper review and some TODO need to be complete. --- diff --git a/doc/configuration.txt b/doc/configuration.txt index 7a3fff174f..5a815a175c 100644 --- a/doc/configuration.txt +++ b/doc/configuration.txt @@ -103,6 +103,10 @@ Summary 8.8. Capturing HTTP headers 8.9. Examples of logs +9. Supported filters +9.1. Trace +9.2. HTTP compression + 1. Quick reminder about HTTP ---------------------------- @@ -1724,6 +1728,7 @@ errorloc302 X X X X -- keyword -------------------------- defaults - frontend - listen -- backend - errorloc303 X X X X force-persist - X X X +filter - X X X fullconn X - X X grace X X X X hash-type X - X X @@ -2616,6 +2621,7 @@ compression offload compression algo gzip compression type text/html text/plain + contimeout (deprecated) Set the maximum time to wait for a connection attempt to a server to succeed. May be used in sections : defaults | frontend | listen | backend @@ -3175,6 +3181,38 @@ force-persist { if | unless } and section 7 about ACL usage. +filter [param*] + Add the filter in the filter list attached to the proxy. + May be used in sections : defaults | frontend | listen | backend + no | yes | yes | yes + Arguments : + is the name of the filter. Officially supported filters are + referenced in section 9. + + is a list of parameters accpeted by the filter . The + parsing of these parameters are the responsibility of the + filter. Please refer to the documention of the corresponding + filter (section 9) from all details on the supported parameters. + + Multiple occurrences of the filter line can be used for the same proxy. The + same filter can be referenced many times if needed. + + Example: + listen + bind *:80 + + filter trace name BEFORE-HTTP-COMP + filter compression + filter trace name AFTER-HTTP-COMP + + compression algo gzip + compression offload + + server srv1 192.168.0.1:80 + + See also : section 9. + + fullconn Specify at what backend load the servers will reach their maxconn May be used in sections : defaults | frontend | listen | backend @@ -15476,6 +15514,59 @@ reading. Their sole purpose is to explain how to decipher them. connection because of too many already established. +9. Supported filters +-------------------- + +Here are listed officially supported filters with the list of parameters they +accept. Depending on compile options, some of these filters might be +unavailable. The list of available filters is reported in haproxy -vv. + +See also : "filter" + +9.1. Trace +---------- + +filter trace [name ] [random-parsing] [random-forwarding] + + Arguments: + is an arbitrary name that will be reported in + messages. If no name is provided, "TRACE" is used. + + enables the random parsing of data exchanged between + the client and the server. By default, this filter + parses all available data. With this parameter, it + only parses a random amount of the available data. + + enables the random forwading of parsed data. By + default, this filter forwards all previously parsed + data. With this parameter, it only forwards a random + amount of the parsed data. + +This filter can be used as a base to develop new filters. It defines all +callbacks and print a message on the standard error stream (stderr) with useful +information for all of them. It may be useful to debug the activity of other +filters or, quite simply, HAProxy's activity. + +Using and/or parameters is a good way to +tests the behavior of a filter that parses data exchanged between a client and +a server by adding some latencies in the processing. + + +9.2. HTTP compression +--------------------- + +filter compression + +The HTTP compression has been moved in a filter in HAProxy 1.7. "compression" +keyword must still be used to enable and configure the HTTP compression. And +when no other filter is used, it is enough. But it is mandatory to explicitly +use a filter line to enable the HTTP compression when two or more filters are +used for the same listener/frontend/backend. This is important to know the +filters evaluation order. + +See also : "compression" + + /* * Local variables: * fill-column: 79 diff --git a/doc/internals/filters.txt b/doc/internals/filters.txt new file mode 100644 index 0000000000..71013439d8 --- /dev/null +++ b/doc/internals/filters.txt @@ -0,0 +1,1058 @@ + ----------------------------------------- + Filters Guide - version 0.1 + ( Last update: 2016-04-18 ) + ------------------------------------------ + Author : Christopher Faulet + Contact : christopher dot faulet at capflam dot org + + +ABSTRACT +-------- + +The filters support is a new feature of HAProxy 1.7. It is a way to extend +HAProxy without touching its core code and, in certain extent, without knowing +its internals. This feature will ease contributions, reducing impact of +changes. Another advantage will be to simplify HAProxy by replacing some parts +by filters. As we will see, and as an example, the HTTP compression is the first +feature moved in a filter. + +This document describes how to write a filter and what you have to keep in mind +to do so. It also talks about the known limits and the pitfalls to avoid. + +As said, filters are quite new for now. The API is not freezed and will be +updated/modified/improved/extended as needed. + + + +SUMMARY +------- + + 1. Filters introduction + 2. How to use filters + 3. How to write a new filter + 3.1. API Overview + 3.2. Defining the filter name and its configuration + 3.3. Managing the filter lifecycle + 3.4. Handling the streams creation and desctruction + 3.5. Analyzing the channels activity + 3.6. Filtering the data exchanged + 4. FAQ + + + +1. FILTERS INTRODUCTION +----------------------- + +First of all, to fully understand how filters work and how to create one, it is +best to know, at least from a distance, what is a proxy (frontend/backend), a +stream and a channel in HAProxy and how these entities are linked to each other. +doc/internals/entities.pdf is a good overview. + +Then, to support filters, many callbacks has been added to HAProxy at different +places, mainly around channel analyzers. Their purpose is to allow filters to +be involved in the data processing, from the stream creation/destruction to +the data forwarding. Depending of what it should do, a filter can implement all +or part of these callbacks. For now, existing callbacks are focused on +streams. But futur improvements could enlarge filters scope. For example, it +could be useful to handle events at the connection level. + +In HAProxy configuration file, a filter is declared in a proxy section, except +default. So the configuration corresponding to a filter declaration is attached +to a specific proxy, and will be shared by all its instances. it is opaque from +the HAProxy point of view, this is the filter responsibility to manage it. For +each filter declaration matches a uniq configuration. Several declarations of +the same filter in the same proxy will be handle as different filters by +HAProxy. + +A filter instance is represented by a partially opaque context (or a state) +attached to a stream and passed as arguments to callbacks. Through this context, +filter instances are stateful. Depending the filter is declared in a frontend or +a backend section, its instances will be created, respectively, when a stream is +created or when a backend is selected. Their behaviors will also be +different. Only instances of filters declared in a frontend section will be +aware of the creation and the destruction of the stream, and will take part in +the channels analyzing before the backend is defined. + +It is important to remember the configuration of a filter is shared by all its +instances, while the context of an instance is owned by a uniq stream. + +Filters are designed to be chained. It is possible to declare several filters in +the same proxy section. The declaration order is important because filters will +be called one after the other respecting this order. Frontend and backend +filters are also chained, frontend ones called first. Even if the filters +processing is serialized, each filter will bahave as it was alone (unless it was +developed to be aware of other filters). For all that, some constraints are +imposed to filters, especially when data exchanged between the client and the +server are processed. We will dicuss again these contraints when we will tackle +the subject of writing a filter. + + + +2. HOW TO USE FILTERS +--------------------- + +To use a filter, you must use the parameter 'filter' followed by the filter name +and, optionnaly, its configuration in the desired listen, frontend or backend +section. For example: + + listen test + ... + filter trace name TST + ... + + +See doc/configuration.txt for a formal definition of the parameter 'filter'. +Note that additional parameters on the filter line must be parsed by the filter +itself. + +The list of available filters is reported by 'haproxy -vv': + + $> haproxy -vv + HA-Proxy version 1.7-dev2-3a1d4a-33 2016/03/21 + Copyright 2000-2016 Willy Tarreau + + [...] + + Available filters : + [COMP] compression + [TRACE] trace + + +Multiple filter lines can be used in a proxy section to chain filters. Filters +will be called in the declaration order. + +Some filters can support implicit declarartions in certain circumstances +(without the filter line). This is not recommanded for new features but are +useful for existing ones moved in a filter, for backward compatibility +reasons. Implicit declarartions are supported when there is only one filter used +on a proxy. When several filters are used, explicit declarartions are mandatory. +The HTTP compression filter is one of these filters. Alone, using 'compression' +keywords is enough to use it. But when at least a second filter is used, a +filter line must be added. + + # filter line is optionnal + listen t1 + bind *:80 + compression algo gzip + compression offload + server srv x.x.x.x:80 + + # filter line is mandatory for the compression filter + listen t2 + bind *:81 + filter trace name T2 + filter compression + compression algo gzip + compression offload + server srv x.x.x.x:80 + + + + +3. HOW TO WRITE A NEW FILTER +---------------------------- + +If you want to write a filter, there are 2 header files that you must know: + + * include/types/filters.h: This is the main header file, containing all + important structures you will use. It represents + the filter API. + * include/proto/filters.h: This header file contains helper functions that + you may need to use. It also contains the internal + API used by HAProxy to handle filters. + +To ease the filters integration, it is better to follow some conventions: + + * Use 'flt_' prefix to name your filter (e.g: flt_http_comp or flt_trace). + * Keep everything related to your filter in a same file. + +The filter 'trace' can be used as a template to write your own filter. It is a +good start to see how filters really work. + +3.1 API OVERVIEW +---------------- + +Writing a filter can be summarized to write functions and attach them to the +existing callbacks. Available callbacks are listed in the following structure: + + struct flt_ops { + /* + * Callbacks to manage the filter lifecycle + */ + int (*init) (struct proxy *p, struct flt_conf *fconf); + void (*deinit)(struct proxy *p, struct flt_conf *fconf); + int (*check) (struct proxy *p, struct flt_conf *fconf); + + /* + * Stream callbacks + */ + int (*stream_start) (struct stream *s, struct filter *f); + void (*stream_stop) (struct stream *s, struct filter *f); + + /* + * Channel callbacks + */ + int (*channel_start_analyze)(struct stream *s, struct filter *f, + struct channel *chn); + int (*channel_analyze) (struct stream *s, struct filter *f, + struct channel *chn, + unsigned int an_bit); + int (*channel_end_analyze) (struct stream *s, struct filter *f, + struct channel *chn); + + /* + * HTTP callbacks + */ + int (*http_data) (struct stream *s, struct filter *f, + struct http_msg *msg); + int (*http_chunk_trailers)(struct stream *s, struct filter *f, + struct http_msg *msg); + int (*http_end) (struct stream *s, struct filter *f, + struct http_msg *msg); + int (*http_forward_data) (struct stream *s, struct filter *f, + struct http_msg *msg, + unsigned int len); + + void (*http_reset) (struct stream *s, struct filter *f, + struct http_msg *msg); + void (*http_reply) (struct stream *s, struct filter *f, + short status, + const struct chunk *msg); + + /* + * TCP callbacks + */ + int (*tcp_data) (struct stream *s, struct filter *f, + struct channel *chn); + int (*tcp_forward_data)(struct stream *s, struct filter *f, + struct channel *chn, + unsigned int len); + }; + + +We will explain in following parts when these callbacks are called and what they +should do. + +Filters are declared in proxy sections. So each proxy have an ordered list of +filters, possibly empty if no filter is used. When the configuration of a proxy +is parsed, each filter line represents an entry in this list. In the structure +'proxy', the filters configurations are stored in the field 'filter_configs', +each one of type 'struct flt_conf *': + + /* + * Structure representing the filter configuration, attached to a proxy and + * accessible from a filter when instantiated in a stream + */ + struct flt_conf { + const char *id; /* The filter id */ + struct flt_ops *ops; /* The filter callbacks */ + void *conf; /* The filter configuration */ + struct list list; /* Next filter for the same proxy */ + }; + + * 'flt_conf.id' is an identifier, defined by the filter. It can be + NULL. HAProxy does not use this field. Filters can use it in log messages or + as a uniq identifier to check multiple declarations. It is the filter + responsibility to free it, if necessary. + + * 'flt_conf.conf' is opaque. It is the internal configuration of a filter, + generally allocated and filled by its parsing function (See § 3.2). It is + the filter responsibility to free it. + + * 'flt_conf.ops' references the callbacks implemented by the filter. This + field must be set during the parsing phase (See § 3.2) and can be refine + during the initialization phase (See § 3.3). If it is dynamically allocated, + it is the filter responsibility to free it. + + +The filter configuration is global and shared by all its instances. A filter +instance is created in the context of a stream and attached to this stream. in +the structure 'stream', the field 'strm_flt' is the state of all filter +instances attached to a stream: + + /* + * Structure reprensenting the "global" state of filters attached to a + * stream. + */ + struct strm_flt { + struct list filters; /* List of filters attached to a stream */ + struct filter *current[2]; /* From which filter resume processing, for a specific channel. + * This is used for resumable callbacks only, + * If NULL, we start from the first filter. + * 0: request channel, 1: response channel */ + unsigned short flags; /* STRM_FL_* */ + unsigned char nb_req_data_filters; /* Number of data filters registerd on the request channel */ + unsigned char nb_rsp_data_filters; /* Number of data filters registerd on the response channel */ + }; + + +Filter instances attached to a stream are stored in the field +'strm_flt.filters', each instance is of type 'struct filter *': + + /* + * Structure reprensenting a filter instance attached to a stream + * + * 2D-Array fields are used to store info per channel. The first index + * stands for the request channel, and the second one for the response + * channel. Especially, and are offets representing amount of + * data that the filter are, respectively, parsed and forwarded on a + * channel. Filters can access these values using FLT_NXT and FLT_FWD + * macros. + */ + struct filter { + struct flt_conf *config; /* the filter's configuration */ + void *ctx; /* The filter context (opaque) */ + unsigned short flags; /* FLT_FL_* */ + unsigned int next[2]; /* Offset, relative to buf->p, to the next + * byte to parse for a specific channel + * 0: request channel, 1: response channel */ + unsigned int fwd[2]; /* Offset, relative to buf->p, to the next + * byte to forward for a specific channel + * 0: request channel, 1: response channel */ + struct list list; /* Next filter for the same proxy/stream */ + }; + + * 'filter.config' is the filter configuration previously described. All + instances of a filter share it. + + * 'filter.ctx' is an opaque context. It is managed by the filter, so it is its + responsibility to free it. + + * 'filter.next' and 'filter.fwd' will be described later (See § 3.6). + + +3.2. DEFINING THE FILTER NAME AND ITS CONFIGURATION +--------------------------------------------------- + +When you write a filter, the first thing to do is to add it in the supported +filters. To do so, you must register its name as a valid keyword on the filter +line: + + /* Declare the filter parser for "my_filter" keyword */ + static struct flt_kw_list flt_kws = { "MY_FILTER_SCOPE", { }, { + { "my_filter", parse_my_filter_cfg }, + { NULL, NULL }, + } + }; + + __attribute__((constructor)) + static void + __my_filter_init(void) + { + flt_register_keywords(&flt_kws); + } + + +Then you must define the internal configuration your filter will use. For +example: + + struct my_filter_config { + struct proxy *proxy; + char *name; + /* ... */ + }; + + +You also must list all callbacks implemented by your filter. Here, we use a +global variable: + + struct flt_ops my_filter_ops { + .init = my_filter_init, + .deinit = my_filter_deinit, + .check = my_filter_config_check, + + /* ... */ + }; + + +Finally, you must define the function to parse your filter configuration, here +'parse_my_filter_cfg'. This function must parse all remaining keywords on the +filter line: + + /* Return -1 on error, else 0 */ + static int + parse_my_filter_cfg(char **args, int *cur_arg, struct proxy *px, + struct flt_conf *flt_conf, char **err) + { + struct my_filter_config *my_conf; + int pos = *cur_arg; + + /* Allocate the internal configuration used by the filter */ + my_conf = calloc(1, sizeof(*my_conf)); + if (!my_conf) { + memprintf(err, "%s: out of memory", args[*cur_arg]); + return -1; + } + my_conf->proxy = px; + + /* ... */ + + /* Parse all keywords supported by the filter and fill the internal + * configuration */ + pos++; /* Skip the filter name */ + while (*args[pos]) { + if (!strcmp(args[pos], "name")) { + if (!*args[pos + 1]) { + memprintf(err, "'%s' : '%s' option without value", + args[*cur_arg], args[pos]); + goto error; + } + my_conf->name = strdup(args[pos + 1]); + if (!my_conf->name) { + memprintf(err, "%s: out of memory", args[*cur_arg]); + goto error; + } + pos += 2; + } + + /* ... parse other keywords ... */ + } + *cur_arg = pos; + + /* Set callbacks supported by the filter */ + flt_conf->ops = &my_filter_ops; + + /* Last, save the internal configuration */ + flt_conf->conf = my_conf; + return 0; + + error: + if (my_conf->name) + free(my_conf->name); + free(my_conf); + return -1; + } + + +WARNING: In your parsing function, you must define 'flt_conf->ops'. You must + also parse all arguments on the filter line. This is mandatory. + +In the previous example, we expect to read a filter line as follows: + + filter my_filter name MY_NAME ... + + +Optionnaly, by implementing the 'flt_ops.check' callback, you add a step to +check the internal configuration of your filter after the parsing phase, when +the HAProxy configuration is fully defined. For example: + + /* Check configuration of a trace filter for a specified proxy. + * Return 1 on error, else 0. */ + static int + my_filter_config_check(struct proxy *px, struct flt_conf *my_conf) + { + if (px->mode != PR_MODE_HTTP) { + Alert("The filter 'my_filter' cannot be used in non-HTTP mode.\n"); + return 1; + } + + /* ... */ + + return 0; + } + + + +3.3. MANAGING THE FILTER LIFECYCLE +---------------------------------- + +Once the configuration parsed and checked, filters are ready to by used. There +are two callbacks to manage the filter lifecycle: + + * 'flt_ops.init': It initializes the filter for a proxy. You may define this + callback if you need to complete your filter configuration. + + * 'flt_ops.deinit': It cleans up what the parsing function and the init + callback have done. This callback is useful to release + memory allocated for the filter configuration. + +Here is an example: + + /* Initialize the filter. Returns -1 on error, else 0. */ + static int + my_filter_init(struct proxy *px, struct flt_conf *fconf) + { + struct my_filter_config *my_conf = fconf->conf; + + /* ... */ + + return 0; + } + + /* Free ressources allocated by the trace filter. */ + static void + my_filter_deinit(struct proxy *px, struct flt_conf *fconf) + { + struct my_filter_config *my_conf = fconf->conf; + + if (my_conf) { + free(my_conf->name); + /* ... */ + free(my_conf); + } + fconf->conf = NULL; + } + + +TODO: Add callbacks to handle creation/destruction of filter instances. And + document it. + + +3.4. HANDLING THE STREAMS CREATION AND DESCTRUCTION +--------------------------------------------------- + +You may be interessted to handle stream creation and destruction. If so, you +must define followings callbacks: + + * 'flt_ops.stream_start': It is called when a stream is started. This callback + can fail by returning a negative value. It will be + considered as a critical error by HAProxy which + disabled the listener for a short time. + + * 'flt_ops.stream_stop': It is called when a stream is stopped. This callback + always succeed. Anyway, it is too late to return an + error. + +For example: + + /* Called when a stream is created. Returns -1 on error, else 0. */ + static int + my_filter_stream_start(struct stream *s, struct filter *filter) + { + struct my_filter_config *my_conf = FLT_CONF(filter); + + /* ... */ + + return 0; + } + + /* Called when a stream is destroyed */ + static void + my_filter_stream_stop(struct stream *s, struct filter *filter) + { + struct my_filter_config *my_conf = FLT_CONF(filter); + + /* ... */ + } + + +WARNING: Handling the streams creation and destuction is only possible for + filters defined on proxies with the frontend capability. + + +3.5. ANALYZING THE CHANNELS ACTIVITY +------------------------------------ + +The main purpose of filters is to take part in the channels analyzing. To do so, +there is a callback, 'flt_ops.channel_analyze', called before each analyzer +attached to a channel, execpt analyzers responsible for the data +parsing/forwarding (TCP data or HTTP body). Concretely, on the request channel, +'flt_ops.channel_analyze' could be called before following analyzers: + + * tcp_inspect_request (AN_REQ_INSPECT_FE and AN_REQ_INSPECT_BE) + * http_wait_for_request (AN_REQ_WAIT_HTTP) + * http_wait_for_request_body (AN_REQ_HTTP_BODY) + * http_process_req_common (AN_REQ_HTTP_PROCESS_FE) + * process_switching_rules (AN_REQ_SWITCHING_RULES) + * http_process_req_ common (AN_REQ_HTTP_PROCESS_BE) + * http_process_tarpit (AN_REQ_HTTP_TARPIT) + * process_server_rules (AN_REQ_SRV_RULES) + * http_process_request (AN_REQ_HTTP_INNER) + * tcp_persist_rdp_cookie (AN_REQ_PRST_RDP_COOKIE) + * process_sticking_rules (AN_REQ_STICKING_RULES) + * flt_analyze_http_headers (AN_FLT_HTTP_HDRS) + +And on the response channel: + + * tcp_inspect_response (AN_RES_INSPECT) + * http_wait_for_response (AN_RES_WAIT_HTTP) + * process_store_rules (AN_RES_STORE_RULES) + * http_process_res_common (AN_RES_HTTP_PROCESS_BE) + * flt_analyze_http_headers (AN_FLT_HTTP_HDRS) + +Note that 'flt_analyze_http_headers' (AN_FLT_HTTP_HDRS) is a new analyzer. It +has been added to let filters analyze HTTP headers after all processing, just +before the data parsing/forwarding. + +Unlike the other callbacks previously seen before, 'flt_ops.channel_analyze' can +interrupt the stream processing. So a filter can decide to not execute the +analyzer that follows and wait the next iteration. If there are more than one +filter, following ones are skipped. On the next iteration, the filtering resumes +where it was stopped, i.e. on the filter that has previously stopped the +processing. So it is possible for a filter to stop the stream processing for a +while before continuing. For example: + + /* Called before a processing happens on a given channel. + * Returns a negative value if an error occurs, 0 if it needs to wait, + * any other value otherwise. */ + static int + my_filter_chn_analyze(struct stream *s, struct filter *filter, + struct channel *chn, unsigned an_bit) + { + struct my_filter_config *my_conf = FLT_CONF(filter); + + switch (an_bit) { + case AN_REQ_WAIT_HTTP: + if (/* wait that a condition is verified before continuing */) + return 0; + break; + /* ... * / + } + return 1; + } + + * 'an_bit' is the analyzer id. All analyzers are listed in + 'include/types/channels.h'. + + * 'chn' is the channel on which the analyzing is done. You can know if it is + the request or the response channel by testing if CF_ISRESP flag is set: + + │ ((chn->flags & CF_ISRESP) == CF_ISRESP) + + +In previous example, the stream processing is blocked before receipt of the HTTP +request until a condition is verified. + +To surround activity of a filter during the channel analyzing, two new analyzers +has been added: + + * 'flt_start_analyze' (AN_FLT_START_FE/AN_FLT_START_BE): For a specific + filter, this analyzer is called before any call to the 'channel_analyze' + callback. From the filter point of view, it calls the + 'flt_ops.channel_start_analyze' callback. + + * 'flt_end_analyze' (AN_FLT_END): For a specific filter, this analyzer is + called when all other analyzers have finished their processing. From the + filter point of view, it calls the 'flt_ops.channel_end_analyze' callback. + +For TCP streams, these analyzers are called only once. For HTTP streams, if the +client connection is kept alive, this happens at each request/response roundtip. + +'flt_ops.channel_start_analyze' and 'flt_ops.channel_end_analyze' callbacks can +interrupt the stream processing, as 'flt_ops.channel_analyze'. Here is an +example: + + /* Called when analyze starts for a given channel + * Returns a negative value if an error occurs, 0 if it needs to wait, + * any other value otherwise. */ + static int + my_filter_chn_start_analyze(struct stream *s, struct filter *filter, + struct channel *chn) + { + struct my_filter_config *my_conf = FLT_CONF(filter); + + /* ... TODO ... */ + + return 1; + } + + /* Called when analyze ends for a given channel + * Returns a negative value if an error occurs, 0 if it needs to wait, + * any other value otherwise. */ + static int + my_filter_chn_end_analyze(struct stream *s, struct filter *filter, + struct channel *chn) + { + struct my_filter_config *my_conf = FLT_CONF(filter); + + /* ... TODO ... */ + + return 1; + } + + +Workflow on channels can be summarized as following: + + | + +----------+-----------+ + | flt_ops.stream_start | + +----------+-----------+ + | + ... + | + +-<-- [1] +------->---------+ + | --+ | | --+ + +------<----------+ | | +--------<--------+ | + | | | | | | | + V | | | V | | ++-------------------------------+ | | | +-------------------------------+ | | +| flt_start_analyze +-+ | | | flt_start_analyze +-+ | +|(flt_ops.channel_start_analyze)| | F | |(flt_ops.channel_start_analyze)| | ++---------------+---------------+ | R | +---------------+---------------+ | + | | O | | | + +------<--------+ | N ^ +--------<-------+ | B + | | | T | | | | A ++---------------+----------+ | | E | +---------------+----------+ | | C +|+--------------V-----------+ | | N | |+--------------V-----------+ | | K +||+--------------------------+ | | D | ||+--------------------------+ | | E +||| flt_ops.channel_analyze | | | | ||| flt_ops.channel_analyze | | | N ++|| V +--+ | | +|| V +---+ | D + +| analyzer | | | +| analyzer | | + +-------------+------------+ | | +-------------+------------+ | + | --+ | | | + +------------>------------+ ... | + | | + [ data filtering (see below) ] | + | | + ... | + | | + +--------<--------+ | + | | | + V | | + +-------------------------------+ | | + | flt_end_analyze +-+ | + | (flt_ops.channel_end_analyze) | | + +---------------+---------------+ | + | --+ + If HTTP stream, go back to [1] --<--+ + | + ... + | + +----------+-----------+ + | flt_ops.stream_stop | + +----------+-----------+ + | + V + + +TODO: Add pre/post analyzer callbacks with a mask. So, this part will be + massively refactored very soon. + + + 3.6. FILTERING THE DATA EXCHANGED +----------------------------------- + +WARNING: To fully understand this part, you must be aware on how the buffers + work in HAProxy. In particular, you must be comfortable with the idea + of circular buffers. See doc/internals/buffer-operations.txt and + doc/internals/buffer-ops.fig for details. + doc/internals/body-parsing.txt could also be useful. + +An extended feature of the filters is the data filtering. By default a filter +does not look into data exchanged between the client and the server because it +is expensive. Indeed, instead of forwarding data without any processing, each +byte need to be buffered. + +So, to enable the data filtering on a channel, at any time, in one of previous +callbacks, you should call 'register_data_filter' function. And conversely, to +disable it, you should call 'unregister_data_filter' function. For example: + + my_filter_chn_analyze(struct stream *s, struct filter *filter, + struct channel *chn, unsigned an_bit) + { + struct my_filter_config *my_conf = FLT_CONF(filter); + + /* 'chn' must be the request channel */ + if (!(chn->flags & CF_ISRESP) && an_bit == AN_FLT_HTTP_HDRS) { + struct http_txn *txn = s->txn; + struct http_msg *msg = &txn->req; + struct buffer *req = msg->chn->buf; + struct hdr_ctx ctx; + + /* Enable the data filtering for the request if 'X-Filter' header + * is set to 'true'. */ + if (http_find_header2("X-Filter", 8, req->p, &txn->hdr_idx, &ctx) && + ctx.vlen >= 3 && memcmp(ctx.line + ctx.val, "true", 4) == 0) + register_data_filter(s, chn_filter); + } + + return 1; + } + +Here, the data filtering is enabled if the HTTP header 'X-Filter' is found and +set to 'true'. + +If several filters are declared, the evaluation order remains the same, +regardless the order of the registrations to the data filtering. + +Depending on the stream type, TCP or HTTP, the way to handle data filtering will +be slightly different. Among other things, for HTTP streams, there are more +callbacks to help you to fully handle all steps of an HTTP transaction. But the +basis is the same. The data filtering is done in 2 stages: + + * The data parsing: At this stage, filters will analyze input data on a + channel. Once a filter has parsed some data, it cannot parse it again. At + any time, a filter can choose to not parse all available data. So, it is + possible for a filter to retain data for a while. Because filters are + chained, a filter cannot parse more data than its predecessors. Thus only + data considered as parsed by the last filter will be available to the next + stage, the data forwarding. + + * The data forwarding: At this stage, filters will decide how much data + HAProxy can forward among those considered as parsed at the previous + stage. Once a filter has marked data as forwardable, it cannot analyze it + anymore. At any time, a filter can choose to not forward all parsed + data. So, it is possible for a filter to retain data for a while. Because + filters are chained, a filter cannot forward more data than its + predecessors. Thus only data marked as forwardable by the last filter will + be actually forwarded by HAProxy. + +Internally, filters own 2 offsets, relatively to 'buf->p', representing the +number of bytes already parsed in the available input data and the number of +bytes considered as forwarded. We will call these offsets, respectively, 'nxt' +and 'fwd'. Following macros reference these offsets: + + * FLT_NXT(flt, chn), flt_req_nxt(flt) and flt_rsp_nxt(flt) + + * FLT_FWD(flt, chn), flt_req_fwd(flt) and flt_rsp_fwd(flt) + +where 'flt' is the 'struct filter' passed as argument in all callbacks and 'chn' +is the considered channel. + +Using these offsets, following operations on buffers are possible: + + chn->buf->p + FLT_NXT(flt, chn) // the pointer on parsable data for + // the filter 'flt' on the channel 'chn'. + // Everything between chn->buf->p and 'nxt' offset was already parsed + // by the filter. + + chn->buf->i - FLT_NXT(flt, chn) // the number of bytes of parsable data for + // the filter 'flt' on the channel 'chn'. + + chn->buf->p + FLT_FWD(flt, chn) // the pointer on forwardable data for + // the filter 'flt' on the channel 'chn'. + // Everything between chn->buf->p and 'fwd' offset was already forwarded + // by the filter. + + +Note that at any time, for a filter, 'nxt' offset is always greater or equal to +'fwd' offset. + +TODO: Add schema with buffer states when there is 2 filters that analyze data. + + +3.6.1 FILTERING DATA ON TCP STREAMS +----------------------------------- + +The TCP data filtering is the easy case, because HAProxy do not parse these +data. So you have only two callbacks that you need to consider: + + * 'flt_ops.tcp_data': This callback is called when unparsed data are + available. If not defined, all available data will be considered as parsed + for the filter. + + * 'flt_ops.tcp_forward_data': This callback is called when parsed data are + available. If not defined, all parsed data will be considered as forwarded + for the filter. + +Here is an example: + + /* Returns a negative value if an error occurs, else the number of + * consumed bytes. */ + static int + my_filter_tcp_data(struct stream *s, struct filter *filter, + struct channel *chn) + { + struct my_filter_config *my_conf = FLT_CONF(filter); + int avail = chn->buf->i - FLT_NXT(filter, chn); + int ret = avail; + + /* Do not parse more than 'my_conf->max_parse' bytes at a time */ + if (my_conf->max_parse != 0 && ret > my_conf->max_parse) + ret = my_conf->max_parse; + + /* if available data are not completely parsed, wake up the stream to + * be sure to not freeze it. */ + if (ret != avail) + task_wakeup(s->task, TASK_WOKEN_MSG); + return ret; + } + + + /* Returns a negative value if an error occurs, else * or the number of + * forwarded bytes. */ + static int + my_filter_tcp_forward_data(struct stream *s, struct filter *filter, + struct channel *chn, unsigned int len) + { + struct my_filter_config *my_conf = FLT_CONF(filter); + int ret = len; + + /* Do not forward more than 'my_conf->max_forward' bytes at a time */ + if (my_conf->max_forward != 0 && ret > my_conf->max_forward) + ret = my_conf->max_forward; + + /* if parsed data are not completely forwarded, wake up the stream to + * be sure to not freeze it. */ + if (ret != len) + task_wakeup(s->task, TASK_WOKEN_MSG); + return ret; + } + + + +3.6.2 FILTERING DATA ON HTTP STREAMS +------------------------------------ + +The HTTP data filtering is a bit tricky because HAProxy will parse the body +structure, especially chunked body. So basically there is the HTTP counterpart +to the previous callbacks: + + * 'flt_ops.http_data': This callback is called when unparsed data are + available. If not defined, all available data will be considered as parsed + for the filter. + + * 'flt_ops.http_forward_data': This callback is called when parsed data are + available. If not defined, all parsed data will be considered as forwarded + for the filter. + +But the prototype for these callbacks is slightly different. Instead of having +the channel as parameter, we have the HTTP message (struct http_msg). You need +to be careful when you use 'http_msg.chunk_len' size. This value is the number +of bytes remaining to parse in the HTTP body (or the chunk for chunked +messages). The HTTP parser of HAProxy uses it to have the number of bytes that +it could consume: + + /* Available input data in the current chunk from the HAProxy point of view. + * msg->next bytes were already parsed. Without data filtering, HAProxy + * will consume all of it. */ + Bytes = MIN(msg->chunk_len, chn->buf->i - msg->next); + + +But in your filter, you need to recompute it: + + /* Available input data in the current chunk from the filter point of view. + * 'nxt' bytes were already parsed. */ + Bytes = MIN(msg->chunk_len + msg->next, chn->buf->i) - FLT_NXT(flt, chn); + + +In addition to these callbacks, there are two other: + + * 'flt_ops.http_end': This callback is called when the whole HTTP + request/response is processed. It can interrupt the stream processing. So, + it could be used to synchronize the HTTP request with the HTTP response, for + example: + + /* Returns a negative value if an error occurs, 0 if it needs to wait, + * any other value otherwise. */ + static int + my_filter_http_end(struct stream *s, struct filter *filter, + struct http_msg *msg) + { + struct my_filter_ctx *my_ctx = filter->ctx; + + + if (!(msg->chn->flags & CF_ISRESP)) /* The request */ + my_ctx->end_of_req = 1; + else /* The response */ + my_ctx->end_of_rsp = 1; + + /* Both the request and the response are finished */ + if (my_ctx->end_of_req == 1 && my_ctx->end_of_rsp == 1) + return 1; + + /* Wait */ + return 0; + } + + + * 'flt_ops.http_chunk_trailers': This callback is called for chunked HTTP + messages only when all chunks were parsed. HTTP trailers can be parsed into + several passes. This callback will be called each time. The number of bytes + parsed by HAProxy at each iteration is stored in 'msg->sol'. + +Then, to finish, there are 2 informational callbacks: + + * 'flt_ops.http_reset': This callback is called when a HTTP message is + reset. This only happens when a '100-continue' response is received. It + could be useful to reset the filter context before receiving the true + response. + + * 'flt_ops.http_reply': This callback is called when, at any time, HAProxy + decides to stop the processing on a HTTP message and to send an internal + response to the client. This mainly happens when an error or a redirect + occurs. + + +3.6.3 REWRITING DATA +-------------------- + +The last part, and the trickiest one about the data filtering, is about the data +rewriting. For now, the filter API does not offer a lot of functions to handle +it. There are only functions to notify HAProxy that the data size has changed to +let it update internal state of filters. This is your responsibility to update +data itself, i.e. the buffer offsets. For a HTTP message, you also must update +'msg->next' and 'msg->chunk_len' values accordingly: + + * 'flt_change_next_size': This function must be called when a filter alter + incoming data. It updates 'nxt' offset value of all its predecessors. Do not + call this function when a filter change the size of incoming data leads to + an undefined behavior. + + unsigned int avail = MIN(msg->chunk_len + msg->next, chn->buf->i) - + flt_rsp_next(filter); + + if (avail > 10 and /* ...Some condition... */) { + /* Move the buffer forward to have buf->p pointing on unparsed + * data */ + b_adv(msg->chn->buf, flt_rsp_nxt(filter)); + + /* Skip first 10 bytes. To simplify this example, we consider a + * non-wrapping buffer */ + memmove(buf->p + 10, buf->p, avail - 10); + + /* Restore buf->p value */ + b_rew(msg->chn->buf, flt_rsp_nxt(filter)); + + /* Now update other filters */ + flt_change_next_size(filter, msg->chn, -10); + + /* Update the buffer state */ + buf->i -= 10; + + /* And update the HTTP message state */ + msg->chunk_len -= 10; + + return (avail - 10); + } + else + return 0; /* Wait for more data */ + + + * 'flt_change_forward_size': This function must be called when a filter alter + parsed data. It updates offset values ('nxt' and 'fwd') of all filters. Do + not call this function when a filter change the size of parsed data leads to + an undefined behavior. + + /* len is the number of bytes of forwardable data */ + if (len > 10 and /* ...Some condition... */) { + /* Move the buffer forward to have buf->p pointing on non-forwarded + * data */ + b_adv(msg->chn->buf, flt_rsp_fwd(filter)); + + /* Skip first 10 bytes. To simplify this example, we consider a + * non-wrapping buffer */ + memmove(buf->p + 10, buf->p, len - 10); + + /* Restore buf->p value */ + b_rew(msg->chn->buf, flt_rsp_fwd(filter)); + + /* Now update other filters */ + flt_change_forward_size(filter, msg->chn, -10); + + /* Update the buffer state */ + buf->i -= 10; + + /* And update the HTTP message state */ + msg->next -= 10; + + return (len - 10); + } + else + return 0; /* Wait for more data */ + + +TODO: implement all the stuff to easily rewrite data. For HTTP messages, this + requires to have a chunked message. Else the size of data cannot be + changed. + + + + +4. FAQ +------ + +4.1. Detect multiple declarations of the same filter +---------------------------------------------------- + +TODO