From: Joe Orton There are a number of common pitfalls encountered when writing
+ output filters; this page aims to document best practice for
+ authors of new or existing filters. This document is applicable to both version 2.0 and version 2.2
+ of the Apache HTTP Server; it specifically targets
+ Each time a filter is invoked, it is passed a bucket
+ brigade, containing a sequence of buckets which
+ represent both data content and metadata. Every bucket has a
+ bucket type; a number of bucket types are defined and
+ used by the A filter can tell whether a bucket represents either data or
+ metadata using the There are two metadata bucket types which all filters must pay
+ attention to: the Filters can create This shows a bucket brigade which may be passed to a filter; it
+ contains two metadata buckets ( For any given request, an output filter might be invoked only
+ once and be given a single brigade representing the entire response.
+ It is also possible that the number of times a filter is invoked
+ for single response is proportional to the size of the content
+ being filtered, with the filter being passed a brigade containing
+ a single bucket each time. Filters must operate correctly in
+ either case. An output filter can distinguish the final invocation for a
+ given response by the presence of an An output filter should never pass an empty brigade up the
+ filter chain. But, for good defensive programming, filters should
+ be prepared to accept an empty brigade, and do nothing. A bucket brigade is a doubly-linked list of buckets. The list
+ is terminated (at both ends) by a sentinel which can be
+ distinguished from a normal bucket by comparing it with the
+ pointer returned by There are a variety of functions and macros for traversing and
+ manipulating bucket brigades; see the apr_bucket.h
+ header for complete coverage. Commonly used macros include:
+
+ RESOURCE-level or CONTENT_SET-level
+ filters though some advice is generic to all types of filter.httpd core modules (and the
+ apr-util library which provides the bucket brigade
+ interface), but modules are free to define their own types.APR_BUCKET_IS_METADATA macro.
+ Generally, all metadata buckets should be passed up the filter
+ chain by an output filter. Filters may transform, delete, and
+ insert data buckets as appropriate.EOS bucket type, and the
+ FLUSH bucket type. An EOS bucket
+ indicates that the end of the response has been reached and no
+ further buckets need be processed. A FLUSH bucket
+ indicates that the filter should flush any buffered buckets (if
+ applicable) down the filter chain immediately.FLUSH buckets are sent when the
+ content generator (or a downstream filter) knows that there may be
+ a delay before more content can be sent. By passing
+ FLUSH buckets up the filter chain immediately,
+ filters ensure that the client is not kept waiting for pending
+ data longer than necessary.FLUSH buckets and pass these up
+ the filter chain if desired. Generating FLUSH
+ buckets unnecessarily, or too frequently, can harm network
+ utilisation since it may force large numbers of small packets to
+ be sent, rather than a small number of larger packets. The
+ section on Non-blocking bucket reads
+ covers a case where filters are encouraged to generate
+ FLUSH buckets.HEAP FLUSH FILE EOS
FLUSH and
+ EOS), and two data buckets (HEAP and
+ FILE).EOS bucket in
+ the brigade. Any buckets in the brigade after an EOS should be
+ ignored.apr_status_t dummy_filter(ap_filter_t *f, apr_bucket_brigade *bb)
+{
+ if (APR_BRIGADE_EMPTY(bb)) {
+ return APR_SUCCESS;
+ }
+ ....APR_BRIGADE_SENTINEL. The list
+ sentinel is in fact not a valid bucket structure; any attempt to
+ call normal bucket functions (such as
+ apr_bucket_read) on the sentinel will have undefined
+ behaviour (i.e. will crash the process).
+
APR_BRIGADE_FIRST(bb)APR_BRIGADE_LAST(bb)APR_BUCKET_NEXT(e)APR_BUCKET_PREV(e)
The apr_bucket_brigade structure itself is
+ allocated out of a pool, so if a filter creates a new brigade, it
+ must ensure that memory use is correctly bounded. A filter which
+ allocates a new brigade out of the request pool
+ (r->pool) on every invocation, for example, will fall
+ foul of the warning above concerning
+ memory use. Such a filter should instead create a brigade on the
+ first invocation per request, and store that brigade in its state structure.
apr_brigade_destroy to "destroy" a brigade. The
+ memory used by the brigade structure will not be released by
+ calling this function (since it comes from a pool), but the
+ associated pool cleanup is unregistered. Using
+ apr_brigade_destroy can in fact cause memory leaks;
+ if a "destroyed" brigade contains still contains buckets when its
+ containing pool is destroyed, those buckets will not be
+ immediately destroyed.When dealing with non-metadata buckets, it is important to
+ understand that the "apr_bucket *" object is an
+ abstract representation of data:
+
+
->length field is set to
+ the value (apr_size_t)-1. The PIPE
+ bucket type is an example of a bucket type has an indeterminate
+ length; it represents the output from a pipe, .FILE bucket type, for example,
+ represents data stored in a file on disk.apr_bucket_read function. When this function is
+ invoked, the bucket may morph into a different bucket
+ type, and may also insert a new bucket into the bucket brigade.
+ This must happen for buckets which represent data not mapped into
+ memory.
+
+ To give an example; consider a bucket brigade containing a
+ single FILE bucket representing an entire file, 24
+ kilobytes in size:
FILE(0K-24K)
When this bucket is read, it will read a block of data from the
+ file, morph into a HEAP bucket to represent that
+ data, and return the data to the caller. It also inserts a new
+ FILE bucket representing the remainder of the file;
+ after the apr_bucket_read call, the brigade looks
+ like:
HEAP(8K) FILE(8K-24K)
The basic function of any output filter will be to iterate + through the passed-in brigade and transform (or simply examine) + the content in some manner. The implementation of the iteration + loop is critical to producing a well-behaved output filter.
+ +Taking an example which loops through the entire brigade as
+ follows:
+
+ apr_bucket *e = APR_BRIGADE_FIRST(bb);
+const char *data;
+apr_size_t len;
+
+while (e != APR_BRIGADE_SENTINEL(bb)) {
+ apr_bucket_read(e, &data, &length, APR_BLOCK_READ);
+ e = APR_BUCKET_NEXT(e);
+}
+
+return ap_pass_brigade(bb);FILE bucket, for example,
+ the entire file contents would be read into memory as each
+ apr_bucket_read call morphed a FILE
+ bucket into a HEAP bucket.
In contrast, the implementation below will use consume a fixed + amount of memory to filter any brigade; a temporary brigade is + needed and must be allocated only once per response, see the Maintaining state section.
+ +apr_bucket *e;
+const char *data;
+apr_size_t len;
+
+while ((e = APR_BRIGADE_FIRST(bb)) != APR_BRIGADE_SENTINEL(bb)) {
+ rv = apr_bucket_read(e, &data, &length, APR_BLOCK_READ);
+ if (rv) ...;
+ /* Remove bucket e from bb. */
+ APR_BUCKET_REMOVE(e);
+ /* Insert it into temporary brigade. */
+ APR_BRIGADE_INSERT_HEAD(tmpbb);
+ /* Pass brigade upstream. */
+ rv = ap_pass_brigade(f->next, tmpbb);
+ if (rv) ...;
+ apr_brigade_cleanup(tmpbb);
+}A filter which needs to maintain state over multiple
+ invocations per response can use the ->ctx field of
+ its ap_filter_t structure. It is typical to store a
+ temporary brigade in such a structure, to avoid having to allocate
+ a new brigade per invocation as described in the Brigade structure section.
struct dummy_state {
+ apr_bucket_brigade *tmpbb;
+ int filter_state;
+ ....
+};
+
+apr_status_t dummy_filter(ap_filter_t *f, apr_bucket_brigade *bb)
+{
+ struct dummy_state *state;
+
+ state = f->ctx;
+ if (state == NULL) {
+ /* First invocation for this response: initialise state structure. */
+ f->ctx = state = apr_palloc(sizeof *state, f->r->pool);
+
+ state->tmpbb = apr_brigade_create(f->r->pool, f->c->bucket_alloc);
+ state->filter_state = ...;
+ }
+ ...If a filter decides to store buckets beyond the duration of a
+ single filter function invocation (for example storing them in its
+ ->ctx state structure), those buckets must be set
+ aside. This is necessary because some bucket types provide
+ buckets which represent temporary resources (such as stack memory)
+ which will fall out of scope as soon as the filter chain completes
+ processing the brigade.
To setaside a bucket, the apr_bucket_setaside
+ function can be called. Not all bucket types can be setaside, but
+ if successful, the bucket will have morphed to ensure it has a
+ lifetime at least as long as the pool given as an argument to the
+ apr_bucket_setaside function.
Alternatively, the ap_save_brigade function can be
+ used, which will create a new brigade containing buckets with a
+ lifetime as long as the given pool argument. This function must
+ be used with great care, however: on return it guarantees that all
+ the buckets in the returned brigade will represent data mapped
+ into memory. If given an input brigade containing, for example, a
+ PIPE bucket, ap_save_brigade will consume an
+ arbitrary amount of memory to store the entire output of the
+ pipe.
The apr_bucket_read function takes an
+ apr_read_type_e argument which determines whether a
+ blocking or non-blocking read will be performed
+ from the data source. A good filter will first attempt to read
+ from every data bucket using a non-blocking read; if that fails
+ with APR_EAGAIN, then send a FLUSH
+ bucket up the filter chain, and retry using a blocking read.
This mode of operation ensure that any filters further up the + filter chain will flush any buffered buckets if a slow content + source is being used.
+ +A CGI script is an example of a slow content source which is
+ implemented as a bucket type. PIPE buckets which represent the output from a CGI
+ script; reading from such a bucket will block when waiting for the
+ CGI script to produce more output.
apr_bucket *e;
+apr_read_type_e mode = APR_NONBLOCK_READ;
+
+while ((e = APR_BRIGADE_FIRST(bb)) != APR_BRIGADE_SENTINEL(bb)) {
+ apr_status_t rv;
+
+ rv = apr_bucket_read(e, &data, &length, mode);
+ if (rv == APR_EAGAIN && mode == APR_NONBLOCK_READ) {
+ /* Pass up a brigade containing a flush bucket: */
+ APR_BRIGADE_INSERT_TAIL(tmpbb, apr_bucket_flush_create(...));
+ rv = ap_pass_brigade(f->next, tmpbb);
+ apr_brigade_cleanup(tmpbb);
+ if (rv != APR_SUCCESS) return rv;
+
+ /* Retry, using a blocking read. */
+ mode = APR_BLOCK_READ;
+ continue;
+ } else if (rv != APR_SUCCESS) {
+ /* handle errors */
+ }
+
+ /* Next time, try a non-blocking read first. */
+ mode = APR_NONBLOCK_READ;
+ ...
+}In summary, here is a set of rules for all output filters to + follow:
+ +FLUSH buckets should be respected by passing
+ any pending or buffered buckets up the filter chain.EOS bucket.ap_pass_brigade to pass a brigade
+ up the filter chain, output filters should call
+ apr_brigade_clear to ensure the brigade is empty
+ before reusing that brigade structure; output filters should
+ never use apr_brigade_destroy to "destroy"
+ brigades.ap_pass_brigade, and must return appropriate errors
+ back down the filter chain.FLUSH bucket up the
+ filter chain if the read blocks, before retrying with a blocking
+ read.