From: André Malo Date: Sat, 5 Apr 2003 02:06:36 +0000 (+0000) Subject: new XML X-Git-Tag: pre_ajp_proxy~1912 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=b3637f6e123ddc00c4509bf754da26e783e8ecfd;p=thirdparty%2Fapache%2Fhttpd.git new XML git-svn-id: https://svn.apache.org/repos/asf/httpd/httpd/trunk@99224 13f79535-47bb-0310-9956-ffa450edef68 --- diff --git a/docs/manual/developer/filters.html b/docs/manual/developer/filters.html deleted file mode 100644 index d5ea5911b01..00000000000 --- a/docs/manual/developer/filters.html +++ /dev/null @@ -1,169 +0,0 @@ - - - - - - - How filters work in Apache 2.0 - - - - - - -

How filters work in Apache 2.0

- -

Warning - this is a cut 'n paste job from an email: - <022501c1c529$f63a9550$7f00000a@KOJ>

- -
-There are three basic filter types (each of these is actually broken
-down into two categories, but that comes later).
-
-CONNECTION:  Filters of this type are valid for the lifetime of this
-             connection. (AP_FTYPE_CONNECTION, AP_FTYPE_NETWORK)
-
-PROTOCOL:    Filters of this type are valid for the lifetime of this
-             request from the point of view of the client, this means
-             that the request is valid from the time that the request
-             is sent until the time that the response is received.
-             (AP_FTYPE_PROTOCOL, AP_FTYPE_TRANSCODE)
-
-RESOURCE:    Filters of this type are valid for the time that this
-             content is used to satisfy a request.  For simple
-             requests, this is identical to PROTOCOL, but internal redirects
-             and sub-requests can change the content without ending
-             the request. (AP_FTYPE_RESOURCE, AP_FTYPE_CONTENT_SET)
-
-It is important to make the distinction between a protocol and a
-resource filter.  A resource filter is tied to a specific resource, it
-may also be tied to header information, but the main binding is to a
-resource.  If you are writing a filter and you want to know if it is
-resource or protocol, the correct question to ask is:  "Can this filter
-be removed if the request is redirected to a different resource?"  If
-the answer is yes, then it is a resource filter.  If it is no, then it
-is most likely a protocol or connection filter.  I won't go into
-connection filters, because they seem to be well understood.
-
-With this definition, a few examples might help:
-Byterange:  We have coded it to be inserted for all
-requests, and it is removed if not used.  Because this filter is active
-at the beginning of all requests, it can not be removed if it is
-redirected, so this is a protocol filter.
-
-http_header:  This filter actually writes the headers to the
-network.  This is obviously a required filter (except in the asis case
-which is special and will be dealt with below) and so it is a protocol
-filter.
-
-Deflate:  The administrator configures this filter based on
-which file has been requested.  If we do an internal redirect from an
-autoindex page to an index.html page, the deflate filter may be added or
-removed based on config, so this is a resource filter.
-
-The further breakdown of each category into two more filter types is
-strictly for ordering.  We could remove it, and only allow for one
-filter type, but the order would tend to be wrong, and we would need to
-hack things to make it work.  Currently, the RESOURCE filters only have
-one filter type, but that should change.
-
-How are filters inserted?
-This is actually rather simple in theory, but the code is
-complex.  First of all, it is important that everybody realize that
-there are three filter lists for each request, but they are all
-concatenated together.  So, the first list is r->output_filters, then
-r->proto_output_filters, and finally r->connection->output_filters.
-These correspond to the RESOURCE, PROTOCOL, and CONNECTION filters
-respectively.  The problem previously, was that we used a singly linked
-list to create the filter stack, and we started from the "correct"
-location.  This means that if I had a RESOURCE filter on the stack, and
-I added a CONNECTION filter, the CONNECTION filter would be ignored.
-This should make sense, because we would insert the connection filter at
-the top of the c->output_filters list, but the end of r->output_filters
-pointed to the filter that used to be at the front of c->output_filters.
-This is obviously wrong.  The new insertion code uses a doubly linked
-list.  This has the advantage that we never lose a filter that has been
-inserted.  Unfortunately, it comes with a separate set of headaches.
-
-The problem is that we have two different cases were we use subrequests.
-The first is to insert more data into a response.  The second is to
-replace the existing response with an internal redirect.  These are two
-different cases and need to be treated as such.
-
-In the first case, we are creating the subrequest from within a handler
-or filter.  This means that the next filter should be passed to
-make_sub_request function, and the last resource filter in the
-sub-request will point to the next filter in the main request.  This
-makes sense, because the sub-request's data needs to flow through the
-same set of filters as the main request.  A graphical representation
-might help:
-
-Default_handler --> includes_filter --> byterange --> content_length ->
-etc
-
-If the includes filter creates a sub request, then we don't want the
-data from that sub-request to go through the includes filter, because it
-might not be SSI data.  So, the subrequest adds the following:
-
-Default_handler --> includes_filter -/-> byterange --> content_length -> etc
-                                    /
-Default_handler --> sub_request_core
-
-What happens if the subrequest is SSI data?  Well, that's easy, the
-includes_filter is a resource filter, so it will be added to the sub
-request in between the Default_handler and the sub_request_core filter.
-
-The second case for sub-requests is when one sub-request is going to
-become the real request.  This happens whenever a sub-request is created
-outside of a handler or filter, and NULL is passed as the next filter to
-the make_sub_request function.
-
-In this case, the resource filters no longer make sense for the new
-request, because the resource has changed.  So, instead of starting from
-scratch, we simply point the front of the resource filters for the
-sub-request to the front of the protocol filters for the old request.
-This means that we won't lose any of the protocol filters, neither will
-we try to send this data through a filter that shouldn't see it.
-
-The problem is that we are using a doubly-linked list for our filter
-stacks now. But, you should notice that it is possible for two lists to
-intersect in this model.  So, you do you handle the previous pointer?
-This is a very difficult question to answer, because there is no "right"
-answer, either method is equally valid.  I looked at why we use the
-previous pointer.  The only reason for it is to allow for easier
-addition of new servers.  With that being said, the solution I chose was
-to make the previous pointer always stay on the original request.
-
-This causes some more complex logic, but it works for all cases.  My
-concern in having it move to the sub-request, is that for the more
-common case (where a sub-request is used to add data to a response), the
-main filter chain would be wrong.  That didn't seem like a good idea to
-me.
-
-asis:
-The final topic.  :-)  Mod_Asis is a bit of a hack, but the
-handler needs to remove all filters except for connection filters, and
-send the data.  If you are using mod_asis, all other bets are off.
-
-The absolutely last point is that the reason this code was so hard to
-get right, was because we had hacked so much to force it to work.  I
-wrote most of the hacks originally, so I am very much to blame.
-However, now that the code is right, I have started to remove some
-hacks.  Most people should have seen that the reset_filters and
-add_required_filters functions are gone.  Those inserted protocol level
-filters for error conditions, in fact, both functions did the same
-thing, one after the other, it was really strange.  Because we don't
-lose protocol filters for error cases any more, those hacks went away.
-The HTTP_HEADER, Content-length, and Byterange filters are all added in
-the insert_filters phase, because if they were added earlier, we had
-some interesting interactions.  Now, those could all be moved to be
-inserted with the HTTP_IN, CORE, and CORE_IN filters.  That would make
-the code easier to follow.
-
- - - - - diff --git a/docs/manual/developer/filters.html.en b/docs/manual/developer/filters.html.en new file mode 100644 index 00000000000..35697f94621 --- /dev/null +++ b/docs/manual/developer/filters.html.en @@ -0,0 +1,204 @@ + + + +How filters work in Apache 2.0 - Apache HTTP Server + + + + + +
<-
+
+Apache > HTTP Server > Documentation > Version 2.1

How filters work in Apache 2.0

+

Warning

+

This is a cut 'n paste job from an email + (<022501c1c529$f63a9550$7f00000a@KOJ>) and only reformatted for + better readability. It's not up to date but may be a good start for + further research.

+
+
+ +
top
+
+

Filter Types

+

There are three basic filter types (each of these is actually broken + down into two categories, but that comes later).

+ +
+
CONNECTION
+
Filters of this type are valid for the lifetime of this connection. + (AP_FTYPE_CONNECTION, AP_FTYPE_NETWORK)
+ +
PROTOCOL
+
Filters of this type are valid for the lifetime of this request from + the point of view of the client, this means that the request is valid + from the time that the request is sent until the time that the response + is received. (AP_FTYPE_PROTOCOL, + AP_FTYPE_TRANSCODE)
+ +
RESOURCE
+
Filters of this type are valid for the time that this content is used + to satisfy a request. For simple requests, this is identical to + PROTOCOL, but internal redirects and sub-requests can change + the content without ending the request. (AP_FTYPE_RESOURCE, + AP_FTYPE_CONTENT_SET)
+
+ +

It is important to make the distinction between a protocol and a + resource filter. A resource filter is tied to a specific resource, it + may also be tied to header information, but the main binding is to a + resource. If you are writing a filter and you want to know if it is + resource or protocol, the correct question to ask is: "Can this filter + be removed if the request is redirected to a different resource?" If + the answer is yes, then it is a resource filter. If it is no, then it + is most likely a protocol or connection filter. I won't go into + connection filters, because they seem to be well understood. With this + definition, a few examples might help:

+ +
+
Byterange
+
We have coded it to be inserted for all requests, and it is removed + if not used. Because this filter is active at the beginning of all + requests, it can not be removed if it is redirected, so this is a + protocol filter.
+ +
http_header
+
This filter actually writes the headers to the network. This is + obviously a required filter (except in the asis case which is special + and will be dealt with below) and so it is a protocol filter.
+ +
Deflate
+
The administrator configures this filter based on which file has been + requested. If we do an internal redirect from an autoindex page to an + index.html page, the deflate filter may be added or removed based on + config, so this is a resource filter.
+
+ +

The further breakdown of each category into two more filter types is + strictly for ordering. We could remove it, and only allow for one + filter type, but the order would tend to be wrong, and we would need to + hack things to make it work. Currently, the RESOURCE filters + only have one filter type, but that should change.

+
top
+
+

How are filters inserted?

+

This is actually rather simple in theory, but the code is + complex. First of all, it is important that everybody realize that + there are three filter lists for each request, but they are all + concatenated together. So, the first list is + r->output_filters, then r->proto_output_filters, + and finally r->connection->output_filters. These correspond + to the RESOURCE, PROTOCOL, and + CONNECTION filters respectively. The problem previously, was + that we used a singly linked list to create the filter stack, and we + started from the "correct" location. This means that if I had a + RESOURCE filter on the stack, and I added a + CONNECTION filter, the CONNECTION filter would + be ignored. This should make sense, because we would insert the connection + filter at the top of the c->output_filters list, but the end + of r->output_filters pointed to the filter that used to be + at the front of c->output_filters. This is obviously wrong. + The new insertion code uses a doubly linked list. This has the advantage + that we never lose a filter that has been inserted. Unfortunately, it comes + with a separate set of headaches.

+ +

The problem is that we have two different cases were we use subrequests. + The first is to insert more data into a response. The second is to + replace the existing response with an internal redirect. These are two + different cases and need to be treated as such.

+ +

In the first case, we are creating the subrequest from within a handler + or filter. This means that the next filter should be passed to + make_sub_request function, and the last resource filter in the + sub-request will point to the next filter in the main request. This + makes sense, because the sub-request's data needs to flow through the + same set of filters as the main request. A graphical representation + might help:

+ +
+Default_handler --> includes_filter --> byterange --> ...
+
+ +

If the includes filter creates a sub request, then we don't want the + data from that sub-request to go through the includes filter, because it + might not be SSI data. So, the subrequest adds the following:

+ +
    
+Default_handler --> includes_filter -/-> byterange --> ...
+                                    /
+Default_handler --> sub_request_core
+
+ +

What happens if the subrequest is SSI data? Well, that's easy, the + includes_filter is a resource filter, so it will be added to + the sub request in between the Default_handler and the + sub_request_core filter.

+ +

The second case for sub-requests is when one sub-request is going to + become the real request. This happens whenever a sub-request is created + outside of a handler or filter, and NULL is passed as the next filter to + the make_sub_request function.

+ +

In this case, the resource filters no longer make sense for the new + request, because the resource has changed. So, instead of starting from + scratch, we simply point the front of the resource filters for the + sub-request to the front of the protocol filters for the old request. + This means that we won't lose any of the protocol filters, neither will + we try to send this data through a filter that shouldn't see it.

+ +

The problem is that we are using a doubly-linked list for our filter + stacks now. But, you should notice that it is possible for two lists to + intersect in this model. So, you do you handle the previous pointer? + This is a very difficult question to answer, because there is no "right" + answer, either method is equally valid. I looked at why we use the + previous pointer. The only reason for it is to allow for easier + addition of new servers. With that being said, the solution I chose was + to make the previous pointer always stay on the original request.

+ +

This causes some more complex logic, but it works for all cases. My + concern in having it move to the sub-request, is that for the more + common case (where a sub-request is used to add data to a response), the + main filter chain would be wrong. That didn't seem like a good idea to + me.

+
top
+
+

Asis

+

The final topic. :-) Mod_Asis is a bit of a hack, but the + handler needs to remove all filters except for connection filters, and + send the data. If you are using mod_asis, all other + bets are off.

+
top
+
+

Explanations

+

The absolutely last point is that the reason this code was so hard to + get right, was because we had hacked so much to force it to work. I + wrote most of the hacks originally, so I am very much to blame. + However, now that the code is right, I have started to remove some + hacks. Most people should have seen that the reset_filters + and add_required_filters functions are gone. Those inserted + protocol level filters for error conditions, in fact, both functions did + the same thing, one after the other, it was really strange. Because we + don't lose protocol filters for error cases any more, those hacks went away. + The HTTP_HEADER, Content-length, and + Byterange filters are all added in the + insert_filters phase, because if they were added earlier, we + had some interesting interactions. Now, those could all be moved to be + inserted with the HTTP_IN, CORE, and + CORE_IN filters. That would make the code easier to + follow.

+
+ + \ No newline at end of file diff --git a/docs/manual/developer/filters.xml b/docs/manual/developer/filters.xml new file mode 100644 index 00000000000..1ce61ef1a91 --- /dev/null +++ b/docs/manual/developer/filters.xml @@ -0,0 +1,191 @@ + + + + + + + +How filters work in Apache 2.0 + + + Warning +

This is a cut 'n paste job from an email + (<022501c1c529$f63a9550$7f00000a@KOJ>) and only reformatted for + better readability. It's not up to date but may be a good start for + further research.

+
+
+ +
Filter Types +

There are three basic filter types (each of these is actually broken + down into two categories, but that comes later).

+ +
+
CONNECTION
+
Filters of this type are valid for the lifetime of this connection. + (AP_FTYPE_CONNECTION, AP_FTYPE_NETWORK)
+ +
PROTOCOL
+
Filters of this type are valid for the lifetime of this request from + the point of view of the client, this means that the request is valid + from the time that the request is sent until the time that the response + is received. (AP_FTYPE_PROTOCOL, + AP_FTYPE_TRANSCODE)
+ +
RESOURCE
+
Filters of this type are valid for the time that this content is used + to satisfy a request. For simple requests, this is identical to + PROTOCOL, but internal redirects and sub-requests can change + the content without ending the request. (AP_FTYPE_RESOURCE, + AP_FTYPE_CONTENT_SET)
+
+ +

It is important to make the distinction between a protocol and a + resource filter. A resource filter is tied to a specific resource, it + may also be tied to header information, but the main binding is to a + resource. If you are writing a filter and you want to know if it is + resource or protocol, the correct question to ask is: "Can this filter + be removed if the request is redirected to a different resource?" If + the answer is yes, then it is a resource filter. If it is no, then it + is most likely a protocol or connection filter. I won't go into + connection filters, because they seem to be well understood. With this + definition, a few examples might help:

+ +
+
Byterange
+
We have coded it to be inserted for all requests, and it is removed + if not used. Because this filter is active at the beginning of all + requests, it can not be removed if it is redirected, so this is a + protocol filter.
+ +
http_header
+
This filter actually writes the headers to the network. This is + obviously a required filter (except in the asis case which is special + and will be dealt with below) and so it is a protocol filter.
+ +
Deflate
+
The administrator configures this filter based on which file has been + requested. If we do an internal redirect from an autoindex page to an + index.html page, the deflate filter may be added or removed based on + config, so this is a resource filter.
+
+ +

The further breakdown of each category into two more filter types is + strictly for ordering. We could remove it, and only allow for one + filter type, but the order would tend to be wrong, and we would need to + hack things to make it work. Currently, the RESOURCE filters + only have one filter type, but that should change.

+
+ +
How are filters inserted? +

This is actually rather simple in theory, but the code is + complex. First of all, it is important that everybody realize that + there are three filter lists for each request, but they are all + concatenated together. So, the first list is + r->output_filters, then r->proto_output_filters, + and finally r->connection->output_filters. These correspond + to the RESOURCE, PROTOCOL, and + CONNECTION filters respectively. The problem previously, was + that we used a singly linked list to create the filter stack, and we + started from the "correct" location. This means that if I had a + RESOURCE filter on the stack, and I added a + CONNECTION filter, the CONNECTION filter would + be ignored. This should make sense, because we would insert the connection + filter at the top of the c->output_filters list, but the end + of r->output_filters pointed to the filter that used to be + at the front of c->output_filters. This is obviously wrong. + The new insertion code uses a doubly linked list. This has the advantage + that we never lose a filter that has been inserted. Unfortunately, it comes + with a separate set of headaches.

+ +

The problem is that we have two different cases were we use subrequests. + The first is to insert more data into a response. The second is to + replace the existing response with an internal redirect. These are two + different cases and need to be treated as such.

+ +

In the first case, we are creating the subrequest from within a handler + or filter. This means that the next filter should be passed to + make_sub_request function, and the last resource filter in the + sub-request will point to the next filter in the main request. This + makes sense, because the sub-request's data needs to flow through the + same set of filters as the main request. A graphical representation + might help:

+ + +
+Default_handler --> includes_filter --> byterange --> ...
+
+
+ +

If the includes filter creates a sub request, then we don't want the + data from that sub-request to go through the includes filter, because it + might not be SSI data. So, the subrequest adds the following:

+ + +
    
+Default_handler --> includes_filter -/-> byterange --> ...
+                                    /
+Default_handler --> sub_request_core
+
+
+ +

What happens if the subrequest is SSI data? Well, that's easy, the + includes_filter is a resource filter, so it will be added to + the sub request in between the Default_handler and the + sub_request_core filter.

+ +

The second case for sub-requests is when one sub-request is going to + become the real request. This happens whenever a sub-request is created + outside of a handler or filter, and NULL is passed as the next filter to + the make_sub_request function.

+ +

In this case, the resource filters no longer make sense for the new + request, because the resource has changed. So, instead of starting from + scratch, we simply point the front of the resource filters for the + sub-request to the front of the protocol filters for the old request. + This means that we won't lose any of the protocol filters, neither will + we try to send this data through a filter that shouldn't see it.

+ +

The problem is that we are using a doubly-linked list for our filter + stacks now. But, you should notice that it is possible for two lists to + intersect in this model. So, you do you handle the previous pointer? + This is a very difficult question to answer, because there is no "right" + answer, either method is equally valid. I looked at why we use the + previous pointer. The only reason for it is to allow for easier + addition of new servers. With that being said, the solution I chose was + to make the previous pointer always stay on the original request.

+ +

This causes some more complex logic, but it works for all cases. My + concern in having it move to the sub-request, is that for the more + common case (where a sub-request is used to add data to a response), the + main filter chain would be wrong. That didn't seem like a good idea to + me.

+
+ +
Asis +

The final topic. :-) Mod_Asis is a bit of a hack, but the + handler needs to remove all filters except for connection filters, and + send the data. If you are using mod_asis, all other + bets are off.

+
+ +
Explanations +

The absolutely last point is that the reason this code was so hard to + get right, was because we had hacked so much to force it to work. I + wrote most of the hacks originally, so I am very much to blame. + However, now that the code is right, I have started to remove some + hacks. Most people should have seen that the reset_filters + and add_required_filters functions are gone. Those inserted + protocol level filters for error conditions, in fact, both functions did + the same thing, one after the other, it was really strange. Because we + don't lose protocol filters for error cases any more, those hacks went away. + The HTTP_HEADER, Content-length, and + Byterange filters are all added in the + insert_filters phase, because if they were added earlier, we + had some interesting interactions. Now, those could all be moved to be + inserted with the HTTP_IN, CORE, and + CORE_IN filters. That would make the code easier to + follow.

+
+
+