From 4816d4a5ab9bb543bbfc68da364769e4775dcf73 Mon Sep 17 00:00:00 2001 From: "(no author)" <(no author)@unknown> Date: Sat, 5 Apr 2003 02:09:39 +0000 Subject: [PATCH] This commit was manufactured by cvs2svn to create branch 'APACHE_2_0_BRANCH'. git-svn-id: https://svn.apache.org/repos/asf/httpd/httpd/branches/APACHE_2_0_BRANCH@99225 13f79535-47bb-0310-9956-ffa450edef68 --- docs/manual/developer/filters.html.en | 204 ++++++++++++++++++++++++++ docs/manual/developer/filters.xml | 191 ++++++++++++++++++++++++ 2 files changed, 395 insertions(+) create mode 100644 docs/manual/developer/filters.html.en create mode 100644 docs/manual/developer/filters.xml diff --git a/docs/manual/developer/filters.html.en b/docs/manual/developer/filters.html.en new file mode 100644 index 00000000000..35697f94621 --- /dev/null +++ b/docs/manual/developer/filters.html.en @@ -0,0 +1,204 @@ + + +
+Apache HTTP Server Version 2.1
+
This is a cut 'n paste job from an email + (<022501c1c529$f63a9550$7f00000a@KOJ>) and only reformatted for + better readability. It's not up to date but may be a good start for + further research.
+There are three basic filter types (each of these is actually broken + down into two categories, but that comes later).
+ +CONNECTIONAP_FTYPE_CONNECTION, AP_FTYPE_NETWORK)PROTOCOLAP_FTYPE_PROTOCOL,
+ AP_FTYPE_TRANSCODE)RESOURCEPROTOCOL, but internal redirects and sub-requests can change
+ the content without ending the request. (AP_FTYPE_RESOURCE,
+ AP_FTYPE_CONTENT_SET)It is important to make the distinction between a protocol and a + resource filter. A resource filter is tied to a specific resource, it + may also be tied to header information, but the main binding is to a + resource. If you are writing a filter and you want to know if it is + resource or protocol, the correct question to ask is: "Can this filter + be removed if the request is redirected to a different resource?" If + the answer is yes, then it is a resource filter. If it is no, then it + is most likely a protocol or connection filter. I won't go into + connection filters, because they seem to be well understood. With this + definition, a few examples might help:
+ +The further breakdown of each category into two more filter types is
+ strictly for ordering. We could remove it, and only allow for one
+ filter type, but the order would tend to be wrong, and we would need to
+ hack things to make it work. Currently, the RESOURCE filters
+ only have one filter type, but that should change.
This is actually rather simple in theory, but the code is
+ complex. First of all, it is important that everybody realize that
+ there are three filter lists for each request, but they are all
+ concatenated together. So, the first list is
+ r->output_filters, then r->proto_output_filters,
+ and finally r->connection->output_filters. These correspond
+ to the RESOURCE, PROTOCOL, and
+ CONNECTION filters respectively. The problem previously, was
+ that we used a singly linked list to create the filter stack, and we
+ started from the "correct" location. This means that if I had a
+ RESOURCE filter on the stack, and I added a
+ CONNECTION filter, the CONNECTION filter would
+ be ignored. This should make sense, because we would insert the connection
+ filter at the top of the c->output_filters list, but the end
+ of r->output_filters pointed to the filter that used to be
+ at the front of c->output_filters. This is obviously wrong.
+ The new insertion code uses a doubly linked list. This has the advantage
+ that we never lose a filter that has been inserted. Unfortunately, it comes
+ with a separate set of headaches.
The problem is that we have two different cases were we use subrequests. + The first is to insert more data into a response. The second is to + replace the existing response with an internal redirect. These are two + different cases and need to be treated as such.
+ +In the first case, we are creating the subrequest from within a handler
+ or filter. This means that the next filter should be passed to
+ make_sub_request function, and the last resource filter in the
+ sub-request will point to the next filter in the main request. This
+ makes sense, because the sub-request's data needs to flow through the
+ same set of filters as the main request. A graphical representation
+ might help:
+Default_handler --> includes_filter --> byterange --> ... +
If the includes filter creates a sub request, then we don't want the + data from that sub-request to go through the includes filter, because it + might not be SSI data. So, the subrequest adds the following:
+ ++Default_handler --> includes_filter -/-> byterange --> ... + / +Default_handler --> sub_request_core +
What happens if the subrequest is SSI data? Well, that's easy, the
+ includes_filter is a resource filter, so it will be added to
+ the sub request in between the Default_handler and the
+ sub_request_core filter.
The second case for sub-requests is when one sub-request is going to
+ become the real request. This happens whenever a sub-request is created
+ outside of a handler or filter, and NULL is passed as the next filter to
+ the make_sub_request function.
In this case, the resource filters no longer make sense for the new + request, because the resource has changed. So, instead of starting from + scratch, we simply point the front of the resource filters for the + sub-request to the front of the protocol filters for the old request. + This means that we won't lose any of the protocol filters, neither will + we try to send this data through a filter that shouldn't see it.
+ +The problem is that we are using a doubly-linked list for our filter + stacks now. But, you should notice that it is possible for two lists to + intersect in this model. So, you do you handle the previous pointer? + This is a very difficult question to answer, because there is no "right" + answer, either method is equally valid. I looked at why we use the + previous pointer. The only reason for it is to allow for easier + addition of new servers. With that being said, the solution I chose was + to make the previous pointer always stay on the original request.
+ +This causes some more complex logic, but it works for all cases. My + concern in having it move to the sub-request, is that for the more + common case (where a sub-request is used to add data to a response), the + main filter chain would be wrong. That didn't seem like a good idea to + me.
+The final topic. :-) Mod_Asis is a bit of a hack, but the
+ handler needs to remove all filters except for connection filters, and
+ send the data. If you are using mod_asis, all other
+ bets are off.
The absolutely last point is that the reason this code was so hard to
+ get right, was because we had hacked so much to force it to work. I
+ wrote most of the hacks originally, so I am very much to blame.
+ However, now that the code is right, I have started to remove some
+ hacks. Most people should have seen that the reset_filters
+ and add_required_filters functions are gone. Those inserted
+ protocol level filters for error conditions, in fact, both functions did
+ the same thing, one after the other, it was really strange. Because we
+ don't lose protocol filters for error cases any more, those hacks went away.
+ The HTTP_HEADER, Content-length, and
+ Byterange filters are all added in the
+ insert_filters phase, because if they were added earlier, we
+ had some interesting interactions. Now, those could all be moved to be
+ inserted with the HTTP_IN, CORE, and
+ CORE_IN filters. That would make the code easier to
+ follow.
This is a cut 'n paste job from an email + (<022501c1c529$f63a9550$7f00000a@KOJ>) and only reformatted for + better readability. It's not up to date but may be a good start for + further research.
+There are three basic filter types (each of these is actually broken + down into two categories, but that comes later).
+ +CONNECTIONAP_FTYPE_CONNECTION, AP_FTYPE_NETWORK)PROTOCOLAP_FTYPE_PROTOCOL,
+ AP_FTYPE_TRANSCODE)RESOURCEPROTOCOL, but internal redirects and sub-requests can change
+ the content without ending the request. (AP_FTYPE_RESOURCE,
+ AP_FTYPE_CONTENT_SET)It is important to make the distinction between a protocol and a + resource filter. A resource filter is tied to a specific resource, it + may also be tied to header information, but the main binding is to a + resource. If you are writing a filter and you want to know if it is + resource or protocol, the correct question to ask is: "Can this filter + be removed if the request is redirected to a different resource?" If + the answer is yes, then it is a resource filter. If it is no, then it + is most likely a protocol or connection filter. I won't go into + connection filters, because they seem to be well understood. With this + definition, a few examples might help:
+ +The further breakdown of each category into two more filter types is
+ strictly for ordering. We could remove it, and only allow for one
+ filter type, but the order would tend to be wrong, and we would need to
+ hack things to make it work. Currently, the RESOURCE filters
+ only have one filter type, but that should change.
This is actually rather simple in theory, but the code is
+ complex. First of all, it is important that everybody realize that
+ there are three filter lists for each request, but they are all
+ concatenated together. So, the first list is
+ r->output_filters, then r->proto_output_filters,
+ and finally r->connection->output_filters. These correspond
+ to the RESOURCE, PROTOCOL, and
+ CONNECTION filters respectively. The problem previously, was
+ that we used a singly linked list to create the filter stack, and we
+ started from the "correct" location. This means that if I had a
+ RESOURCE filter on the stack, and I added a
+ CONNECTION filter, the CONNECTION filter would
+ be ignored. This should make sense, because we would insert the connection
+ filter at the top of the c->output_filters list, but the end
+ of r->output_filters pointed to the filter that used to be
+ at the front of c->output_filters. This is obviously wrong.
+ The new insertion code uses a doubly linked list. This has the advantage
+ that we never lose a filter that has been inserted. Unfortunately, it comes
+ with a separate set of headaches.
The problem is that we have two different cases were we use subrequests. + The first is to insert more data into a response. The second is to + replace the existing response with an internal redirect. These are two + different cases and need to be treated as such.
+ +In the first case, we are creating the subrequest from within a handler
+ or filter. This means that the next filter should be passed to
+ make_sub_request function, and the last resource filter in the
+ sub-request will point to the next filter in the main request. This
+ makes sense, because the sub-request's data needs to flow through the
+ same set of filters as the main request. A graphical representation
+ might help:
+Default_handler --> includes_filter --> byterange --> ... ++
If the includes filter creates a sub request, then we don't want the + data from that sub-request to go through the includes filter, because it + might not be SSI data. So, the subrequest adds the following:
+ ++Default_handler --> includes_filter -/-> byterange --> ... + / +Default_handler --> sub_request_core ++
What happens if the subrequest is SSI data? Well, that's easy, the
+ includes_filter is a resource filter, so it will be added to
+ the sub request in between the Default_handler and the
+ sub_request_core filter.
The second case for sub-requests is when one sub-request is going to
+ become the real request. This happens whenever a sub-request is created
+ outside of a handler or filter, and NULL is passed as the next filter to
+ the make_sub_request function.
In this case, the resource filters no longer make sense for the new + request, because the resource has changed. So, instead of starting from + scratch, we simply point the front of the resource filters for the + sub-request to the front of the protocol filters for the old request. + This means that we won't lose any of the protocol filters, neither will + we try to send this data through a filter that shouldn't see it.
+ +The problem is that we are using a doubly-linked list for our filter + stacks now. But, you should notice that it is possible for two lists to + intersect in this model. So, you do you handle the previous pointer? + This is a very difficult question to answer, because there is no "right" + answer, either method is equally valid. I looked at why we use the + previous pointer. The only reason for it is to allow for easier + addition of new servers. With that being said, the solution I chose was + to make the previous pointer always stay on the original request.
+ +This causes some more complex logic, but it works for all cases. My + concern in having it move to the sub-request, is that for the more + common case (where a sub-request is used to add data to a response), the + main filter chain would be wrong. That didn't seem like a good idea to + me.
+The final topic. :-) Mod_Asis is a bit of a hack, but the
+ handler needs to remove all filters except for connection filters, and
+ send the data. If you are using
The absolutely last point is that the reason this code was so hard to
+ get right, was because we had hacked so much to force it to work. I
+ wrote most of the hacks originally, so I am very much to blame.
+ However, now that the code is right, I have started to remove some
+ hacks. Most people should have seen that the reset_filters
+ and add_required_filters functions are gone. Those inserted
+ protocol level filters for error conditions, in fact, both functions did
+ the same thing, one after the other, it was really strange. Because we
+ don't lose protocol filters for error cases any more, those hacks went away.
+ The HTTP_HEADER, Content-length, and
+ Byterange filters are all added in the
+ insert_filters phase, because if they were added earlier, we
+ had some interesting interactions. Now, those could all be moved to be
+ inserted with the HTTP_IN, CORE, and
+ CORE_IN filters. That would make the code easier to
+ follow.