From: Rich Bowen Date: Thu, 17 Mar 2005 00:56:26 +0000 (+0000) Subject: Move some the examples, technical details, and tutorial-style stuff into X-Git-Tag: 2.1.5~287 X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=52233be38353d696b0d240ca69ff91426a79dbbd;p=thirdparty%2Fapache%2Fhttpd.git Move some the examples, technical details, and tutorial-style stuff into its own place, somewhat like we did with mod_ssl and others. The main module reference document should just be a module reference document, and nothing more. git-svn-id: https://svn.apache.org/repos/asf/httpd/httpd/trunk@157838 13f79535-47bb-0310-9956-ffa450edef68 --- diff --git a/docs/manual/rewrite/index.html b/docs/manual/rewrite/index.html new file mode 100644 index 00000000000..5f97bff8c6d --- /dev/null +++ b/docs/manual/rewrite/index.html @@ -0,0 +1,3 @@ +URI: index.html.en +Content-Language: en +Content-type: text/html; charset=ISO-8859-1 diff --git a/docs/manual/rewrite/index.html.en b/docs/manual/rewrite/index.html.en new file mode 100644 index 00000000000..a0562060795 --- /dev/null +++ b/docs/manual/rewrite/index.html.en @@ -0,0 +1,97 @@ + + + +Apache mod_rewrite - Apache HTTP Server + + + + + +
<-
+
+Apache > HTTP Server > Documentation > Version 2.0

Apache mod_rewrite

+
+

Available Languages:  en 

+
+ +
+

``The great thing about mod_rewrite is it gives you + all the configurability and flexibility of Sendmail. + The downside to mod_rewrite is that it gives you all + the configurability and flexibility of Sendmail.''

+ +

-- Brian Behlendorf
+ Apache Group

+ +
+ +
+

`` Despite the tons of examples and docs, + mod_rewrite is voodoo. Damned cool voodoo, but still + voodoo. ''

+ +

-- Brian Moore
+ bem@news.cmc.net

+ +
+ +

Welcome to mod_rewrite, the Swiss Army Knife of URL + manipulation!

+ +

This module uses a rule-based rewriting engine (based on a + regular-expression parser) to rewrite requested URLs on the + fly. It supports an unlimited number of rules and an + unlimited number of attached rule conditions for each rule to + provide a really flexible and powerful URL manipulation + mechanism. The URL manipulations can depend on various tests, + for instance server variables, environment variables, HTTP + headers, time stamps and even external database lookups in + various formats can be used to achieve granular URL + matching.

+ +

This module operates on the full URLs (including the + path-info part) both in per-server context + (httpd.conf) and per-directory context + (.htaccess) and can even generate query-string + parts on result. The rewritten result can lead to internal + sub-processing, external request redirection or even to an + internal proxy throughput.

+ +

But all this functionality and flexibility has its + drawback: complexity. So don't expect to understand this + entire module in just one day.

+ +
+ +
top
+
top
+
+

mod_rewrite

+

Extensive documentation on the directives +provided by this module is provided in the mod_rewrite reference documentation. +

+
+
+

Available Languages:  en 

+
+ \ No newline at end of file diff --git a/docs/manual/rewrite/index.xml b/docs/manual/rewrite/index.xml new file mode 100644 index 00000000000..9ef34906b5e --- /dev/null +++ b/docs/manual/rewrite/index.xml @@ -0,0 +1,92 @@ + + + + + + + + + + + Apache mod_rewrite + + +
+

``The great thing about mod_rewrite is it gives you + all the configurability and flexibility of Sendmail. + The downside to mod_rewrite is that it gives you all + the configurability and flexibility of Sendmail.''

+ +

-- Brian Behlendorf
+ Apache Group

+ +
+ +
+

`` Despite the tons of examples and docs, + mod_rewrite is voodoo. Damned cool voodoo, but still + voodoo. ''

+ +

-- Brian Moore
+ bem@news.cmc.net

+ +
+ +

Welcome to mod_rewrite, the Swiss Army Knife of URL + manipulation!

+ +

This module uses a rule-based rewriting engine (based on a + regular-expression parser) to rewrite requested URLs on the + fly. It supports an unlimited number of rules and an + unlimited number of attached rule conditions for each rule to + provide a really flexible and powerful URL manipulation + mechanism. The URL manipulations can depend on various tests, + for instance server variables, environment variables, HTTP + headers, time stamps and even external database lookups in + various formats can be used to achieve granular URL + matching.

+ +

This module operates on the full URLs (including the + path-info part) both in per-server context + (httpd.conf) and per-directory context + (.htaccess) and can even generate query-string + parts on result. The rewritten result can lead to internal + sub-processing, external request redirection or even to an + internal proxy throughput.

+ +

But all this functionality and flexibility has its + drawback: complexity. So don't expect to understand this + entire module in just one day.

+ +
+ +
Documentation + +
+ +
+ + diff --git a/docs/manual/rewrite/index.xml.meta b/docs/manual/rewrite/index.xml.meta new file mode 100644 index 00000000000..dd496b6d2cc --- /dev/null +++ b/docs/manual/rewrite/index.xml.meta @@ -0,0 +1,11 @@ + + + + index + /rewrite/ + .. + + + en + + diff --git a/docs/manual/rewrite/rewrite_guide.html b/docs/manual/rewrite/rewrite_guide.html new file mode 100644 index 00000000000..49f623e98f9 --- /dev/null +++ b/docs/manual/rewrite/rewrite_guide.html @@ -0,0 +1,3 @@ +URI: rewrite_guide.html.en +Content-Language: en +Content-type: text/html; charset=ISO-8859-1 diff --git a/docs/manual/rewrite/rewrite_guide.html.en b/docs/manual/rewrite/rewrite_guide.html.en new file mode 100644 index 00000000000..4b169e77395 --- /dev/null +++ b/docs/manual/rewrite/rewrite_guide.html.en @@ -0,0 +1,788 @@ + + + +URL Rewriting Guide - Apache HTTP Server + + + + + +
<-
+
+Apache > HTTP Server > Documentation > Version 2.0

URL Rewriting Guide

+
+

Available Languages:  en 

+
+ + +

This document supplements the mod_rewrite + reference documentation. + It describes how one can use Apache's mod_rewrite + to solve typical URL-based problems with which webmasters are + commonony confronted. We give detailed descriptions on how to + solve each problem by configuring URL rewriting rulesets.

+ +
ATTENTION: Depending on your server configuration + it may be necessary to slightly change the examples for your + situation, e.g. adding the [PT] flag when + additionally using mod_alias and + mod_userdir, etc. Or rewriting a ruleset + to fit in .htaccess context instead + of per-server context. Always try to understand what a + particular ruleset really does before you use it. This + avoids many problems.
+ +
+ +
top
+
+

Canonical URLs

+ + + +
+
Description:
+ +
+

On some webservers there are more than one URL for a + resource. Usually there are canonical URLs (which should be + actually used and distributed) and those which are just + shortcuts, internal ones, etc. Independent of which URL the + user supplied with the request he should finally see the + canonical one only.

+
+ +
Solution:
+ +
+

We do an external HTTP redirect for all non-canonical + URLs to fix them in the location view of the Browser and + for all subsequent requests. In the example ruleset below + we replace /~user by the canonical + /u/user and fix a missing trailing slash for + /u/user.

+ +
+RewriteRule   ^/~([^/]+)/?(.*)    /u/$1/$2  [R]
+RewriteRule   ^/([uge])/([^/]+)$  /$1/$2/   [R]
+
+
+
+ +
top
+
+

Canonical Hostnames

+ +
+
Description:
+ +
The goal of this rule is to force the use of a particular + hostname, in preference to other hostnames which may be used to + reach the same site. For example, if you wish to force the use + of www.example.com instead of + example.com, you might use a variant of the + following recipe.
+ +
Solution:
+ +
+

For sites running on a port other than 80:

+
+RewriteCond %{HTTP_HOST}   !^fully\.qualified\.domain\.name [NC]
+RewriteCond %{HTTP_HOST}   !^$
+RewriteCond %{SERVER_PORT} !^80$
+RewriteRule ^/(.*)         http://fully.qualified.domain.name:%{SERVER_PORT}/$1 [L,R]
+
+ +

And for a site running on port 80

+
+RewriteCond %{HTTP_HOST}   !^fully\.qualified\.domain\.name [NC]
+RewriteCond %{HTTP_HOST}   !^$
+RewriteRule ^/(.*)         http://fully.qualified.domain.name/$1 [L,R]
+
+
+
+ +
top
+
+

Moved DocumentRoot

+ + + +
+
Description:
+ +
+

Usually the DocumentRoot +of the webserver directly relates to the URL "/". +But often this data is not really of top-level priority. For example, +you may wish for visitors, on first entering a site, to go to a +particular subdirectory /about/. This may be accomplished +using the following ruleset:

+
+ +
Solution:
+ +
+

We redirect the URL / to + /about/: +

+ +
+RewriteEngine on
+RewriteRule   ^/$  /about/  [R]
+
+ +

Note that this can also be handled using the RedirectMatch directive:

+ +

+RedirectMatch ^/$ http://example.com/e/www/ +

+
+
+ +
top
+
+

Trailing Slash Problem

+ + + +
+
Description:
+ +

The vast majority of "trailing slash" problems can be dealt + with using the techniques discussed in the FAQ + entry. However, occasionally, there is a need to use mod_rewrite + to handle a case where a missing trailing slash causes a URL to + fail. This can happen, for example, after a series of complex + rewrite rules.

+
+ +
Solution:
+ +
+

The solution to this subtle problem is to let the server + add the trailing slash automatically. To do this + correctly we have to use an external redirect, so the + browser correctly requests subsequent images etc. If we + only did a internal rewrite, this would only work for the + directory page, but would go wrong when any images are + included into this page with relative URLs, because the + browser would request an in-lined object. For instance, a + request for image.gif in + /~quux/foo/index.html would become + /~quux/image.gif without the external + redirect!

+ +

So, to do this trick we write:

+ +
+RewriteEngine  on
+RewriteBase    /~quux/
+RewriteRule    ^foo$  foo/  [R]
+
+ +

Alternately, you can put the following in a + top-level .htaccess file in the content directory. + But note that this creates some processing overhead.

+ +
+RewriteEngine  on
+RewriteBase    /~quux/
+RewriteCond    %{REQUEST_FILENAME}  -d
+RewriteRule    ^(.+[^/])$           $1/  [R]
+
+
+
+ +
top
+
+

Move Homedirs to Different Webserver

+ + + +
+
Description:
+ +
+

Many webmasters have asked for a solution to the + following situation: They wanted to redirect just all + homedirs on a webserver to another webserver. They usually + need such things when establishing a newer webserver which + will replace the old one over time.

+
+ +
Solution:
+ +
+

The solution is trivial with mod_rewrite. + On the old webserver we just redirect all + /~user/anypath URLs to + http://newserver/~user/anypath.

+ +
+RewriteEngine on
+RewriteRule   ^/~(.+)  http://newserver/~$1  [R,L]
+
+
+
+ +
top
+
+

Search pages in more than one directory

+ + + +
+
Description:
+ +
+

Sometimes it is necessary to let the webserver search + for pages in more than one directory. Here MultiViews or + other techniques cannot help.

+
+ +
Solution:
+ +
+

We program a explicit ruleset which searches for the + files in the directories.

+ +
+RewriteEngine on
+
+#   first try to find it in custom/...
+#   ...and if found stop and be happy:
+RewriteCond         /your/docroot/dir1/%{REQUEST_FILENAME}  -f
+RewriteRule  ^(.+)  /your/docroot/dir1/$1  [L]
+
+#   second try to find it in pub/...
+#   ...and if found stop and be happy:
+RewriteCond         /your/docroot/dir2/%{REQUEST_FILENAME}  -f
+RewriteRule  ^(.+)  /your/docroot/dir2/$1  [L]
+
+#   else go on for other Alias or ScriptAlias directives,
+#   etc.
+RewriteRule   ^(.+)  -  [PT]
+
+
+
+ +
top
+
+

Set Environment Variables According To URL Parts

+ + + +
+
Description:
+ +
+

Perhaps you want to keep status information between + requests and use the URL to encode it. But you don't want + to use a CGI wrapper for all pages just to strip out this + information.

+
+ +
Solution:
+ +
+

We use a rewrite rule to strip out the status information + and remember it via an environment variable which can be + later dereferenced from within XSSI or CGI. This way a + URL /foo/S=java/bar/ gets translated to + /foo/bar/ and the environment variable named + STATUS is set to the value "java".

+ +
+RewriteEngine on
+RewriteRule   ^(.*)/S=([^/]+)/(.*)    $1/$3 [E=STATUS:$2]
+
+
+
+ +
top
+
+

Virtual User Hosts

+ + + +
+
Description:
+ +
+

Assume that you want to provide + www.username.host.domain.com + for the homepage of username via just DNS A records to the + same machine and without any virtualhosts on this + machine.

+
+ +
Solution:
+ +
+

For HTTP/1.0 requests there is no solution, but for + HTTP/1.1 requests which contain a Host: HTTP header we + can use the following ruleset to rewrite + http://www.username.host.com/anypath + internally to /home/username/anypath:

+ +
+RewriteEngine on
+RewriteCond   %{HTTP_HOST}                 ^www\.[^.]+\.host\.com$
+RewriteRule   ^(.+)                        %{HTTP_HOST}$1          [C]
+RewriteRule   ^www\.([^.]+)\.host\.com(.*) /home/$1$2
+
+
+
+ +
top
+
+

Redirect Homedirs For Foreigners

+ + + +
+
Description:
+ +
+

We want to redirect homedir URLs to another webserver + www.somewhere.com when the requesting user + does not stay in the local domain + ourdomain.com. This is sometimes used in + virtual host contexts.

+
+ +
Solution:
+ +
+

Just a rewrite condition:

+ +
+RewriteEngine on
+RewriteCond   %{REMOTE_HOST}  !^.+\.ourdomain\.com$
+RewriteRule   ^(/~.+)         http://www.somewhere.com/$1 [R,L]
+
+
+
+ +
top
+
+

Redirecting Anchors

+ + + +
+
Description:
+ +
+

By default, redirecting to an HTML anchor doesn't work, + because mod_rewrite escapes the # character, + turning it into %23. This, in turn, breaks the + redirection.

+
+ +
Solution:
+ +
+

Use the [NE] flag on the + RewriteRule. NE stands for No Escape. +

+
+
+ +
top
+
+

Time-Dependent Rewriting

+ + + +
+
Description:
+ +
+

When tricks like time-dependent content should happen a + lot of webmasters still use CGI scripts which do for + instance redirects to specialized pages. How can it be done + via mod_rewrite?

+
+ +
Solution:
+ +
+

There are a lot of variables named TIME_xxx + for rewrite conditions. In conjunction with the special + lexicographic comparison patterns <STRING, + >STRING and =STRING we can + do time-dependent redirects:

+ +
+RewriteEngine on
+RewriteCond   %{TIME_HOUR}%{TIME_MIN} >0700
+RewriteCond   %{TIME_HOUR}%{TIME_MIN} <1900
+RewriteRule   ^foo\.html$             foo.day.html
+RewriteRule   ^foo\.html$             foo.night.html
+
+ +

This provides the content of foo.day.html + under the URL foo.html from + 07:00-19:00 and at the remaining time the + contents of foo.night.html. Just a nice + feature for a homepage...

+
+
+ +
top
+
+

Backward Compatibility for YYYY to XXXX migration

+ + + +
+
Description:
+ +
+

How can we make URLs backward compatible (still + existing virtually) after migrating document.YYYY + to document.XXXX, e.g. after translating a + bunch of .html files to .phtml?

+
+ +
Solution:
+ +
+

We just rewrite the name to its basename and test for + existence of the new extension. If it exists, we take + that name, else we rewrite the URL to its original state.

+ + +
+#   backward compatibility ruleset for
+#   rewriting document.html to document.phtml
+#   when and only when document.phtml exists
+#   but no longer document.html
+RewriteEngine on
+RewriteBase   /~quux/
+#   parse out basename, but remember the fact
+RewriteRule   ^(.*)\.html$              $1      [C,E=WasHTML:yes]
+#   rewrite to document.phtml if exists
+RewriteCond   %{REQUEST_FILENAME}.phtml -f
+RewriteRule   ^(.*)$ $1.phtml                   [S=1]
+#   else reverse the previous basename cutout
+RewriteCond   %{ENV:WasHTML}            ^yes$
+RewriteRule   ^(.*)$ $1.html
+
+
+
+ +
top
+
+

Content Handling

+ + + +

From Old to New (intern)

+ + + +
+
Description:
+ +
+

Assume we have recently renamed the page + foo.html to bar.html and now want + to provide the old URL for backward compatibility. Actually + we want that users of the old URL even not recognize that + the pages was renamed.

+
+ +
Solution:
+ +
+

We rewrite the old URL to the new one internally via the + following rule:

+ +
+RewriteEngine  on
+RewriteBase    /~quux/
+RewriteRule    ^foo\.html$  bar.html
+
+
+
+ + + +

From Old to New (extern)

+ + + +
+
Description:
+ +
+

Assume again that we have recently renamed the page + foo.html to bar.html and now want + to provide the old URL for backward compatibility. But this + time we want that the users of the old URL get hinted to + the new one, i.e. their browsers Location field should + change, too.

+
+ +
Solution:
+ +
+

We force a HTTP redirect to the new URL which leads to a + change of the browsers and thus the users view:

+ +
+RewriteEngine  on
+RewriteBase    /~quux/
+RewriteRule    ^foo\.html$  bar.html  [R]
+
+
+
+ + + +

From Static to Dynamic

+ + + +
+
Description:
+ +
+

How can we transform a static page + foo.html into a dynamic variant + foo.cgi in a seamless way, i.e. without notice + by the browser/user.

+
+ +
Solution:
+ +
+

We just rewrite the URL to the CGI-script and force the + correct MIME-type so it gets really run as a CGI-script. + This way a request to /~quux/foo.html + internally leads to the invocation of + /~quux/foo.cgi.

+ +
+RewriteEngine  on
+RewriteBase    /~quux/
+RewriteRule    ^foo\.html$  foo.cgi  [T=application/x-httpd-cgi]
+
+
+
+ + +
top
+
+

Access Restriction

+ + + +

Blocking of Robots

+ + + +
+
Description:
+ +
+

How can we block a really annoying robot from + retrieving pages of a specific webarea? A + /robots.txt file containing entries of the + "Robot Exclusion Protocol" is typically not enough to get + rid of such a robot.

+
+ +
Solution:
+ +
+

We use a ruleset which forbids the URLs of the webarea + /~quux/foo/arc/ (perhaps a very deep + directory indexed area where the robot traversal would + create big server load). We have to make sure that we + forbid access only to the particular robot, i.e. just + forbidding the host where the robot runs is not enough. + This would block users from this host, too. We accomplish + this by also matching the User-Agent HTTP header + information.

+ +
+RewriteCond %{HTTP_USER_AGENT}   ^NameOfBadRobot.*
+RewriteCond %{REMOTE_ADDR}       ^123\.45\.67\.[8-9]$
+RewriteRule ^/~quux/foo/arc/.+   -   [F]
+
+
+
+ + + +

Blocked Inline-Images

+ + + +
+
Description:
+ +
+

Assume we have under http://www.quux-corp.de/~quux/ + some pages with inlined GIF graphics. These graphics are + nice, so others directly incorporate them via hyperlinks to + their pages. We don't like this practice because it adds + useless traffic to our server.

+
+ +
Solution:
+ +
+

While we cannot 100% protect the images from inclusion, + we can at least restrict the cases where the browser + sends a HTTP Referer header.

+ +
+RewriteCond %{HTTP_REFERER} !^$
+RewriteCond %{HTTP_REFERER} !^http://www.quux-corp.de/~quux/.*$ [NC]
+RewriteRule .*\.gif$        -                                    [F]
+
+ +
+RewriteCond %{HTTP_REFERER}         !^$
+RewriteCond %{HTTP_REFERER}         !.*/foo-with-gif\.html$
+RewriteRule ^inlined-in-foo\.gif$   -                        [F]
+
+
+
+ + + +

Proxy Deny

+ + + +
+
Description:
+ +
+

How can we forbid a certain host or even a user of a + special host from using the Apache proxy?

+
+ +
Solution:
+ +
+

We first have to make sure mod_rewrite + is below(!) mod_proxy in the Configuration + file when compiling the Apache webserver. This way it gets + called before mod_proxy. Then we + configure the following for a host-dependent deny...

+ +
+RewriteCond %{REMOTE_HOST} ^badhost\.mydomain\.com$
+RewriteRule !^http://[^/.]\.mydomain.com.*  - [F]
+
+ +

...and this one for a user@host-dependent deny:

+ +
+RewriteCond %{REMOTE_IDENT}@%{REMOTE_HOST}  ^badguy@badhost\.mydomain\.com$
+RewriteRule !^http://[^/.]\.mydomain.com.*  - [F]
+
+
+
+ + + +
top
+
+

Other

+ + + +

External Rewriting Engine

+ + + +
+
Description:
+ +
+

A FAQ: How can we solve the FOO/BAR/QUUX/etc. + problem? There seems no solution by the use of + mod_rewrite...

+
+ +
Solution:
+ +
+

Use an external RewriteMap, i.e. a program which acts + like a RewriteMap. It is run once on startup of Apache + receives the requested URLs on STDIN and has + to put the resulting (usually rewritten) URL on + STDOUT (same order!).

+ +
+RewriteEngine on
+RewriteMap    quux-map       prg:/path/to/map.quux.pl
+RewriteRule   ^/~quux/(.*)$  /~quux/${quux-map:$1}
+
+ +
+#!/path/to/perl
+
+#   disable buffered I/O which would lead
+#   to deadloops for the Apache server
+$| = 1;
+
+#   read URLs one per line from stdin and
+#   generate substitution URL on stdout
+while (<>) {
+    s|^foo/|bar/|;
+    print $_;
+}
+
+ +

This is a demonstration-only example and just rewrites + all URLs /~quux/foo/... to + /~quux/bar/.... Actually you can program + whatever you like. But notice that while such maps can be + used also by an average user, only the + system administrator can define it.

+
+
+ + + +
+
+

Available Languages:  en 

+
+ \ No newline at end of file diff --git a/docs/manual/rewrite/rewrite_guide.xml b/docs/manual/rewrite/rewrite_guide.xml new file mode 100644 index 00000000000..0b1aa36dd95 --- /dev/null +++ b/docs/manual/rewrite/rewrite_guide.xml @@ -0,0 +1,781 @@ + + + + + + + + + + + URL Rewriting Guide + + + +

This document supplements the mod_rewrite + reference documentation. + It describes how one can use Apache's mod_rewrite + to solve typical URL-based problems with which webmasters are + commonony confronted. We give detailed descriptions on how to + solve each problem by configuring URL rewriting rulesets.

+ + ATTENTION: Depending on your server configuration + it may be necessary to slightly change the examples for your + situation, e.g. adding the [PT] flag when + additionally using mod_alias and + mod_userdir, etc. Or rewriting a ruleset + to fit in .htaccess context instead + of per-server context. Always try to understand what a + particular ruleset really does before you use it. This + avoids many problems. + +
+Module +documentation +mod_rewrite +introduction +Technical details + + +
+ +Canonical URLs + +
+
Description:
+ +
+

On some webservers there are more than one URL for a + resource. Usually there are canonical URLs (which should be + actually used and distributed) and those which are just + shortcuts, internal ones, etc. Independent of which URL the + user supplied with the request he should finally see the + canonical one only.

+
+ +
Solution:
+ +
+

We do an external HTTP redirect for all non-canonical + URLs to fix them in the location view of the Browser and + for all subsequent requests. In the example ruleset below + we replace /~user by the canonical + /u/user and fix a missing trailing slash for + /u/user.

+ +
+RewriteRule   ^/~([^/]+)/?(.*)    /u/$1/$2  [R]
+RewriteRule   ^/([uge])/([^/]+)$  /$1/$2/   [R]
+
+
+
+ +
+ +
Canonical Hostnames + +
+
Description:
+ +
The goal of this rule is to force the use of a particular + hostname, in preference to other hostnames which may be used to + reach the same site. For example, if you wish to force the use + of www.example.com instead of + example.com, you might use a variant of the + following recipe.
+ +
Solution:
+ +
+

For sites running on a port other than 80:

+
+RewriteCond %{HTTP_HOST}   !^fully\.qualified\.domain\.name [NC]
+RewriteCond %{HTTP_HOST}   !^$
+RewriteCond %{SERVER_PORT} !^80$
+RewriteRule ^/(.*)         http://fully.qualified.domain.name:%{SERVER_PORT}/$1 [L,R]
+
+ +

And for a site running on port 80

+
+RewriteCond %{HTTP_HOST}   !^fully\.qualified\.domain\.name [NC]
+RewriteCond %{HTTP_HOST}   !^$
+RewriteRule ^/(.*)         http://fully.qualified.domain.name/$1 [L,R]
+
+
+
+ +
+ +
+ + Moved <code>DocumentRoot</code> + +
+
Description:
+ +
+

Usually the DocumentRoot +of the webserver directly relates to the URL "/". +But often this data is not really of top-level priority. For example, +you may wish for visitors, on first entering a site, to go to a +particular subdirectory /about/. This may be accomplished +using the following ruleset:

+
+ +
Solution:
+ +
+

We redirect the URL / to + /about/: +

+ +
+RewriteEngine on
+RewriteRule   ^/$  /about/  [R]
+
+ +

Note that this can also be handled using the RedirectMatch directive:

+ + +RedirectMatch ^/$ http://example.com/e/www/ + +
+
+ +
+ +
+ + Trailing Slash Problem + +
+
Description:
+ +

The vast majority of "trailing slash" problems can be dealt + with using the techniques discussed in the FAQ + entry. However, occasionally, there is a need to use mod_rewrite + to handle a case where a missing trailing slash causes a URL to + fail. This can happen, for example, after a series of complex + rewrite rules.

+
+ +
Solution:
+ +
+

The solution to this subtle problem is to let the server + add the trailing slash automatically. To do this + correctly we have to use an external redirect, so the + browser correctly requests subsequent images etc. If we + only did a internal rewrite, this would only work for the + directory page, but would go wrong when any images are + included into this page with relative URLs, because the + browser would request an in-lined object. For instance, a + request for image.gif in + /~quux/foo/index.html would become + /~quux/image.gif without the external + redirect!

+ +

So, to do this trick we write:

+ +
+RewriteEngine  on
+RewriteBase    /~quux/
+RewriteRule    ^foo$  foo/  [R]
+
+ +

Alternately, you can put the following in a + top-level .htaccess file in the content directory. + But note that this creates some processing overhead.

+ +
+RewriteEngine  on
+RewriteBase    /~quux/
+RewriteCond    %{REQUEST_FILENAME}  -d
+RewriteRule    ^(.+[^/])$           $1/  [R]
+
+
+
+ +
+ +
+ + Move Homedirs to Different Webserver + +
+
Description:
+ +
+

Many webmasters have asked for a solution to the + following situation: They wanted to redirect just all + homedirs on a webserver to another webserver. They usually + need such things when establishing a newer webserver which + will replace the old one over time.

+
+ +
Solution:
+ +
+

The solution is trivial with mod_rewrite. + On the old webserver we just redirect all + /~user/anypath URLs to + http://newserver/~user/anypath.

+ +
+RewriteEngine on
+RewriteRule   ^/~(.+)  http://newserver/~$1  [R,L]
+
+
+
+ +
+ +
+ + Search pages in more than one directory + +
+
Description:
+ +
+

Sometimes it is necessary to let the webserver search + for pages in more than one directory. Here MultiViews or + other techniques cannot help.

+
+ +
Solution:
+ +
+

We program a explicit ruleset which searches for the + files in the directories.

+ +
+RewriteEngine on
+
+#   first try to find it in custom/...
+#   ...and if found stop and be happy:
+RewriteCond         /your/docroot/dir1/%{REQUEST_FILENAME}  -f
+RewriteRule  ^(.+)  /your/docroot/dir1/$1  [L]
+
+#   second try to find it in pub/...
+#   ...and if found stop and be happy:
+RewriteCond         /your/docroot/dir2/%{REQUEST_FILENAME}  -f
+RewriteRule  ^(.+)  /your/docroot/dir2/$1  [L]
+
+#   else go on for other Alias or ScriptAlias directives,
+#   etc.
+RewriteRule   ^(.+)  -  [PT]
+
+
+
+ +
+ +
+ + Set Environment Variables According To URL Parts + +
+
Description:
+ +
+

Perhaps you want to keep status information between + requests and use the URL to encode it. But you don't want + to use a CGI wrapper for all pages just to strip out this + information.

+
+ +
Solution:
+ +
+

We use a rewrite rule to strip out the status information + and remember it via an environment variable which can be + later dereferenced from within XSSI or CGI. This way a + URL /foo/S=java/bar/ gets translated to + /foo/bar/ and the environment variable named + STATUS is set to the value "java".

+ +
+RewriteEngine on
+RewriteRule   ^(.*)/S=([^/]+)/(.*)    $1/$3 [E=STATUS:$2]
+
+
+
+ +
+ +
+ + Virtual User Hosts + +
+
Description:
+ +
+

Assume that you want to provide + www.username.host.domain.com + for the homepage of username via just DNS A records to the + same machine and without any virtualhosts on this + machine.

+
+ +
Solution:
+ +
+

For HTTP/1.0 requests there is no solution, but for + HTTP/1.1 requests which contain a Host: HTTP header we + can use the following ruleset to rewrite + http://www.username.host.com/anypath + internally to /home/username/anypath:

+ +
+RewriteEngine on
+RewriteCond   %{HTTP_HOST}                 ^www\.[^.]+\.host\.com$
+RewriteRule   ^(.+)                        %{HTTP_HOST}$1          [C]
+RewriteRule   ^www\.([^.]+)\.host\.com(.*) /home/$1$2
+
+
+
+ +
+ +
+ + Redirect Homedirs For Foreigners + +
+
Description:
+ +
+

We want to redirect homedir URLs to another webserver + www.somewhere.com when the requesting user + does not stay in the local domain + ourdomain.com. This is sometimes used in + virtual host contexts.

+
+ +
Solution:
+ +
+

Just a rewrite condition:

+ +
+RewriteEngine on
+RewriteCond   %{REMOTE_HOST}  !^.+\.ourdomain\.com$
+RewriteRule   ^(/~.+)         http://www.somewhere.com/$1 [R,L]
+
+
+
+ +
+ +
+ + Redirecting Anchors + +
+
Description:
+ +
+

By default, redirecting to an HTML anchor doesn't work, + because mod_rewrite escapes the # character, + turning it into %23. This, in turn, breaks the + redirection.

+
+ +
Solution:
+ +
+

Use the [NE] flag on the + RewriteRule. NE stands for No Escape. +

+
+
+ +
+ +
+ + Time-Dependent Rewriting + +
+
Description:
+ +
+

When tricks like time-dependent content should happen a + lot of webmasters still use CGI scripts which do for + instance redirects to specialized pages. How can it be done + via mod_rewrite?

+
+ +
Solution:
+ +
+

There are a lot of variables named TIME_xxx + for rewrite conditions. In conjunction with the special + lexicographic comparison patterns <STRING, + >STRING and =STRING we can + do time-dependent redirects:

+ +
+RewriteEngine on
+RewriteCond   %{TIME_HOUR}%{TIME_MIN} >0700
+RewriteCond   %{TIME_HOUR}%{TIME_MIN} <1900
+RewriteRule   ^foo\.html$             foo.day.html
+RewriteRule   ^foo\.html$             foo.night.html
+
+ +

This provides the content of foo.day.html + under the URL foo.html from + 07:00-19:00 and at the remaining time the + contents of foo.night.html. Just a nice + feature for a homepage...

+
+
+ +
+ +
+ + Backward Compatibility for YYYY to XXXX migration + +
+
Description:
+ +
+

How can we make URLs backward compatible (still + existing virtually) after migrating document.YYYY + to document.XXXX, e.g. after translating a + bunch of .html files to .phtml?

+
+ +
Solution:
+ +
+

We just rewrite the name to its basename and test for + existence of the new extension. If it exists, we take + that name, else we rewrite the URL to its original state.

+ + +
+#   backward compatibility ruleset for
+#   rewriting document.html to document.phtml
+#   when and only when document.phtml exists
+#   but no longer document.html
+RewriteEngine on
+RewriteBase   /~quux/
+#   parse out basename, but remember the fact
+RewriteRule   ^(.*)\.html$              $1      [C,E=WasHTML:yes]
+#   rewrite to document.phtml if exists
+RewriteCond   %{REQUEST_FILENAME}.phtml -f
+RewriteRule   ^(.*)$ $1.phtml                   [S=1]
+#   else reverse the previous basename cutout
+RewriteCond   %{ENV:WasHTML}            ^yes$
+RewriteRule   ^(.*)$ $1.html
+
+
+
+ +
+ +
+ + Content Handling + +
+ + From Old to New (intern) + +
+
Description:
+ +
+

Assume we have recently renamed the page + foo.html to bar.html and now want + to provide the old URL for backward compatibility. Actually + we want that users of the old URL even not recognize that + the pages was renamed.

+
+ +
Solution:
+ +
+

We rewrite the old URL to the new one internally via the + following rule:

+ +
+RewriteEngine  on
+RewriteBase    /~quux/
+RewriteRule    ^foo\.html$  bar.html
+
+
+
+ +
+ +
+ + From Old to New (extern) + +
+
Description:
+ +
+

Assume again that we have recently renamed the page + foo.html to bar.html and now want + to provide the old URL for backward compatibility. But this + time we want that the users of the old URL get hinted to + the new one, i.e. their browsers Location field should + change, too.

+
+ +
Solution:
+ +
+

We force a HTTP redirect to the new URL which leads to a + change of the browsers and thus the users view:

+ +
+RewriteEngine  on
+RewriteBase    /~quux/
+RewriteRule    ^foo\.html$  bar.html  [R]
+
+
+
+ +
+ +
+ + From Static to Dynamic + +
+
Description:
+ +
+

How can we transform a static page + foo.html into a dynamic variant + foo.cgi in a seamless way, i.e. without notice + by the browser/user.

+
+ +
Solution:
+ +
+

We just rewrite the URL to the CGI-script and force the + correct MIME-type so it gets really run as a CGI-script. + This way a request to /~quux/foo.html + internally leads to the invocation of + /~quux/foo.cgi.

+ +
+RewriteEngine  on
+RewriteBase    /~quux/
+RewriteRule    ^foo\.html$  foo.cgi  [T=application/x-httpd-cgi]
+
+
+
+ +
+
+ +
+ + Access Restriction + +
+ + Blocking of Robots + +
+
Description:
+ +
+

How can we block a really annoying robot from + retrieving pages of a specific webarea? A + /robots.txt file containing entries of the + "Robot Exclusion Protocol" is typically not enough to get + rid of such a robot.

+
+ +
Solution:
+ +
+

We use a ruleset which forbids the URLs of the webarea + /~quux/foo/arc/ (perhaps a very deep + directory indexed area where the robot traversal would + create big server load). We have to make sure that we + forbid access only to the particular robot, i.e. just + forbidding the host where the robot runs is not enough. + This would block users from this host, too. We accomplish + this by also matching the User-Agent HTTP header + information.

+ +
+RewriteCond %{HTTP_USER_AGENT}   ^NameOfBadRobot.*
+RewriteCond %{REMOTE_ADDR}       ^123\.45\.67\.[8-9]$
+RewriteRule ^/~quux/foo/arc/.+   -   [F]
+
+
+
+ +
+ +
+ + Blocked Inline-Images + +
+
Description:
+ +
+

Assume we have under http://www.quux-corp.de/~quux/ + some pages with inlined GIF graphics. These graphics are + nice, so others directly incorporate them via hyperlinks to + their pages. We don't like this practice because it adds + useless traffic to our server.

+
+ +
Solution:
+ +
+

While we cannot 100% protect the images from inclusion, + we can at least restrict the cases where the browser + sends a HTTP Referer header.

+ +
+RewriteCond %{HTTP_REFERER} !^$
+RewriteCond %{HTTP_REFERER} !^http://www.quux-corp.de/~quux/.*$ [NC]
+RewriteRule .*\.gif$        -                                    [F]
+
+ +
+RewriteCond %{HTTP_REFERER}         !^$
+RewriteCond %{HTTP_REFERER}         !.*/foo-with-gif\.html$
+RewriteRule ^inlined-in-foo\.gif$   -                        [F]
+
+
+
+ +
+ +
+ + Proxy Deny + +
+
Description:
+ +
+

How can we forbid a certain host or even a user of a + special host from using the Apache proxy?

+
+ +
Solution:
+ +
+

We first have to make sure mod_rewrite + is below(!) mod_proxy in the Configuration + file when compiling the Apache webserver. This way it gets + called before mod_proxy. Then we + configure the following for a host-dependent deny...

+ +
+RewriteCond %{REMOTE_HOST} ^badhost\.mydomain\.com$
+RewriteRule !^http://[^/.]\.mydomain.com.*  - [F]
+
+ +

...and this one for a user@host-dependent deny:

+ +
+RewriteCond %{REMOTE_IDENT}@%{REMOTE_HOST}  ^badguy@badhost\.mydomain\.com$
+RewriteRule !^http://[^/.]\.mydomain.com.*  - [F]
+
+
+
+ +
+ +
+ +
+ + Other + +
+ + External Rewriting Engine + +
+
Description:
+ +
+

A FAQ: How can we solve the FOO/BAR/QUUX/etc. + problem? There seems no solution by the use of + mod_rewrite...

+
+ +
Solution:
+ +
+

Use an external RewriteMap, i.e. a program which acts + like a RewriteMap. It is run once on startup of Apache + receives the requested URLs on STDIN and has + to put the resulting (usually rewritten) URL on + STDOUT (same order!).

+ +
+RewriteEngine on
+RewriteMap    quux-map       prg:/path/to/map.quux.pl
+RewriteRule   ^/~quux/(.*)$  /~quux/${quux-map:$1}
+
+ +
+#!/path/to/perl
+
+#   disable buffered I/O which would lead
+#   to deadloops for the Apache server
+$| = 1;
+
+#   read URLs one per line from stdin and
+#   generate substitution URL on stdout
+while (<>) {
+    s|^foo/|bar/|;
+    print $_;
+}
+
+ +

This is a demonstration-only example and just rewrites + all URLs /~quux/foo/... to + /~quux/bar/.... Actually you can program + whatever you like. But notice that while such maps can be + used also by an average user, only the + system administrator can define it.

+
+
+ +
+ +
+ +
+ diff --git a/docs/manual/rewrite/rewrite_guide.xml.meta b/docs/manual/rewrite/rewrite_guide.xml.meta new file mode 100644 index 00000000000..65639c1a252 --- /dev/null +++ b/docs/manual/rewrite/rewrite_guide.xml.meta @@ -0,0 +1,11 @@ + + + + rewrite_guide + /rewrite/ + .. + + + en + + diff --git a/docs/manual/rewrite/rewrite_guide_advanced.html b/docs/manual/rewrite/rewrite_guide_advanced.html new file mode 100644 index 00000000000..07850dd6bac --- /dev/null +++ b/docs/manual/rewrite/rewrite_guide_advanced.html @@ -0,0 +1,3 @@ +URI: rewrite_guide_advanced.html.en +Content-Language: en +Content-type: text/html; charset=ISO-8859-1 diff --git a/docs/manual/rewrite/rewrite_guide_advanced.html.en b/docs/manual/rewrite/rewrite_guide_advanced.html.en new file mode 100644 index 00000000000..7244e42e716 --- /dev/null +++ b/docs/manual/rewrite/rewrite_guide_advanced.html.en @@ -0,0 +1,1288 @@ + + + +URL Rewriting Guide - Advanced topics - Apache HTTP Server + + + + + +
<-
+
+Apache > HTTP Server > Documentation > Version 2.0

URL Rewriting Guide - Advanced topics

+
+

Available Languages:  en 

+
+ + +

This document supplements the mod_rewrite + reference documentation. + It describes how one can use Apache's mod_rewrite + to solve typical URL-based problems with which webmasters are + commonony confronted. We give detailed descriptions on how to + solve each problem by configuring URL rewriting rulesets.

+ +
ATTENTION: Depending on your server configuration + it may be necessary to slightly change the examples for your + situation, e.g. adding the [PT] flag when + additionally using mod_alias and + mod_userdir, etc. Or rewriting a ruleset + to fit in .htaccess context instead + of per-server context. Always try to understand what a + particular ruleset really does before you use it. This + avoids many problems.
+ +
+ +
top
+
+

Webcluster through Homogeneous URL Layout

+ + + +
+
Description:
+ +
+

We want to create a homogeneous and consistent URL + layout over all WWW servers on a Intranet webcluster, i.e. + all URLs (per definition server local and thus server + dependent!) become actually server independent! + What we want is to give the WWW namespace a consistent + server-independent layout: no URL should have to include + any physically correct target server. The cluster itself + should drive us automatically to the physical target + host.

+
+ +
Solution:
+ +
+

First, the knowledge of the target servers come from + (distributed) external maps which contain information + where our users, groups and entities stay. The have the + form

+ +
+user1  server_of_user1
+user2  server_of_user2
+:      :
+
+ +

We put them into files map.xxx-to-host. + Second we need to instruct all servers to redirect URLs + of the forms

+ +
+/u/user/anypath
+/g/group/anypath
+/e/entity/anypath
+
+ +

to

+ +
+http://physical-host/u/user/anypath
+http://physical-host/g/group/anypath
+http://physical-host/e/entity/anypath
+
+ +

when the URL is not locally valid to a server. The + following ruleset does this for us by the help of the map + files (assuming that server0 is a default server which + will be used if a user has no entry in the map):

+ +
+RewriteEngine on
+
+RewriteMap      user-to-host   txt:/path/to/map.user-to-host
+RewriteMap     group-to-host   txt:/path/to/map.group-to-host
+RewriteMap    entity-to-host   txt:/path/to/map.entity-to-host
+
+RewriteRule   ^/u/([^/]+)/?(.*)   http://${user-to-host:$1|server0}/u/$1/$2
+RewriteRule   ^/g/([^/]+)/?(.*)  http://${group-to-host:$1|server0}/g/$1/$2
+RewriteRule   ^/e/([^/]+)/?(.*) http://${entity-to-host:$1|server0}/e/$1/$2
+
+RewriteRule   ^/([uge])/([^/]+)/?$          /$1/$2/.www/
+RewriteRule   ^/([uge])/([^/]+)/([^.]+.+)   /$1/$2/.www/$3\
+
+
+
+ +
top
+
+

Structured Homedirs

+ + + +
+
Description:
+ +
+

Some sites with thousands of users usually use a + structured homedir layout, i.e. each homedir is in a + subdirectory which begins for instance with the first + character of the username. So, /~foo/anypath + is /home/f/foo/.www/anypath + while /~bar/anypath is + /home/b/bar/.www/anypath.

+
+ +
Solution:
+ +
+

We use the following ruleset to expand the tilde URLs + into exactly the above layout.

+ +
+RewriteEngine on
+RewriteRule   ^/~(([a-z])[a-z0-9]+)(.*)  /home/$2/$1/.www$3
+
+
+
+ +
top
+
+

Filesystem Reorganization

+ + + +
+
Description:
+ +
+

This really is a hardcore example: a killer application + which heavily uses per-directory + RewriteRules to get a smooth look and feel + on the Web while its data structure is never touched or + adjusted. Background: net.sw is + my archive of freely available Unix software packages, + which I started to collect in 1992. It is both my hobby + and job to to this, because while I'm studying computer + science I have also worked for many years as a system and + network administrator in my spare time. Every week I need + some sort of software so I created a deep hierarchy of + directories where I stored the packages:

+ +
+drwxrwxr-x   2 netsw  users    512 Aug  3 18:39 Audio/
+drwxrwxr-x   2 netsw  users    512 Jul  9 14:37 Benchmark/
+drwxrwxr-x  12 netsw  users    512 Jul  9 00:34 Crypto/
+drwxrwxr-x   5 netsw  users    512 Jul  9 00:41 Database/
+drwxrwxr-x   4 netsw  users    512 Jul 30 19:25 Dicts/
+drwxrwxr-x  10 netsw  users    512 Jul  9 01:54 Graphic/
+drwxrwxr-x   5 netsw  users    512 Jul  9 01:58 Hackers/
+drwxrwxr-x   8 netsw  users    512 Jul  9 03:19 InfoSys/
+drwxrwxr-x   3 netsw  users    512 Jul  9 03:21 Math/
+drwxrwxr-x   3 netsw  users    512 Jul  9 03:24 Misc/
+drwxrwxr-x   9 netsw  users    512 Aug  1 16:33 Network/
+drwxrwxr-x   2 netsw  users    512 Jul  9 05:53 Office/
+drwxrwxr-x   7 netsw  users    512 Jul  9 09:24 SoftEng/
+drwxrwxr-x   7 netsw  users    512 Jul  9 12:17 System/
+drwxrwxr-x  12 netsw  users    512 Aug  3 20:15 Typesetting/
+drwxrwxr-x  10 netsw  users    512 Jul  9 14:08 X11/
+
+ +

In July 1996 I decided to make this archive public to + the world via a nice Web interface. "Nice" means that I + wanted to offer an interface where you can browse + directly through the archive hierarchy. And "nice" means + that I didn't wanted to change anything inside this + hierarchy - not even by putting some CGI scripts at the + top of it. Why? Because the above structure should be + later accessible via FTP as well, and I didn't want any + Web or CGI stuff to be there.

+
+ +
Solution:
+ +
+

The solution has two parts: The first is a set of CGI + scripts which create all the pages at all directory + levels on-the-fly. I put them under + /e/netsw/.www/ as follows:

+ +
+-rw-r--r--   1 netsw  users    1318 Aug  1 18:10 .wwwacl
+drwxr-xr-x  18 netsw  users     512 Aug  5 15:51 DATA/
+-rw-rw-rw-   1 netsw  users  372982 Aug  5 16:35 LOGFILE
+-rw-r--r--   1 netsw  users     659 Aug  4 09:27 TODO
+-rw-r--r--   1 netsw  users    5697 Aug  1 18:01 netsw-about.html
+-rwxr-xr-x   1 netsw  users     579 Aug  2 10:33 netsw-access.pl
+-rwxr-xr-x   1 netsw  users    1532 Aug  1 17:35 netsw-changes.cgi
+-rwxr-xr-x   1 netsw  users    2866 Aug  5 14:49 netsw-home.cgi
+drwxr-xr-x   2 netsw  users     512 Jul  8 23:47 netsw-img/
+-rwxr-xr-x   1 netsw  users   24050 Aug  5 15:49 netsw-lsdir.cgi
+-rwxr-xr-x   1 netsw  users    1589 Aug  3 18:43 netsw-search.cgi
+-rwxr-xr-x   1 netsw  users    1885 Aug  1 17:41 netsw-tree.cgi
+-rw-r--r--   1 netsw  users     234 Jul 30 16:35 netsw-unlimit.lst
+
+ +

The DATA/ subdirectory holds the above + directory structure, i.e. the real + net.sw stuff and gets + automatically updated via rdist from time to + time. The second part of the problem remains: how to link + these two structures together into one smooth-looking URL + tree? We want to hide the DATA/ directory + from the user while running the appropriate CGI scripts + for the various URLs. Here is the solution: first I put + the following into the per-directory configuration file + in the DocumentRoot + of the server to rewrite the announced URL + /net.sw/ to the internal path + /e/netsw:

+ +
+RewriteRule  ^net.sw$       net.sw/        [R]
+RewriteRule  ^net.sw/(.*)$  e/netsw/$1
+
+ +

The first rule is for requests which miss the trailing + slash! The second rule does the real thing. And then + comes the killer configuration which stays in the + per-directory config file + /e/netsw/.www/.wwwacl:

+ +
+Options       ExecCGI FollowSymLinks Includes MultiViews
+
+RewriteEngine on
+
+#  we are reached via /net.sw/ prefix
+RewriteBase   /net.sw/
+
+#  first we rewrite the root dir to
+#  the handling cgi script
+RewriteRule   ^$                       netsw-home.cgi     [L]
+RewriteRule   ^index\.html$            netsw-home.cgi     [L]
+
+#  strip out the subdirs when
+#  the browser requests us from perdir pages
+RewriteRule   ^.+/(netsw-[^/]+/.+)$    $1                 [L]
+
+#  and now break the rewriting for local files
+RewriteRule   ^netsw-home\.cgi.*       -                  [L]
+RewriteRule   ^netsw-changes\.cgi.*    -                  [L]
+RewriteRule   ^netsw-search\.cgi.*     -                  [L]
+RewriteRule   ^netsw-tree\.cgi$        -                  [L]
+RewriteRule   ^netsw-about\.html$      -                  [L]
+RewriteRule   ^netsw-img/.*$           -                  [L]
+
+#  anything else is a subdir which gets handled
+#  by another cgi script
+RewriteRule   !^netsw-lsdir\.cgi.*     -                  [C]
+RewriteRule   (.*)                     netsw-lsdir.cgi/$1
+
+ +

Some hints for interpretation:

+ +
    +
  1. Notice the L (last) flag and no + substitution field ('-') in the forth part
  2. + +
  3. Notice the ! (not) character and + the C (chain) flag at the first rule + in the last part
  4. + +
  5. Notice the catch-all pattern in the last rule
  6. +
+
+
+ +
top
+
+

Redirect Failing URLs To Other Webserver

+ + + +
+
Description:
+ +
+

A typical FAQ about URL rewriting is how to redirect + failing requests on webserver A to webserver B. Usually + this is done via ErrorDocument CGI-scripts in Perl, but + there is also a mod_rewrite solution. + But notice that this performs more poorly than using an + ErrorDocument + CGI-script!

+
+ +
Solution:
+ +
+

The first solution has the best performance but less + flexibility, and is less error safe:

+ +
+RewriteEngine on
+RewriteCond   /your/docroot/%{REQUEST_FILENAME} !-f
+RewriteRule   ^(.+)                             http://webserverB.dom/$1
+
+ +

The problem here is that this will only work for pages + inside the DocumentRoot. While you can add more + Conditions (for instance to also handle homedirs, etc.) + there is better variant:

+ +
+RewriteEngine on
+RewriteCond   %{REQUEST_URI} !-U
+RewriteRule   ^(.+)          http://webserverB.dom/$1
+
+ +

This uses the URL look-ahead feature of mod_rewrite. + The result is that this will work for all types of URLs + and is a safe way. But it does a performance impact on + the webserver, because for every request there is one + more internal subrequest. So, if your webserver runs on a + powerful CPU, use this one. If it is a slow machine, use + the first approach or better a ErrorDocument CGI-script.

+
+
+ +
top
+
+

Archive Access Multiplexer

+ + + +
+
Description:
+ +
+

Do you know the great CPAN (Comprehensive Perl Archive + Network) under http://www.perl.com/CPAN? + This does a redirect to one of several FTP servers around + the world which carry a CPAN mirror and is approximately + near the location of the requesting client. Actually this + can be called an FTP access multiplexing service. While + CPAN runs via CGI scripts, how can a similar approach + implemented via mod_rewrite?

+
+ +
Solution:
+ +
+

First we notice that from version 3.0.0 + mod_rewrite can + also use the "ftp:" scheme on redirects. + And second, the location approximation can be done by a + RewriteMap + over the top-level domain of the client. + With a tricky chained ruleset we can use this top-level + domain as a key to our multiplexing map.

+ +
+RewriteEngine on
+RewriteMap    multiplex                txt:/path/to/map.cxan
+RewriteRule   ^/CxAN/(.*)              %{REMOTE_HOST}::$1                 [C]
+RewriteRule   ^.+\.([a-zA-Z]+)::(.*)$  ${multiplex:$1|ftp.default.dom}$2  [R,L]
+
+ +
+##
+##  map.cxan -- Multiplexing Map for CxAN
+##
+
+de        ftp://ftp.cxan.de/CxAN/
+uk        ftp://ftp.cxan.uk/CxAN/
+com       ftp://ftp.cxan.com/CxAN/
+ :
+##EOF##
+
+
+
+ +
top
+
+

Content Handling

+ + + +

Browser Dependent Content

+ + + +
+
Description:
+ +
+

At least for important top-level pages it is sometimes + necessary to provide the optimum of browser dependent + content, i.e. one has to provide a maximum version for the + latest Netscape variants, a minimum version for the Lynx + browsers and a average feature version for all others.

+
+ +
Solution:
+ +
+

We cannot use content negotiation because the browsers do + not provide their type in that form. Instead we have to + act on the HTTP header "User-Agent". The following condig + does the following: If the HTTP header "User-Agent" + begins with "Mozilla/3", the page foo.html + is rewritten to foo.NS.html and and the + rewriting stops. If the browser is "Lynx" or "Mozilla" of + version 1 or 2 the URL becomes foo.20.html. + All other browsers receive page foo.32.html. + This is done by the following ruleset:

+ +
+RewriteCond %{HTTP_USER_AGENT}  ^Mozilla/3.*
+RewriteRule ^foo\.html$         foo.NS.html          [L]
+
+RewriteCond %{HTTP_USER_AGENT}  ^Lynx/.*         [OR]
+RewriteCond %{HTTP_USER_AGENT}  ^Mozilla/[12].*
+RewriteRule ^foo\.html$         foo.20.html          [L]
+
+RewriteRule ^foo\.html$         foo.32.html          [L]
+
+
+
+ + + +

Dynamic Mirror

+ + + +
+
Description:
+ +
+

Assume there are nice webpages on remote hosts we want + to bring into our namespace. For FTP servers we would use + the mirror program which actually maintains an + explicit up-to-date copy of the remote data on the local + machine. For a webserver we could use the program + webcopy which acts similar via HTTP. But both + techniques have one major drawback: The local copy is + always just as up-to-date as often we run the program. It + would be much better if the mirror is not a static one we + have to establish explicitly. Instead we want a dynamic + mirror with data which gets updated automatically when + there is need (updated data on the remote host).

+
+ +
Solution:
+ +
+

To provide this feature we map the remote webpage or even + the complete remote webarea to our namespace by the use + of the Proxy Throughput feature + (flag [P]):

+ +
+RewriteEngine  on
+RewriteBase    /~quux/
+RewriteRule    ^hotsheet/(.*)$  http://www.tstimpreso.com/hotsheet/$1  [P]
+
+ +
+RewriteEngine  on
+RewriteBase    /~quux/
+RewriteRule    ^usa-news\.html$   http://www.quux-corp.com/news/index.html  [P]
+
+
+
+ + + +

Reverse Dynamic Mirror

+ + + +
+
Description:
+ +
...
+ +
Solution:
+ +
+
+RewriteEngine on
+RewriteCond   /mirror/of/remotesite/$1           -U
+RewriteRule   ^http://www\.remotesite\.com/(.*)$ /mirror/of/remotesite/$1
+
+
+
+ + + +

Retrieve Missing Data from Intranet

+ + + +
+
Description:
+ +
+

This is a tricky way of virtually running a corporate + (external) Internet webserver + (www.quux-corp.dom), while actually keeping + and maintaining its data on a (internal) Intranet webserver + (www2.quux-corp.dom) which is protected by a + firewall. The trick is that on the external webserver we + retrieve the requested data on-the-fly from the internal + one.

+
+ +
Solution:
+ +
+

First, we have to make sure that our firewall still + protects the internal webserver and that only the + external webserver is allowed to retrieve data from it. + For a packet-filtering firewall we could for instance + configure a firewall ruleset like the following:

+ +
+ALLOW Host www.quux-corp.dom Port >1024 --> Host www2.quux-corp.dom Port 80
+DENY  Host *                 Port *     --> Host www2.quux-corp.dom Port 80
+
+ +

Just adjust it to your actual configuration syntax. + Now we can establish the mod_rewrite + rules which request the missing data in the background + through the proxy throughput feature:

+ +
+RewriteRule ^/~([^/]+)/?(.*)          /home/$1/.www/$2
+RewriteCond %{REQUEST_FILENAME}       !-f
+RewriteCond %{REQUEST_FILENAME}       !-d
+RewriteRule ^/home/([^/]+)/.www/?(.*) http://www2.quux-corp.dom/~$1/pub/$2 [P]
+
+
+
+ + + +

Load Balancing

+ + + +
+
Description:
+ +
+

Suppose we want to load balance the traffic to + www.foo.com over www[0-5].foo.com + (a total of 6 servers). How can this be done?

+
+ +
Solution:
+ +
+

There are a lot of possible solutions for this problem. + We will discuss first a commonly known DNS-based variant + and then the special one with mod_rewrite:

+ +
    +
  1. + DNS Round-Robin + +

    The simplest method for load-balancing is to use + the DNS round-robin feature of BIND. + Here you just configure www[0-9].foo.com + as usual in your DNS with A(address) records, e.g.

    + +
    +www0   IN  A       1.2.3.1
    +www1   IN  A       1.2.3.2
    +www2   IN  A       1.2.3.3
    +www3   IN  A       1.2.3.4
    +www4   IN  A       1.2.3.5
    +www5   IN  A       1.2.3.6
    +
    + +

    Then you additionally add the following entry:

    + +
    +www    IN  CNAME   www0.foo.com.
    +       IN  CNAME   www1.foo.com.
    +       IN  CNAME   www2.foo.com.
    +       IN  CNAME   www3.foo.com.
    +       IN  CNAME   www4.foo.com.
    +       IN  CNAME   www5.foo.com.
    +       IN  CNAME   www6.foo.com.
    +
    + +

    Notice that this seems wrong, but is actually an + intended feature of BIND and can be used + in this way. However, now when www.foo.com gets + resolved, BIND gives out www0-www6 + - but in a slightly permutated/rotated order every time. + This way the clients are spread over the various + servers. But notice that this not a perfect load + balancing scheme, because DNS resolve information + gets cached by the other nameservers on the net, so + once a client has resolved www.foo.com + to a particular wwwN.foo.com, all + subsequent requests also go to this particular name + wwwN.foo.com. But the final result is + ok, because the total sum of the requests are really + spread over the various webservers.

    +
  2. + +
  3. + DNS Load-Balancing + +

    A sophisticated DNS-based method for + load-balancing is to use the program + lbnamed which can be found at + http://www.stanford.edu/~schemers/docs/lbnamed/lbnamed.html. + It is a Perl 5 program in conjunction with auxilliary + tools which provides a real load-balancing for + DNS.

    +
  4. + +
  5. + Proxy Throughput Round-Robin + +

    In this variant we use mod_rewrite + and its proxy throughput feature. First we dedicate + www0.foo.com to be actually + www.foo.com by using a single

    + +
    +www    IN  CNAME   www0.foo.com.
    +
    + +

    entry in the DNS. Then we convert + www0.foo.com to a proxy-only server, + i.e. we configure this machine so all arriving URLs + are just pushed through the internal proxy to one of + the 5 other servers (www1-www5). To + accomplish this we first establish a ruleset which + contacts a load balancing script lb.pl + for all URLs.

    + +
    +RewriteEngine on
    +RewriteMap    lb      prg:/path/to/lb.pl
    +RewriteRule   ^/(.+)$ ${lb:$1}           [P,L]
    +
    + +

    Then we write lb.pl:

    + +
    +#!/path/to/perl
    +##
    +##  lb.pl -- load balancing script
    +##
    +
    +$| = 1;
    +
    +$name   = "www";     # the hostname base
    +$first  = 1;         # the first server (not 0 here, because 0 is myself)
    +$last   = 5;         # the last server in the round-robin
    +$domain = "foo.dom"; # the domainname
    +
    +$cnt = 0;
    +while (<STDIN>) {
    +    $cnt = (($cnt+1) % ($last+1-$first));
    +    $server = sprintf("%s%d.%s", $name, $cnt+$first, $domain);
    +    print "http://$server/$_";
    +}
    +
    +##EOF##
    +
    + +
    A last notice: Why is this useful? Seems like + www0.foo.com still is overloaded? The + answer is yes, it is overloaded, but with plain proxy + throughput requests, only! All SSI, CGI, ePerl, etc. + processing is completely done on the other machines. + This is the essential point.
    +
  6. + +
  7. + Hardware/TCP Round-Robin + +

    There is a hardware solution available, too. Cisco + has a beast called LocalDirector which does a load + balancing at the TCP/IP level. Actually this is some + sort of a circuit level gateway in front of a + webcluster. If you have enough money and really need + a solution with high performance, use this one.

    +
  8. +
+
+
+ + + +

New MIME-type, New Service

+ + + +
+
Description:
+ +
+

On the net there are a lot of nifty CGI programs. But + their usage is usually boring, so a lot of webmaster + don't use them. Even Apache's Action handler feature for + MIME-types is only appropriate when the CGI programs + don't need special URLs (actually PATH_INFO + and QUERY_STRINGS) as their input. First, + let us configure a new file type with extension + .scgi (for secure CGI) which will be processed + by the popular cgiwrap program. The problem + here is that for instance we use a Homogeneous URL Layout + (see above) a file inside the user homedirs has the URL + /u/user/foo/bar.scgi. But + cgiwrap needs the URL in the form + /~user/foo/bar.scgi/. The following rule + solves the problem:

+ +
+RewriteRule ^/[uge]/([^/]+)/\.www/(.+)\.scgi(.*) ...
+... /internal/cgi/user/cgiwrap/~$1/$2.scgi$3  [NS,T=application/x-http-cgi]
+
+ +

Or assume we have some more nifty programs: + wwwlog (which displays the + access.log for a URL subtree and + wwwidx (which runs Glimpse on a URL + subtree). We have to provide the URL area to these + programs so they know on which area they have to act on. + But usually this ugly, because they are all the times + still requested from that areas, i.e. typically we would + run the swwidx program from within + /u/user/foo/ via hyperlink to

+ +
+/internal/cgi/user/swwidx?i=/u/user/foo/
+
+ +

which is ugly. Because we have to hard-code + both the location of the area + and the location of the CGI inside the + hyperlink. When we have to reorganize the area, we spend a + lot of time changing the various hyperlinks.

+
+ +
Solution:
+ +
+

The solution here is to provide a special new URL format + which automatically leads to the proper CGI invocation. + We configure the following:

+ +
+RewriteRule   ^/([uge])/([^/]+)(/?.*)/\*  /internal/cgi/user/wwwidx?i=/$1/$2$3/
+RewriteRule   ^/([uge])/([^/]+)(/?.*):log /internal/cgi/user/wwwlog?f=/$1/$2$3
+
+ +

Now the hyperlink to search at + /u/user/foo/ reads only

+ +
+HREF="*"
+
+ +

which internally gets automatically transformed to

+ +
+/internal/cgi/user/wwwidx?i=/u/user/foo/
+
+ +

The same approach leads to an invocation for the + access log CGI program when the hyperlink + :log gets used.

+
+
+ + + +

On-the-fly Content-Regeneration

+ + + +
+
Description:
+ +
+

Here comes a really esoteric feature: Dynamically + generated but statically served pages, i.e. pages should be + delivered as pure static pages (read from the filesystem + and just passed through), but they have to be generated + dynamically by the webserver if missing. This way you can + have CGI-generated pages which are statically served unless + one (or a cronjob) removes the static contents. Then the + contents gets refreshed.

+
+ +
Solution:
+ +
+ This is done via the following ruleset: + +
+RewriteCond %{REQUEST_FILENAME}   !-s
+RewriteRule ^page\.html$          page.cgi   [T=application/x-httpd-cgi,L]
+
+ +

Here a request to page.html leads to a + internal run of a corresponding page.cgi if + page.html is still missing or has filesize + null. The trick here is that page.cgi is a + usual CGI script which (additionally to its STDOUT) + writes its output to the file page.html. + Once it was run, the server sends out the data of + page.html. When the webmaster wants to force + a refresh the contents, he just removes + page.html (usually done by a cronjob).

+
+
+ + + +

Document With Autorefresh

+ + + +
+
Description:
+ +
+

Wouldn't it be nice while creating a complex webpage if + the webbrowser would automatically refresh the page every + time we write a new version from within our editor? + Impossible?

+
+ +
Solution:
+ +
+

No! We just combine the MIME multipart feature, the + webserver NPH feature and the URL manipulation power of + mod_rewrite. First, we establish a new + URL feature: Adding just :refresh to any + URL causes this to be refreshed every time it gets + updated on the filesystem.

+ +
+RewriteRule   ^(/[uge]/[^/]+/?.*):refresh  /internal/cgi/apache/nph-refresh?f=$1
+
+ +

Now when we reference the URL

+ +
+/u/foo/bar/page.html:refresh
+
+ +

this leads to the internal invocation of the URL

+ +
+/internal/cgi/apache/nph-refresh?f=/u/foo/bar/page.html
+
+ +

The only missing part is the NPH-CGI script. Although + one would usually say "left as an exercise to the reader" + ;-) I will provide this, too.

+ +
+#!/sw/bin/perl
+##
+##  nph-refresh -- NPH/CGI script for auto refreshing pages
+##  Copyright (c) 1997 Ralf S. Engelschall, All Rights Reserved.
+##
+$| = 1;
+
+#   split the QUERY_STRING variable
+@pairs = split(/&/, $ENV{'QUERY_STRING'});
+foreach $pair (@pairs) {
+    ($name, $value) = split(/=/, $pair);
+    $name =~ tr/A-Z/a-z/;
+    $name = 'QS_' . $name;
+    $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
+    eval "\$$name = \"$value\"";
+}
+$QS_s = 1 if ($QS_s eq '');
+$QS_n = 3600 if ($QS_n eq '');
+if ($QS_f eq '') {
+    print "HTTP/1.0 200 OK\n";
+    print "Content-type: text/html\n\n";
+    print "&lt;b&gt;ERROR&lt;/b&gt;: No file given\n";
+    exit(0);
+}
+if (! -f $QS_f) {
+    print "HTTP/1.0 200 OK\n";
+    print "Content-type: text/html\n\n";
+    print "&lt;b&gt;ERROR&lt;/b&gt;: File $QS_f not found\n";
+    exit(0);
+}
+
+sub print_http_headers_multipart_begin {
+    print "HTTP/1.0 200 OK\n";
+    $bound = "ThisRandomString12345";
+    print "Content-type: multipart/x-mixed-replace;boundary=$bound\n";
+    &print_http_headers_multipart_next;
+}
+
+sub print_http_headers_multipart_next {
+    print "\n--$bound\n";
+}
+
+sub print_http_headers_multipart_end {
+    print "\n--$bound--\n";
+}
+
+sub displayhtml {
+    local($buffer) = @_;
+    $len = length($buffer);
+    print "Content-type: text/html\n";
+    print "Content-length: $len\n\n";
+    print $buffer;
+}
+
+sub readfile {
+    local($file) = @_;
+    local(*FP, $size, $buffer, $bytes);
+    ($x, $x, $x, $x, $x, $x, $x, $size) = stat($file);
+    $size = sprintf("%d", $size);
+    open(FP, "&lt;$file");
+    $bytes = sysread(FP, $buffer, $size);
+    close(FP);
+    return $buffer;
+}
+
+$buffer = &readfile($QS_f);
+&print_http_headers_multipart_begin;
+&displayhtml($buffer);
+
+sub mystat {
+    local($file) = $_[0];
+    local($time);
+
+    ($x, $x, $x, $x, $x, $x, $x, $x, $x, $mtime) = stat($file);
+    return $mtime;
+}
+
+$mtimeL = &mystat($QS_f);
+$mtime = $mtime;
+for ($n = 0; $n &lt; $QS_n; $n++) {
+    while (1) {
+        $mtime = &mystat($QS_f);
+        if ($mtime ne $mtimeL) {
+            $mtimeL = $mtime;
+            sleep(2);
+            $buffer = &readfile($QS_f);
+            &print_http_headers_multipart_next;
+            &displayhtml($buffer);
+            sleep(5);
+            $mtimeL = &mystat($QS_f);
+            last;
+        }
+        sleep($QS_s);
+    }
+}
+
+&print_http_headers_multipart_end;
+
+exit(0);
+
+##EOF##
+
+
+
+ + + +

Mass Virtual Hosting

+ + + +
+
Description:
+ +
+

The <VirtualHost> feature of Apache is nice + and works great when you just have a few dozens + virtual hosts. But when you are an ISP and have hundreds of + virtual hosts to provide this feature is not the best + choice.

+
+ +
Solution:
+ +
+

To provide this feature we map the remote webpage or even + the complete remote webarea to our namespace by the use + of the Proxy Throughput feature (flag [P]):

+ +
+##
+##  vhost.map
+##
+www.vhost1.dom:80  /path/to/docroot/vhost1
+www.vhost2.dom:80  /path/to/docroot/vhost2
+     :
+www.vhostN.dom:80  /path/to/docroot/vhostN
+
+ +
+##
+##  httpd.conf
+##
+    :
+#   use the canonical hostname on redirects, etc.
+UseCanonicalName on
+
+    :
+#   add the virtual host in front of the CLF-format
+CustomLog  /path/to/access_log  "%{VHOST}e %h %l %u %t \"%r\" %>s %b"
+    :
+
+#   enable the rewriting engine in the main server
+RewriteEngine on
+
+#   define two maps: one for fixing the URL and one which defines
+#   the available virtual hosts with their corresponding
+#   DocumentRoot.
+RewriteMap    lowercase    int:tolower
+RewriteMap    vhost        txt:/path/to/vhost.map
+
+#   Now do the actual virtual host mapping
+#   via a huge and complicated single rule:
+#
+#   1. make sure we don't map for common locations
+RewriteCond   %{REQUEST_URI}  !^/commonurl1/.*
+RewriteCond   %{REQUEST_URI}  !^/commonurl2/.*
+    :
+RewriteCond   %{REQUEST_URI}  !^/commonurlN/.*
+#
+#   2. make sure we have a Host header, because
+#      currently our approach only supports
+#      virtual hosting through this header
+RewriteCond   %{HTTP_HOST}  !^$
+#
+#   3. lowercase the hostname
+RewriteCond   ${lowercase:%{HTTP_HOST}|NONE}  ^(.+)$
+#
+#   4. lookup this hostname in vhost.map and
+#      remember it only when it is a path
+#      (and not "NONE" from above)
+RewriteCond   ${vhost:%1}  ^(/.*)$
+#
+#   5. finally we can map the URL to its docroot location
+#      and remember the virtual host for logging puposes
+RewriteRule   ^/(.*)$   %1/$1  [E=VHOST:${lowercase:%{HTTP_HOST}}]
+    :
+
+
+
+ + + +
top
+
+

Access Restriction

+ + + +

Host Deny

+ + + +
+
Description:
+ +
+

How can we forbid a list of externally configured hosts + from using our server?

+
+ +
Solution:
+ +
+

For Apache >= 1.3b6:

+ +
+RewriteEngine on
+RewriteMap    hosts-deny  txt:/path/to/hosts.deny
+RewriteCond   ${hosts-deny:%{REMOTE_HOST}|NOT-FOUND} !=NOT-FOUND [OR]
+RewriteCond   ${hosts-deny:%{REMOTE_ADDR}|NOT-FOUND} !=NOT-FOUND
+RewriteRule   ^/.*  -  [F]
+
+ +

For Apache <= 1.3b6:

+ +
+RewriteEngine on
+RewriteMap    hosts-deny  txt:/path/to/hosts.deny
+RewriteRule   ^/(.*)$ ${hosts-deny:%{REMOTE_HOST}|NOT-FOUND}/$1
+RewriteRule   !^NOT-FOUND/.* - [F]
+RewriteRule   ^NOT-FOUND/(.*)$ ${hosts-deny:%{REMOTE_ADDR}|NOT-FOUND}/$1
+RewriteRule   !^NOT-FOUND/.* - [F]
+RewriteRule   ^NOT-FOUND/(.*)$ /$1
+
+ +
+##
+##  hosts.deny
+##
+##  ATTENTION! This is a map, not a list, even when we treat it as such.
+##             mod_rewrite parses it for key/value pairs, so at least a
+##             dummy value "-" must be present for each entry.
+##
+
+193.102.180.41 -
+bsdti1.sdm.de  -
+192.76.162.40  -
+
+
+
+ + + +

Proxy Deny

+ + + +
+
Description:
+ +
+

How can we forbid a certain host or even a user of a + special host from using the Apache proxy?

+
+ +
Solution:
+ +
+

We first have to make sure mod_rewrite + is below(!) mod_proxy in the Configuration + file when compiling the Apache webserver. This way it gets + called before mod_proxy. Then we + configure the following for a host-dependent deny...

+ +
+RewriteCond %{REMOTE_HOST} ^badhost\.mydomain\.com$
+RewriteRule !^http://[^/.]\.mydomain.com.*  - [F]
+
+ +

...and this one for a user@host-dependent deny:

+ +
+RewriteCond %{REMOTE_IDENT}@%{REMOTE_HOST}  ^badguy@badhost\.mydomain\.com$
+RewriteRule !^http://[^/.]\.mydomain.com.*  - [F]
+
+
+
+ + + +

Special Authentication Variant

+ + + +
+
Description:
+ +
+

Sometimes a very special authentication is needed, for + instance a authentication which checks for a set of + explicitly configured users. Only these should receive + access and without explicit prompting (which would occur + when using the Basic Auth via mod_auth).

+
+ +
Solution:
+ +
+

We use a list of rewrite conditions to exclude all except + our friends:

+ +
+RewriteCond %{REMOTE_IDENT}@%{REMOTE_HOST} !^friend1@client1.quux-corp\.com$
+RewriteCond %{REMOTE_IDENT}@%{REMOTE_HOST} !^friend2@client2.quux-corp\.com$
+RewriteCond %{REMOTE_IDENT}@%{REMOTE_HOST} !^friend3@client3.quux-corp\.com$
+RewriteRule ^/~quux/only-for-friends/      -                                 [F]
+
+
+
+ + + +

Referer-based Deflector

+ + + +
+
Description:
+ +
+

How can we program a flexible URL Deflector which acts + on the "Referer" HTTP header and can be configured with as + many referring pages as we like?

+
+ +
Solution:
+ +
+

Use the following really tricky ruleset...

+ +
+RewriteMap  deflector txt:/path/to/deflector.map
+
+RewriteCond %{HTTP_REFERER} !=""
+RewriteCond ${deflector:%{HTTP_REFERER}} ^-$
+RewriteRule ^.* %{HTTP_REFERER} [R,L]
+
+RewriteCond %{HTTP_REFERER} !=""
+RewriteCond ${deflector:%{HTTP_REFERER}|NOT-FOUND} !=NOT-FOUND
+RewriteRule ^.* ${deflector:%{HTTP_REFERER}} [R,L]
+
+ +

... in conjunction with a corresponding rewrite + map:

+ +
+##
+##  deflector.map
+##
+
+http://www.badguys.com/bad/index.html    -
+http://www.badguys.com/bad/index2.html   -
+http://www.badguys.com/bad/index3.html   http://somewhere.com/
+
+ +

This automatically redirects the request back to the + referring page (when "-" is used as the value + in the map) or to a specific URL (when an URL is specified + in the map as the second argument).

+
+
+ + + +
+
+

Available Languages:  en 

+
+ \ No newline at end of file diff --git a/docs/manual/rewrite/rewrite_guide_advanced.xml b/docs/manual/rewrite/rewrite_guide_advanced.xml new file mode 100644 index 00000000000..257f3889346 --- /dev/null +++ b/docs/manual/rewrite/rewrite_guide_advanced.xml @@ -0,0 +1,1290 @@ + + + + + + + + + + + URL Rewriting Guide - Advanced topics + + + +

This document supplements the mod_rewrite + reference documentation. + It describes how one can use Apache's mod_rewrite + to solve typical URL-based problems with which webmasters are + commonony confronted. We give detailed descriptions on how to + solve each problem by configuring URL rewriting rulesets.

+ + ATTENTION: Depending on your server configuration + it may be necessary to slightly change the examples for your + situation, e.g. adding the [PT] flag when + additionally using mod_alias and + mod_userdir, etc. Or rewriting a ruleset + to fit in .htaccess context instead + of per-server context. Always try to understand what a + particular ruleset really does before you use it. This + avoids many problems. + +
+Module +documentation +mod_rewrite +introduction +Technical details + + +
+ + Webcluster through Homogeneous URL Layout + +
+
Description:
+ +
+

We want to create a homogeneous and consistent URL + layout over all WWW servers on a Intranet webcluster, i.e. + all URLs (per definition server local and thus server + dependent!) become actually server independent! + What we want is to give the WWW namespace a consistent + server-independent layout: no URL should have to include + any physically correct target server. The cluster itself + should drive us automatically to the physical target + host.

+
+ +
Solution:
+ +
+

First, the knowledge of the target servers come from + (distributed) external maps which contain information + where our users, groups and entities stay. The have the + form

+ +
+user1  server_of_user1
+user2  server_of_user2
+:      :
+
+ +

We put them into files map.xxx-to-host. + Second we need to instruct all servers to redirect URLs + of the forms

+ +
+/u/user/anypath
+/g/group/anypath
+/e/entity/anypath
+
+ +

to

+ +
+http://physical-host/u/user/anypath
+http://physical-host/g/group/anypath
+http://physical-host/e/entity/anypath
+
+ +

when the URL is not locally valid to a server. The + following ruleset does this for us by the help of the map + files (assuming that server0 is a default server which + will be used if a user has no entry in the map):

+ +
+RewriteEngine on
+
+RewriteMap      user-to-host   txt:/path/to/map.user-to-host
+RewriteMap     group-to-host   txt:/path/to/map.group-to-host
+RewriteMap    entity-to-host   txt:/path/to/map.entity-to-host
+
+RewriteRule   ^/u/([^/]+)/?(.*)   http://${user-to-host:$1|server0}/u/$1/$2
+RewriteRule   ^/g/([^/]+)/?(.*)  http://${group-to-host:$1|server0}/g/$1/$2
+RewriteRule   ^/e/([^/]+)/?(.*) http://${entity-to-host:$1|server0}/e/$1/$2
+
+RewriteRule   ^/([uge])/([^/]+)/?$          /$1/$2/.www/
+RewriteRule   ^/([uge])/([^/]+)/([^.]+.+)   /$1/$2/.www/$3\
+
+
+
+ +
+ +
+ + Structured Homedirs + +
+
Description:
+ +
+

Some sites with thousands of users usually use a + structured homedir layout, i.e. each homedir is in a + subdirectory which begins for instance with the first + character of the username. So, /~foo/anypath + is /home/f/foo/.www/anypath + while /~bar/anypath is + /home/b/bar/.www/anypath.

+
+ +
Solution:
+ +
+

We use the following ruleset to expand the tilde URLs + into exactly the above layout.

+ +
+RewriteEngine on
+RewriteRule   ^/~(([a-z])[a-z0-9]+)(.*)  /home/$2/$1/.www$3
+
+
+
+ +
+ +
+ + Filesystem Reorganization + +
+
Description:
+ +
+

This really is a hardcore example: a killer application + which heavily uses per-directory + RewriteRules to get a smooth look and feel + on the Web while its data structure is never touched or + adjusted. Background: net.sw is + my archive of freely available Unix software packages, + which I started to collect in 1992. It is both my hobby + and job to to this, because while I'm studying computer + science I have also worked for many years as a system and + network administrator in my spare time. Every week I need + some sort of software so I created a deep hierarchy of + directories where I stored the packages:

+ +
+drwxrwxr-x   2 netsw  users    512 Aug  3 18:39 Audio/
+drwxrwxr-x   2 netsw  users    512 Jul  9 14:37 Benchmark/
+drwxrwxr-x  12 netsw  users    512 Jul  9 00:34 Crypto/
+drwxrwxr-x   5 netsw  users    512 Jul  9 00:41 Database/
+drwxrwxr-x   4 netsw  users    512 Jul 30 19:25 Dicts/
+drwxrwxr-x  10 netsw  users    512 Jul  9 01:54 Graphic/
+drwxrwxr-x   5 netsw  users    512 Jul  9 01:58 Hackers/
+drwxrwxr-x   8 netsw  users    512 Jul  9 03:19 InfoSys/
+drwxrwxr-x   3 netsw  users    512 Jul  9 03:21 Math/
+drwxrwxr-x   3 netsw  users    512 Jul  9 03:24 Misc/
+drwxrwxr-x   9 netsw  users    512 Aug  1 16:33 Network/
+drwxrwxr-x   2 netsw  users    512 Jul  9 05:53 Office/
+drwxrwxr-x   7 netsw  users    512 Jul  9 09:24 SoftEng/
+drwxrwxr-x   7 netsw  users    512 Jul  9 12:17 System/
+drwxrwxr-x  12 netsw  users    512 Aug  3 20:15 Typesetting/
+drwxrwxr-x  10 netsw  users    512 Jul  9 14:08 X11/
+
+ +

In July 1996 I decided to make this archive public to + the world via a nice Web interface. "Nice" means that I + wanted to offer an interface where you can browse + directly through the archive hierarchy. And "nice" means + that I didn't wanted to change anything inside this + hierarchy - not even by putting some CGI scripts at the + top of it. Why? Because the above structure should be + later accessible via FTP as well, and I didn't want any + Web or CGI stuff to be there.

+
+ +
Solution:
+ +
+

The solution has two parts: The first is a set of CGI + scripts which create all the pages at all directory + levels on-the-fly. I put them under + /e/netsw/.www/ as follows:

+ +
+-rw-r--r--   1 netsw  users    1318 Aug  1 18:10 .wwwacl
+drwxr-xr-x  18 netsw  users     512 Aug  5 15:51 DATA/
+-rw-rw-rw-   1 netsw  users  372982 Aug  5 16:35 LOGFILE
+-rw-r--r--   1 netsw  users     659 Aug  4 09:27 TODO
+-rw-r--r--   1 netsw  users    5697 Aug  1 18:01 netsw-about.html
+-rwxr-xr-x   1 netsw  users     579 Aug  2 10:33 netsw-access.pl
+-rwxr-xr-x   1 netsw  users    1532 Aug  1 17:35 netsw-changes.cgi
+-rwxr-xr-x   1 netsw  users    2866 Aug  5 14:49 netsw-home.cgi
+drwxr-xr-x   2 netsw  users     512 Jul  8 23:47 netsw-img/
+-rwxr-xr-x   1 netsw  users   24050 Aug  5 15:49 netsw-lsdir.cgi
+-rwxr-xr-x   1 netsw  users    1589 Aug  3 18:43 netsw-search.cgi
+-rwxr-xr-x   1 netsw  users    1885 Aug  1 17:41 netsw-tree.cgi
+-rw-r--r--   1 netsw  users     234 Jul 30 16:35 netsw-unlimit.lst
+
+ +

The DATA/ subdirectory holds the above + directory structure, i.e. the real + net.sw stuff and gets + automatically updated via rdist from time to + time. The second part of the problem remains: how to link + these two structures together into one smooth-looking URL + tree? We want to hide the DATA/ directory + from the user while running the appropriate CGI scripts + for the various URLs. Here is the solution: first I put + the following into the per-directory configuration file + in the DocumentRoot + of the server to rewrite the announced URL + /net.sw/ to the internal path + /e/netsw:

+ +
+RewriteRule  ^net.sw$       net.sw/        [R]
+RewriteRule  ^net.sw/(.*)$  e/netsw/$1
+
+ +

The first rule is for requests which miss the trailing + slash! The second rule does the real thing. And then + comes the killer configuration which stays in the + per-directory config file + /e/netsw/.www/.wwwacl:

+ +
+Options       ExecCGI FollowSymLinks Includes MultiViews
+
+RewriteEngine on
+
+#  we are reached via /net.sw/ prefix
+RewriteBase   /net.sw/
+
+#  first we rewrite the root dir to
+#  the handling cgi script
+RewriteRule   ^$                       netsw-home.cgi     [L]
+RewriteRule   ^index\.html$            netsw-home.cgi     [L]
+
+#  strip out the subdirs when
+#  the browser requests us from perdir pages
+RewriteRule   ^.+/(netsw-[^/]+/.+)$    $1                 [L]
+
+#  and now break the rewriting for local files
+RewriteRule   ^netsw-home\.cgi.*       -                  [L]
+RewriteRule   ^netsw-changes\.cgi.*    -                  [L]
+RewriteRule   ^netsw-search\.cgi.*     -                  [L]
+RewriteRule   ^netsw-tree\.cgi$        -                  [L]
+RewriteRule   ^netsw-about\.html$      -                  [L]
+RewriteRule   ^netsw-img/.*$           -                  [L]
+
+#  anything else is a subdir which gets handled
+#  by another cgi script
+RewriteRule   !^netsw-lsdir\.cgi.*     -                  [C]
+RewriteRule   (.*)                     netsw-lsdir.cgi/$1
+
+ +

Some hints for interpretation:

+ +
    +
  1. Notice the L (last) flag and no + substitution field ('-') in the forth part
  2. + +
  3. Notice the ! (not) character and + the C (chain) flag at the first rule + in the last part
  4. + +
  5. Notice the catch-all pattern in the last rule
  6. +
+
+
+ +
+ +
+ + Redirect Failing URLs To Other Webserver + +
+
Description:
+ +
+

A typical FAQ about URL rewriting is how to redirect + failing requests on webserver A to webserver B. Usually + this is done via ErrorDocument CGI-scripts in Perl, but + there is also a mod_rewrite solution. + But notice that this performs more poorly than using an + ErrorDocument + CGI-script!

+
+ +
Solution:
+ +
+

The first solution has the best performance but less + flexibility, and is less error safe:

+ +
+RewriteEngine on
+RewriteCond   /your/docroot/%{REQUEST_FILENAME} !-f
+RewriteRule   ^(.+)                             http://webserverB.dom/$1
+
+ +

The problem here is that this will only work for pages + inside the DocumentRoot. While you can add more + Conditions (for instance to also handle homedirs, etc.) + there is better variant:

+ +
+RewriteEngine on
+RewriteCond   %{REQUEST_URI} !-U
+RewriteRule   ^(.+)          http://webserverB.dom/$1
+
+ +

This uses the URL look-ahead feature of mod_rewrite. + The result is that this will work for all types of URLs + and is a safe way. But it does a performance impact on + the webserver, because for every request there is one + more internal subrequest. So, if your webserver runs on a + powerful CPU, use this one. If it is a slow machine, use + the first approach or better a ErrorDocument CGI-script.

+
+
+ +
+ +
+ + Archive Access Multiplexer + +
+
Description:
+ +
+

Do you know the great CPAN (Comprehensive Perl Archive + Network) under http://www.perl.com/CPAN? + This does a redirect to one of several FTP servers around + the world which carry a CPAN mirror and is approximately + near the location of the requesting client. Actually this + can be called an FTP access multiplexing service. While + CPAN runs via CGI scripts, how can a similar approach + implemented via mod_rewrite?

+
+ +
Solution:
+ +
+

First we notice that from version 3.0.0 + mod_rewrite can + also use the "ftp:" scheme on redirects. + And second, the location approximation can be done by a + RewriteMap + over the top-level domain of the client. + With a tricky chained ruleset we can use this top-level + domain as a key to our multiplexing map.

+ +
+RewriteEngine on
+RewriteMap    multiplex                txt:/path/to/map.cxan
+RewriteRule   ^/CxAN/(.*)              %{REMOTE_HOST}::$1                 [C]
+RewriteRule   ^.+\.([a-zA-Z]+)::(.*)$  ${multiplex:$1|ftp.default.dom}$2  [R,L]
+
+ +
+##
+##  map.cxan -- Multiplexing Map for CxAN
+##
+
+de        ftp://ftp.cxan.de/CxAN/
+uk        ftp://ftp.cxan.uk/CxAN/
+com       ftp://ftp.cxan.com/CxAN/
+ :
+##EOF##
+
+
+
+ +
+ +
+ + Content Handling + +
+ + Browser Dependent Content + +
+
Description:
+ +
+

At least for important top-level pages it is sometimes + necessary to provide the optimum of browser dependent + content, i.e. one has to provide a maximum version for the + latest Netscape variants, a minimum version for the Lynx + browsers and a average feature version for all others.

+
+ +
Solution:
+ +
+

We cannot use content negotiation because the browsers do + not provide their type in that form. Instead we have to + act on the HTTP header "User-Agent". The following condig + does the following: If the HTTP header "User-Agent" + begins with "Mozilla/3", the page foo.html + is rewritten to foo.NS.html and and the + rewriting stops. If the browser is "Lynx" or "Mozilla" of + version 1 or 2 the URL becomes foo.20.html. + All other browsers receive page foo.32.html. + This is done by the following ruleset:

+ +
+RewriteCond %{HTTP_USER_AGENT}  ^Mozilla/3.*
+RewriteRule ^foo\.html$         foo.NS.html          [L]
+
+RewriteCond %{HTTP_USER_AGENT}  ^Lynx/.*         [OR]
+RewriteCond %{HTTP_USER_AGENT}  ^Mozilla/[12].*
+RewriteRule ^foo\.html$         foo.20.html          [L]
+
+RewriteRule ^foo\.html$         foo.32.html          [L]
+
+
+
+ +
+ +
+ + Dynamic Mirror + +
+
Description:
+ +
+

Assume there are nice webpages on remote hosts we want + to bring into our namespace. For FTP servers we would use + the mirror program which actually maintains an + explicit up-to-date copy of the remote data on the local + machine. For a webserver we could use the program + webcopy which acts similar via HTTP. But both + techniques have one major drawback: The local copy is + always just as up-to-date as often we run the program. It + would be much better if the mirror is not a static one we + have to establish explicitly. Instead we want a dynamic + mirror with data which gets updated automatically when + there is need (updated data on the remote host).

+
+ +
Solution:
+ +
+

To provide this feature we map the remote webpage or even + the complete remote webarea to our namespace by the use + of the Proxy Throughput feature + (flag [P]):

+ +
+RewriteEngine  on
+RewriteBase    /~quux/
+RewriteRule    ^hotsheet/(.*)$  http://www.tstimpreso.com/hotsheet/$1  [P]
+
+ +
+RewriteEngine  on
+RewriteBase    /~quux/
+RewriteRule    ^usa-news\.html$   http://www.quux-corp.com/news/index.html  [P]
+
+
+
+ +
+ +
+ + Reverse Dynamic Mirror + +
+
Description:
+ +
...
+ +
Solution:
+ +
+
+RewriteEngine on
+RewriteCond   /mirror/of/remotesite/$1           -U
+RewriteRule   ^http://www\.remotesite\.com/(.*)$ /mirror/of/remotesite/$1
+
+
+
+ +
+ +
+ + Retrieve Missing Data from Intranet + +
+
Description:
+ +
+

This is a tricky way of virtually running a corporate + (external) Internet webserver + (www.quux-corp.dom), while actually keeping + and maintaining its data on a (internal) Intranet webserver + (www2.quux-corp.dom) which is protected by a + firewall. The trick is that on the external webserver we + retrieve the requested data on-the-fly from the internal + one.

+
+ +
Solution:
+ +
+

First, we have to make sure that our firewall still + protects the internal webserver and that only the + external webserver is allowed to retrieve data from it. + For a packet-filtering firewall we could for instance + configure a firewall ruleset like the following:

+ +
+ALLOW Host www.quux-corp.dom Port >1024 --> Host www2.quux-corp.dom Port 80
+DENY  Host *                 Port *     --> Host www2.quux-corp.dom Port 80
+
+ +

Just adjust it to your actual configuration syntax. + Now we can establish the mod_rewrite + rules which request the missing data in the background + through the proxy throughput feature:

+ +
+RewriteRule ^/~([^/]+)/?(.*)          /home/$1/.www/$2
+RewriteCond %{REQUEST_FILENAME}       !-f
+RewriteCond %{REQUEST_FILENAME}       !-d
+RewriteRule ^/home/([^/]+)/.www/?(.*) http://www2.quux-corp.dom/~$1/pub/$2 [P]
+
+
+
+ +
+ +
+ + Load Balancing + +
+
Description:
+ +
+

Suppose we want to load balance the traffic to + www.foo.com over www[0-5].foo.com + (a total of 6 servers). How can this be done?

+
+ +
Solution:
+ +
+

There are a lot of possible solutions for this problem. + We will discuss first a commonly known DNS-based variant + and then the special one with mod_rewrite:

+ +
    +
  1. + DNS Round-Robin + +

    The simplest method for load-balancing is to use + the DNS round-robin feature of BIND. + Here you just configure www[0-9].foo.com + as usual in your DNS with A(address) records, e.g.

    + +
    +www0   IN  A       1.2.3.1
    +www1   IN  A       1.2.3.2
    +www2   IN  A       1.2.3.3
    +www3   IN  A       1.2.3.4
    +www4   IN  A       1.2.3.5
    +www5   IN  A       1.2.3.6
    +
    + +

    Then you additionally add the following entry:

    + +
    +www    IN  CNAME   www0.foo.com.
    +       IN  CNAME   www1.foo.com.
    +       IN  CNAME   www2.foo.com.
    +       IN  CNAME   www3.foo.com.
    +       IN  CNAME   www4.foo.com.
    +       IN  CNAME   www5.foo.com.
    +       IN  CNAME   www6.foo.com.
    +
    + +

    Notice that this seems wrong, but is actually an + intended feature of BIND and can be used + in this way. However, now when www.foo.com gets + resolved, BIND gives out www0-www6 + - but in a slightly permutated/rotated order every time. + This way the clients are spread over the various + servers. But notice that this not a perfect load + balancing scheme, because DNS resolve information + gets cached by the other nameservers on the net, so + once a client has resolved www.foo.com + to a particular wwwN.foo.com, all + subsequent requests also go to this particular name + wwwN.foo.com. But the final result is + ok, because the total sum of the requests are really + spread over the various webservers.

    +
  2. + +
  3. + DNS Load-Balancing + +

    A sophisticated DNS-based method for + load-balancing is to use the program + lbnamed which can be found at + http://www.stanford.edu/~schemers/docs/lbnamed/lbnamed.html. + It is a Perl 5 program in conjunction with auxilliary + tools which provides a real load-balancing for + DNS.

    +
  4. + +
  5. + Proxy Throughput Round-Robin + +

    In this variant we use mod_rewrite + and its proxy throughput feature. First we dedicate + www0.foo.com to be actually + www.foo.com by using a single

    + +
    +www    IN  CNAME   www0.foo.com.
    +
    + +

    entry in the DNS. Then we convert + www0.foo.com to a proxy-only server, + i.e. we configure this machine so all arriving URLs + are just pushed through the internal proxy to one of + the 5 other servers (www1-www5). To + accomplish this we first establish a ruleset which + contacts a load balancing script lb.pl + for all URLs.

    + +
    +RewriteEngine on
    +RewriteMap    lb      prg:/path/to/lb.pl
    +RewriteRule   ^/(.+)$ ${lb:$1}           [P,L]
    +
    + +

    Then we write lb.pl:

    + +
    +#!/path/to/perl
    +##
    +##  lb.pl -- load balancing script
    +##
    +
    +$| = 1;
    +
    +$name   = "www";     # the hostname base
    +$first  = 1;         # the first server (not 0 here, because 0 is myself)
    +$last   = 5;         # the last server in the round-robin
    +$domain = "foo.dom"; # the domainname
    +
    +$cnt = 0;
    +while (<STDIN>) {
    +    $cnt = (($cnt+1) % ($last+1-$first));
    +    $server = sprintf("%s%d.%s", $name, $cnt+$first, $domain);
    +    print "http://$server/$_";
    +}
    +
    +##EOF##
    +
    + + A last notice: Why is this useful? Seems like + www0.foo.com still is overloaded? The + answer is yes, it is overloaded, but with plain proxy + throughput requests, only! All SSI, CGI, ePerl, etc. + processing is completely done on the other machines. + This is the essential point. +
  6. + +
  7. + Hardware/TCP Round-Robin + +

    There is a hardware solution available, too. Cisco + has a beast called LocalDirector which does a load + balancing at the TCP/IP level. Actually this is some + sort of a circuit level gateway in front of a + webcluster. If you have enough money and really need + a solution with high performance, use this one.

    +
  8. +
+
+
+ +
+ +
+ + New MIME-type, New Service + +
+
Description:
+ +
+

On the net there are a lot of nifty CGI programs. But + their usage is usually boring, so a lot of webmaster + don't use them. Even Apache's Action handler feature for + MIME-types is only appropriate when the CGI programs + don't need special URLs (actually PATH_INFO + and QUERY_STRINGS) as their input. First, + let us configure a new file type with extension + .scgi (for secure CGI) which will be processed + by the popular cgiwrap program. The problem + here is that for instance we use a Homogeneous URL Layout + (see above) a file inside the user homedirs has the URL + /u/user/foo/bar.scgi. But + cgiwrap needs the URL in the form + /~user/foo/bar.scgi/. The following rule + solves the problem:

+ +
+RewriteRule ^/[uge]/([^/]+)/\.www/(.+)\.scgi(.*) ...
+... /internal/cgi/user/cgiwrap/~$1/$2.scgi$3  [NS,T=application/x-http-cgi]
+
+ +

Or assume we have some more nifty programs: + wwwlog (which displays the + access.log for a URL subtree and + wwwidx (which runs Glimpse on a URL + subtree). We have to provide the URL area to these + programs so they know on which area they have to act on. + But usually this ugly, because they are all the times + still requested from that areas, i.e. typically we would + run the swwidx program from within + /u/user/foo/ via hyperlink to

+ +
+/internal/cgi/user/swwidx?i=/u/user/foo/
+
+ +

which is ugly. Because we have to hard-code + both the location of the area + and the location of the CGI inside the + hyperlink. When we have to reorganize the area, we spend a + lot of time changing the various hyperlinks.

+
+ +
Solution:
+ +
+

The solution here is to provide a special new URL format + which automatically leads to the proper CGI invocation. + We configure the following:

+ +
+RewriteRule   ^/([uge])/([^/]+)(/?.*)/\*  /internal/cgi/user/wwwidx?i=/$1/$2$3/
+RewriteRule   ^/([uge])/([^/]+)(/?.*):log /internal/cgi/user/wwwlog?f=/$1/$2$3
+
+ +

Now the hyperlink to search at + /u/user/foo/ reads only

+ +
+HREF="*"
+
+ +

which internally gets automatically transformed to

+ +
+/internal/cgi/user/wwwidx?i=/u/user/foo/
+
+ +

The same approach leads to an invocation for the + access log CGI program when the hyperlink + :log gets used.

+
+
+ +
+ +
+ + On-the-fly Content-Regeneration + +
+
Description:
+ +
+

Here comes a really esoteric feature: Dynamically + generated but statically served pages, i.e. pages should be + delivered as pure static pages (read from the filesystem + and just passed through), but they have to be generated + dynamically by the webserver if missing. This way you can + have CGI-generated pages which are statically served unless + one (or a cronjob) removes the static contents. Then the + contents gets refreshed.

+
+ +
Solution:
+ +
+ This is done via the following ruleset: + +
+RewriteCond %{REQUEST_FILENAME}   !-s
+RewriteRule ^page\.html$          page.cgi   [T=application/x-httpd-cgi,L]
+
+ +

Here a request to page.html leads to a + internal run of a corresponding page.cgi if + page.html is still missing or has filesize + null. The trick here is that page.cgi is a + usual CGI script which (additionally to its STDOUT) + writes its output to the file page.html. + Once it was run, the server sends out the data of + page.html. When the webmaster wants to force + a refresh the contents, he just removes + page.html (usually done by a cronjob).

+
+
+ +
+ +
+ + Document With Autorefresh + +
+
Description:
+ +
+

Wouldn't it be nice while creating a complex webpage if + the webbrowser would automatically refresh the page every + time we write a new version from within our editor? + Impossible?

+
+ +
Solution:
+ +
+

No! We just combine the MIME multipart feature, the + webserver NPH feature and the URL manipulation power of + mod_rewrite. First, we establish a new + URL feature: Adding just :refresh to any + URL causes this to be refreshed every time it gets + updated on the filesystem.

+ +
+RewriteRule   ^(/[uge]/[^/]+/?.*):refresh  /internal/cgi/apache/nph-refresh?f=$1
+
+ +

Now when we reference the URL

+ +
+/u/foo/bar/page.html:refresh
+
+ +

this leads to the internal invocation of the URL

+ +
+/internal/cgi/apache/nph-refresh?f=/u/foo/bar/page.html
+
+ +

The only missing part is the NPH-CGI script. Although + one would usually say "left as an exercise to the reader" + ;-) I will provide this, too.

+ +
+#!/sw/bin/perl
+##
+##  nph-refresh -- NPH/CGI script for auto refreshing pages
+##  Copyright (c) 1997 Ralf S. Engelschall, All Rights Reserved.
+##
+$| = 1;
+
+#   split the QUERY_STRING variable
+@pairs = split(/&/, $ENV{'QUERY_STRING'});
+foreach $pair (@pairs) {
+    ($name, $value) = split(/=/, $pair);
+    $name =~ tr/A-Z/a-z/;
+    $name = 'QS_' . $name;
+    $value =~ s/%([a-fA-F0-9][a-fA-F0-9])/pack("C", hex($1))/eg;
+    eval "\$$name = \"$value\"";
+}
+$QS_s = 1 if ($QS_s eq '');
+$QS_n = 3600 if ($QS_n eq '');
+if ($QS_f eq '') {
+    print "HTTP/1.0 200 OK\n";
+    print "Content-type: text/html\n\n";
+    print "&lt;b&gt;ERROR&lt;/b&gt;: No file given\n";
+    exit(0);
+}
+if (! -f $QS_f) {
+    print "HTTP/1.0 200 OK\n";
+    print "Content-type: text/html\n\n";
+    print "&lt;b&gt;ERROR&lt;/b&gt;: File $QS_f not found\n";
+    exit(0);
+}
+
+sub print_http_headers_multipart_begin {
+    print "HTTP/1.0 200 OK\n";
+    $bound = "ThisRandomString12345";
+    print "Content-type: multipart/x-mixed-replace;boundary=$bound\n";
+    &print_http_headers_multipart_next;
+}
+
+sub print_http_headers_multipart_next {
+    print "\n--$bound\n";
+}
+
+sub print_http_headers_multipart_end {
+    print "\n--$bound--\n";
+}
+
+sub displayhtml {
+    local($buffer) = @_;
+    $len = length($buffer);
+    print "Content-type: text/html\n";
+    print "Content-length: $len\n\n";
+    print $buffer;
+}
+
+sub readfile {
+    local($file) = @_;
+    local(*FP, $size, $buffer, $bytes);
+    ($x, $x, $x, $x, $x, $x, $x, $size) = stat($file);
+    $size = sprintf("%d", $size);
+    open(FP, "&lt;$file");
+    $bytes = sysread(FP, $buffer, $size);
+    close(FP);
+    return $buffer;
+}
+
+$buffer = &readfile($QS_f);
+&print_http_headers_multipart_begin;
+&displayhtml($buffer);
+
+sub mystat {
+    local($file) = $_[0];
+    local($time);
+
+    ($x, $x, $x, $x, $x, $x, $x, $x, $x, $mtime) = stat($file);
+    return $mtime;
+}
+
+$mtimeL = &mystat($QS_f);
+$mtime = $mtime;
+for ($n = 0; $n &lt; $QS_n; $n++) {
+    while (1) {
+        $mtime = &mystat($QS_f);
+        if ($mtime ne $mtimeL) {
+            $mtimeL = $mtime;
+            sleep(2);
+            $buffer = &readfile($QS_f);
+            &print_http_headers_multipart_next;
+            &displayhtml($buffer);
+            sleep(5);
+            $mtimeL = &mystat($QS_f);
+            last;
+        }
+        sleep($QS_s);
+    }
+}
+
+&print_http_headers_multipart_end;
+
+exit(0);
+
+##EOF##
+
+
+
+ +
+ +
+ + Mass Virtual Hosting + +
+
Description:
+ +
+

The VirtualHost feature of Apache is nice + and works great when you just have a few dozens + virtual hosts. But when you are an ISP and have hundreds of + virtual hosts to provide this feature is not the best + choice.

+
+ +
Solution:
+ +
+

To provide this feature we map the remote webpage or even + the complete remote webarea to our namespace by the use + of the Proxy Throughput feature (flag [P]):

+ +
+##
+##  vhost.map
+##
+www.vhost1.dom:80  /path/to/docroot/vhost1
+www.vhost2.dom:80  /path/to/docroot/vhost2
+     :
+www.vhostN.dom:80  /path/to/docroot/vhostN
+
+ +
+##
+##  httpd.conf
+##
+    :
+#   use the canonical hostname on redirects, etc.
+UseCanonicalName on
+
+    :
+#   add the virtual host in front of the CLF-format
+CustomLog  /path/to/access_log  "%{VHOST}e %h %l %u %t \"%r\" %>s %b"
+    :
+
+#   enable the rewriting engine in the main server
+RewriteEngine on
+
+#   define two maps: one for fixing the URL and one which defines
+#   the available virtual hosts with their corresponding
+#   DocumentRoot.
+RewriteMap    lowercase    int:tolower
+RewriteMap    vhost        txt:/path/to/vhost.map
+
+#   Now do the actual virtual host mapping
+#   via a huge and complicated single rule:
+#
+#   1. make sure we don't map for common locations
+RewriteCond   %{REQUEST_URI}  !^/commonurl1/.*
+RewriteCond   %{REQUEST_URI}  !^/commonurl2/.*
+    :
+RewriteCond   %{REQUEST_URI}  !^/commonurlN/.*
+#
+#   2. make sure we have a Host header, because
+#      currently our approach only supports
+#      virtual hosting through this header
+RewriteCond   %{HTTP_HOST}  !^$
+#
+#   3. lowercase the hostname
+RewriteCond   ${lowercase:%{HTTP_HOST}|NONE}  ^(.+)$
+#
+#   4. lookup this hostname in vhost.map and
+#      remember it only when it is a path
+#      (and not "NONE" from above)
+RewriteCond   ${vhost:%1}  ^(/.*)$
+#
+#   5. finally we can map the URL to its docroot location
+#      and remember the virtual host for logging puposes
+RewriteRule   ^/(.*)$   %1/$1  [E=VHOST:${lowercase:%{HTTP_HOST}}]
+    :
+
+
+
+ +
+ +
+ +
+ + Access Restriction + +
+ + Host Deny + +
+
Description:
+ +
+

How can we forbid a list of externally configured hosts + from using our server?

+
+ +
Solution:
+ +
+

For Apache >= 1.3b6:

+ +
+RewriteEngine on
+RewriteMap    hosts-deny  txt:/path/to/hosts.deny
+RewriteCond   ${hosts-deny:%{REMOTE_HOST}|NOT-FOUND} !=NOT-FOUND [OR]
+RewriteCond   ${hosts-deny:%{REMOTE_ADDR}|NOT-FOUND} !=NOT-FOUND
+RewriteRule   ^/.*  -  [F]
+
+ +

For Apache <= 1.3b6:

+ +
+RewriteEngine on
+RewriteMap    hosts-deny  txt:/path/to/hosts.deny
+RewriteRule   ^/(.*)$ ${hosts-deny:%{REMOTE_HOST}|NOT-FOUND}/$1
+RewriteRule   !^NOT-FOUND/.* - [F]
+RewriteRule   ^NOT-FOUND/(.*)$ ${hosts-deny:%{REMOTE_ADDR}|NOT-FOUND}/$1
+RewriteRule   !^NOT-FOUND/.* - [F]
+RewriteRule   ^NOT-FOUND/(.*)$ /$1
+
+ +
+##
+##  hosts.deny
+##
+##  ATTENTION! This is a map, not a list, even when we treat it as such.
+##             mod_rewrite parses it for key/value pairs, so at least a
+##             dummy value "-" must be present for each entry.
+##
+
+193.102.180.41 -
+bsdti1.sdm.de  -
+192.76.162.40  -
+
+
+
+ +
+ +
+ + Proxy Deny + +
+
Description:
+ +
+

How can we forbid a certain host or even a user of a + special host from using the Apache proxy?

+
+ +
Solution:
+ +
+

We first have to make sure mod_rewrite + is below(!) mod_proxy in the Configuration + file when compiling the Apache webserver. This way it gets + called before mod_proxy. Then we + configure the following for a host-dependent deny...

+ +
+RewriteCond %{REMOTE_HOST} ^badhost\.mydomain\.com$
+RewriteRule !^http://[^/.]\.mydomain.com.*  - [F]
+
+ +

...and this one for a user@host-dependent deny:

+ +
+RewriteCond %{REMOTE_IDENT}@%{REMOTE_HOST}  ^badguy@badhost\.mydomain\.com$
+RewriteRule !^http://[^/.]\.mydomain.com.*  - [F]
+
+
+
+ +
+ +
+ + Special Authentication Variant + +
+
Description:
+ +
+

Sometimes a very special authentication is needed, for + instance a authentication which checks for a set of + explicitly configured users. Only these should receive + access and without explicit prompting (which would occur + when using the Basic Auth via mod_auth).

+
+ +
Solution:
+ +
+

We use a list of rewrite conditions to exclude all except + our friends:

+ +
+RewriteCond %{REMOTE_IDENT}@%{REMOTE_HOST} !^friend1@client1.quux-corp\.com$
+RewriteCond %{REMOTE_IDENT}@%{REMOTE_HOST} !^friend2@client2.quux-corp\.com$
+RewriteCond %{REMOTE_IDENT}@%{REMOTE_HOST} !^friend3@client3.quux-corp\.com$
+RewriteRule ^/~quux/only-for-friends/      -                                 [F]
+
+
+
+ +
+ +
+ + Referer-based Deflector + +
+
Description:
+ +
+

How can we program a flexible URL Deflector which acts + on the "Referer" HTTP header and can be configured with as + many referring pages as we like?

+
+ +
Solution:
+ +
+

Use the following really tricky ruleset...

+ +
+RewriteMap  deflector txt:/path/to/deflector.map
+
+RewriteCond %{HTTP_REFERER} !=""
+RewriteCond ${deflector:%{HTTP_REFERER}} ^-$
+RewriteRule ^.* %{HTTP_REFERER} [R,L]
+
+RewriteCond %{HTTP_REFERER} !=""
+RewriteCond ${deflector:%{HTTP_REFERER}|NOT-FOUND} !=NOT-FOUND
+RewriteRule ^.* ${deflector:%{HTTP_REFERER}} [R,L]
+
+ +

... in conjunction with a corresponding rewrite + map:

+ +
+##
+##  deflector.map
+##
+
+http://www.badguys.com/bad/index.html    -
+http://www.badguys.com/bad/index2.html   -
+http://www.badguys.com/bad/index3.html   http://somewhere.com/
+
+ +

This automatically redirects the request back to the + referring page (when "-" is used as the value + in the map) or to a specific URL (when an URL is specified + in the map as the second argument).

+
+
+ +
+ +
+ +
+ diff --git a/docs/manual/rewrite/rewrite_guide_advanced.xml.meta b/docs/manual/rewrite/rewrite_guide_advanced.xml.meta new file mode 100644 index 00000000000..f0c7bf26866 --- /dev/null +++ b/docs/manual/rewrite/rewrite_guide_advanced.xml.meta @@ -0,0 +1,11 @@ + + + + rewrite_guide_advanced + /rewrite/ + .. + + + en + + diff --git a/docs/manual/rewrite/rewrite_intro.html b/docs/manual/rewrite/rewrite_intro.html new file mode 100644 index 00000000000..607f1c18aba --- /dev/null +++ b/docs/manual/rewrite/rewrite_intro.html @@ -0,0 +1,3 @@ +URI: rewrite_intro.html.en +Content-Language: en +Content-type: text/html; charset=ISO-8859-1 diff --git a/docs/manual/rewrite/rewrite_intro.html.en b/docs/manual/rewrite/rewrite_intro.html.en new file mode 100644 index 00000000000..22d0a834c56 --- /dev/null +++ b/docs/manual/rewrite/rewrite_intro.html.en @@ -0,0 +1,117 @@ + + + +Apache mod_rewrite Introduction - Apache HTTP Server + + + + + +
<-
+
+Apache > HTTP Server > Documentation > Version 2.0

Apache mod_rewrite Introduction

+
+

Available Languages:  en 

+
+ +

This document supplements the mod_rewrite +reference documentation. It +describes the basic concepts necessary for use of +mod_rewrite. Other documents go into greater detail, +but this doc should help the beginner get their feet wet. +

+
+ +
top
+
+

Introduction

+

The Apache module mod_rewrite is a very powerful and +sophisticated module which provides a way to do URL manipulations. With +it, you can do nearly all types of URL rewriting that you may need. It +is, however, somewhat complex, and may be intimidating to the beginner. +There is also a tendency to treat rewrite rules as magic incantation, +using them without actually understanding what they do.

+ +

This document attempts to give sufficient background so that what +follows is understood, rather than just copied blindly. +

+
top
+
+

Regular Expressions

+

Basic regex building blocks

+
top
+
+

RewriteRule basics

+

+Basic anatomy of a RewriteRule, with exhaustively annotated simple +examples. +

+
top
+
+

Rewrite Flags

+

Discussion of the flags to RewriteRule, and when and why one might +use them.

+
top
+
+

Rewrite conditions

+

Discussion of RewriteCond, looping, and other related concepts. +

+
top
+
+

Rewrite maps

+

Discussion of RewriteMap, including simple, but heavily annotated, +examples.

+
top
+
+

.htaccess files

+

Discussion of the differences between rewrite rules in httpd.conf and +in .htaccess files.

+
top
+
+

Environment Variables

+ +

This module keeps track of two additional (non-standard) +CGI/SSI environment variables named SCRIPT_URL +and SCRIPT_URI. These contain the +logical Web-view to the current resource, while the +standard CGI/SSI variables SCRIPT_NAME and +SCRIPT_FILENAME contain the physical +System-view.

+ +

Notice: These variables hold the URI/URL as they were +initially requested, i.e., before any +rewriting. This is important because the rewriting process is +primarily used to rewrite logical URLs to physical +pathnames.

+ +

Example

+SCRIPT_NAME=/sw/lib/w3s/tree/global/u/rse/.www/index.html
+SCRIPT_FILENAME=/u/rse/.www/index.html
+SCRIPT_URL=/u/rse/
+SCRIPT_URI=http://en1.engelschall.com/u/rse/
+
+ +
+
+

Available Languages:  en 

+
+ \ No newline at end of file diff --git a/docs/manual/rewrite/rewrite_intro.xml b/docs/manual/rewrite/rewrite_intro.xml new file mode 100644 index 00000000000..b88e4eff286 --- /dev/null +++ b/docs/manual/rewrite/rewrite_intro.xml @@ -0,0 +1,115 @@ + + + + + + + + + + + Apache mod_rewrite Introduction + + +

This document supplements the mod_rewrite +reference documentation. It +describes the basic concepts necessary for use of +mod_rewrite. Other documents go into greater detail, +but this doc should help the beginner get their feet wet. +

+
+ +Module +documentation +Technical details +Practical solutions to common +problems + +
Introduction +

The Apache module mod_rewrite is a very powerful and +sophisticated module which provides a way to do URL manipulations. With +it, you can do nearly all types of URL rewriting that you may need. It +is, however, somewhat complex, and may be intimidating to the beginner. +There is also a tendency to treat rewrite rules as magic incantation, +using them without actually understanding what they do.

+ +

This document attempts to give sufficient background so that what +follows is understood, rather than just copied blindly. +

+
+ +
Regular Expressions +

Basic regex building blocks

+
+ +
RewriteRule basics +

+Basic anatomy of a RewriteRule, with exhaustively annotated simple +examples. +

+
+ +
Rewrite Flags +

Discussion of the flags to RewriteRule, and when and why one might +use them.

+
+ +
Rewrite conditions +

Discussion of RewriteCond, looping, and other related concepts. +

+
+ +
Rewrite maps +

Discussion of RewriteMap, including simple, but heavily annotated, +examples.

+
+ +
.htaccess files +

Discussion of the differences between rewrite rules in httpd.conf and +in .htaccess files.

+
+ +
Environment Variables + +

This module keeps track of two additional (non-standard) +CGI/SSI environment variables named SCRIPT_URL +and SCRIPT_URI. These contain the +logical Web-view to the current resource, while the +standard CGI/SSI variables SCRIPT_NAME and +SCRIPT_FILENAME contain the physical +System-view.

+ +

Notice: These variables hold the URI/URL as they were +initially requested, i.e., before any +rewriting. This is important because the rewriting process is +primarily used to rewrite logical URLs to physical +pathnames.

+ +Example +
+SCRIPT_NAME=/sw/lib/w3s/tree/global/u/rse/.www/index.html
+SCRIPT_FILENAME=/u/rse/.www/index.html
+SCRIPT_URL=/u/rse/
+SCRIPT_URI=http://en1.engelschall.com/u/rse/
+
+
+ +
+ +
+ diff --git a/docs/manual/rewrite/rewrite_intro.xml.meta b/docs/manual/rewrite/rewrite_intro.xml.meta new file mode 100644 index 00000000000..fc07817eff3 --- /dev/null +++ b/docs/manual/rewrite/rewrite_intro.xml.meta @@ -0,0 +1,11 @@ + + + + rewrite_intro + /rewrite/ + .. + + + en + + diff --git a/docs/manual/rewrite/rewrite_tech.html b/docs/manual/rewrite/rewrite_tech.html new file mode 100644 index 00000000000..4e06869fbe0 --- /dev/null +++ b/docs/manual/rewrite/rewrite_tech.html @@ -0,0 +1,3 @@ +URI: rewrite_tech.html.en +Content-Language: en +Content-type: text/html; charset=ISO-8859-1 diff --git a/docs/manual/rewrite/rewrite_tech.html.en b/docs/manual/rewrite/rewrite_tech.html.en new file mode 100644 index 00000000000..6888f4f05f4 --- /dev/null +++ b/docs/manual/rewrite/rewrite_tech.html.en @@ -0,0 +1,166 @@ + + + +Apache mod_rewrite Technical Details - Apache HTTP Server + + + + + +
<-
+
+Apache > HTTP Server > Documentation > Version 2.0

Apache mod_rewrite Technical Details

+
+

Available Languages:  en 

+
+ +

This document discusses some of the technical details of mod_rewrite +and URL matching.

+
+ +
top
+
+

Internal Processing

+ +

The internal processing of this module is very complex but + needs to be explained once even to the average user to avoid + common mistakes and to let you exploit its full + functionality.

+
top
+
+

API Phases

+ +

First you have to understand that when Apache processes a + HTTP request it does this in phases. A hook for each of these + phases is provided by the Apache API. Mod_rewrite uses two of + these hooks: the URL-to-filename translation hook which is + used after the HTTP request has been read but before any + authorization starts and the Fixup hook which is triggered + after the authorization phases and after the per-directory + config files (.htaccess) have been read, but + before the content handler is activated.

+ +

So, after a request comes in and Apache has determined the + corresponding server (or virtual server) the rewriting engine + starts processing of all mod_rewrite directives from the + per-server configuration in the URL-to-filename phase. A few + steps later when the final data directories are found, the + per-directory configuration directives of mod_rewrite are + triggered in the Fixup phase. In both situations mod_rewrite + rewrites URLs either to new URLs or to filenames, although + there is no obvious distinction between them. This is a usage + of the API which was not intended to be this way when the API + was designed, but as of Apache 1.x this is the only way + mod_rewrite can operate. To make this point more clear + remember the following two points:

+ +
    +
  1. Although mod_rewrite rewrites URLs to URLs, URLs to + filenames and even filenames to filenames, the API + currently provides only a URL-to-filename hook. In Apache + 2.0 the two missing hooks will be added to make the + processing more clear. But this point has no drawbacks for + the user, it is just a fact which should be remembered: + Apache does more in the URL-to-filename hook than the API + intends for it.
  2. + +
  3. + Unbelievably mod_rewrite provides URL manipulations in + per-directory context, i.e., within + .htaccess files, although these are reached + a very long time after the URLs have been translated to + filenames. It has to be this way because + .htaccess files live in the filesystem, so + processing has already reached this stage. In other + words: According to the API phases at this time it is too + late for any URL manipulations. To overcome this chicken + and egg problem mod_rewrite uses a trick: When you + manipulate a URL/filename in per-directory context + mod_rewrite first rewrites the filename back to its + corresponding URL (which is usually impossible, but see + the RewriteBase directive below for the + trick to achieve this) and then initiates a new internal + sub-request with the new URL. This restarts processing of + the API phases. + +

    Again mod_rewrite tries hard to make this complicated + step totally transparent to the user, but you should + remember here: While URL manipulations in per-server + context are really fast and efficient, per-directory + rewrites are slow and inefficient due to this chicken and + egg problem. But on the other hand this is the only way + mod_rewrite can provide (locally restricted) URL + manipulations to the average user.

    +
  4. +
+ +

Don't forget these two points!

+
top
+
+

Ruleset Processing

+ +

Now when mod_rewrite is triggered in these two API phases, it + reads the configured rulesets from its configuration + structure (which itself was either created on startup for + per-server context or during the directory walk of the Apache + kernel for per-directory context). Then the URL rewriting + engine is started with the contained ruleset (one or more + rules together with their conditions). The operation of the + URL rewriting engine itself is exactly the same for both + configuration contexts. Only the final result processing is + different.

+ +

The order of rules in the ruleset is important because the + rewriting engine processes them in a special (and not very + obvious) order. The rule is this: The rewriting engine loops + through the ruleset rule by rule (RewriteRule directives) and + when a particular rule matches it optionally loops through + existing corresponding conditions (RewriteCond + directives). For historical reasons the conditions are given + first, and so the control flow is a little bit long-winded. See + Figure 1 for more details.

+

+ [Needs graphics capability to display]
+ Figure 1:The control flow through the rewriting ruleset +

+

As you can see, first the URL is matched against the + Pattern of each rule. When it fails mod_rewrite + immediately stops processing this rule and continues with the + next rule. If the Pattern matches, mod_rewrite looks + for corresponding rule conditions. If none are present, it + just substitutes the URL with a new value which is + constructed from the string Substitution and goes on + with its rule-looping. But if conditions exist, it starts an + inner loop for processing them in the order that they are + listed. For conditions the logic is different: we don't match + a pattern against the current URL. Instead we first create a + string TestString by expanding variables, + back-references, map lookups, etc. and then we try + to match CondPattern against it. If the pattern + doesn't match, the complete set of conditions and the + corresponding rule fails. If the pattern matches, then the + next condition is processed until no more conditions are + available. If all conditions match, processing is continued + with the substitution of the URL with + Substitution.

+ +
+
+

Available Languages:  en 

+
+ \ No newline at end of file diff --git a/docs/manual/rewrite/rewrite_tech.xml b/docs/manual/rewrite/rewrite_tech.xml new file mode 100644 index 00000000000..f5b1d6a82be --- /dev/null +++ b/docs/manual/rewrite/rewrite_tech.xml @@ -0,0 +1,169 @@ + + + + + + + + + + + Apache mod_rewrite Technical Details + + +

This document discusses some of the technical details of mod_rewrite +and URL matching.

+
+Module +documentation +mod_rewrite +introduction +Practical solutions to common +problems + +
Internal Processing + +

The internal processing of this module is very complex but + needs to be explained once even to the average user to avoid + common mistakes and to let you exploit its full + functionality.

+
+ +
API Phases + +

First you have to understand that when Apache processes a + HTTP request it does this in phases. A hook for each of these + phases is provided by the Apache API. Mod_rewrite uses two of + these hooks: the URL-to-filename translation hook which is + used after the HTTP request has been read but before any + authorization starts and the Fixup hook which is triggered + after the authorization phases and after the per-directory + config files (.htaccess) have been read, but + before the content handler is activated.

+ +

So, after a request comes in and Apache has determined the + corresponding server (or virtual server) the rewriting engine + starts processing of all mod_rewrite directives from the + per-server configuration in the URL-to-filename phase. A few + steps later when the final data directories are found, the + per-directory configuration directives of mod_rewrite are + triggered in the Fixup phase. In both situations mod_rewrite + rewrites URLs either to new URLs or to filenames, although + there is no obvious distinction between them. This is a usage + of the API which was not intended to be this way when the API + was designed, but as of Apache 1.x this is the only way + mod_rewrite can operate. To make this point more clear + remember the following two points:

+ +
    +
  1. Although mod_rewrite rewrites URLs to URLs, URLs to + filenames and even filenames to filenames, the API + currently provides only a URL-to-filename hook. In Apache + 2.0 the two missing hooks will be added to make the + processing more clear. But this point has no drawbacks for + the user, it is just a fact which should be remembered: + Apache does more in the URL-to-filename hook than the API + intends for it.
  2. + +
  3. + Unbelievably mod_rewrite provides URL manipulations in + per-directory context, i.e., within + .htaccess files, although these are reached + a very long time after the URLs have been translated to + filenames. It has to be this way because + .htaccess files live in the filesystem, so + processing has already reached this stage. In other + words: According to the API phases at this time it is too + late for any URL manipulations. To overcome this chicken + and egg problem mod_rewrite uses a trick: When you + manipulate a URL/filename in per-directory context + mod_rewrite first rewrites the filename back to its + corresponding URL (which is usually impossible, but see + the RewriteBase directive below for the + trick to achieve this) and then initiates a new internal + sub-request with the new URL. This restarts processing of + the API phases. + +

    Again mod_rewrite tries hard to make this complicated + step totally transparent to the user, but you should + remember here: While URL manipulations in per-server + context are really fast and efficient, per-directory + rewrites are slow and inefficient due to this chicken and + egg problem. But on the other hand this is the only way + mod_rewrite can provide (locally restricted) URL + manipulations to the average user.

    +
  4. +
+ +

Don't forget these two points!

+
+ +
Ruleset Processing + +

Now when mod_rewrite is triggered in these two API phases, it + reads the configured rulesets from its configuration + structure (which itself was either created on startup for + per-server context or during the directory walk of the Apache + kernel for per-directory context). Then the URL rewriting + engine is started with the contained ruleset (one or more + rules together with their conditions). The operation of the + URL rewriting engine itself is exactly the same for both + configuration contexts. Only the final result processing is + different.

+ +

The order of rules in the ruleset is important because the + rewriting engine processes them in a special (and not very + obvious) order. The rule is this: The rewriting engine loops + through the ruleset rule by rule (RewriteRule directives) and + when a particular rule matches it optionally loops through + existing corresponding conditions (RewriteCond + directives). For historical reasons the conditions are given + first, and so the control flow is a little bit long-winded. See + Figure 1 for more details.

+

+ [Needs graphics capability to display]
+ Figure 1:The control flow through the rewriting ruleset +

+

As you can see, first the URL is matched against the + Pattern of each rule. When it fails mod_rewrite + immediately stops processing this rule and continues with the + next rule. If the Pattern matches, mod_rewrite looks + for corresponding rule conditions. If none are present, it + just substitutes the URL with a new value which is + constructed from the string Substitution and goes on + with its rule-looping. But if conditions exist, it starts an + inner loop for processing them in the order that they are + listed. For conditions the logic is different: we don't match + a pattern against the current URL. Instead we first create a + string TestString by expanding variables, + back-references, map lookups, etc. and then we try + to match CondPattern against it. If the pattern + doesn't match, the complete set of conditions and the + corresponding rule fails. If the pattern matches, then the + next condition is processed until no more conditions are + available. If all conditions match, processing is continued + with the substitution of the URL with + Substitution.

+ +
+ + +
+ diff --git a/docs/manual/rewrite/rewrite_tech.xml.meta b/docs/manual/rewrite/rewrite_tech.xml.meta new file mode 100644 index 00000000000..afe8c36e2b7 --- /dev/null +++ b/docs/manual/rewrite/rewrite_tech.xml.meta @@ -0,0 +1,11 @@ + + + + rewrite_tech + /rewrite/ + .. + + + en + +