From: Chris Pepper
[PT]
flag when
- additionally using [PT]
flag if
+ using .htaccess
context instead
+ to work in .htaccess
context instead
of per-server context. Always try to understand what a
- particular ruleset really does before you use it. This
+ particular ruleset really does before you use it; this
avoids many problems.We want to create a homogeneous and consistent URL - layout over all WWW servers on a Intranet webcluster, i.e. - all URLs (per definition server local and thus server - dependent!) become actually server independent! - What we want is to give the WWW namespace a consistent - server-independent layout: no URL should have to include - any physically correct target server. The cluster itself - should drive us automatically to the physical target - host.
+ layout across all WWW servers on an Intranet web cluster, i.e., + all URLs (by definition server-local and thus + server-dependent!) become server independent! + What we want is to give the WWW namespace a single consistent + layout: no URL should refer to + any particular target server. The cluster itself + should connect users automatically to a physical target + host as needed, invisibly.First, the knowledge of the target servers come from - (distributed) external maps which contain information - where our users, groups and entities stay. The have the - form
+First, the knowledge of the target servers comes from + (distributed) external maps which contain information on + where our users, groups, and entities reside. They have the + form:
user1 server_of_user1 @@ -87,7 +87,7 @@ user2 server_of_user2We put them into files
+ of the forms:map.xxx-to-host
. Second we need to instruct all servers to redirect URLs - of the forms- /u/user/anypath @@ -103,8 +103,8 @@ http://physical-host/g/group/anypath http://physical-host/e/entity/anypathwhen the URL is not locally valid to a server. The - following ruleset does this for us by the help of the map +
when any URL path need not be valid on every server. The + following ruleset does this for us with the help of the map files (assuming that server0 is a default server which will be used if a user has no entry in the map):
@@ -135,9 +135,9 @@ RewriteRule ^/([uge])/([^/]+)/([^.]+.+) /$1/$2/.www/$3\
Some sites with thousands of users usually use a - structured homedir layout, i.e. each homedir is in a - subdirectory which begins for instance with the first +
Some sites with thousands of users use a
+ structured homedir layout, i.e. each homedir is in a
+ subdirectory which begins (for instance) with the first
character of the username. So, /~foo/anypath
is /home/f/foo/.www/anypath
while /~bar/anypath
is
@@ -148,7 +148,7 @@ RewriteRule ^/([uge])/([^/]+)/([^.]+.+) /$1/$2/.www/$3\
We use the following ruleset to expand the tilde URLs - into exactly the above layout.
+ into the above layout.RewriteEngine on @@ -174,7 +174,7 @@ RewriteRule ^/~(([a-z])[a-z0-9]+)(.*) /home/$2net.sw is my archive of freely available Unix software packages, which I started to collect in 1992. It is both my hobby - and job to to this, because while I'm studying computer + and job to do this, because while I'm studying computer science I have also worked for many years as a system and network administrator in my spare time. Every week I need some sort of software so I created a deep hierarchy of @@ -203,11 +203,11 @@ drwxrwxr-x 10 netsw users 512 Jul 9 14:08 X11/ the world via a nice Web interface. "Nice" means that I wanted to offer an interface where you can browse directly through the archive hierarchy. And "nice" means - that I didn't wanted to change anything inside this + that I didn't want to change anything inside this hierarchy - not even by putting some CGI scripts at the - top of it. Why? Because the above structure should be - later accessible via FTP as well, and I didn't want any - Web or CGI stuff to be there. + top of it. Why? Because the above structure should later be + accessible via FTP as well, and I didn't want any + Web or CGI stuff mixed in there.
The DATA/
subdirectory holds the above
- directory structure, i.e. the real
- net.sw stuff and gets
+ directory structure, i.e. the real
+ net.sw stuff, and gets
automatically updated via rdist
from time to
time. The second part of the problem remains: how to link
these two structures together into one smooth-looking URL
@@ -245,7 +245,7 @@ drwxr-xr-x 2 netsw users 512 Jul 8 23:47 netsw-img/
for the various URLs. Here is the solution: first I put
the following into the per-directory configuration file
in the /net.sw/
to the internal path
/e/netsw
:
L
(last) flag and no
- substitution field ('-
') in the forth part-
') in the fourth part
!
(not) character and
the C
(chain) flag at the first rule
@@ -310,7 +310,7 @@ RewriteRule (.*) netsw-lsdir.cgi/$1
A typical FAQ about URL rewriting is how to redirect
failing requests on webserver A to webserver B. Usually
this is done via
The first solution has the best performance but less - flexibility, and is less error safe:
+ flexibility, and is less safe:RewriteEngine on @@ -341,7 +341,7 @@ RewriteRule ^(.+) http://webserverBThe problem here is that this will only work for pages inside theDocumentRoot . While you can add more Conditions (for instance to also handle homedirs, etc.) - there is better variant: + there is a better variant: RewriteEngine on @@ -351,12 +351,12 @@ RewriteRule ^(.+) http://webserverB.dom/$1This uses the URL look-ahead feature of
+ the first approach or better anmod_rewrite . The result is that this will work for all types of URLs - and is a safe way. But it does a performance impact on - the webserver, because for every request there is one + and is safe. But it does have a performance impact on + the web server, because for every request there is one more internal subrequest. So, if your webserver runs on a powerful CPU, use this one. If it is a slow machine, use - the first approach or better aErrorDocument CGI-script.ErrorDocument CGI script.
First we notice that from version 3.0.0 +
First we notice that as of version 3.0.0,
ftp:
" scheme on redirects.
And second, the location approximation can be done by a
@@ -430,9 +430,9 @@ com ftp://ftp.cxan.com/CxAN/
At least for important top-level pages it is sometimes necessary to provide the optimum of browser dependent - content, i.e. one has to provide a maximum version for the - latest Netscape variants, a minimum version for the Lynx - browsers and a average feature version for all others.
+ content, i.e., one has to provide one version for + current browsers, a different version for the Lynx and text-mode + browsers, and another for other browsers.We cannot use content negotiation because the browsers do
not provide their type in that form. Instead we have to
- act on the HTTP header "User-Agent". The following condig
+ act on the HTTP header "User-Agent". The following config
does the following: If the HTTP header "User-Agent"
begins with "Mozilla/3", the page foo.html
- is rewritten to foo.NS.html
and and the
+ is rewritten to foo.NS.html
and the
rewriting stops. If the browser is "Lynx" or "Mozilla" of
- version 1 or 2 the URL becomes foo.20.html
.
+ version 1 or 2, the URL becomes foo.20.html
.
All other browsers receive page foo.32.html
.
- This is done by the following ruleset:
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/3.* @@ -477,13 +477,13 @@ RewriteRule ^foo\.html$ foo.32.html [L themirror
program which actually maintains an explicit up-to-date copy of the remote data on the local machine. For a webserver we could use the program -webcopy
which acts similar via HTTP. But both +webcopy
which runs via HTTP. But both techniques have one major drawback: The local copy is - always just as up-to-date as often we run the program. It + always just as up-to-date as the last time we ran the program. It would be much better if the mirror is not a static one we have to establish explicitly. Instead we want a dynamic mirror with data which gets updated automatically when - there is need (updated data on the remote host). + there is need (updated on the remote host).
The simplest method for load-balancing is to use
the DNS round-robin feature of BIND
.
Here you just configure www[0-9].foo.com
- as usual in your DNS with A(address) records, e.g.
www0 IN A 1.2.3.1 @@ -621,29 +621,25 @@ www5 IN A 1.2.3.6Then you additionally add the following entry:
- -www IN CNAME www0.foo.com. - IN CNAME www1.foo.com. - IN CNAME www2.foo.com. - IN CNAME www3.foo.com. - IN CNAME www4.foo.com. - IN CNAME www5.foo.com. - IN CNAME www6.foo.com. +www IN A 1.2.3.1 +www IN A 1.2.3.2 +www IN A 1.2.3.3 +www IN A 1.2.3.4 +www IN A 1.2.3.5Notice that this seems wrong, but is actually an - intended feature of
BIND
and can be used - in this way. However, now whenwww.foo.com
gets - resolved,BIND
gives outwww0-www6
+Now when
@@ -674,7 +670,7 @@ www IN CNAME www0.foo.com.www.foo.com
gets + resolved,BIND
gives outwww0-www5
- but in a slightly permutated/rotated order every time. This way the clients are spread over the various - servers. But notice that this not a perfect load - balancing scheme, because DNS resolve information + servers. But notice that this is not a perfect load + balancing scheme, because DNS resolution information gets cached by the other nameservers on the net, so once a client has resolvedwww.foo.com
- to a particularwwwN.foo.com
, all + to a particularwwwN.foo.com
, all its subsequent requests also go to this particular namewwwN.foo.com
. But the final result is - ok, because the total sum of the requests are really + okay, because the requests are collectively spread over the various webservers.entry in the DNS. Then we convert
www0.foo.com
to a proxy-only server, - i.e. we configure this machine so all arriving URLs + i.e., we configure this machine so all arriving URLs are just pushed through the internal proxy to one of the 5 other servers (www1-www5
). To accomplish this we first establish a ruleset which @@ -753,7 +749,7 @@ while (<STDIN>) { let us configure a new file type with extension.scgi
(for secure CGI) which will be processed by the popularcgiwrap
program. The problem - here is that for instance we use a Homogeneous URL Layout + here is that for instance if we use a Homogeneous URL Layout (see above) a file inside the user homedirs has the URL/u/user/foo/bar.scgi
. Butcgiwrap
needs the URL in the form @@ -767,12 +763,12 @@ RewriteRule ^/[uge]/([^/]+)/\.www/(.+)\.scgi(.*) ...Or assume we have some more nifty programs:
@@ -829,7 +825,7 @@ HREF="*"wwwlog
(which displays the -access.log
for a URL subtree and +access.log
for a URL subtree) andwwwidx
(which runs Glimpse on a URL subtree). We have to provide the URL area to these programs so they know on which area they have to act on. - But usually this ugly, because they are all the times - still requested from that areas, i.e. typically we would + But usually this is ugly, because they are all the times + still requested from that areas, i.e., typically we would run theswwidx
program from within/u/user/foo/
via hyperlink to
Here comes a really esoteric feature: Dynamically - generated but statically served pages, i.e. pages should be + generated but statically served pages, i.e., pages should be delivered as pure static pages (read from the filesystem and just passed through), but they have to be generated dynamically by the webserver if missing. This way you can @@ -1093,7 +1089,7 @@ RewriteCond ${lowercase:%{HTTP_HOST}|NONE} ^(.+)$ RewriteCond ${vhost:%1} ^(/.*)$ # # 5. finally we can map the URL to its docroot location -# and remember the virtual host for logging puposes +# and remember the virtual host for logging purposes RewriteRule ^/(.*)$ %1/$1 [E=VHOST:${lowercase:%{HTTP_HOST}}] :