- The -configdir parameter in awstats_updateall.pl is now working coorectly.
- Fix failing call to ipv6 plugin.
- Pb with some regex value used in the new REGEX fields added in 5.6.
+- Better support for WebStar log files.
- Count for add to favourites is done on hits to favicon.ico for IE only. This
avoid counting wrong "Add" done by browsers that hit the file even when no
add is done. Value reported is the (count for IE) / (ratio of IE among all
New features/improvements:
- Added 'rawlog' plugin to add a form to show raw log content with filter
capabilities.
+- Added a dynamic exclude filter on CGI full list report pages.
- Added maillogconvert.pl for analyzing mail log files (better support
for sendmail, postfix and qmail log files).
- Added -addfilenum option in logresolvemerge.pl
<br># Note: Use space between each value.
<br># Note: You can use regular expression values writing value with REGEX[value].
<br># Change : Effective for new updates only
-<br># Example: "127.0.0.1 REGEX[^192\.168\.] REGEX[^10\.0\.]"
+<br># Example: "127.0.0.1 REGEX[^192\.168\.] REGEX[^10\.0\.0\.]"
<br># Example: "localhost REGEX[^.*\.localdomain$]"
<br># Default: ""
<br>#
<br># Use SkipFiles to ignore access to URLs that match one of following entries.
<br># You can enter a list of not important URLs (like framed menus, hidden pages,
<br># etc...) to exclude them from statistics. You must enter here exact relative
-<br># URL as found in log file or some REGEX values.
-<br># For example, to ignore /badpage.html, just add "/badpage.html", to ignore
+<br># URL as found in log file, or a matching REGEX value.
+<br># For example, to ignore /badpage.html, just add "/badpage.html". To ignore
<br># all pages in a particular directory, add "REGEX[^\/directorytoexclude]".
<br># The opposite parameter of "SkipFiles" is "OnlyFiles".
-<br># Note: This parameter is not case sensitive.
-<br># Note: Use space between each value and do not remove default values.
+<br># Note: Use space between each value. This parameter is not case sensitive.
<br># Note: You can use regular expression values writing value with REGEX[value].
<br># Change : Effective for new updates only
<br># Example: "/badpage.html REGEX[^\/excludedirectory]"
<br># If DNS lookup is already done in your log file, you must enter here hostname
<br># criteria, else enter ip address criteria.
<br># The opposite parameter of "OnlyHosts" is "SkipHosts".
-<br># Note: This parameter is not case sensitive.
-<br># Note: Use space between each value.
+<br># Note: Use space between each value. This parameter is not case sensitive.
<br># Note: You can use regular expression values writing value with REGEX[value].
<br># Change : Effective for new updates only
-<br># Example: "127.0.0.1 REGEX[^192\.168\.] REGEX[^10\.0\.]"
+<br># Example: "127.0.0.1 REGEX[^192\.168\.] REGEX[^10\.0\.0\.]"
<br># Default: ""
<br>#
<br>OnlyHosts=""
<br># match a particular string, like a particular directory, you can add this
<br># directory name in this parameter.
<br># The opposite parameter of "OnlyFiles" is "SkipFiles".
-<br># Note: This parameter is not case sensitive.
-<br># Note: Use space between each value and do not remove default values
+<br># Note: Use space between each value. This parameter is not case sensitive.
<br># Note: You can use regular expression values writing value with REGEX[value].
<br># Change : Effective for new updates only
<br># Example: "REGEX[marketing_directory] REGEX[office\/.*\.(csv|sxw)$]"
<br># images extensions as they are hit downloaded that must be counted but they
<br># are not viewed pages. URLs with such extensions are not included in the TOP
<br># Pages/URL report.
-<br># Note: If you want to exclude your own URLs from stats (No Pages and no Hits
-<br># reported), you should use SkipFiles parameter instead.
-<br># Example: ""
+<br># Note: If you want to exclude particular URLs from stats (No Pages and no
+<br># Hits reported), you must use SkipFiles parameter.
<br># Example: "css js class gif jpg jpeg png bmp zip arj gz z wav mp3 wma mpg"
+<br># Example: ""
<br># Default: "css js class gif jpg jpeg png bmp"
<br>#
<br>NotPageList="css js class gif jpg jpeg png bmp"
<br>\r
* Step 1-6<br>\r
Edit this new config file with your own setup :<br>\r
-- Change <a href="awstats_config.html#LogType">LogType</a> value with "W" for analyzing\r
-web server log files, "M" for mail log files, "F" for ftp log files, "O" otherwise.<br>\r
- Change <a href="awstats_config.html#LogFile">LogFile</a> value with full path of your web server log file (You\r
can also use a relative path from your awstats.pl directory).<br>\r
+- Change <a href="awstats_config.html#LogType">LogType</a> value with "W" for analyzing\r
+web server log files, "M" for mail log files, "F" for ftp log files, "O" otherwise.<br>\r
- Check if <a href="awstats_config.html#LogFormat">LogFormat</a> has the value "1" (it means "NCSA apache combined/ELF/XLF log format").<br>\r
- Change <a href="awstats_config.html#DirIcons">DirIcons</a> parameter to reflect relative path of icon directory.<br>\r
- Edit <a href="awstats_config.html#SiteDomain">SiteDomain</a> parameter with the main domain name or the intranet \r
<br>\r
* Step 1-5<br>\r
Edit this new config file with your own setup :<br>\r
-- Change <a href="awstats_config.html#LogType">LogType</a> value with "W" for analyzing\r
-web server log files, "M" for mail log files, "F" for ftp log files, "O" otherwise.<br>\r
- Change <a href="awstats_config.html#LogFile">LogFile</a> value with full path of your web server log file (You\r
can also use a relative path from your awstats.pl directory).<br>\r
+- Change <a href="awstats_config.html#LogType">LogType</a> value with "W" for analyzing\r
+web server log files, "M" for mail log files, "F" for ftp log files, "O" otherwise.<br>\r
- Change <a href="awstats_config.html#LogFormat">LogFormat</a> to value "2" (it means "IIS Extended W3C log format").<br>\r
- Change <a href="awstats_config.html#DirIcons">DirIcons</a> parameter to reflect relative path of icon directory.<br>\r
- Edit <a href="awstats_config.html#SiteDomain">SiteDomain</a> parameter with the main domain name or the intranet\r
section <a href="#READ">Read Statistics</a>), you should run an update process from a scheduler (command is same than\r
first process) frequently.<br>\r
<br>\r
-You can add instructions in your <b>crontab</b> (Unix/Linux) or your <b>task scheduler</b> (for\r
-Windows), to launch frequently this Awstats update process.<br>\r
-For sites with:<br>\r
-- 10,000 visitors a month Launch AWStats once a day<br>\r
-- 50,000 visitors a month Launch AWStats once every 4 hours<br>\r
-- 250,000 visitors a month Launch AWStats once an hour<br>\r
-- 1,000,000 visitors a month Launch AWStats once an hour<br>\r
-This is ABSOLUTELY necessary to keep good performances.<br>\r
-See AWStats <a href="awstats_benchmark.html">Benchmark page</a> for more accurate information.<br>\r
-<br>\r
-!!! Warning, if you don't use (or can't use with IIS) the <a href="awstats_config.html#PurgeLogFile">PurgeLogFile</a> parameter,\r
-it's very important that you don't forget to purge/rotate your log file yourself (or setup your web server to do it)\r
-frequently (You can find help for this on <a href="awstats_faq.html#CRONTAB">FAQ-SET550</a>).\r
-Even if AWStats never analyzes twice the same log record, the more often you clean your log file, the\r
-faster AWStats will be.<br>\r
+You have two choice:<br>\r
+- Include the update in your <b>logrotate</b> process. See <a href="awstats_faq.html#ROTATE">FAQ-COM120</a> for this.<br>\r
+- Or add instructions in your <b>crontab</b> (Unix/Linux) or your <b>task scheduler</b> (for\r
+Windows), to launch frequently this Awstats update process. See <a href="awstats_faq.html#CRONTAB">FAQ-COM130</a> for this.<br><br>\r
+See AWStats <a href="awstats_benchmark.html">Benchmark page</a> for recommanded update/logrotate frequency.<br>\r
<br>\r
\r
<br>\r
\r
<br>\r
<br><a name="awstats_updateall"><H2 style="font: 22px arial,helvetica,sanserif color: #606060"><u>awstats_updateall.pl</u></H2></a>\r
-<br>awstats_updateall launches update process for all AWStats config files found in\r
-<br>a particular directory, so you can easily setup a cron/scheduler job.\r
-<br>This directory is by default /etc/awstats.\r
+<br>awstats_updateall launches update process for all AWStats config files (except\r
+<br>awstats.model.conf) found in a particular directory, so you can easily setup a\r
+<br>cron/scheduler job. The scanned directory is by default /etc/awstats.\r
<br>\r
<br>Usage: awstats_updateall.pl now [options]\r
<br>\r
<br>Where options are:\r
<br> -awstatsprog=pathtoawstatspl\r
-<br> -confdir=confdirtoscan\r
+<br> -configdir=confdirtoscan\r
\r
<br>\r
<br>\r
<br> and awstatsbuildstaticpages_options can be\r
<br> -awstatsprog=pathtoawstatspl gives AWStats software (awstats.pl) path\r
<br> -dir=outputdir to set output directory for generated pages\r
-<br> -date used to add build date in built pages file name\r
+<br> -date Used to add build date in built pages file name\r
+<br> -staticlinksext=xxx For pages with .xxx extension instead of .html\r
+<br> -buildpdf[=pathtohtmldoc] Build a PDF file after building HTML pages.\r
+<br> Output directory must contains icon directory\r
+<br> when this option is used (need 'htmldoc').\r
<br>\r
<br>New versions and FAQ at http://awstats.sourceforge.net\r
<br>\r
<br> If this file is an AWStats history file then urlaliasbuilder will use the\r
<br> SIDER section of this file as its input URL's list.\r
<br> -urlaliasfile=Output urlalias file to build\r
-<br> -overwrite\r
-<br> -secure\r
+<br> -overwrite Overwrite output file if exists\r
+<br> -secure Use https protocol\r
<br>\r
<br>Example: urlaliasbuilder.pl -site=www.someotherhost.com\r
<br>\r
<br> logresolvemerge.pl [options] file1 ... filen\r
<br> logresolvemerge.pl [options] *.*\r
<br>Options:\r
-<br> -dnslookup make a reverse DNS lookup on IP adresses (not done by default)\r
-<br> -showsteps to add benchmark informations every 5000 lines processed\r
+<br> -dnslookup make a reverse DNS lookup on IP adresses\r
+<br> -dnscache=file make DNS lookup from cache file first before network lookup\r
+<br> -showsteps print on stderr benchmark information every 8192 lines\r
+<br> -addfilenum if used with several files, file number can be added in first\r
+<br> field of output file.\r
<br>\r
<br>This runs logresolvemerge in command line to open one or several web\r
<br>server log files to merge them (sorted on date) and/or to make a reverse\r
<br>(but that is the case in all web server log files).\r
<br>logresolvemerge is particularly usefull when you want to merge large log\r
<br>files in a fast process and with a low use of memory getting records in a\r
-<br>chronological order from a pipe (for use by a log analyzer).\r
+<br>chronological order through a pipe (for use by third tool, like log analyzer).\r
<br>\r
<br>Now supports/detects:\r
<br> Automatic detection of log format\r
while ($LogFile =~ /%([ymdhwYMDHWNSO]+)-(\d+)/) {
my $timetag=$1;
my $timephase=$2;
- if ($Debug) { debug(" Found a time phase of $timephase hour in log file name",1); }
+ if ($Debug) { debug(" Found a time tag '$timetag' with a phase of '$timephase' hour in log file name",1); }
# Get older time
my ($oldersec,$oldermin,$olderhour,$olderday,$oldermonth,$olderyear,$olderwday,$olderyday) = localtime($starttime-($timephase*3600));
my $olderweekofmonth=int($olderday/7);
if ($oldermin < 10) { $oldermin = "0$oldermin"; }
if ($oldersec < 10) { $oldersec = "0$oldersec"; }
# Replace tag with new value
- if ($timetag =~ /YYYY/i) { $LogFile =~ s/%YYYY-$timephase/$olderyear/ig; next; }
- if ($timetag =~ /YY/i) { $LogFile =~ s/%YY-$timephase/$oldersmallyear/ig; next; }
- if ($timetag =~ /MM/i) { $LogFile =~ s/%MM-$timephase/$oldermonth/ig; next; }
- if ($timetag =~ /MO/i) { $LogFile =~ s/%MO-$timephase/$MonthNumLibEn{$oldermonth}/ig; next; }
- if ($timetag =~ /DD/i) { $LogFile =~ s/%DD-$timephase/$olderday/ig; next; }
- if ($timetag =~ /HH/i) { $LogFile =~ s/%HH-$timephase/$olderhour/ig; next; }
- if ($timetag =~ /NS/i) { $LogFile =~ s/%NS-$timephase/$olderns/ig; next; }
- if ($timetag =~ /WM/) { $LogFile =~ s/%WM-$timephase/$olderweekofmonth/g; next; }
- if ($timetag =~ /Wm/) { my $olderweekofmonth0=$olderweekofmonth-1; $LogFile =~ s/%Wm-$timephase/$olderweekofmonth0/g; next; }
- if ($timetag =~ /WY/) { $LogFile =~ s/%WY-$timephase/$olderweekofyear/g; next; }
- if ($timetag =~ /Wy/) { my $olderweekofyear0=sprintf("%02d",$olderweekofyear-1); $LogFile =~ s/%Wy-$timephase/$olderweekofyear0/g; next; }
- if ($timetag =~ /DW/) { $LogFile =~ s/%DW-$timephase/$olderwday/g; next; }
- if ($timetag =~ /Dw/) { my $olderwday0=$olderwday-1; $LogFile =~ s/%Dw-$timephase/$olderwday0/g; next; }
+ if ($timetag eq 'YYYY') { $LogFile =~ s/%YYYY-$timephase/$olderyear/ig; next; }
+ if ($timetag eq 'YY') { $LogFile =~ s/%YY-$timephase/$oldersmallyear/ig; next; }
+ if ($timetag eq 'MM') { $LogFile =~ s/%MM-$timephase/$oldermonth/ig; next; }
+ if ($timetag eq 'MO') { $LogFile =~ s/%MO-$timephase/$MonthNumLibEn{$oldermonth}/ig; next; }
+ if ($timetag eq 'DD') { $LogFile =~ s/%DD-$timephase/$olderday/ig; next; }
+ if ($timetag eq 'HH') { $LogFile =~ s/%HH-$timephase/$olderhour/ig; next; }
+ if ($timetag eq 'NS') { $LogFile =~ s/%NS-$timephase/$olderns/ig; next; }
+ if ($timetag eq 'WM') { $LogFile =~ s/%WM-$timephase/$olderweekofmonth/g; next; }
+ if ($timetag eq 'Wm') { my $olderweekofmonth0=$olderweekofmonth-1; $LogFile =~ s/%Wm-$timephase/$olderweekofmonth0/g; next; }
+ if ($timetag eq 'WY') { $LogFile =~ s/%WY-$timephase/$olderweekofyear/g; next; }
+ if ($timetag eq 'Wy') { my $olderweekofyear0=sprintf("%02d",$olderweekofyear-1); $LogFile =~ s/%Wy-$timephase/$olderweekofyear0/g; next; }
+ if ($timetag eq 'DW') { $LogFile =~ s/%DW-$timephase/$olderwday/g; next; }
+ if ($timetag eq 'Dw') { my $olderwday0=$olderwday-1; $LogFile =~ s/%Dw-$timephase/$olderwday0/g; next; }
# If unknown tag
error("Unknown tag '\%$timetag' in LogFile parameter.");
}
# Do not load "output plugins" if update only
if ($UpdateStats && ! scalar keys %HTMLOutput && $pluginisfor{$pluginname} !~ /u/) { $PluginsLoaded{'init'}{"$pluginname"}=1; next; }
}
- else { $PluginsLoaded{'init'}{"$pluginname"}=1; } # Unkown plugins always loaded
+ else { $PluginsLoaded{'init'}{"$pluginname"}=1; } # Unknown plugins always loaded
# Load plugin
foreach my $dir (@PossiblePluginsDir) {
my $searchdir=$dir;
}
# If migrate and version < 4.x we need to include BEGIN_UNKNOWNIP into BEGIN_VISITOR for backward compatibility
if ($MigrateStats && $versionnum < 4000) {
- debug("File is version < 4000. We add UNKOWNIP in sections to load",1);
+ debug("File is version < 4000. We add UNKNOWNIP in sections to load",1);
$SectionsToLoad{'unknownip'}=99;
}
if (! scalar %SectionsToLoad) { debug(" Stop reading history file. Got all we need."); last; }
next;
}
- # BEGIN_UNKOWNIP for backward compatibility
+ # BEGIN_UNKNOWNIP for backward compatibility
if ($field[0] eq 'BEGIN_UNKNOWNIP') {
if ($Debug) { debug(" Begin of UNKNOWNIP section"); }
$_=<HISTORY>;
if (length($nompage)>$MaxLengthOfURL) { $nompage=substr($nompage,0,$MaxLengthOfURL)."..."; }
if ($ShowLinksOnUrl) {
my $newkey=CleanFromCSSA($url);
- if ($newkey =~ /^http(s|):/i) { # URL seems to be extracted from a ftp or proxy log file
- print "<A HREF=\"$newkey\" target=\"url\">$nompage</A>";
+ if ($LogType eq 'W') { # Web log file
+ if ($newkey =~ /^http(s|):/i) { # URL seems to be extracted from a proxy log file
+ print "<A HREF=\"$newkey\" target=\"url\">$nompage</A>";
+ }
+ elsif ($newkey =~ /^\//) { # URL seems to be an url extracted from a web or wap server log file
+ $newkey =~ s/^\/$SiteDomain//;
+ # Define urlprot
+ my $urlprot='http';
+ if ($UseHTTPSLinkForUrl && $newkey =~ /^$UseHTTPSLinkForUrl/) { $urlprot='https'; }
+ print "<A HREF=\"$urlprot://$SiteDomain$newkey\" target=\"url\">$nompage</A>";
+ }
+ else {
+ print "$nompage";
+ }
+ }
+ elsif ($LogType eq 'F') { # Ftp log file
+ print "$nompage";
}
- elsif ($newkey =~ /^\//) { # URL seems to be an url extracted from a web or wap server log file
- $newkey =~ s/^\/$SiteDomain//;
- # Define http or https
- my $httplink='http';
- if ($UseHTTPSLinkForUrl && $newkey =~ /^$UseHTTPSLinkForUrl/) { $httplink='https'; }
- print "<A HREF=\"$httplink://$SiteDomain$newkey\" target=\"url\">$nompage</A>";
+ elsif ($LogType eq 'M') { # Smtp log file
+ print "$nompage";
}
- else {
+ else { # Other type log file
print "$nompage";
}
}
@fieldlib=('date','host','logname','method','url','code','size','ua','referer');
}
elsif ($LogFormat eq '3') {
- $PerlParsingFormat="([^\\t]*\\t[^\\t]*)\\t([^\\t]*)\\t([\\d]*)\\t([^\\t]*)\\t([^\\t]*)\\t([^\\t]*)\\t[^\\t]*\\t.*:([^\\t]*)\\t([\\d]*)";
+ $PerlParsingFormat="([^\\t]*\\t[^\\t]*)\\t([^\\t]*)\\t([\\d|-]*)\\t([^\\t]*)\\t([^\\t]*)\\t([^\\t]*)\\t[^\\t]*\\t([^\\t]*)\\t([\\d]*)";\r
$pos_date=0;$pos_method=1;$pos_code=2;$pos_host=3;$pos_agent=4;$pos_referer=5;$pos_url=6;$pos_size=7;
@fieldlib=('date','method','code','host','ua','referer','url','size');
}
# Check protocol (Note: Use of TmpProtocol does not increase speed)
#----------------------------------------------------------------------
my $protocol=0;
- if ($field[$pos_method] eq 'GET' || $field[$pos_method] eq 'POST' || $field[$pos_method] eq 'HEAD' || $field[$pos_method] =~ /OK/i) {
- # HTTP request. Keep only GET, POST, HEAD, *OK* with Webstar but not OPTIONS
+ if ($field[$pos_method] eq 'GET' || $field[$pos_method] eq 'POST' || $field[$pos_method] eq 'HEAD' || $field[$pos_method] =~ /OK/i || $field[$pos_method] =~ /ERR\!/i) {\r
+ # HTTP request. Keep only GET, POST, HEAD, *OK* and ERR! for Webstar. Do not keep OPTIONS\r
$protocol=1;
}
elsif ($field[$pos_method] eq 'SMTP') {
# Mail request ('SMTP' for mail log with maillogconvert.pl preprocessor)
$protocol=3;
}
- elsif ($field[$pos_method] eq 'RETR' || $field[$pos_method] =~ /get/i) {
+ elsif ($field[$pos_method] eq 'RETR' || $field[$pos_method] eq 'o' || $field[$pos_method] =~ /get/i) {
# FTP GET request
$protocol=2;
}
- elsif ($field[$pos_method] eq 'STOR' || $field[$pos_method] =~ /sent/i) {
+ elsif ($field[$pos_method] eq 'STOR' || $field[$pos_method] eq 'i' || $field[$pos_method] =~ /sent/i) {
# FTP SENT request
$protocol=2;
}
}
}
+ # Convert URL for Webstar to common URL
+ if ($LogFormat eq '3') {
+ $field[$pos_url]=~s/:/\//g;
+ if ($field[$pos_code] eq '-') { $field[$pos_code]='200'; }
+ }
+
# Here, field array, timerecord and yearmonthdayrecord are initialized for log record
if ($Debug) { debug(" This is a not already processed record ($timerecord)",4); }
&html_end;
}
if ($HTMLOutput{'unknownip'}) {
- print "$Center<a name=\"UNKOWNIP\"> </a><BR>\n";
+ print "$Center<a name=\"UNKNOWNIP\"> </a><BR>\n";
&tab_head("$Message[45]",19);
print "<TR bgcolor=\"#$color_TableBGRowTitle\"><TH>".(scalar keys %_host_h)." $Message[1]</TH>";
&ShowHostInfo('__title__');
&html_end;
}
if ($HTMLOutput{'unknownos'}) {
- print "$Center<a name=\"UNKOWNOS\"> </a><BR>\n";
+ print "$Center<a name=\"UNKNOWNOS\"> </a><BR>\n";
my $title="$Message[46]";
&tab_head("$title",19);
print "<TR bgcolor=\"#$color_TableBGRowTitle\"><TH>User agent (".(scalar keys %_unknownreferer_l).")</TH><TH>$Message[9]</TH></TR>\n";
&html_end;
}
if ($HTMLOutput{'unknownbrowser'}) {
- print "$Center<a name=\"UNKOWNBROWSER\"> </a><BR>\n";
+ print "$Center<a name=\"UNKNOWNBROWSER\"> </a><BR>\n";
my $title="$Message[50]";
&tab_head("$title",19);
print "<TR bgcolor=\"#$color_TableBGRowTitle\"><TH>User agent (".(scalar keys %_unknownrefererbrowser_l).")</TH><TH>$Message[9]</TH></TR>\n";
if ($ShowOriginStats =~ /P/i) { print "<TD>".($_from_p[4]?$_from_p[4]:" ")."</TD><TD>".($_from_p[4]?"$p_p[4] %":" ")."</TD>"; }
if ($ShowOriginStats =~ /H/i) { print "<TD>".($_from_h[4]?$_from_h[4]:" ")."</TD><TD>".($_from_h[4]?"$p_h[4] %":" ")."</TD>"; }
print "</TR>\n";
- #------- Unkown origin
+ #------- Unknown origin
print "<TR><TD CLASS=AWS><b>$Message[39]</b></TD>";
if ($ShowOriginStats =~ /P/i) { print "<TD>".($_from_p[1]?$_from_p[1]:" ")."</TD><TD>".($_from_p[1]?"$p_p[1] %":" ")."</TD>"; }
if ($ShowOriginStats =~ /H/i) { print "<TD>".($_from_h[1]?$_from_h[1]:" ")."</TD><TD>".($_from_h[1]?"$p_h[1] %":" ")."</TD>"; }