If sarg is compiled with cc, the old configure script would add the -Aa
option to the CFLAGS but that option produces an error on modern cc. The
error is about a missing argument. I could not find the new syntax nor the
purpose of that option but it was reported that removing the whole
condition was solving the problem.
Speed up the second stage of the report generation
A lot of temporary files are produced after the log is split into several
files each containing the data of the users but those temporary files were
constantly being opened and closed for each line to be written into. It
was a small waste of time.
The new processing opens the temporary user's files once per user, process
the data of the corresponding user and generate the temporary file with
those data. The gain is roughly 10% on my data set.
Add a double check on the data at the user's level
There already was a check on the global data collected from the access log
to ensure no data was lost.
There is now a check on the amount of bytes download by the user and the
time elapsed. The detection of a failure is therefore more accurate and
make it easier to locate the leak.
During the creation of the topusers report, if the the report consist
entirely of failed connection without any byte downloaded, the incache
and outcache are both zero and don't sum up to 100%. This is expected and
should not output a warning about corrupted data.
Report any error while reading the day summary file
Sarg generates a temporary file containing the downloaded bytes and the
elapsed time to produce the user's hourly report. While that temporary
file is read and processed into yet another temporary file, any error
found is reported and the processing is aborted instead of being silently
ignored. It ensure no data are missing from the reports.
The date in the subject line of the e-mail was produced with asctime.
But that function append a \n at the end of the date. The subject line
was therefore followed by a blank line as mailx was concerned and that
would mess with the header of the e-mail.
This patch render the subject line using strftime. It avoid the line
break and allow mailx to add additional headers from the command line.
Print a debug message when purging the temporary directory
The temporary directory used by sarg is purged before the report is
generated and it can take some time if a previous directory was left
from a previous run.
A debug message is printed to tell the user about the delay that's
occuring.
Some URL are stripped of the scheme while others are not. Yet they were
all printed using the same function that prefixed a http:// in front of
the URL even if the original scheme was stil present.
To make the links clickable without error in the report, the scheme may
be provided by the caller when it requests the printing of the URL in a
HTTM link.
The affected reports were the authentication failures, the denied
accesses and the downloaded files.
Frédéric Marchal [Thu, 29 Dec 2011 15:05:52 +0000 (15:05 +0000)]
Support for gd, ldap and iconv can be disabled during configuration
The configuration script support the three command line switches
--without-gd, --without-ldap and --without-iconv to build sarg without
one of those external module.
Frédéric Marchal [Thu, 29 Dec 2011 15:05:34 +0000 (15:05 +0000)]
Don't abort on squidGuard log errors
squidGuard sometime wraps the url in the log file and sarg used to abort
the whole report generation. This patch merely issue a warning but keep
producing the report.
The report contains a warning indicating how many lines were ignored
from the original log file.
Frédéric Marchal [Thu, 10 Nov 2011 19:59:17 +0000 (19:59 +0000)]
Add missing \n on several debug messages
Some messages were missing the carriage return. The consequence was a
concatenation of the following message with the previous one potentially
letting it go unnoticed.
Some directories are created to store the user's data but if they end up
not being used, they are deleted along with their content. It saves
space on the disk.
A nicer fix would be not to create the directories and their content in
the first place but I'll keep that for the next release.
Frédéric Marchal [Sun, 30 Oct 2011 19:04:33 +0000 (19:04 +0000)]
Add links on the Sites & Users page to the user's page
The page listing the links visited and who visited them contains links to
jump directly to the page of each user provided they are on the top
users page.
Frédéric Marchal [Sun, 30 Oct 2011 19:04:12 +0000 (19:04 +0000)]
Use a function to safely copy the strings
Some calls already used strncpy to copy strings but, now, we are using a
function to encapsulate the code and it is used to copy the arguments
passed to sarg by the user.
Frédéric Marchal [Sun, 30 Oct 2011 14:42:42 +0000 (14:42 +0000)]
Display some messages to understand why sarg isn't doing something
One common class of questions from users is to ask why sarg isn't
producing some kind of report. Considering the number of configuration
parameters, it is not surprising that some users get lost.
To help the users help themselves, the -z command line option has been
enhanced to print messages indicating why sarg don't produce a report.
Times with a one digit hour must be sorted before the times with two
digits. To fix this issue, a padding zero is prefixed to the hour if it
contains only one digit.
Write a note about the ignored items in the reports
Several parameters of sarg.conf can limit the number of lines written in the
reports but, so far, only the denied report was reporting how many entries were
left out.
Now, the authentication failure, the dansguardian and the redirector reports do
write the number of ignored lines.
That change should spare some headaches to the users trying to understand the
reports.
Correctly append a suffix to the mangled temporary file name
When two users end up with the same mangled temporary file name, a suffix is
supposed to be added at the end of the new file name to make it distinct from the
previous one but the suffix was added one byte too far making it useless.
The result was that the log entries of two or more users were written into the
same file overwritting each other's data and corrupting the report.
Protect the columns sorting against missing or invalid dates
Sorttable.js fails on columns containing a date on the first row but not on
subsequent rows. This patch assumes a zero date for any missing or invalid date
sorting those rows at the top of the table.
Add the javascript to dynamically sort the tables in the reports
This is the original script written by Stuart Langridge and downloaded on
http://www.kryogenix.org/code/browser/sorttable/. It is included here as the
script contains some bugs and the author doesn't seem to be supporting his
script any more.
Frédéric Marchal [Sat, 25 Jun 2011 12:49:04 +0000 (12:49 +0000)]
Add support for IPv6 in the aliasing of host names
IPv6 addresses can be defined in the hostalias file and accept the CIDR
notation.
Squares brackets are not required around the IP address in the hostalias
file but the log file should enclose the IPv6 address between square
brackets to avoid confusion with the port number.
Frédéric Marchal [Thu, 23 Jun 2011 13:57:33 +0000 (13:57 +0000)]
Increase the limit on the number of days that can be processed
If a log file (possibly restricted to a date range) ends up with more than 90
different dates, sarg aborts and complains that there are too many dates.
That restriction is just a safety and isn't critical so it has been increased
to a more reasonable value.
Frédéric Marchal [Sun, 19 Jun 2011 19:33:48 +0000 (19:33 +0000)]
Alias the host names in the redirector report
The scheme is removed from the URL even for a custom report format and the
URL is always truncated to keep only the host name. The full URL was
always reported for a custom log format.
In addition, the reported host name is replaced by the alias if one is
defined.
There is no grouping of the identical host names as the report list one
access per line along with the access time so there is no grouping any
way.
Frédéric Marchal [Sun, 19 Jun 2011 18:32:18 +0000 (18:32 +0000)]
Don't report clickable link for aliased url
The HTML reports contain A tags to link to the page visited by the users
but if the host name is aliased, the link is meaningless and must not be
reported.
Frédéric Marchal [Sat, 18 Jun 2011 10:32:16 +0000 (10:32 +0000)]
External sort command delimits the columns only on a tabulation
The default sort command splits the columns on a blank to non blank
transition but our files only use a tabulation as the column separator.
Therefore, the calls to the external sort command explicitly require
that the columns be identified by a tabulation. It prevents problems
when the fields contain spaces.
Frédéric Marchal [Sat, 18 Jun 2011 10:31:49 +0000 (10:31 +0000)]
Alias host names in URL and group identical names
The user can write a file providing rules to replace the host names
extracted from the URL and displayed in the reports. The rules allow
for one wildcard in the host names to be matched.
Identical aliased host named are grouped together in the reports.
Fix the error messages when parsing a redirector log with custom format
If redirector_log_format is set in sarg.conf, the error messages displayed for
any error encountered while parsing the format string are unclear or wrong.
This patch fix the message and explain really why the format string could not
be used.
The files and directories are named after the user whose report is
about. Therefore, even if the administrator tries to hide the user's
identity with a useratb file, the real identity is still visible in the
URL.
To solve this problem, option anonymous_output_files was added to
sarg.conf. When it is on, each user's file is named using a unique
number that can't be traced back to the real user.
This patch also allows to shorten the URL of the report.
The man page tells that the program should try again when getnameinfo
returns EAI_AGAIN but it doesn't say if the program should wait and how
many attempts it should perform. Therefore, we assume the implementation
just want us to call it again but we won't waste more time than that.
The number of IP addresses to resolve is potentially very big and it
doesn't matter much if a few addresses are not resolved.
The time was ignored when parsing a squid log file written with the common
logformat. The consequence was that all the accesses were reported as occuring
at 00:00.
Frédéric Marchal [Fri, 25 Feb 2011 20:32:09 +0000 (20:32 +0000)]
Don't abort for an empty report directory
If sarg fails and leaves an empty report directory, one without a
sarg-date file in it, any subsequent execution of sarg will fail due to
that empty report directory.
This change ignores such an empty directory and issue a simple warning.
Frédéric Marchal [Fri, 25 Feb 2011 08:05:14 +0000 (08:05 +0000)]
Take the date_format into account when converting a file
The date_format parameter read from sarg.conf was taken into account too late
in the program flow and was ignored during the convertion or the splitting of a
file. Only command line option -g was effective.
Don't delete a file twice if -i is given on the command line
If sarg is ran with command line option -i, in some circunstances I have
yet to clarify, the ip file is not produced. In that case, the name of
the previously created file (whose name is still in the string buffer)
is deleted a second time. The result is a failure as the file doesn't
exists any more.
Enable a warning in gcc to stop the compilation if an empty body is found after
some control structures. It should detect stray semi-colons at the end of the
control structures such as if, for, while,...
Fix a problem with the attributes passed to ldap_search
The attributes list passed to ldap_search must be terminated by a NULL
pointer. That wasn't the case in sarg and was likely responsible for a
segfault. It should be fixed now.
According to the gettext manual, AM_GNU_GETTEXT_VERSION sets the
minimum gettext version required to build the package but it doesn't
look quite right as my system insist on using that exact version of
gettext to install the po files.