Sort the top sites report by number of users connecting the sites
The top sites report can be sorted according to the number of users
connecting to the visited sites. It shows how popular sites are within your
network.
If sarg is configured with a wrong /tmp path or the path given points to a
directory the user doesn't intent to use as a temporary directory, we must
not delete it's content (think about a link going to /usr/bin).
To protect against that situation, sarg only deletes its own files and
after making sure it only contains files created by sarg.
During the creation of the user's reports, if the report showing the details
by date and hour is not requested, the unnecessary file is deleted but it
overwrite the buffer containing the name of another temporary file to
delete. As the file name is overwritten, it cannot be deleted when the
function completes.
To prevent sarg from filling up the memory and waking up the OOM killer
while reading an invalid or corrupted log file, the longest line sarg
will accept before aborting is 10MB long. The limit is arbitrary.
Update the messages when an error is detected while reading a line
The module to read long text lines may read any file. It is not restricted
to reading the input log file. Therefore, the error messages must not claim
that the error is in the input log file.
Frédéric Marchal [Thu, 14 Jun 2012 08:04:25 +0000 (10:04 +0200)]
Allow backslash as the domain/user separator
For NTLM users, the domain and user names may be separated by a + or a \\
as pointed out by mrac33:
(http://sourceforge.net/tracker/index.php?func=detail&aid=3532108&group_id=68910&atid=522791)
For compatibility reasons, the _ separator is still retained.
Thanks to mrac33 for reporting and fixing this bug.
Frédéric Marchal [Mon, 21 May 2012 19:55:47 +0000 (21:55 +0200)]
IP address resolution using one external program
It is now possible to resolve an IP address using an external program.
Only one external program can be configured but it may do anything
including attempting several strategies to resolve the IP address.
The module may be chained after the standard dns module to get the name of
a computer not registered with the DNS.
Executing an external program is exceedingly slow so it is best to try
the DNS first!
Frédéric Marchal [Mon, 21 May 2012 08:10:37 +0000 (10:10 +0200)]
Take the port number into account when processing IPv4 addresses
The port number is ignored from IPv4 addresses read from the log file. It
allows to compare IPv4 addresses against the host exclusion list.
Prior to that change, it was not possible to filter out IPv4 ranges if a
port number was reported in the log file as the address was not recognized
as an IPv4 address and therefore was not compared to the correct exclusion
list.
Frédéric Marchal [Mon, 12 Mar 2012 09:14:25 +0000 (10:14 +0100)]
Display the offending regular expression if an error is detected
If a regular expression is invalid, the actual regular expression is displayed
in the error message in addition to the error message from libpcre. The user
will know what regular expression failed.
Frédéric Marchal [Sat, 10 Mar 2012 14:37:11 +0000 (15:37 +0100)]
Deal with url without scheme or path in a squidGuard log
Some url in a squidGuard log don't start with a scheme:// and may not
even contain a path. Those bare minimum url are not parsed correctly
by the redirector_log_format suggested in sarg.conf.
To parse those log entries correctly, we grab the whole url in the
buffer and strip it down to keep the host name.
If an empty user name creeps up to the name manufacturing function, the name
generated to store the user's files is empty and it leads to the deletion of
the whole report directory during the process. The visible results is that sarg
ends up with an error because its output directory is missing.
This patch makes sure no empty file name is used. It is still necessary to
avoid empty user names in the first place.
Frédéric Marchal [Sat, 18 Feb 2012 08:19:45 +0000 (09:19 +0100)]
Make a module out of the DNS IP resolving
The code was changed to accommodate module names in resolve_ip instead of
just yes or no. The named modules are tried in sequence until one returns
a positive result.
Frédéric Marchal [Sun, 12 Feb 2012 15:42:35 +0000 (16:42 +0100)]
Fix the permissions in the archive file
The archive to distribute a release had the wrong permissions. Every
directory was missing the x permission preventing the user from entering
into the directory.
Avoid a possible name clash in the temporary directory
As all the temporary files are generated in the same directory and some of
them may be named after the user's ID found in the log file, it is possible
that a user's file ends up with the same name as an internal file such as
the downloads.
To avoid that name clash, the temporary files created for any auxiliary
report are suffixed with a distinct extension.
No links in the denied page if the user is not on the topusers list
The report with the denied accesses contains links to the user report page
but the user report page is not generated if the user is not on the
topusers list.
This patch hide the link if the user's page doesn't exists.
Produce the time and graphs reports if users_sites is disabled
If the topusers report is requested but not the users_sites report, the
links to the time and graphs reports are included in the topusers list but
the files were not generated. Now, they are.
The LLONG_MAX constant is only declared on some systems if gcc is ran in
a mode compatible with C99.
This commit requires C99 (-std=gnu99 in fact) and check that LLONG_MAX is
properly defined. At least, the user will be informed about the problem as
soon as possible if such a problem occurs again.
The bug describing this problem is here:
http://sourceforge.net/tracker/index.php?func=detail&aid=3482261&group_id=68910&atid=522791
If sarg is compiled with cc, the old configure script would add the -Aa
option to the CFLAGS but that option produces an error on modern cc. The
error is about a missing argument. I could not find the new syntax nor the
purpose of that option but it was reported that removing the whole
condition was solving the problem.
Speed up the second stage of the report generation
A lot of temporary files are produced after the log is split into several
files each containing the data of the users but those temporary files were
constantly being opened and closed for each line to be written into. It
was a small waste of time.
The new processing opens the temporary user's files once per user, process
the data of the corresponding user and generate the temporary file with
those data. The gain is roughly 10% on my data set.
Add a double check on the data at the user's level
There already was a check on the global data collected from the access log
to ensure no data was lost.
There is now a check on the amount of bytes download by the user and the
time elapsed. The detection of a failure is therefore more accurate and
make it easier to locate the leak.
During the creation of the topusers report, if the the report consist
entirely of failed connection without any byte downloaded, the incache
and outcache are both zero and don't sum up to 100%. This is expected and
should not output a warning about corrupted data.
Report any error while reading the day summary file
Sarg generates a temporary file containing the downloaded bytes and the
elapsed time to produce the user's hourly report. While that temporary
file is read and processed into yet another temporary file, any error
found is reported and the processing is aborted instead of being silently
ignored. It ensure no data are missing from the reports.
The date in the subject line of the e-mail was produced with asctime.
But that function append a \n at the end of the date. The subject line
was therefore followed by a blank line as mailx was concerned and that
would mess with the header of the e-mail.
This patch render the subject line using strftime. It avoid the line
break and allow mailx to add additional headers from the command line.
Print a debug message when purging the temporary directory
The temporary directory used by sarg is purged before the report is
generated and it can take some time if a previous directory was left
from a previous run.
A debug message is printed to tell the user about the delay that's
occuring.
Some URL are stripped of the scheme while others are not. Yet they were
all printed using the same function that prefixed a http:// in front of
the URL even if the original scheme was stil present.
To make the links clickable without error in the report, the scheme may
be provided by the caller when it requests the printing of the URL in a
HTTM link.
The affected reports were the authentication failures, the denied
accesses and the downloaded files.
Frédéric Marchal [Thu, 29 Dec 2011 15:05:52 +0000 (15:05 +0000)]
Support for gd, ldap and iconv can be disabled during configuration
The configuration script support the three command line switches
--without-gd, --without-ldap and --without-iconv to build sarg without
one of those external module.
Frédéric Marchal [Thu, 29 Dec 2011 15:05:34 +0000 (15:05 +0000)]
Don't abort on squidGuard log errors
squidGuard sometime wraps the url in the log file and sarg used to abort
the whole report generation. This patch merely issue a warning but keep
producing the report.
The report contains a warning indicating how many lines were ignored
from the original log file.