Frédéric Marchal [Sun, 16 Dec 2012 17:03:51 +0000 (18:03 +0100)]
Make a translator friendly message out of pieced together words
The "generated by" message written at the page bottom was made out of
words assembled together during the page generation. It was not possible
to translate that message.
Be more thorough when ensuring a file is correctly written
The return status of every written file is checked when the file is closed.
Any incorrectly written file should be detected early and a proper message
should be reported.
Overwrite any existing dansguardian temporary file
When sarg parses a dansguardian log, the temporary file to store the
parsed value is overwritten if it exists instead of appending the entries
at the end of any left over from a previous run.
Frédéric Marchal [Fri, 31 Aug 2012 19:46:41 +0000 (21:46 +0200)]
Remove a message about the redirector log that can't be deleted
If no redirector log was provided, a message was displayed in debug mode to
inform the user about a temporary file that can't be deleted. That message
was unnecessary and misleading. It is now displayed only when appropriate.
Frédéric Marchal [Fri, 31 Aug 2012 16:58:54 +0000 (18:58 +0200)]
Add a safety to prevent the deletion of files that haven't been created
There was a path in the source code where sarg could try to delete the
temporary unsorted files of the denied and authfail reports without
checking that the file names were not empty.
The functions where the guard was added are not supposed to be called if
no reports are to be generated but that check relies on the caller. If the
caller fails and call the function to generate the reports, it will try
to delete a file whose name is empty.
Frédéric Marchal [Sun, 26 Aug 2012 17:40:10 +0000 (19:40 +0200)]
Don't keep the % in the URL when converted into a file name
When a file name is manufactured from a URL, the percent sign is removed to
prevent the web server or the browser from requesting a file with a % in
it.
The server or the browser would decode the percent sign if the two
subsequent bytes happened to be a valid hexadecimal byte and would request
the wrong file.
Frédéric Marchal [Sun, 26 Aug 2012 16:47:25 +0000 (18:47 +0200)]
Keep reading the log files even with a small number of errors
Sarg tolerates a few errors in the input log files. The number of errors
can be configured. Sarg can stop on some consecutive errors and on the
total number of errors.
Frédéric Marchal [Sun, 26 Aug 2012 15:37:23 +0000 (17:37 +0200)]
Allow an empty data size in a common log
Any column of the common log format may be - to denote missing or
no applicable data. In particular, apache writes a - when the URL is a
redirection. That case is taken into account with this patch.
Frédéric Marchal [Sun, 26 Aug 2012 15:25:49 +0000 (17:25 +0200)]
Accept common log files without extension column
The sample common log file used to test the program contained additional
columns. The standard common log format doesn't have those column. Sarg
failed to parse the standard format due to the lack of any supernumerary
column.
Frédéric Marchal [Sun, 26 Aug 2012 13:46:51 +0000 (15:46 +0200)]
Decode extended log formats
Microsoft ISA produces such a log. This change is supposed to handle more
general cases than the previous routine.
The current code successfully decode the one line long log I have to test
the code. The decoding procedure may not be compatible with *any*
compliant extended log implementation. Sample logs are necessary to improve
the code.
Frédéric Marchal [Sun, 26 Aug 2012 09:18:52 +0000 (11:18 +0200)]
Store the entry time in a structure instead of a pointer
Instead of requiring that the module keeps track of the entry time on
behalf of the main loop, the entry time is stored in the entry structure.
Therefore, there is no need to keep a static variable inside the module
and pass its pointer to the caller.
Frédéric Marchal [Tue, 21 Aug 2012 19:02:33 +0000 (21:02 +0200)]
Don't use strcmp to check strings one or zero characters long
As a side effect, the date format is stored in a single character instead
of a string and df is now the only variable used globally to set the
date format.
Frédéric Marchal [Fri, 10 Aug 2012 18:10:31 +0000 (20:10 +0200)]
Don't show the input log reading percentage
The show_read_percent option shows the percent of the input log file
reading independently of show_read_statistics. It allows for a progress
indicator without having to read the input log file twice.
Sort the top sites report by number of users connecting the sites
The top sites report can be sorted according to the number of users
connecting to the visited sites. It shows how popular sites are within your
network.
If sarg is configured with a wrong /tmp path or the path given points to a
directory the user doesn't intent to use as a temporary directory, we must
not delete it's content (think about a link going to /usr/bin).
To protect against that situation, sarg only deletes its own files and
after making sure it only contains files created by sarg.
During the creation of the user's reports, if the report showing the details
by date and hour is not requested, the unnecessary file is deleted but it
overwrite the buffer containing the name of another temporary file to
delete. As the file name is overwritten, it cannot be deleted when the
function completes.
To prevent sarg from filling up the memory and waking up the OOM killer
while reading an invalid or corrupted log file, the longest line sarg
will accept before aborting is 10MB long. The limit is arbitrary.
Update the messages when an error is detected while reading a line
The module to read long text lines may read any file. It is not restricted
to reading the input log file. Therefore, the error messages must not claim
that the error is in the input log file.
That change was first made on branch v2.3 and merged with v2.4. The new
functions introduced in version 2.4 have been changed to reflect the same
change.
The structure is to be shared with several modules.
As a side effect, the denied and authentication failure reports now display
the full URL even when the short URL is requested. It has to be defined
whether this is desirable or not.
Frédéric Marchal [Thu, 14 Jun 2012 08:04:25 +0000 (10:04 +0200)]
Allow backslash as the domain/user separator
For NTLM users, the domain and user names may be separated by a + or a \\
as pointed out by mrac33:
(http://sourceforge.net/tracker/index.php?func=detail&aid=3532108&group_id=68910&atid=522791)
For compatibility reasons, the _ separator is still retained.
Thanks to mrac33 for reporting and fixing this bug.