Timo Sirainen [Tue, 9 Jun 2015 16:32:09 +0000 (19:32 +0300)]
dict-sql: Don't try to optimize finding a matching map by using the previous match.
In some setups multiple maps can match and it's important that the matching
is done in the same order always, otherwise the results could become
somewhat random.
Timo Sirainen [Wed, 3 Jun 2015 20:56:32 +0000 (23:56 +0300)]
fts: Added "doveadm fts lookup" command.
This is mainly useful for debugging lib-fts. It doesn't perform any of the
lib-fts tokenization / filtering so you can do raw lookups.
Timo Sirainen [Wed, 3 Jun 2015 19:30:05 +0000 (22:30 +0300)]
indexer: Improved handling multiple indexing requests to the same mailbox.
If a request arrives for a mailbox that we were already indexes, the
previous code simply sent the indexing request to the existing worker
process. This could have caused a lot of requests to be buffered to the same
mailbox if the indexing took a long time, which could have taken a while to
process even though they weren't really doing anything indexing work.
The new code instead just keeps track in memory that when the earlier
indexing is finished, it's done again once to finish any pending changes.
Teemu Huovila [Wed, 3 Jun 2015 13:47:25 +0000 (16:47 +0300)]
lib-lda: Fixed crash in mail_deliver_get_log_var_expand_table().
Discovered by clang static analyzer. This caused crashes with older versions
of Pigeonhole.
Timo Sirainen [Tue, 2 Jun 2015 21:46:23 +0000 (00:46 +0300)]
lib-fts: fts-filter API changed to have a non-pointer vfuncs variable.
The main benefit being that the fts-filter implementations can save a few
lines of code.
Timo Sirainen [Tue, 2 Jun 2015 18:56:03 +0000 (21:56 +0300)]
lib-fts: Added string_t *token to struct fts_filter
This makes the work a bit easier for simple filters that don't need any
state but want to use a string_t.
Timo Sirainen [Tue, 2 Jun 2015 16:52:15 +0000 (19:52 +0300)]
fts: Error logging fix.
1) We were logging the error after it was already freed from data stack.
2) We were logging uninitialized error string when fts indexing was the one
that failed.
Phil Carmody [Mon, 1 Jun 2015 19:08:43 +0000 (22:08 +0300)]
lib: API change - have uni_utf8_get_char*() return _char_bytes
Often the two functions are called in close proximity (both ways round). As
_get_char*() calls _char_bytes() early on the success path, we may as well
return that value to the caller for immediate use.
The callers which call _char_bytes() first are simply rejecting the truncated
case quickly - all other invalid cases still call both functions, and all
other valid cases (which should be the fast path) likewise call both.
Phil Carmody [Mon, 1 Jun 2015 19:08:27 +0000 (22:08 +0300)]
fts-solr: laxer check of uni_utf8_get_char_n() return value
If uni_utf8_get_char*() were changed to return the number of bytes in the
character on success, then all we care about is it being > 0 (i.e. not
error, not truncated).
Phil Carmody [Mon, 1 Jun 2015 19:07:44 +0000 (22:07 +0300)]
lib: test-unichar - test invalid utf8 encodings
Chop trailing characters off valid encodings, and watch them fail.
(There's no need to do this on most of the test characters, as they're
truncated to the same byte sequence - only do 1 in 64.)
Phil Carmody [Mon, 1 Jun 2015 19:06:44 +0000 (22:06 +0300)]
lib: test-unichar - streamline the unichars test
It's doing 2 kinds of tests, split them into separate test cases.
And the first part has started to get expensive, so just make sure
all code paths are tested by skipping most values. Only 3 from each
set of 64 (lowest 6 bits) are tested.
Timo Sirainen [Mon, 1 Jun 2015 18:58:30 +0000 (21:58 +0300)]
lib-fts: tokenizers - Fixed removal of trailing character in truncated tokens.
If the token is truncated, we don't want to remove the trailing character
since it's not actually there.
Also we don't want to remove trailing apostrophes from a truncated word,
because they're not actually at the end of the (untruncated) token there.
This doesn't make a big difference, but it's slightly more correct.
Timo Sirainen [Mon, 1 Jun 2015 18:35:39 +0000 (21:35 +0300)]
lib-fts: simple tokenizer minor cleanup - removed unnecessary token length > 0 check
fts_tokenizer_generic_simple_current_token() will check it in any case.
Timo Sirainen [Mon, 1 Jun 2015 18:27:09 +0000 (21:27 +0300)]
lib-fts: simple tokenizer cleanup - make prev_letter updating more explicit.
It was now hidden inside one of the functions, which didn't make the
prev_letter very consistent when a word break was found. It didn't actually
matter what the prev_letter was at that point, but now the behavior is more
consistent.
Teemu Huovila [Mon, 1 Jun 2015 15:35:58 +0000 (18:35 +0300)]
lib-fts: Change TR29 tokenizer to break at full stop (and others).
Diverge from the TR29 rules and always break at MidNumLet letters.
This fixes tokenizing first.last@domain.tld email addresses.
Timo Sirainen [Fri, 29 May 2015 18:39:33 +0000 (21:39 +0300)]
auth: Added %{passdb:field} and %{userdb:field} variables
The field expands to either the passdb or userdb extra field.
You can also use %{passdb:field:defaultvalue} where if field doesn't exist,
it's expanded to defaultvalue. Note that an empty value means that the field
still exists and it's not expanded to defaultvalue.
Timo Sirainen [Fri, 29 May 2015 17:55:58 +0000 (20:55 +0300)]
auth: Make sure %{mech} and %{session} is escaped in %var expansion.
%{mech} is already very trusted and %{session} should be only from trusted
sources as well, so this doesn't fix any actual security holes. They are
also unlikely to have ever even been used in anything that requires
escaping.
Timo Sirainen [Fri, 29 May 2015 08:50:18 +0000 (11:50 +0300)]
fts: If precaching fails, stop precaching the rest of the mails.
If there are a lot of mails to be precached, this could mean that the
precaching is attempted for a long time and every one of them fails the same
way.
Timo Sirainen [Mon, 25 May 2015 15:50:48 +0000 (11:50 -0400)]
lib: Avoid race conditions in mkdir*() if directory is being deleted at the same time.
Mainly this allows the call to return failure silently without logging
unnecessary errors.
Pascal Volk [Mon, 25 May 2015 14:27:22 +0000 (14:27 +0000)]
systemd service: Fixed typos in the comment section.
The settings for the file descriptor limit is LimitNOFILE.
Removed quotes around the value infinity. Otherwise systemd will
fail to parse that resource value.
Timo Sirainen [Sun, 24 May 2015 21:40:53 +0000 (17:40 -0400)]
lib-storage: If session_id isn't given, generate a new one.
This is useful for tracking logs written by services that aren't directly
related to any specific user session.
Timo Sirainen [Fri, 22 May 2015 23:07:56 +0000 (19:07 -0400)]
auth: Don't crash if trying to add password with TAB or LF to auth cache.
This would happen only if the passwords were stored as plaintext in passdb
and the valid password actually contained TAB or LF.
Timo Sirainen [Fri, 22 May 2015 02:03:10 +0000 (22:03 -0400)]
lib-fts: ICU normalization changes some characters to spaces - remove them.
We don't really want to add spaces to our index. It would be nice if the
words between spaces were actually split to different tokens, but that's
more of the fts-tokenizer's job and at filter stage that's probably not
wanted anymore.