Timo Sirainen [Tue, 2 Jun 2015 21:46:23 +0000 (00:46 +0300)]
lib-fts: fts-filter API changed to have a non-pointer vfuncs variable.
The main benefit being that the fts-filter implementations can save a few
lines of code.
Timo Sirainen [Tue, 2 Jun 2015 18:56:03 +0000 (21:56 +0300)]
lib-fts: Added string_t *token to struct fts_filter
This makes the work a bit easier for simple filters that don't need any
state but want to use a string_t.
Timo Sirainen [Tue, 2 Jun 2015 16:52:15 +0000 (19:52 +0300)]
fts: Error logging fix.
1) We were logging the error after it was already freed from data stack.
2) We were logging uninitialized error string when fts indexing was the one
that failed.
Phil Carmody [Mon, 1 Jun 2015 19:08:43 +0000 (22:08 +0300)]
lib: API change - have uni_utf8_get_char*() return _char_bytes
Often the two functions are called in close proximity (both ways round). As
_get_char*() calls _char_bytes() early on the success path, we may as well
return that value to the caller for immediate use.
The callers which call _char_bytes() first are simply rejecting the truncated
case quickly - all other invalid cases still call both functions, and all
other valid cases (which should be the fast path) likewise call both.
Phil Carmody [Mon, 1 Jun 2015 19:08:27 +0000 (22:08 +0300)]
fts-solr: laxer check of uni_utf8_get_char_n() return value
If uni_utf8_get_char*() were changed to return the number of bytes in the
character on success, then all we care about is it being > 0 (i.e. not
error, not truncated).
Phil Carmody [Mon, 1 Jun 2015 19:07:44 +0000 (22:07 +0300)]
lib: test-unichar - test invalid utf8 encodings
Chop trailing characters off valid encodings, and watch them fail.
(There's no need to do this on most of the test characters, as they're
truncated to the same byte sequence - only do 1 in 64.)
Phil Carmody [Mon, 1 Jun 2015 19:06:44 +0000 (22:06 +0300)]
lib: test-unichar - streamline the unichars test
It's doing 2 kinds of tests, split them into separate test cases.
And the first part has started to get expensive, so just make sure
all code paths are tested by skipping most values. Only 3 from each
set of 64 (lowest 6 bits) are tested.
Timo Sirainen [Mon, 1 Jun 2015 18:58:30 +0000 (21:58 +0300)]
lib-fts: tokenizers - Fixed removal of trailing character in truncated tokens.
If the token is truncated, we don't want to remove the trailing character
since it's not actually there.
Also we don't want to remove trailing apostrophes from a truncated word,
because they're not actually at the end of the (untruncated) token there.
This doesn't make a big difference, but it's slightly more correct.
Timo Sirainen [Mon, 1 Jun 2015 18:35:39 +0000 (21:35 +0300)]
lib-fts: simple tokenizer minor cleanup - removed unnecessary token length > 0 check
fts_tokenizer_generic_simple_current_token() will check it in any case.
Timo Sirainen [Mon, 1 Jun 2015 18:27:09 +0000 (21:27 +0300)]
lib-fts: simple tokenizer cleanup - make prev_letter updating more explicit.
It was now hidden inside one of the functions, which didn't make the
prev_letter very consistent when a word break was found. It didn't actually
matter what the prev_letter was at that point, but now the behavior is more
consistent.
Teemu Huovila [Mon, 1 Jun 2015 15:35:58 +0000 (18:35 +0300)]
lib-fts: Change TR29 tokenizer to break at full stop (and others).
Diverge from the TR29 rules and always break at MidNumLet letters.
This fixes tokenizing first.last@domain.tld email addresses.
Timo Sirainen [Fri, 29 May 2015 18:39:33 +0000 (21:39 +0300)]
auth: Added %{passdb:field} and %{userdb:field} variables
The field expands to either the passdb or userdb extra field.
You can also use %{passdb:field:defaultvalue} where if field doesn't exist,
it's expanded to defaultvalue. Note that an empty value means that the field
still exists and it's not expanded to defaultvalue.
Timo Sirainen [Fri, 29 May 2015 17:55:58 +0000 (20:55 +0300)]
auth: Make sure %{mech} and %{session} is escaped in %var expansion.
%{mech} is already very trusted and %{session} should be only from trusted
sources as well, so this doesn't fix any actual security holes. They are
also unlikely to have ever even been used in anything that requires
escaping.
Timo Sirainen [Fri, 29 May 2015 08:50:18 +0000 (11:50 +0300)]
fts: If precaching fails, stop precaching the rest of the mails.
If there are a lot of mails to be precached, this could mean that the
precaching is attempted for a long time and every one of them fails the same
way.
Timo Sirainen [Mon, 25 May 2015 15:50:48 +0000 (11:50 -0400)]
lib: Avoid race conditions in mkdir*() if directory is being deleted at the same time.
Mainly this allows the call to return failure silently without logging
unnecessary errors.
Pascal Volk [Mon, 25 May 2015 14:27:22 +0000 (14:27 +0000)]
systemd service: Fixed typos in the comment section.
The settings for the file descriptor limit is LimitNOFILE.
Removed quotes around the value infinity. Otherwise systemd will
fail to parse that resource value.
Timo Sirainen [Sun, 24 May 2015 21:40:53 +0000 (17:40 -0400)]
lib-storage: If session_id isn't given, generate a new one.
This is useful for tracking logs written by services that aren't directly
related to any specific user session.
Timo Sirainen [Fri, 22 May 2015 23:07:56 +0000 (19:07 -0400)]
auth: Don't crash if trying to add password with TAB or LF to auth cache.
This would happen only if the passwords were stored as plaintext in passdb
and the valid password actually contained TAB or LF.
Timo Sirainen [Fri, 22 May 2015 02:03:10 +0000 (22:03 -0400)]
lib-fts: ICU normalization changes some characters to spaces - remove them.
We don't really want to add spaces to our index. It would be nice if the
words between spaces were actually split to different tokens, but that's
more of the fts-tokenizer's job and at filter stage that's probably not
wanted anymore.
Teemu Huovila [Thu, 21 May 2015 10:29:15 +0000 (06:29 -0400)]
lib-fts: Fix simple tokenizer apostrophe handling.
Apostrophes and quotation marks are now treated as word breaks,
except U+0027 between non-wordbrek characters. The characters
U+2019 and U+FF07 are transformed to U+0027 before processing.
Timo Sirainen [Mon, 18 May 2015 11:53:52 +0000 (14:53 +0300)]
lib-fts: Partially reverted d097a9779c37 - don't use lib_atexit()
Because fts is loaded as plugin lib_atexit() is called after the plugin is
already unloaded, so it crashes.
Timo Sirainen [Mon, 18 May 2015 11:49:15 +0000 (07:49 -0400)]
director: Added "up" vs "down" states and doveadm director up/down commands.
These commands are intended to be used by automated watchdogs that detect if
backends are up or down. This way the vhost count doesn't get forgotten
after server goes down. It also means that admin can manually take down a
server by setting its vhost count to 0 without the watchdog automatically
bringing it back up.
Timo Sirainen [Sat, 16 May 2015 15:47:20 +0000 (18:47 +0300)]
lib-fts: Rewrite ICU handling functions.
Some of the changes:
- Use buffers instead of allocating everything from data stack.
- Optimistically attempt to write the data directly to the buffers without
first calculating their size. Grow the buffer if it doesn't fit first.
- Use u_strFromUTF8Lenient() instead of u_strFromUTF8(). Our input is
already supposed to be valid UTF-8, although we don't check if all code
points are valid, while u_strFromUTF8() does check them and return failures.
We don't really care about if code points are valid or not and
u_strFromUTF8Lenient() passes through everything.
Added unit tests to make sure all the functions work as intended and all the
UTF-8 input passes through them successfully.