git.ipfire.org Git - thirdparty/dovecot/core.git/commit

author	Timo Sirainen <timo.sirainen@open-xchange.com>
	Tue, 26 Oct 2021 13:59:29 +0000 (16:59 +0300)
committer	aki.tuomi <aki.tuomi@open-xchange.com>
	Mon, 8 Nov 2021 10:31:23 +0000 (10:31 +0000)
commit	e18843502604d9f4317000923a7493e8f6c8b132
tree	aafe5fc395181c5c08a67d9cbc55d628f9d354c7	tree \| snapshot
parent	2af8437d1d19f1fba76a835c05878f19d64e9b72	commit \| diff

lib-fts: Fix address tokenizer to handle large input properly

Previously it could have used excessive amounts of memory if the input
didn't contain separator characters.

The fix changes a bit how the address-tokenizer works: Previously large
email addresses were saved as truncated tokens. Now they're skipped
entirely by the address tokenizer. Similarly when searching long email
addresses they're no longer searched as truncated tokens, but instead
simply fed to the parent tokenizer which (likely) searches them in
smaller pieces.

Note that this also sometimes changes the order in which tokens are
returned, e.g. "foo", "example", "foo@example.com", "com" instead of
returning "com" before the email address. This isn't ideal, but fixing it
seems annoyingly complicated and practically it doesn't matter right now.

src/lib-fts/fts-tokenizer-address.c		diff \| blob \| blame \| history
src/lib-fts/test-fts-tokenizer.c		diff \| blob \| blame \| history