From: Tom Lane Date: Wed, 14 Nov 2007 03:26:24 +0000 (+0000) Subject: Update discussion of tsearch2 migration. I'm not entirely sure about X-Git-Tag: REL8_3_BETA3~42 X-Git-Url: http://git.ipfire.org/?a=commitdiff_plain;h=de085820bf7f9dbff4b6c427a7a3689b7909c690;p=thirdparty%2Fpostgresql.git Update discussion of tsearch2 migration. I'm not entirely sure about the division of material between here and the tsearch2 contrib page, but at least it's not obviously unfinished any more. --- diff --git a/doc/src/sgml/textsearch.sgml b/doc/src/sgml/textsearch.sgml index e556c6dd78a..0ba401c2a43 100644 --- a/doc/src/sgml/textsearch.sgml +++ b/doc/src/sgml/textsearch.sgml @@ -1,4 +1,4 @@ - + Full Text Search @@ -3489,99 +3489,77 @@ Parser: "pg_catalog.default" Migration from Pre-8.3 Text Search - This area needs lots of work. Here is a quick list of known issues: + Applications that used the contrib/tsearch2 add-on module + for text searching will need some adjustments to work with the + built-in features: - + - The old contrib/tsearch2 objects must be removed from - the pg_dump output from a pre-8.3 database. While many of them won't - load for lack of a tsearch2.so library, some do and cause problems. - We have a working perl script for doing this with a custom- or tar-format - backup, but there is a proposal to incorporate the functionality directly - into pg_restore. Neither approach will help for pg_dumpall output. + Some functions have been renamed or had small adjustments in their + argument lists, and all of them are now in the pg_catalog + schema, whereas in a previous installation they would have been in + public or another non-system schema. There is a new + version of contrib/tsearch2 (see ) + that provides a compatibility layer to solve most problems in this + area. - The old dump may include schema-qualified references to the old - contrib/tsearch2 objects; for example public.tsvector - columns in table definitions. These will fail since the objects - are now in the pg_catalog schema. Given current pg_dump behavior - this will happen only for tables that are in a different schema - from the tsearch2 objects; which makes it more likely to bite - people who carefully put their tsearch2 objects in a - non-public schema. - - - - Question: will restore-time failures of this type happen for - any objects other than the tsvector and tsquery datatypes? - - - - The basic alternatives for fixing this seem to involve creating - a dummy linkage, such as a public.tsvector domain linking to the - base pg_catalog.tsvector type (which only helps for the datatypes); - or stripping the schema references out of the dump. We could - just recommend that users do this manually, or try to provide - some tools to help. - - - - - - We have renamed the built-in tsvector update triggers, and changed - their arguments too. This will result in CREATE TRIGGER commands - failing during load, which can be ignored, but users will need to - re-issue them with suitable argument adjustment. We probably - can't automate that for them. Also, the old tsearch2 trigger - function offered an option to invoke functions, which was removed - as being a security hole. Users who were relying on that will need to - write custom trigger functions as a substitute. I think all we - can do here is document what to do to fix it. + The old contrib/tsearch2 functions and other objects + must be suppressed when loading pg_dump + output from a pre-8.3 database. While many of them won't load anyway, + a few will and then cause problems. One simple way to deal with this + is to load the new contrib/tsearch2 module before restoring + the dump; then it will block the old objects from being loaded. - We have renamed a number of other functions besides the triggers, - compared to the tsearch2 versions. This seems unlikely to cause - any problems during dump/reload but it will require adjustments in - the bodies of stored procedures and in client application code. - Again, not much to do except document it. + Text search configuration setup is completely different now. + Instead of manually inserting rows into configuration tables, + search is configured through the specialized SQL commands shown + earlier in this chapter. There is not currently any automated + support for converting an existing custom configuration for 8.3; + you're on your own here. - Configuration setup is completely different now. Can we provide - any automated assistance for translating an old custom setup? - It probably can't be 100% automatic in any case, so maybe documentation - is the best we can do here too. Aside from the inside-the-database - differences, outside-the-database configuration files now have - prescribed location and extensions, which was not true before. - - + Most types of dictionaries rely on some outside-the-database + configuration files. These are largely compatible with pre-8.3 + usage, but note the following differences: - - - Relocation of configuration from add-on tables into core system catalogs - will break client queries that looked at the add-on tables. - - + + + + Configuration files now must be placed in a single specified + directory ($SHAREDIR/tsearch_data), and must have + a specific extension depending on the type of file, as noted + previously in the descriptions of the various dictionary types. + This restriction was added to forestall security problems. + + - - - Thesaurus files now use ? for stop words. - - + + + Configuration files must be encoded in UTF-8 encoding, + regardless of what database encoding is used. + + - - - What else? + + + In thesaurus configuration files, stop words must be marked with + ?. + + +