From: Tom Lane Date: Fri, 27 Mar 2026 21:41:00 +0000 (-0400) Subject: Doc: split functions-posix-regexp section into multiple subsections. X-Git-Url: http://git.ipfire.org/cgi-bin/gitweb.cgi?a=commitdiff_plain;h=00c025a001170979e99706ce746f75fcc615761d;p=thirdparty%2Fpostgresql.git Doc: split functions-posix-regexp section into multiple subsections. Create a section for each function that the previous text described in one long series of paragraphs. Also split the functions' previously in-line syntax summaries into clauses, which is more readable and allows us to sneak in an explicit mention of the result data type. This change gives us an opportunity to make cross-reference links more specific, too, so do that. Author: jian he Reviewed-by: Tom Lane Discussion: https://postgr.es/m/CACJufxFuk9P=P4=BZ=qCkgvo6im8aL8NnCkjxx2S2MQDWNdouw@mail.gmail.com --- diff --git a/doc/src/sgml/func/func-json.sgml b/doc/src/sgml/func/func-json.sgml index 1ec73cff464..839208c9c83 100644 --- a/doc/src/sgml/func/func-json.sgml +++ b/doc/src/sgml/func/func-json.sgml @@ -3149,7 +3149,7 @@ $[*] ? (@ like_regex "^[aeiou]" flag "i") LIKE_REGEX operator. Therefore, the like_regex filter is implemented using the POSIX regular expression engine described in - . This leads to various minor + . This leads to various minor discrepancies from standard SQL/JSON behavior, which are cataloged in . Note, however, that the flag-letter incompatibilities described there diff --git a/doc/src/sgml/func/func-matching.sgml b/doc/src/sgml/func/func-matching.sgml index b159137f93a..af60e9898de 100644 --- a/doc/src/sgml/func/func-matching.sgml +++ b/doc/src/sgml/func/func-matching.sgml @@ -417,36 +417,6 @@ substring('foobar' SIMILAR '#"o_b#"%' ESCAPE '#') NULLregular expression pattern matching - - substring - - - regexp_count - - - regexp_instr - - - regexp_like - - - regexp_match - - - regexp_matches - - - regexp_replace - - - regexp_split_to_table - - - regexp_split_to_array - - - regexp_substr - lists the available @@ -569,15 +539,34 @@ substring('foobar' SIMILAR '#"o_b#"%' ESCAPE '#') NULL The POSIX pattern language is described in much - greater detail below. + greater detail in . + + POSIX Regular Expression Functions + - The substring function with two parameters, - substring(string from - pattern), provides extraction of a - substring - that matches a POSIX regular expression pattern. It returns null if + This section describes the available functions for pattern matching + using POSIX regular expressions. + + + + <function>substring</function> + + substring + + + + The substring function with two parameters + provides extraction of a substring that matches a POSIX regular + expression pattern. It has the syntax: + +substring(string from pattern) text +substring(string, pattern) text + + (The syntax with from is SQL-standard, but + PostgreSQL also accepts a comma.) + It returns null if there is no match, otherwise the first portion of the text that matched the pattern. But if the pattern contains any parentheses, the portion of the text that matched the first parenthesized subexpression (the @@ -586,7 +575,7 @@ substring('foobar' SIMILAR '#"o_b#"%' ESCAPE '#') NULL. @@ -596,16 +585,21 @@ substring('foobar' FROM 'o.b') oob substring('foobar' FROM 'o(.)b') o + + + + <function>regexp_count</function> + + regexp_count + The regexp_count function counts the number of places where a POSIX regular expression pattern matches a string. - It has the syntax - regexp_count(string, - pattern - , start - , flags - ). + It has the syntax: + +regexp_count(string, pattern , start , flags ) integer + pattern is searched for in string, normally from the beginning of the string, but if the start parameter is @@ -625,20 +619,22 @@ regexp_count('ABCABCAXYaxy', 'A.') 3 regexp_count('ABCABCAXYaxy', 'A.', 1, 'i') 4 + + + + <function>regexp_instr</function> + + regexp_instr + The regexp_instr function returns the starting or ending position of the N'th match of a POSIX regular expression pattern to a string, or zero if there is no - such match. It has the syntax - regexp_instr(string, - pattern - , start - , N - , endoption - , flags - , subexpr - ). + such match. It has the syntax: + +regexp_instr(string, pattern , start , N , endoption , flags , subexpr ) integer + pattern is searched for in string, normally from the beginning of the string, but if the start parameter is @@ -674,14 +670,21 @@ regexp_instr(string=>'ABCDEFGHI', pattern=>'(c..)(...)', start=>1, "N"=>1, endop 6 + + + + <function>regexp_like</function> + + regexp_like + The regexp_like function checks whether a match of a POSIX regular expression pattern occurs within a string, - returning boolean true or false. It has the syntax - regexp_like(string, - pattern - , flags ). + returning boolean true or false. It has the syntax: + +regexp_like(string, pattern , flags ) boolean + The flags parameter is an optional text string containing zero or more single-letter flags that change the function's behavior. Supported flags are described @@ -699,13 +702,21 @@ regexp_like('Hello World', 'world') false regexp_like('Hello World', 'world', 'i') true + + + + <function>regexp_match</function> + + regexp_match + The regexp_match function returns a text array of matching substring(s) within the first match of a POSIX - regular expression pattern to a string. It has the syntax - regexp_match(string, - pattern , flags ). + regular expression pattern to a string. It has the syntax: + +regexp_match(string, pattern , flags ) text[] + If there is no match, the result is NULL. If a match is found, and the pattern contains no parenthesized subexpressions, then the result is a single-element text @@ -715,7 +726,7 @@ regexp_like('Hello World', 'world', 'i') true whose n'th element is the substring matching the n'th parenthesized subexpression of the pattern (not counting non-capturing - parentheses; see below for details). + parentheses; see for details). The flags parameter is an optional text string containing zero or more single-letter flags that change the function's behavior. Supported flags are described @@ -757,12 +768,23 @@ SELECT (regexp_match('foobarbequebaz', 'bar.*que'))[1]; + + + + <function>regexp_matches</function> + + regexp_matches + The regexp_matches function returns a set of text arrays of matching substring(s) within matches of a POSIX regular - expression pattern to a string. It has the same syntax as - regexp_match. + expression pattern to a string. It has the syntax: + +regexp_matches(string, pattern , flags ) setof text[] + + The parameters are the same as + for regexp_match. This function returns no rows if there is no match, one row if there is a match and the g flag is not given, or N rows if there are N matches and the g flag @@ -811,20 +833,22 @@ SELECT col1, (SELECT regexp_matches(col2, '(bar)(beque)')) FROM tab; without a match, which is typically not the desired behavior. + + + + <function>regexp_replace</function> + + regexp_replace + The regexp_replace function provides substitution of new text for substrings that match POSIX regular expression patterns. - It has the syntax - regexp_replace(string, - pattern, replacement - , flags ) - or - regexp_replace(string, - pattern, replacement, - start - , N - , flags ). + It has the syntax: + +regexp_replace(string, pattern, replacement , flags ) text +regexp_replace(string, pattern, replacement, start , N , flags ) text + The source string is returned unchanged if there is no match to the pattern. If there is a match, the string is returned with the @@ -872,12 +896,20 @@ regexp_replace(string=>'A PostgreSQL function', pattern=>'a|e|i|o|u', replacemen A PostgrXSQL function + + + + <function>regexp_split_to_table</function> + + regexp_split_to_table + The regexp_split_to_table function splits a string using a POSIX - regular expression pattern as a delimiter. It has the syntax - regexp_split_to_table(string, pattern - , flags ). + regular expression pattern as a delimiter. It has the syntax: + +regexp_split_to_table(string, pattern , flags ) setof text + If there is no match to the pattern, the function returns the string. If there is at least one match, for each match it returns the text from the end of the last match (or the beginning of the string) @@ -889,15 +921,6 @@ regexp_replace(string=>'A PostgreSQL function', pattern=>'a|e|i|o|u', replacemen . - - The regexp_split_to_array function behaves the same as - regexp_split_to_table, except that regexp_split_to_array - returns its result as an array of text. It has the syntax - regexp_split_to_array(string, pattern - , flags ). - The parameters are the same as for regexp_split_to_table. - - Some examples: @@ -915,12 +938,6 @@ SELECT foo FROM regexp_split_to_table('the quick brown fox jumps over the lazy d dog (9 rows) -SELECT regexp_split_to_array('the quick brown fox jumps over the lazy dog', '\s+'); - regexp_split_to_array ------------------------------------------------ - {the,quick,brown,fox,jumps,over,the,lazy,dog} -(1 row) - SELECT foo FROM regexp_split_to_table('the quick brown fox', '\s*') AS foo; foo ----- @@ -945,25 +962,61 @@ SELECT foo FROM regexp_split_to_table('the quick brown fox', '\s*') AS foo; - As the last example demonstrates, the regexp split functions ignore + As the last example demonstrates, + regexp_split_to_table ignores zero-length matches that occur at the start or end of the string or immediately after a previous match. This is contrary to the strict definition of regexp matching that is implemented by the other regexp functions, but is usually the most convenient behavior in practice. Other software systems such as Perl use similar definitions. + + + + <function>regexp_split_to_array</function> + + regexp_split_to_array + + + + The regexp_split_to_array function behaves the + same as + regexp_split_to_table, + except that regexp_split_to_array returns its + result as an array of text rather than a set. It has + the syntax: + +regexp_split_to_array(string, pattern , flags ) text[] + + The parameters are the same as + for regexp_split_to_table. + + + + An example: + +SELECT regexp_split_to_array('the quick brown fox jumps over the lazy dog', '\s+'); + regexp_split_to_array +----------------------------------------------- + {the,quick,brown,fox,jumps,over,the,lazy,dog} +(1 row) + + + + + + <function>regexp_substr</function> + + regexp_substr + The regexp_substr function returns the substring that matches a POSIX regular expression pattern, - or NULL if there is no match. It has the syntax - regexp_substr(string, - pattern - , start - , N - , flags - , subexpr - ). + or NULL if there is no match. It has the syntax: + +regexp_substr(string, pattern , start , N , flags , subexpr ) text + pattern is searched for in string, normally from the beginning of the string, but if the start parameter is @@ -993,11 +1046,13 @@ regexp_substr('ABCDEFGHI', '(c..)(...)', 1, 1, 'i', 2) FGH + + - Regular Expression Details + POSIX Regular Expression Details PostgreSQL's regular expressions are implemented diff --git a/doc/src/sgml/func/func-string.sgml b/doc/src/sgml/func/func-string.sgml index 7ad1436e5f8..0786573d7be 100644 --- a/doc/src/sgml/func/func-string.sgml +++ b/doc/src/sgml/func/func-string.sgml @@ -431,7 +431,7 @@ Extracts the first substring matching POSIX regular expression; see - . + . substring('Thomas' FROM '...$') @@ -961,7 +961,7 @@ Returns the number of times the POSIX regular expression pattern matches in the string; see - . + . regexp_count('123456789012', '\d\d\d', 2) @@ -986,7 +986,7 @@ Returns the position within string where the N'th match of the POSIX regular expression pattern occurs, or zero if there is - no such match; see . + no such match; see . regexp_instr('ABCDEF', 'c(.)(..)', 1, 1, 0, 'i') @@ -1011,7 +1011,7 @@ Checks whether a match of the POSIX regular expression pattern occurs within string; see - . + . regexp_like('Hello World', 'world$', 'i') @@ -1031,7 +1031,7 @@ Returns substrings within the first match of the POSIX regular expression pattern to the string; see - . + . regexp_match('foobarbequebaz', '(bar)(beque)') @@ -1052,7 +1052,7 @@ expression pattern to the string, or substrings within all such matches if the g flag is used; - see . + see . regexp_matches('foobarbequebaz', 'ba.', 'g') @@ -1077,7 +1077,7 @@ Replaces the substring that is the first match to the POSIX regular expression pattern, or all such matches if the g flag is used; see - . + . regexp_replace('Thomas', '.[mN]a.', 'M') @@ -1100,7 +1100,7 @@ search beginning at the start'th character of string. If N is omitted, it defaults to 1. See - . + . regexp_replace('Thomas', '.', 'X', 3, 2) @@ -1123,7 +1123,7 @@ Splits string using a POSIX regular expression as the delimiter, producing an array of results; see - . + . regexp_split_to_array('hello world', '\s+') @@ -1142,7 +1142,7 @@ Splits string using a POSIX regular expression as the delimiter, producing a set of results; see - . + . regexp_split_to_table('hello world', '\s+') @@ -1171,7 +1171,7 @@ matches the N'th occurrence of the POSIX regular expression pattern, or NULL if there is no such match; see - . + . regexp_substr('ABCDEF', 'c(.)(..)', 1, 1, 'i') diff --git a/doc/src/sgml/ref/psql-ref.sgml b/doc/src/sgml/ref/psql-ref.sgml index 18ba22b40d6..7c05afd4719 100644 --- a/doc/src/sgml/ref/psql-ref.sgml +++ b/doc/src/sgml/ref/psql-ref.sgml @@ -4131,7 +4131,7 @@ SELECT 1\; SELECT 2\; SELECT 3; Advanced users can use regular-expression notations such as character classes, for example [0-9] to match any digit. All regular expression special characters work as specified in - , except for . which + , except for . which is taken as a separator as mentioned above, * which is translated to the regular-expression notation .*, ? which is translated to ., and