<primary>regular expression</primary>
<seealso>pattern matching</seealso>
</indexterm>
- <indexterm>
- <primary>substring</primary>
- </indexterm>
- <indexterm>
- <primary>regexp_count</primary>
- </indexterm>
- <indexterm>
- <primary>regexp_instr</primary>
- </indexterm>
- <indexterm>
- <primary>regexp_like</primary>
- </indexterm>
- <indexterm>
- <primary>regexp_match</primary>
- </indexterm>
- <indexterm>
- <primary>regexp_matches</primary>
- </indexterm>
- <indexterm>
- <primary>regexp_replace</primary>
- </indexterm>
- <indexterm>
- <primary>regexp_split_to_table</primary>
- </indexterm>
- <indexterm>
- <primary>regexp_split_to_array</primary>
- </indexterm>
- <indexterm>
- <primary>regexp_substr</primary>
- </indexterm>
<para>
<xref linkend="functions-posix-table"/> lists the available
<para>
The <acronym>POSIX</acronym> pattern language is described in much
- greater detail below.
+ greater detail in <xref linkend="posix-syntax-details"/>.
</para>
+ <sect3 id="functions-posix-list">
+ <title>POSIX Regular Expression Functions</title>
+
<para>
- The <function>substring</function> function with two parameters,
- <function>substring(<replaceable>string</replaceable> from
- <replaceable>pattern</replaceable>)</function>, provides extraction of a
- substring
- that matches a POSIX regular expression pattern. It returns null if
+ This section describes the available functions for pattern matching
+ using POSIX regular expressions.
+ </para>
+
+ <sect4 id="functions-posix-substring">
+ <title><function>substring</function></title>
+ <indexterm>
+ <primary>substring</primary>
+ </indexterm>
+
+ <para>
+ The <function>substring</function> function with two parameters
+ provides extraction of a substring that matches a POSIX regular
+ expression pattern. It has the syntax:
+<synopsis>
+substring(<replaceable>string</replaceable> from <replaceable>pattern</replaceable>) <returnvalue>text</returnvalue>
+substring(<replaceable>string</replaceable>, <replaceable>pattern</replaceable>) <returnvalue>text</returnvalue>
+</synopsis>
+ (The syntax with <literal>from</literal> is SQL-standard, but
+ <productname>PostgreSQL</productname> also accepts a comma.)
+ It returns null if
there is no match, otherwise the first portion of the text that matched the
pattern. But if the pattern contains any parentheses, the portion
of the text that matched the first parenthesized subexpression (the
if you want to use parentheses within it without triggering this
exception. If you need parentheses in the pattern before the
subexpression you want to extract, see the non-capturing parentheses
- described below.
+ described in <xref linkend="posix-atoms-table"/>.
</para>
<para>
substring('foobar' FROM 'o(.)b') <lineannotation>o</lineannotation>
</programlisting>
</para>
+ </sect4>
+
+ <sect4 id="functions-posix-regexp-count">
+ <title><function>regexp_count</function></title>
+ <indexterm>
+ <primary>regexp_count</primary>
+ </indexterm>
<para>
The <function>regexp_count</function> function counts the number of
places where a POSIX regular expression pattern matches a string.
- It has the syntax
- <function>regexp_count</function>(<replaceable>string</replaceable>,
- <replaceable>pattern</replaceable>
- <optional>, <replaceable>start</replaceable>
- <optional>, <replaceable>flags</replaceable>
- </optional></optional>).
+ It has the syntax:
+<synopsis>
+regexp_count(<replaceable>string</replaceable>, <replaceable>pattern</replaceable> <optional>, <replaceable>start</replaceable> <optional>, <replaceable>flags</replaceable> </optional></optional>) <returnvalue>integer</returnvalue>
+</synopsis>
<replaceable>pattern</replaceable> is searched for
in <replaceable>string</replaceable>, normally from the beginning of
the string, but if the <replaceable>start</replaceable> parameter is
regexp_count('ABCABCAXYaxy', 'A.', 1, 'i') <lineannotation>4</lineannotation>
</programlisting>
</para>
+ </sect4>
+
+ <sect4 id="functions-posix-regexp-instr">
+ <title><function>regexp_instr</function></title>
+ <indexterm>
+ <primary>regexp_instr</primary>
+ </indexterm>
<para>
The <function>regexp_instr</function> function returns the starting or
ending position of the <replaceable>N</replaceable>'th match of a
POSIX regular expression pattern to a string, or zero if there is no
- such match. It has the syntax
- <function>regexp_instr</function>(<replaceable>string</replaceable>,
- <replaceable>pattern</replaceable>
- <optional>, <replaceable>start</replaceable>
- <optional>, <replaceable>N</replaceable>
- <optional>, <replaceable>endoption</replaceable>
- <optional>, <replaceable>flags</replaceable>
- <optional>, <replaceable>subexpr</replaceable>
- </optional></optional></optional></optional></optional>).
+ such match. It has the syntax:
+<synopsis>
+regexp_instr(<replaceable>string</replaceable>, <replaceable>pattern</replaceable> <optional>, <replaceable>start</replaceable> <optional>, <replaceable>N</replaceable> <optional>, <replaceable>endoption</replaceable> <optional>, <replaceable>flags</replaceable> <optional>, <replaceable>subexpr</replaceable> </optional></optional></optional></optional></optional>) <returnvalue>integer</returnvalue>
+</synopsis>
<replaceable>pattern</replaceable> is searched for
in <replaceable>string</replaceable>, normally from the beginning of
the string, but if the <replaceable>start</replaceable> parameter is
<lineannotation>6</lineannotation>
</programlisting>
</para>
+ </sect4>
+
+ <sect4 id="functions-posix-regexp-like">
+ <title><function>regexp_like</function></title>
+ <indexterm>
+ <primary>regexp_like</primary>
+ </indexterm>
<para>
The <function>regexp_like</function> function checks whether a match
of a POSIX regular expression pattern occurs within a string,
- returning boolean true or false. It has the syntax
- <function>regexp_like</function>(<replaceable>string</replaceable>,
- <replaceable>pattern</replaceable>
- <optional>, <replaceable>flags</replaceable> </optional>).
+ returning boolean true or false. It has the syntax:
+<synopsis>
+regexp_like(<replaceable>string</replaceable>, <replaceable>pattern</replaceable> <optional>, <replaceable>flags</replaceable> </optional>) <returnvalue>boolean</returnvalue>
+</synopsis>
The <replaceable>flags</replaceable> parameter is an optional text
string containing zero or more single-letter flags that change the
function's behavior. Supported flags are described
regexp_like('Hello World', 'world', 'i') <lineannotation>true</lineannotation>
</programlisting>
</para>
+ </sect4>
+
+ <sect4 id="functions-posix-regexp-match">
+ <title><function>regexp_match</function></title>
+ <indexterm>
+ <primary>regexp_match</primary>
+ </indexterm>
<para>
The <function>regexp_match</function> function returns a text array of
matching substring(s) within the first match of a POSIX
- regular expression pattern to a string. It has the syntax
- <function>regexp_match</function>(<replaceable>string</replaceable>,
- <replaceable>pattern</replaceable> <optional>, <replaceable>flags</replaceable> </optional>).
+ regular expression pattern to a string. It has the syntax:
+<synopsis>
+regexp_match(<replaceable>string</replaceable>, <replaceable>pattern</replaceable> <optional>, <replaceable>flags</replaceable> </optional>) <returnvalue>text[]</returnvalue>
+</synopsis>
If there is no match, the result is <literal>NULL</literal>.
If a match is found, and the <replaceable>pattern</replaceable> contains no
parenthesized subexpressions, then the result is a single-element text
whose <replaceable>n</replaceable>'th element is the substring matching
the <replaceable>n</replaceable>'th parenthesized subexpression of
the <replaceable>pattern</replaceable> (not counting <quote>non-capturing</quote>
- parentheses; see below for details).
+ parentheses; see <xref linkend="posix-atoms-table"/> for details).
The <replaceable>flags</replaceable> parameter is an optional text string
containing zero or more single-letter flags that change the function's
behavior. Supported flags are described
</programlisting>
</para>
</tip>
+ </sect4>
+
+ <sect4 id="functions-posix-regexp-matches">
+ <title><function>regexp_matches</function></title>
+ <indexterm>
+ <primary>regexp_matches</primary>
+ </indexterm>
<para>
The <function>regexp_matches</function> function returns a set of text arrays
of matching substring(s) within matches of a POSIX regular
- expression pattern to a string. It has the same syntax as
- <function>regexp_match</function>.
+ expression pattern to a string. It has the syntax:
+<synopsis>
+regexp_matches(<replaceable>string</replaceable>, <replaceable>pattern</replaceable> <optional>, <replaceable>flags</replaceable> </optional>) <returnvalue>setof text[]</returnvalue>
+</synopsis>
+ The parameters are the same as
+ for <link linkend="functions-posix-regexp-match">regexp_match</link>.
This function returns no rows if there is no match, one row if there is
a match and the <literal>g</literal> flag is not given, or <replaceable>N</replaceable>
rows if there are <replaceable>N</replaceable> matches and the <literal>g</literal> flag
without a match, which is typically not the desired behavior.
</para>
</tip>
+ </sect4>
+
+ <sect4 id="functions-posix-regexp-replace">
+ <title><function>regexp_replace</function></title>
+ <indexterm>
+ <primary>regexp_replace</primary>
+ </indexterm>
<para>
The <function>regexp_replace</function> function provides substitution of
new text for substrings that match POSIX regular expression patterns.
- It has the syntax
- <function>regexp_replace</function>(<replaceable>string</replaceable>,
- <replaceable>pattern</replaceable>, <replaceable>replacement</replaceable>
- <optional>, <replaceable>flags</replaceable> </optional>)
- or
- <function>regexp_replace</function>(<replaceable>string</replaceable>,
- <replaceable>pattern</replaceable>, <replaceable>replacement</replaceable>,
- <replaceable>start</replaceable>
- <optional>, <replaceable>N</replaceable>
- <optional>, <replaceable>flags</replaceable> </optional></optional>).
+ It has the syntax:
+<synopsis>
+regexp_replace(<replaceable>string</replaceable>, <replaceable>pattern</replaceable>, <replaceable>replacement</replaceable> <optional>, <replaceable>flags</replaceable> </optional>) <returnvalue>text</returnvalue>
+regexp_replace(<replaceable>string</replaceable>, <replaceable>pattern</replaceable>, <replaceable>replacement</replaceable>, <replaceable>start</replaceable> <optional>, <replaceable>N </replaceable><optional>, <replaceable>flags</replaceable> </optional></optional>) <returnvalue>text</returnvalue>
+</synopsis>
The source <replaceable>string</replaceable> is returned unchanged if
there is no match to the <replaceable>pattern</replaceable>. If there is a
match, the <replaceable>string</replaceable> is returned with the
<lineannotation>A PostgrXSQL function</lineannotation>
</programlisting>
</para>
+ </sect4>
+
+ <sect4 id="functions-posix-regexp-split-to-table">
+ <title><function>regexp_split_to_table</function></title>
+ <indexterm>
+ <primary>regexp_split_to_table</primary>
+ </indexterm>
<para>
The <function>regexp_split_to_table</function> function splits a string using a POSIX
- regular expression pattern as a delimiter. It has the syntax
- <function>regexp_split_to_table</function>(<replaceable>string</replaceable>, <replaceable>pattern</replaceable>
- <optional>, <replaceable>flags</replaceable> </optional>).
+ regular expression pattern as a delimiter. It has the syntax:
+<synopsis>
+regexp_split_to_table(<replaceable>string</replaceable>, <replaceable>pattern</replaceable> <optional>, <replaceable>flags</replaceable> </optional>) <returnvalue>setof text</returnvalue>
+</synopsis>
If there is no match to the <replaceable>pattern</replaceable>, the function returns the
<replaceable>string</replaceable>. If there is at least one match, for each match it returns
the text from the end of the last match (or the beginning of the string)
<xref linkend="posix-embedded-options-table"/>.
</para>
- <para>
- The <function>regexp_split_to_array</function> function behaves the same as
- <function>regexp_split_to_table</function>, except that <function>regexp_split_to_array</function>
- returns its result as an array of <type>text</type>. It has the syntax
- <function>regexp_split_to_array</function>(<replaceable>string</replaceable>, <replaceable>pattern</replaceable>
- <optional>, <replaceable>flags</replaceable> </optional>).
- The parameters are the same as for <function>regexp_split_to_table</function>.
- </para>
-
<para>
Some examples:
<programlisting>
dog
(9 rows)
-SELECT regexp_split_to_array('the quick brown fox jumps over the lazy dog', '\s+');
- regexp_split_to_array
------------------------------------------------
- {the,quick,brown,fox,jumps,over,the,lazy,dog}
-(1 row)
-
SELECT foo FROM regexp_split_to_table('the quick brown fox', '\s*') AS foo;
foo
-----
</para>
<para>
- As the last example demonstrates, the regexp split functions ignore
+ As the last example demonstrates,
+ <function>regexp_split_to_table</function> ignores
zero-length matches that occur at the start or end of the string
or immediately after a previous match. This is contrary to the strict
definition of regexp matching that is implemented by
the other regexp functions, but is usually the most convenient behavior
in practice. Other software systems such as Perl use similar definitions.
</para>
+ </sect4>
+
+ <sect4 id="functions-posix-regexp-split-to-array">
+ <title><function>regexp_split_to_array</function></title>
+ <indexterm>
+ <primary>regexp_split_to_array</primary>
+ </indexterm>
+
+ <para>
+ The <function>regexp_split_to_array</function> function behaves the
+ same as
+ <link linkend="functions-posix-regexp-split-to-table">regexp_split_to_table</link>,
+ except that <function>regexp_split_to_array</function> returns its
+ result as an array of <type>text</type> rather than a set. It has
+ the syntax:
+<synopsis>
+regexp_split_to_array(<replaceable>string</replaceable>, <replaceable>pattern</replaceable> <optional>, <replaceable>flags</replaceable> </optional>) <returnvalue>text[]</returnvalue>
+</synopsis>
+ The parameters are the same as
+ for <function>regexp_split_to_table</function>.
+ </para>
+
+ <para>
+ An example:
+<programlisting>
+SELECT regexp_split_to_array('the quick brown fox jumps over the lazy dog', '\s+');
+ regexp_split_to_array
+-----------------------------------------------
+ {the,quick,brown,fox,jumps,over,the,lazy,dog}
+(1 row)
+</programlisting>
+</para>
+ </sect4>
+
+ <sect4 id="functions-posix-regexp-substr">
+ <title><function>regexp_substr</function></title>
+ <indexterm>
+ <primary>regexp_substr</primary>
+ </indexterm>
<para>
The <function>regexp_substr</function> function returns the substring
that matches a POSIX regular expression pattern,
- or <literal>NULL</literal> if there is no match. It has the syntax
- <function>regexp_substr</function>(<replaceable>string</replaceable>,
- <replaceable>pattern</replaceable>
- <optional>, <replaceable>start</replaceable>
- <optional>, <replaceable>N</replaceable>
- <optional>, <replaceable>flags</replaceable>
- <optional>, <replaceable>subexpr</replaceable>
- </optional></optional></optional></optional>).
+ or <literal>NULL</literal> if there is no match. It has the syntax:
+<synopsis>
+regexp_substr(<replaceable>string</replaceable>, <replaceable>pattern</replaceable> <optional>, <replaceable>start</replaceable> <optional>, <replaceable>N</replaceable> <optional>, <replaceable>flags</replaceable> <optional>, <replaceable>subexpr</replaceable> </optional></optional></optional></optional>) <returnvalue>text</returnvalue>
+</synopsis>
<replaceable>pattern</replaceable> is searched for
in <replaceable>string</replaceable>, normally from the beginning of
the string, but if the <replaceable>start</replaceable> parameter is
<lineannotation>FGH</lineannotation>
</programlisting>
</para>
+ </sect4>
+ </sect3>
<!-- derived from the re_syntax.n man page -->
<sect3 id="posix-syntax-details">
- <title>Regular Expression Details</title>
+ <title>POSIX Regular Expression Details</title>
<para>
<productname>PostgreSQL</productname>'s regular expressions are implemented
</para>
<para>
Extracts the first substring matching POSIX regular expression; see
- <xref linkend="functions-posix-regexp"/>.
+ <xref linkend="functions-posix-substring"/>.
</para>
<para>
<literal>substring('Thomas' FROM '...$')</literal>
Returns the number of times the POSIX regular
expression <parameter>pattern</parameter> matches in
the <parameter>string</parameter>; see
- <xref linkend="functions-posix-regexp"/>.
+ <xref linkend="functions-posix-regexp-count"/>.
</para>
<para>
<literal>regexp_count('123456789012', '\d\d\d', 2)</literal>
Returns the position within <parameter>string</parameter> where
the <parameter>N</parameter>'th match of the POSIX regular
expression <parameter>pattern</parameter> occurs, or zero if there is
- no such match; see <xref linkend="functions-posix-regexp"/>.
+ no such match; see <xref linkend="functions-posix-regexp-instr"/>.
</para>
<para>
<literal>regexp_instr('ABCDEF', 'c(.)(..)', 1, 1, 0, 'i')</literal>
Checks whether a match of the POSIX regular
expression <parameter>pattern</parameter> occurs
within <parameter>string</parameter>; see
- <xref linkend="functions-posix-regexp"/>.
+ <xref linkend="functions-posix-regexp-like"/>.
</para>
<para>
<literal>regexp_like('Hello World', 'world$', 'i')</literal>
Returns substrings within the first match of the POSIX regular
expression <parameter>pattern</parameter> to
the <parameter>string</parameter>; see
- <xref linkend="functions-posix-regexp"/>.
+ <xref linkend="functions-posix-regexp-match"/>.
</para>
<para>
<literal>regexp_match('foobarbequebaz', '(bar)(beque)')</literal>
expression <parameter>pattern</parameter> to
the <parameter>string</parameter>, or substrings within all
such matches if the <literal>g</literal> flag is used;
- see <xref linkend="functions-posix-regexp"/>.
+ see <xref linkend="functions-posix-regexp-matches"/>.
</para>
<para>
<literal>regexp_matches('foobarbequebaz', 'ba.', 'g')</literal>
Replaces the substring that is the first match to the POSIX
regular expression <parameter>pattern</parameter>, or all such
matches if the <literal>g</literal> flag is used; see
- <xref linkend="functions-posix-regexp"/>.
+ <xref linkend="functions-posix-regexp-replace"/>.
</para>
<para>
<literal>regexp_replace('Thomas', '.[mN]a.', 'M')</literal>
search beginning at the <parameter>start</parameter>'th character
of <parameter>string</parameter>. If <parameter>N</parameter> is
omitted, it defaults to 1. See
- <xref linkend="functions-posix-regexp"/>.
+ <xref linkend="functions-posix-regexp-replace"/>.
</para>
<para>
<literal>regexp_replace('Thomas', '.', 'X', 3, 2)</literal>
<para>
Splits <parameter>string</parameter> using a POSIX regular
expression as the delimiter, producing an array of results; see
- <xref linkend="functions-posix-regexp"/>.
+ <xref linkend="functions-posix-regexp-split-to-array"/>.
</para>
<para>
<literal>regexp_split_to_array('hello world', '\s+')</literal>
<para>
Splits <parameter>string</parameter> using a POSIX regular
expression as the delimiter, producing a set of results; see
- <xref linkend="functions-posix-regexp"/>.
+ <xref linkend="functions-posix-regexp-split-to-table"/>.
</para>
<para>
<literal>regexp_split_to_table('hello world', '\s+')</literal>
matches the <parameter>N</parameter>'th occurrence of the POSIX
regular expression <parameter>pattern</parameter>,
or <literal>NULL</literal> if there is no such match; see
- <xref linkend="functions-posix-regexp"/>.
+ <xref linkend="functions-posix-regexp-substr"/>.
</para>
<para>
<literal>regexp_substr('ABCDEF', 'c(.)(..)', 1, 1, 'i')</literal>