.. _regex-howto:
****************************
- Regular Expression HOWTO
+ Regular expression HOWTO
****************************
:Author: A.M. Kuchling <amk@amk.ca>
elaborate regular expression, it will also probably be more understandable.
-Simple Patterns
+Simple patterns
===============
We'll start by learning about the simplest possible regular expressions. Since
to almost any textbook on writing compilers.
-Matching Characters
+Matching characters
-------------------
Most letters and characters will simply match themselves. For example, the
character".
-Repeating Things
+Repeating things
----------------
Being able to match varying sets of characters is the first thing regular
| | | ``[bcd]*`` is only matching |
| | | ``bc``. |
+------+-----------+---------------------------------+
-| 6 | ``abcb`` | Try ``b`` again. This time |
+| 7 | ``abcb`` | Try ``b`` again. This time |
| | | the character at the |
| | | current position is ``'b'``, so |
| | | it succeeds. |
to read.
-Using Regular Expressions
+Using regular expressions
=========================
Now that we've looked at some simple regular expressions, how do we actually use
matches with them.
-Compiling Regular Expressions
+Compiling regular expressions
-----------------------------
Regular expressions are compiled into pattern objects, which have
.. _the-backslash-plague:
-The Backslash Plague
+The backslash plague
--------------------
As stated earlier, regular expressions use the backslash character (``'\'``) to
In addition, special escape sequences that are valid in regular expressions,
but not valid as Python string literals, now result in a
-:exc:`DeprecationWarning` and will eventually become a :exc:`SyntaxError`,
+:exc:`SyntaxWarning` and will eventually become a :exc:`SyntaxError`,
which means the sequences will be invalid if raw string notation or escaping
the backslashes isn't used.
+-------------------+------------------+
-Performing Matches
+Performing matches
------------------
Once you have an object representing a compiled regular expression, what do you
| | location where this RE matches. |
+------------------+-----------------------------------------------+
| ``findall()`` | Find all substrings where the RE matches, and |
-| | returns them as a list. |
+| | return them as a list. |
+------------------+-----------------------------------------------+
| ``finditer()`` | Find all substrings where the RE matches, and |
-| | returns them as an :term:`iterator`. |
+| | return them as an :term:`iterator`. |
+------------------+-----------------------------------------------+
:meth:`~re.Pattern.match` and :meth:`~re.Pattern.search` return ``None`` if no match can be found. If
The ``r`` prefix, making the literal a raw string literal, is needed in this
example because escape sequences in a normal "cooked" string literal that are
not recognized by Python, as opposed to regular expressions, now result in a
-:exc:`DeprecationWarning` and will eventually become a :exc:`SyntaxError`. See
+:exc:`SyntaxWarning` and will eventually become a :exc:`SyntaxError`. See
:ref:`the-backslash-plague`.
:meth:`~re.Pattern.findall` has to create the entire list before it can be returned as the
(29, 31)
-Module-Level Functions
+Module-level functions
----------------------
You don't have to create a pattern object and call its methods; the
cache.
-Compilation Flags
+Compilation flags
-----------------
.. currentmodule:: re
whitespace is in a character class or preceded by an unescaped backslash; this
lets you organize and indent the RE more clearly. This flag also lets you put
comments within a RE that will be ignored by the engine; comments are marked by
- a ``'#'`` that's neither in a character class or preceded by an unescaped
+ a ``'#'`` that's neither in a character class nor preceded by an unescaped
backslash.
For example, here's a RE that uses :const:`re.VERBOSE`; see how much easier it
to understand than the version using :const:`re.VERBOSE`.
-More Pattern Power
+More pattern power
==================
So far we've only covered a part of the features of regular expressions. In
.. _more-metacharacters:
-More Metacharacters
+More metacharacters
-------------------
There are some metacharacters that we haven't covered yet. Most of them will be
find out that they're *very* useful when performing string substitutions.
-Non-capturing and Named Groups
+Non-capturing and named groups
------------------------------
Elaborate REs may use many groups, both to capture substrings of interest, and
'the the'
-Lookahead Assertions
+Lookahead assertions
--------------------
Another zero-width assertion is the lookahead assertion. Lookahead assertions
``.*[.](?!bat$|exe$)[^.]*$``
-Modifying Strings
+Modifying strings
=================
Up to this point, we've simply performed searches against a static string.
+------------------+-----------------------------------------------+
-Splitting Strings
+Splitting strings
-----------------
The :meth:`~re.Pattern.split` method of a pattern splits a string apart
['Words', 'words, words.']
-Search and Replace
+Search and replace
------------------
Another common task is to find all the matches for a pattern, and replace them
pattern string, e.g. ``sub("(?i)b+", "x", "bbbb BBBB")`` returns ``'x x'``.
-Common Problems
+Common problems
===============
Regular expressions are a powerful tool for some applications, but in some ways
expect them to. This section will point out some of the most common pitfalls.
-Use String Methods
+Use string methods
------------------
Sometimes using the :mod:`re` module is a mistake. If you're matching a fixed
:func:`re.search` instead.
-Greedy versus Non-Greedy
+Greedy versus non-greedy
------------------------
When repeating a regular expression, as in ``a*``, the resulting action is to
========
Regular expressions are a complicated topic. Did this document help you
-understand them? Were there parts that were unclear, or Problems you
+understand them? Were there parts that were unclear, or problems you
encountered that weren't covered here? If so, please send suggestions for
-improvements to the author.
+improvements to the :ref:`issue tracker <using-the-tracker>`.
The most complete book on regular expressions is almost certainly Jeffrey
Friedl's Mastering Regular Expressions, published by O'Reilly. Unfortunately,