Popular Examples

Python RegEx

In this tutorial, you will learn about Python RegEx with the help of examples.

In Python, Regular Expression (RegEx) are patterns used to match character combinations in strings. For example,

^s...e$

Here, we have defined a RegEx pattern. The pattern is: any five letter string starting with s and ending with e.

The RegEx pattern ^s...e$ can be used to match against strings:

sense - match
shade - match
seize - match
Sense - no match
science - no match
swift - no match

Example: Python RegEx

To work with RegEx in Python, we first need to import a module named re.

Let's see an example,

# import re module
import re

# regex pattern
pattern = '^s...e$'

# test string
string1 = 'shade'
string2 = 'science'

# use re.match() to match pattern
result1 = re.match(pattern, string1)
result2 = re.match(pattern, string2)

# print boolean value
print('shade:', bool(result1))  # True
print('science:', bool(result2))  # False

# Output: shade: True
                  science: False

In the above example, we first imported a module named re and used the re.match() function to search for the pattern.

Here, re.match() take two parameters:

pattern - the regular expression to be matched
string1 / string2 - the string in which the pattern is checked

The pattern ^s...e$ means any five letter string starting with s and ending with e. Since,

'shade' - matches the pattern, bool() returns True
'science' - does not match the pattern, bool() returns False

MetaCharacters in Python Regular Expression

The characters that are interpreted in a special way by a RegEx engine are metacharacters.

Here's a list of metacharacters with a short description:

Metacharacter	Description
[ ]	specifies a set of characters we wish to match
.	matches any single character
^	checks if a string starts with a certain character
$	checks if a string ends with a certain character
*	matches zero or more occurrences of the pattern left to it
+	matches one or more occurrences of the pattern left to it
?	matches zero or one occurrence of the pattern left to it
( )	groups sub-patterns
\	used to escape various characters including all metacharacters
\|	used for alternation (or operator)

MetaCharacters Examples:

[ ] - Square Brackets

Expression	String	Match?
[xyz]	x	1 match
	hey	1 match
	hello	No match
	proxy	2 matches

Here, [xyz] will match if the string you are trying to match contains any of the x, y, or z.

We can also specify a range of characters using - inside square brackets.

For example, [w-z] is the same as [wxyz] and similarly [1-4] is the same as [1234].

. - Period

Expression	String	Match?
...	hey	1 match
	python	2 matches (contains 3 characters)
	a	No match
	sa	No match

We can see that . matches any single character (except newline '\n').

^ - Caret

Expression	String	Match?
^s	s	1 match
	swift	1 match
	tsunami	No match
	case	No match

Here, ^ is used to check if a string starts with a certain character.

$ - Dollar

Expression	String	Match?
$s	s	1 match
	kicks	1 match
	sick	No match
	case	No match

Above, $ checks if a string ends with a certain character or not.

* - Star

Expression	String	Match?
hel*o	heo	1 match
	hello	1 match
	hola	No match (not ending with o)
	hell	No match

Here, * matches zero or more occurrences of the pattern left to it.

+ - Plus

Expression	String	Match?
hel+o	helo	1 match
	hellllo	1 match
	hola	No match
	heo	No match (zero occurrence)

We can see above that + matches one or more occurrences of the pattern left to it.

? - Question Mark

Expression	String	Match?
hel+o	heo	1 match (zero occurrence)
	helo	1 match (one occurrence)
	sayhelo	1 match
	hello	No match (more than one occurrences)

Here, ? matches zero or one occurrences of the pattern left to it.

| - Alternation

Expression	String	Match?
s\|a	cat	1 match (a in cat)
	case	2 matches (a and s both in case)
	lit	No match
	red	No match

Here, s|a match any string that contains either s or a

() - Group

Expression	String	Match?
(c\|l\|t)an	can	1 match (a in cat)
	lan	1 match
	tan	1 match
	caan	No match

In the above example, (c|l|t)an matches any string that matches either c or l or t followed by an.

Python Special Sequences

A special sequence is \ followed by a special character which makes commonly used patterns easier to write.

Here's a list of special sequence with a short description:

Special Sequence	Description
\A	matches if the specified characters are at the start of a string
\b	matches if the specified characters are at the beginning or end of a word
\B	matches if the specified characters are not at the beginning or end of a word
\d	matches any decimal digit
\D	matches any non-decimal digit
\s	matches where a string contains any whitespace character
\S	matches where a string contains any non-whitespace character
\w	matches any alphanumeric character
\W	matches any non-alphanumeric character
\Z	matches if the specified characters are at the end of a string

Special Sequence Examples:

\A

Expression	String	Match?
\Aan	an ocean	Match
	at sea	No match

Here, \A matches if an is at the start of a string or not.

\b

Expression	String	Match?
\bdis	diss track	Match
	a disco	Match
	adisco	No Match
nt\b	bent	Match
	aunt	Match
	act	No Match

We can see that \b matches if the specified characters

\bdis - are at the beginning of a word or not
nt\b- are at the end of word or not

\B

Expression	String	Match?
\Bdis	diss track	No Match
	a disco	No Match
	adisco	Match
nt\B	bent	No Match
	aunt	No Match
	ant	Match

We can see that \B is opposite of \b. That is, it matches if the specified characters are not at the beginning or end of a word.

\d

Expression	String	Match?
\d	h3llo	1 Match
	hello	No Match

Here, \d matches any decimal digit [0-9].

\D

Expression	String	Match?
\D	1234	No Match
	h3llo	4 Matches

We can see that \D is opposite of \d. That is, it matches any string that does not contain a non-decimal digit.

\s

Expression	String	Match?
\s	hello world	1 Match
	helloworld	No Match

Here, \s matches where a string contains any whitespace character.

\S

Expression	String	Match?
\S	x y	2 Match
	x	1 Match

Here, \S matches where a string contains any non-whitespace character.

\w

Expression	String	Match?
\w	67%;gt	4 Matches
	!>%"	No Match

Here, \w matches any alphanumeric character (digits and alphabets).

\W

Expression	String	Match?
\W	!>%"	4 Matches
	hello	No Match

\W is opposite of \w. It matches any non-alphanumeric character (digits and alphabets).

\Z

Expression	String	Match?
coding\Z	I love coding	1 Match
	coding is fun	No Match

Here, \Z matches if 'coding' is at the end of a string or not.

The re.search() Function

In Python, the re.search() function will search the regex pattern and return the first occurrence.

It is slightly different from re.match() where all lines of the input string are checked.

Let's see an example,

import re

# test string
string1 = 'Nepal is beautiful'
string2 = 'Datamentor for beginners'

# check if 'Nepal' is at the beginning of string1
result1 = re.search('\ANepal', string1) # True

# check if 'beginners' ia at the beginning of string2
result2 = re.search('\Abeginners', string2) # False

# print boolean value
print('Result for string1:', bool(result1)) # True
print('Result for string2:', bool(result2)) # False

Output

Result for string1: True
Result for string2: False

In the above example, we first imported a module named re and used the re.search() function to search for the pattern.

Here, re.search() take two parameters:

\ANepal and \Abeginners - \A matches if the given word is at the start of a string
string1 and string2 - the string in which the pattern is checked

Since,

'Nepal' is at the beginning of string1, bool() returns True
'beginners' is not at the beginning of string2, bool() returns False

The re.split() Function

The re.split() function in Python splits the string at each match and returns a list. For example,

import re

# test string
string1 = 'Nepal is beautiful'

# check if 'Nepal' is at the beginning of string1
result1 = re.split('\s', string1) 

# print boolean value
print(result1)

# Output:  ['Nepal', 'is', 'beautiful']

In the above example, we have used the re.split() function to split the string named string1.

Here, re.split('\s', string1) splits string1 at each white-space character.

Note: We can use other special sequences inside re.split() to split the given string.

The re.findall() Function

In Python, the re.findall() function returns a list of strings containing all matches. For example,

import re

string1 = 'H3ll0 W0R1D'
pattern = '\D+'

# extract non-digits from a string
result = re.findall(pattern, string1) 
print(result)

# Output: ['H', 'll', ' W', 'R', 'D']

Here, the re.findall() function returns a list that contains non-digits from the string1 string.

Note: re.findall() returns an empty list if the pattern is not found in the string.

The re.sub() Function

The re.sub() function in Python returns a string after replacing the matched occurrence in a string with a replacement string. For example,

import re

string1 = 'Hello World'

# replacement string 
replace = 'Hola'

# matches if 'Hello' is at the start or not
pattern = '\AHello'

# replace 'Hello' with 'Hola'
result = re.sub(pattern, replace, string1)
print(result)

# Output: Hola World

In the above example, we have used the re.sub() function to replace 'Hello' with 'Hola' in the string1.

re.sub() returns the original string if the pattern is not found.

Python Match Object

The match object in Python contains all the information about the search and the result. For example,

import re

# test string
string1 = 'Nepal is beautiful'

# result contains match object
result = re.search('\ANepal', string1) 

print(result)

Output

<re.Match object; span=(0, 5), match='Nepal'>

Here, the result variable contains a match object.

Methods and Attributes of Python Match Object

Some of the commonly used methods and attributes of match objects are:

match.group()

The group() function returns the matched substring. For example,

import re

string1 = 'Employee ID 2032 1111'

# Two digit number followed by space followed by three digit number
pattern = '(\d{2}) (\d{3})'

# match variable contains a Match object.
match = re.search(pattern, string1) 

# get substring
print('Whole Substring:', match.group())

# get first part of substring
print('First part of substring:', match.group(1))

# get second part of substring
print('Second part of substring:', match.group(2))

Output

Whole Substring: 32 111
First part of substring: 32
Second part of substring: 111

In the above example, we have used the group() function to return the matched substring from the string named string1.

Here, the pattern '(\d{2}) (\d{3})' means: two digit number followed by space followed by three digit number.

To get the matched substring we have used

match.group() - to get the whole substring
match.group(1) - to get first part of substring
match.group(2) - to get second part of substring

match.start(), match.end(), and match.span()

The start() function returns the index of the start of the matched substring
The end() function returns the end index of the matched substring
The span() function returns a tuple containing start and end index of the matched substring

Let's see an example,

import re

string = 'Employee ID 2032 1111'

# Two digit number followed by space followed by three digit number
pattern = '(\d{2}) (\d{3})'

# match variable contains a Match object.
match = re.search(pattern, string) 

print('Matched Substring Start Index:', match.start())

print('Matched Substring End Index:', match.end())

print('Tuple of Matched Substring Start and End Index:', match.span())

Output

Matched Substring Start Index: 14
Matched Substring End Index: 20
Tuple of Matched Substring Start and End Index: (14, 20)

Raw String in Python

Raw string is useful if we want to treat backslash (\) as a literal character.

For example, '\n' is a new line whereas r'\n' means two characters: a backslash \ followed by n.

Let's understand raw string with the help of an example,

import re

# \n to get new line
string1 = 'Hello\nWorld'
print("Escape Character:", string1)

# prefix r to treat \n as a normal character
string2 = r'Hello\nWorld'
print('Raw String:', string2)

Output

Escape Character: Hello
World
Raw String: Hello\nWorld

Using r prefix before RegEx

In Python, we can prefix r before a regular expression. For example,

import re

# test string
string1 = '\t Programming \n is \r fun.'

pattern = r'[\t\n\r]'

# find \t,\n, and \r in string1
result = re.findall(pattern, string1) 

print(result)

# Output: ['\t', '\n', '\r']

Here, first we have prefixed r before the regular expression pattern as

pattern = r'[\t\n\r]'

And used re.findall() to return a list of strings containing all matches.

Popular Tutorials

Learn Python Interactively

Courses

Learn Python practically and Get Certified.

Popular Tutorials

Popular Examples

Learn Python Interactively

Python Introduction

Python Flow Control

Python Functions

Python Data Structures

Python OOP

Python Modules

Python Exceptions and Files

Python Advanced

Other Topics

Python RegEx

Example: Python RegEx

MetaCharacters in Python Regular Expression

MetaCharacters Examples:

[ ] - Square Brackets

. - Period

^ - Caret

$ - Dollar

* - Star

+ - Plus

? - Question Mark

| - Alternation

() - Group

Python Special Sequences

Special Sequence Examples:

\A

\b

\B

\d

\D

\s

\S

\w

\W

\Z

The re.search() Function

The re.split() Function

The re.findall() Function

The re.sub() Function

Python Match Object

Methods and Attributes of Python Match Object

match.group()

match.start(), match.end(), and match.span()

Raw String in Python

Using r prefix before RegEx

Learn Python practically
and Get Certified.