Python RegEx
Python regex will be discussed using examples. As a means of achieving educational goals.
In Python regex, we talk about Regular Expressions, or RegEx. RegEx is a sequence of characters that forms a search pattern.
Using RegEx, you can check if a string contains the search pattern you specify.
Python RegEx Module
The Python programming language includes a package called re that handles regular expressions.
Re import:
import re
Using Python Regex
Following the import of the re module, you can begin writing regular expressions with Python regex:
To find out if the string begins with “Elon” and ends with “Musk”, search the string as follows:
Example
Python RegEx Functions
In the Python regex module re, we can search for matches between two strings using the following functions:
Function | Overview |
---|---|
findall | Provides a list of all matches. |
search | Matches anywhere in the string and returns a Match object. |
split | Splits the string at each match and returns a list. |
sub | String replacement for one or more matches. |
Metacharacters With Python RegEx
Characters with metacharacters have a special meaning:
Character | Overview | Example | Try it |
---|---|---|---|
[] | The characters in a set – Using – inside square brackets, you can specify a range of characters. [a-e] corresponds to [abcde] [1-4] = [1234] | “[p-m]” | Execute |
\ | Detects a special sequence (or escapes a special character). | “\d” | Execute |
. | Characters other than newlines | “el..on” | Execute |
^ | Begins with | “^reeve” | Execute |
$ | Ends with | “Musk$” | Execute |
* | An occurrence of zero or more | “mrx*” | Execute |
+ | A single or multiple occurrences | “mrx+” | Execute |
{} | Amount of occurrences as specified | “mr{2}” | Execute |
| | It’s either/or | “falls|rise” | Execute |
() | Grouping and capturing |
Special Sequences
Here are some Python regex sequences, preceded by characters from the list below, that have special meanings:
Character | Overview | Example | Try it |
---|---|---|---|
\A | Matches strings where the specified characters appear at the beginning. | “\AMr” | Execute |
\b | This function matches words that contain the specified characters at the beginning or at the end. | e”\blon” m”usk\b” | Execute Execute |
\B | The specified characters must appear in the match, but not at the beginning (or end) A word’s ending. | e”\Blon” m”musk\B” | Execute Execute |
\d | This function finds matches in strings with digits (from 0 to 9). | “\d” | Execute |
\D | Matches strings that do not contain digits. | “\D” | Execute |
\s | Identifies a string containing white space as a match. | “\s” | Execute |
\S | Matches strings without white space characters. | “\S” | Execute |
\w | It finds matches where the string contains any of the word characters (a to Z, 0-9, and underscore _). | “\w” | Execute |
\W | This function returns a match when there are no word characters in the string. | “\W” | Execute |
\Z | A match is returned if the string ends with the specified characters. | “Elon\Z” | Execute |
Sets
Python regex uses the term ‘set’ to describe a number of characters enclosed in square brackets [] that have a special meaning when it comes to their use in Python regex:
Set | Overview | Try it |
---|---|---|
[arn] | The function returns a match when one of the specified characters (a, r, or n) appears in the string | Execute |
[a-n] | Identifies matches between a and n for any lower case character. | Execute |
[^arn] | Matches any character EXCEPT a, r, and n | Execute |
[0123] | Any digit (0, 1, 2, or 3) in the specified range will be returned as a match | Execute |
[0-9] | Matches any digit between 0 and 9 | Execute |
[0-5][0-9] | Matches any two-digit number between 00 and 59 | Execute |
[a-zA-Z] | The function returns the match for any character alphabetically between a and z, in either lowercase or uppercase. | Execute |
[+] | In sets, +, *, ., |, (), $,{} does not have any special meaning, whereas [+] means: return a match for any + character in the string. | Execute |
Python findall() Function
Python’s regex findall() function provides a list that contains all matches.
All matches will be printed in a list:
Example
Matches are listed in order of discovery.
In Python regex, an empty list is returned if no matches are found:
If no matches were found, return an empty list:
Example
search() Function In Python Regex
Search() searches the string for a match and returns a match object if one is found.
It will only return the first match if there is more than one:
If the string contains a white space, search for it as follows:
Example
The value None is returned if no matches are found:
No results were found when you searched:
Example
Python split() Function
When split() is called, it returns a list that contains the split strings:
Each white-space character is split as follows:
Example
By specifying maxsplit, you can control how many splits occur:
Separate the string at the first position:
Example
Python sub() Function
When it comes to Python regex, the sub() function replaces the matches with the text of your choice:
All white or blank spaces should be filled with the character – :
Example
The count parameter allows you to specify the number of replacements:
The first 4 occurrences should be replaced:
Example
Match Object
In Python regex, the Match Object is an object that provides information about the search and the result that has been performed.
Unless there is a match, None will be returned instead of a Match Object.
To find a Match Object, perform the following search:
Example
There are properties and methods on the Match object that are used to retrieve data about the search and the results.
- A tuple containing the start- and end positions of the match is returned by .span().
- The string passed into the function is returned by .string
- A match between a string’s groups is returned by .group()
Identify the first occurrence of a match (start- and end-positions).
Example
The string passed to the function will be displayed as follows:
Example
You will be able to see the part of the string where there was a match.
Example
Unless there is a match, None will be returned instead of Match Object.
Python RegEx Uses
Here are some common use cases for Python regex:
- Regex can be used to search for specific patterns within strings. For example, you can find all occurrences of a certain word, extract email addresses or URLs, validate phone numbers, etc.
- Regex allows you to extract specific portions of text from a larger string. This can be useful for parsing data, extracting information from logs, or scraping web pages.
- Regex enables you to find and replace specific patterns within a string. You can replace certain words, remove unwanted characters, or perform complex text transformations.
- Regex can be used to validate user input by checking if it matches a specific pattern. For example, you can verify if an input follows a certain format like a valid email address, a strong password, or a correct date format.
- Regex can help clean and normalize data by removing unwanted characters, formatting inconsistencies, or correcting common errors in textual data.
- Regex allows you to split text into tokens based on specific patterns. This can be useful for natural language processing tasks like text classification, sentiment analysis, or information retrieval.
- Regex is often used in programming language compilers and interpreters for tokenizing and parsing source code, identifying keywords, operators, and other language constructs.
- In web development frameworks, regex is commonly used for defining URL patterns and routing incoming requests to appropriate handlers or controllers.