Python RegEx

Python regex will be discussed using examples. As a means of achieving educational goals.

In Python regex, we talk about Regular Expressions, or RegEx. RegEx is a sequence of characters that forms a search pattern.

Using RegEx, you can check if a string contains the search pattern you specify.

Python Regex



Python RegEx Module

The Python programming language includes a package called re that handles regular expressions.

Re import:

import re

Using Python Regex

Following the import of the re module, you can begin writing regular expressions with Python regex:

To find out if the string begins with “Elon” and ends with “Musk”, search the string as follows:

Example

import re sentence = "Elon Musk is new owner of twitter, His full name is Elon Reeve Musk" mrx = re.search("^Elon.*Musk%24", sentence) if mrx: print("True – Above sentence has Elon in start and musk in end.") else: print("Sorry no match found.")

Python RegEx Functions

In the Python regex module re, we can search for matches between two strings using the following functions:

FunctionOverview
findallProvides a list of all matches.
searchMatches anywhere in the string and returns a Match object.
splitSplits the string at each match and returns a list.
subString replacement for one or more matches.

Metacharacters With Python RegEx

Characters with metacharacters have a special meaning:

CharacterOverviewExampleTry it
[]The characters in a set – Using – inside square brackets, you can specify a range of characters.
[a-e] corresponds to [abcde]
[1-4] = [1234]
“[p-m]”Execute
\Detects a special sequence (or escapes a special character).“\d”Execute
.Characters other than newlines“el..on”Execute
^Begins with“^reeve”Execute
$Ends with“Musk$”Execute
*An occurrence of zero or more“mrx*”Execute
+A single or multiple occurrences“mrx+”Execute
{}Amount of occurrences as specified“mr{2}”Execute
|It’s either/or“falls|rise”Execute
()Grouping and capturing

Special Sequences

Here are some Python regex sequences, preceded by characters from the list below, that have special meanings:

CharacterOverviewExampleTry it
\A

Matches strings where the specified characters appear at the beginning.

“\AMr”Execute
\b

This function matches words that contain the specified characters at the beginning or at the end.
It ensures that the string is treated as a “raw string” by adding the “r” at the beginning.

e”\blon”
m”usk\b”
Execute
Execute
\B

The specified characters must appear in the match, but not at the beginning (or end) A word’s ending.
A “r” at the beginning indicates that the string should be treated as “raw”.

e”\Blon”
m”musk\B”
Execute
Execute
\d

This function finds matches in strings with digits (from 0 to 9).

“\d”Execute
\D

Matches strings that do not contain digits.

“\D”Execute
\s

Identifies a string containing white space as a match.

“\s”Execute
\S

Matches strings without white space characters.

“\S”Execute
\w

 It finds matches where the string contains any of the word characters (a to Z, 0-9, and underscore _).

“\w”Execute
\W

This function returns a match when there are no word characters in the string.

“\W”Execute
\Z

A match is returned if the string ends with the specified characters.

“Elon\Z”Execute

Sets

Python regex uses the term ‘set’ to describe a number of characters enclosed in square brackets [] that have a special meaning when it comes to their use in Python regex:

SetOverviewTry it
[arn]

The function returns a match when one of the specified characters (a, r, or n) appears in the string

Execute
[a-n]Identifies matches between a and n for any lower case character.Execute
[^arn]Matches any character EXCEPT a, r, and nExecute
[0123]Any digit (0, 1, 2, or 3) in the specified range will be returned as a matchExecute
[0-9]Matches any digit between 0 and 9Execute
[0-5][0-9]Matches any two-digit number between 00 and 59Execute
[a-zA-Z]The function returns the match for any character alphabetically between a and z, in either lowercase or uppercase.Execute
[+]In sets, +, *, ., |, (), $,{} does not have any special meaning, whereas [+] means: return a match for any + character in the string.Execute

Python findall() Function

Python’s regex findall() function provides a list that contains all matches.

All matches will be printed in a list:

Example

import re #Get a list of all occurrences of "on": sentence = "Elon Musk is the new owner of Twitter, His full name is Elon Reeve Musk" mrx = re.findall("on", sentence) print(mrx)

Matches are listed in order of discovery.

In Python regex, an empty list is returned if no matches are found:

If no matches were found, return an empty list:

Example

import re sentence = "Elon Musk is the new owner of Twitter, His full name is Elon Reeve Musk" #Check the string for "Billgates": mrx = re.findall("Billgates", sentence) print(mrx) if (mrx): print("Yes, there is a match%21") else: print("Sorry, No match")

search() Function In Python Regex

Search() searches the string for a match and returns a match object if one is found.

It will only return the first match if there is more than one:

If the string contains a white space, search for it as follows:

Example

import re sentence = "Elon Musk is the new owner of Twitter, His full name is Elon Reeve Musk" mrx = re.search("\s", sentence) print("Position of the first white-space character is:", mrx.start())

The value None is returned if no matches are found:

No results were found when you searched:

Example

import re sentence = "Elon Musk is the new owner of Twitter, His full name is Elon Reeve Musk" mrx = re.search("spaceX", sentence) print(mrx)

Python split() Function

When split() is called, it returns a list that contains the split strings:

Each white-space character is split as follows:

Example

import re #Every whitespace character in the string: sentence = "Elon Musk is the new owner of Twitter, His full name is Elon Reeve Musk" mrx = re.split("\s", sentence) print(mrx)

By specifying maxsplit, you can control how many splits occur:

Separate the string at the first position:

Example

import re #At the first character of white space, separate the string as follows: sentence = "Elon Musk is the new owner of Twitter, His full name is Elon Reeve Musk" mrx = re.split("\s", sentence, 1) print(mrx)

Python sub() Function

When it comes to Python regex, the sub() function replaces the matches with the text of your choice:

All white or blank spaces should be filled with the character – :

Example

import re #Remove all whitespace characters and replace them with "-": sentence = "Currently, Elon Musk is the owner of Twitter. His full name is Elon Reeve Musk" mrx = re.sub("\s", "-", sentence) print(mrx)

The count parameter allows you to specify the number of replacements:

The first 4 occurrences should be replaced:

Example

import re #White or blank spaces should be replaced with symbol – in the first Four positions: sentence = "Elon Musk currently owns Twitter and he is in charge of its operation. The full name of Elon Musk is Elon Reeve Musk" mrx = re.sub("\s", "-", sentence, 4) print(mrx)

Match Object

In Python regex, the Match Object is an object that provides information about the search and the result that has been performed.

Unless there is a match, None will be returned instead of a Match Object.

To find a Match Object, perform the following search:

Example

import re #A Match object is returned by the search() function: sentence = "Elon Musk currently owns Twitter and he is in charge of its operation. The full name of Elon Musk is Elon Reeve Musk" mrx = re.search("on", sentence) print(mrx)

There are properties and methods on the Match object that are used to retrieve data about the search and the results.

  • A tuple containing the start- and end positions of the match is returned by .span().
  • The string passed into the function is returned by .string
  • A match between a string’s groups is returned by .group()

Identify the first occurrence of a match (start- and end-positions).

Example

import re #Find a capital "R" in the beginning of a word and print its position: sentence = "Elon Musk currently owns Twitter and he is in charge of its operation. The full name of Elon Musk is Elon Reeve Musk" mrx = re.search(r"\bR\w+", sentence) print(mrx.span())

The string passed to the function will be displayed as follows:

Example

import re #Search strings are returned by the string property: sentence = "Elon Musk currently owns Twitter and he is in charge of its operation. The full name of Elon Musk is Elon Reeve Musk" mrx = re.search(r"\bR\w+", sentence) print(mrx.string)

You will be able to see the part of the string where there was a match.

Example

import re #Print the word that begins with an upper case "R" character: sentence = "Elon Musk currently owns Twitter and he is in charge of its operation. The full name of Elon Musk is Elon Reeve Musk" mrx = re.search(r"\bR\w+", sentence) print(mrx.group())

Unless there is a match, None will be returned instead of Match Object.


Python RegEx Uses

Here are some common use cases for Python regex:

  1. Regex can be used to search for specific patterns within strings. For example, you can find all occurrences of a certain word, extract email addresses or URLs, validate phone numbers, etc.
  2. Regex allows you to extract specific portions of text from a larger string. This can be useful for parsing data, extracting information from logs, or scraping web pages.
  3. Regex enables you to find and replace specific patterns within a string. You can replace certain words, remove unwanted characters, or perform complex text transformations.
  4. Regex can be used to validate user input by checking if it matches a specific pattern. For example, you can verify if an input follows a certain format like a valid email address, a strong password, or a correct date format.
  5. Regex can help clean and normalize data by removing unwanted characters, formatting inconsistencies, or correcting common errors in textual data.
  6. Regex allows you to split text into tokens based on specific patterns. This can be useful for natural language processing tasks like text classification, sentiment analysis, or information retrieval.
  7. Regex is often used in programming language compilers and interpreters for tokenizing and parsing source code, identifying keywords, operators, and other language constructs.
  8. In web development frameworks, regex is commonly used for defining URL patterns and routing incoming requests to appropriate handlers or controllers.
Your reaction is an essential part of our journey towards excellence, and we appreciate your contribution.
We value your feedback.
+1
0
+1
0
+1
0
+1
0
+1
0
+1
0
+1
0

Subscribe To Our Newsletter
Enter your email to receive a weekly round-up of our best posts. Learn more!
icon

Leave a Reply

Your email address will not be published. Required fields are marked *