Instagram
youtube
Facebook
Twitter

Regex

What is Regular Expression (Regex)?

Regular expression or regex is a powerful tool for searching and manipulating text. It is a pattern of characters that is used to match a specific sequence of characters in a string.

Regex is widely used in programming languages, including Python, to perform various text processing tasks, such as data cleaning, data extraction, and data validation.

How to Use Regex in Python?

Python provides the re module for working with regex. The re module contains various functions and methods for working with regex patterns. Here are some of the most commonly used functions and methods:

  • re.search(pattern, string): Searches for the first occurrence of a pattern in a string and returns a match object.

  • re.findall(pattern, string): Returns all non-overlapping occurrences of a pattern in a string as a list.

  • re.sub(pattern, replacement, string): Searches for all occurrences of a pattern in a string and replaces them with a specified replacement string.

Regex Patterns

Before we dive into using regex in Python, let's first take a look at some of the commonly used regex patterns:

  • . (dot): Matches any character except a newline.

  • ^ (caret): Matches the beginning of a string.

  • $ (dollar): Matches the end of a string.

  • * (asterisk): Matches zero or more occurrences of the preceding character.

  • + (plus): Matches one or more occurrences of the preceding character.

  • ? (question mark): Matches zero or one occurrence of the preceding character.

  • [] (square brackets): Matches any character inside the brackets.

  • | (pipe): Matches either the pattern on the left or the pattern on the right.

  • \ (backslash): Escapes a special character.

Examples

Let's now look at some examples of using regex in Python:



Searching for a Pattern

import re

# Search for the word "Python" in a string
string = "I love Python programming"
pattern = "Python"
result = re.search(pattern, string)

if result:
    print("Pattern found!")
else:
    print("Pattern not found.")

 

Output:

Pattern found!

 


 

Finding All Occurrences of a Pattern

import re

# Find all occurrences of the word "Python" in a string
string = "I love Python programming. Python is my favorite language."
pattern = "Python"
result = re.findall(pattern, string)

print(result)


 

Output:

['Python', 'Python']


 


Replacing a Pattern

import re

# Replace all occurrences of the word "Python" with "Java" in a string
string = "I love Python programming. Python is my favorite language."
pattern = "Python"
replacement = "Java"
result = re.sub(pattern, replacement, string)

print(result)

Output:

I love Java programming. Java is my favorite language.

 

Conclusion

Regex is a powerful tool for working with text data in Python. In this tutorial, we have covered some of the basics of regex and how to use it in Python. With regex, you can easily search, extract, and manipulate text data, making it an essential tool for any data scientist or developer working with text data.