Introduction to Regular Expressions
In this module, we will explore Regular Expressions (regex), which are sequences of characters used to define search patterns. They are powerful tools for text processing, including searching, matching, and modifying strings. We'll learn how to use Python's re module to work with regular expressions and perform operations such as searching, matching, substituting, and splitting strings based on patterns.
Subtopic 1: Understanding Regular Expressions
A regular expression is a special sequence of characters that helps you match or find other strings or sets of strings using a specialized syntax. The syntax allows you to define patterns that can match specific types of text in a string.
Basic Components of Regular Expressions:
- Literal Characters: Matches exact characters in the text.
- Metacharacters: Characters with special meanings used to define regex patterns. Examples include:
.: Matches any single character except a newline.^: Matches the start of a string.$: Matches the end of a string.[]: Matches any one of the characters inside the square brackets.|: Acts as a logical OR.
Example: Simple Regex Match
import re
# Check if "hello" is in the string
result = re.match(r"hello", "hello world")
if result:
print("Match found!")
else:
print("Match not found!")
Subtopic 2: Using re Module
Python’s re module provides several functions to work with regular expressions. Some of the most commonly used functions are:
re.match(): Matches a pattern at the beginning of the string.re.search(): Searches the entire string for a pattern.re.findall(): Returns all non-overlapping matches of a pattern in a string.re.sub(): Substitutes occurrences of a pattern with a specified string.re.split(): Splits a string into a list using a pattern as the delimiter.
Example: Using re.match()
import re
# Match pattern at the beginning of the string
result = re.match(r"Hello", "Hello World!")
if result:
print("Match found!")
else:
print("Match not found!")
Example: Using re.search()
import re
# Search for pattern anywhere in the string
result = re.search(r"world", "Hello world!")
if result:
print("Pattern found!")
else:
print("Pattern not found!")
Subtopic 3: Searching and Matching Patterns
Regular expressions allow you to search for complex patterns, such as digits, specific words, or any character. You can use special characters to match specific parts of a string.
Common Patterns:
\d: Matches any digit (0-9).\w: Matches any word character (letters, digits, and underscores).\s: Matches any whitespace character (spaces, tabs, newlines).[a-z]: Matches any lowercase letter.[^a-z]: Matches any character that is NOT a lowercase letter.
Example: Using re.findall()
import re
# Find all occurrences of digits in the string
text = "There are 3 apples and 5 bananas."
result = re.findall(r"\d", text)
print(f"Digits found: {result}") # Output: ['3', '5']
Subtopic 4: Substitution and Splitting with Regex
Regular expressions can be used to substitute parts of a string with another string or to split a string into multiple parts based on a pattern.
Using re.sub() for Substitution:
- The
re.sub()function allows you to replace occurrences of a pattern in a string with a specified string.
import re
# Substitute digits with asterisks
text = "My phone number is 123-456-7890."
result = re.sub(r"\d", "*", text)
print(result) # Output: My phone number is ***-***-****
Using re.split() to Split a String:
- The
re.split()function splits a string at each match of a pattern and returns a list.
import re
# Split string at every space
text = "This is a simple test."
result = re.split(r"\s", text)
print(result) # Output: ['This', 'is', 'a', 'simple', 'test.']
Tasks
-
Task 1: Validate Email Address
- Write a regular expression to validate if a given email address is in the correct format (e.g.,
user@example.com).
- Write a regular expression to validate if a given email address is in the correct format (e.g.,
-
Task 2: Extract Digits from a String
- Write a program that extracts all the digits from a given string and returns them as a list.
-
Task 3: Replace Vowels in a String
- Write a program that takes a string and replaces all vowels (a, e, i, o, u) with a specified character (e.g.,
*).
- Write a program that takes a string and replaces all vowels (a, e, i, o, u) with a specified character (e.g.,
-
Task 4: Find Words of Specific Length
- Write a program that finds all words in a string that have exactly 5 letters using regular expressions.
-
Task 5: Split String into Sentences
- Write a program that splits a given text into sentences based on punctuation marks like
.,!, and?.
- Write a program that splits a given text into sentences based on punctuation marks like
-
Task 6: Extract Dates from a String
- Write a program that extracts all the dates (in
YYYY-MM-DDformat) from a given string and prints them.
- Write a program that extracts all the dates (in