当前位置：首页 > news >正文

深入解析 Python 正则表达式：全面指南与实战示例

news 来源：原创 2025/4/30 6:14:53

深入解析 Python 正则表达式：全面指南与实战示例

📌 引言

正则表达式（Regular Expressions, regex）是用于文本匹配、查找和替换的强大工具。在 Python 中，我们可以使用 re 模块来处理正则表达式。无论是数据清洗、日志分析，还是字符串解析，正则表达式都能极大地提高效率。

本篇文章将详细介绍 Python 中正则表达式的 语法规则、常见操作及实战示例，让你轻松掌握正则表达式的核心技能！🚀

1. 导入 `re` 模块

在 Python 中，所有正则操作都需要 re 模块：

import re

2. 正则表达式基本语法

✅ 特殊字符

符号	含义	示例	匹配结果
`.`	任意单个字符（除换行符）	`a.b`	`"acb"`, `"a0b"`
`^`	以某个字符串开头	`^Hello`	`"Hello world"`
`$`	以某个字符串结尾	`world$`	`"Hello world"`
`*`	前面字符重复 0 次或多次	`ab*`	`"a"`, `"ab"`, `"abb"`
`+`	前面字符重复 1 次或多次	`ab+`	`"ab"`, `"abb"`
`?`	前面字符 0 次或 1 次	`ab?`	`"a"`, `"ab"`
`{n}`	前面字符恰好 n 次	`a{3}`	`"aaa"`
`{n,}`	前面字符至少 n 次	`a{2,}`	`"aa"`, `"aaa"`
`{n,m}`	前面字符 n 到 m 次	`a{1,3}`	`"a"`, `"aa"`, `"aaa"`
`[]`	字符类，匹配任意一个字符	`[abc]`	`"a"`, `"b"`, `"c"`
`	`	逻辑 “或”	`apple
`\d`	数字 `[0-9]`	`\d{3}`	`"123"`
`\D`	非数字 `[^0-9]`	`\D`	`"a"`, `"@"`
`\s`	空白字符（空格、制表符等）	`\s+`	`" "`
`\w`	单词字符 `[a-zA-Z0-9_]`	`\w+`	`"hello123"`

3. 正则表达式常见操作

✅ `re.match()` —— 从字符串起始位置匹配

import re

pattern = r"hello"
text = "hello world"

match = re.match(pattern, text)
if match:
    print("匹配成功:", match.group())  # hello
else:
    print("匹配失败")

注意：re.match() 只匹配开头部分，如果 "hello" 不在字符串开头，匹配会失败。

✅ `re.search()` —— 在整个字符串中搜索

import re

pattern = r"world"
text = "hello world"

search = re.search(pattern, text)
if search:
    print("匹配成功:", search.group())  # world

适用于 查找字符串任意位置的匹配。

✅ `re.findall()` —— 查找所有匹配项

import re

pattern = r"\d+"
text = "订单号123，金额456元"

matches = re.findall(pattern, text)
print(matches)  # ['123', '456']

适用于 提取多个匹配项。

✅ `re.finditer()` —— 迭代查找

import re

pattern = r"\d+"
text = "订单号123，金额456元"

matches = re.finditer(pattern, text)
for match in matches:
    print(match.group())  # 123  456

适用于 需要保留匹配位置的情况（match.start() 可获取匹配位置）。

✅ `re.sub()` —— 替换字符串

import re

pattern = r"\d+"
text = "订单号123，金额456元"

result = re.sub(pattern, "XXX", text)
print(result)  # 订单号XXX，金额XXX元

适用于 替换敏感信息，如手机号、身份证号等。

✅ `re.split()` —— 按正则拆分字符串

import re

text = "apple,banana;orange|grape"
pattern = r"[,;|]"  # 逗号、分号、竖线分割

result = re.split(pattern, text)
print(result)  # ['apple', 'banana', 'orange', 'grape']

适用于 按多个分隔符拆分字符串。

4. 正则表达式实战案例

案例 1：验证电子邮件

import re

def is_valid_email(email):
    pattern = r"^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$"
    return re.match(pattern, email) is not None

print(is_valid_email("test@example.com"))  # True
print(is_valid_email("invalid-email"))     # False

案例 2：提取网页中的所有 URL

import re

text = "访问 https://www.google.com 或 http://www.github.com 获取更多信息"
pattern = r"https?://[a-zA-Z0-9./-]+"

urls = re.findall(pattern, text)
print(urls)  # ['https://www.google.com', 'http://www.github.com']

案例 3：隐藏身份证号码

import re

text = "张三的身份证号是 123456199012123456"
pattern = r"(\d{6})\d{8}(\d{4})"

masked_text = re.sub(pattern, r"\1********\2", text)
print(masked_text)  # 张三的身份证号是 123456********3456

🎯 总结

方法	用途
`re.match()`	仅匹配开头
`re.search()`	在字符串中查找第一次匹配
`re.findall()`	查找所有匹配项，返回列表
`re.finditer()`	查找所有匹配项，返回迭代器
`re.sub()`	替换匹配内容
`re.split()`	按正则表达式拆分字符串