Python3实战演练-正则表达式1

正则表达式

定义

高级文本匹配模式，提供了搜索、替代等功能。本质上是由一些列特殊符号和字符组成的子串，这个子串就是正则表达式。
这个表达式描述了字符和字符的重复行为，可以匹配一类特征的字符串。

特点

方便进行检索和修改
支持语言众多
灵活多样
其他

python re模块

findall使用方式如下：

1	`re.findall(pattern,string)`

其中pattern是匹配规则，简单使用如：

import re

s = 'My email is 568233708@qq.com,pilihaotian@163.com'
print(re.findall('\w+@\w+\.com', s))

以上程序，输出结果如下，具体的匹配规则，将会再后面详细分析：

1	`['568233708@qq.com', 'pilihaotian@163.com']`

元字符

正则表达式中特殊含义的符号

普通字符匹配

元字符： abc
匹配规则：匹配相应的普通字符

1	`print(re.findall('abc', 'abcdefghij'))`

上述输出结果为：

['abc']

或

元字符： ab|cd
匹配规则：匹配|两边任意一个符合的情况

1	`print(re.findall('ab\|cd', 'abcdefghij'))`

上述输出结果为：

1	`['ab', 'cd']`

但是，如果匹配规则|左右两边有重复的如ab和bc，那么字符串中有abc，那么会匹配第一个，不会匹配到第二个，如：

1	`print(re.findall('ab\|bc', 'abcdefghij'))`

上述输出结果为：

['ab']

单一字符的匹配

元字符： .
匹配规则：匹配除换行之外的任意字符

1	`print(re.findall('.oo.', 'it looks like a good book.'))`

上述输出结果为：

1	`['look', 'good', 'book']`

开始字符的匹配

元字符： ^abc
匹配规则：匹配一个字符串是否以abc开始，如果是返回abc，否则返回空

# 以hello world为例
# 以hello 开头，返回hello
print(re.findall('^hello', 'hello world'))
# 未匹配，返回空
print(re.findall('^world', 'hello world'))

上述输出结果为：

1 2	`['hello'] []`

结束字符的匹配

元字符： abc$
匹配规则：匹配一个字符串是否以abc结束，如果是返回abc，否则返回空

1
2
3

# 以hello world为例
print(re.findall('hello$', 'hello world'))
print(re.findall('world$', 'hello world'))

上述输出结果为：

1 2	`[] ['world']`

重复的匹配*

元字符： *
匹配规则：匹配前面的正则表达式重复0次或多次，需要搭配其他正则表达式使用

1 2	`#普通字符的重复前为o，匹配0个或多个o print(re.findall('fo', 'foa fooa foooa fooooa'))`

上述输出结果为：

1	`['fo', 'foo', 'fooo', 'foooo']`

重复的匹配+

元字符： +
匹配规则：匹配前面的正则表达式重复1次或多次，需要搭配其他正则表达式使用
和*对比，如下：

1 2	`print(re.findall('l*', 'hello world')) print(re.findall('l+', 'hello world'))`

上述输出结果为：

1 2	`['', '', 'll', '', '', '', '', '', 'l', '', ''] ['ll', 'l']`

其中，*号匹配的规则为重复前面的l0次或多次，hello world这个字符串从左到右匹配，第一个h未匹配l，即0次，匹配结果为空，第二个e同理，第三个l匹配成功，继续向后也成功，即匹配2次，匹配结果为ll，后续同理。
+号匹配的规则为重复前面的l至少一次，故结果中不会有未匹配到的值。

重复的匹配？

元字符： ？
匹配规则：匹配前面的正则表达式重复0次或1次，需要搭配其他正则表达式使用

1	`print(re.findall('fo?', 'f foa fooa'))`

上述输出结果为：

1	`['f', 'fo', 'fo']`

其中，第一个匹配次数为0 第二个匹配次数为1，第三个匹配次数也为1。
同理，如下示例：

1	`print(re.findall('f?', 'abc f fo'))`

上述输出结果为：

1	`['', '', '', '', 'f', '', 'f', '', '']`

重复的匹配n

元字符： {n}
匹配规则：匹配前面的正则表达式重复固定的n次，需要搭配其他正则表达式使用

1	`print(re.findall('fo{3}', 'foa fooa foooa fooooa'))`

上述输出结果为：

1	`['fooo', 'fooo']`

重复的匹配m,n

元字符： {m,n}
匹配规则：匹配前面的正则表达式重复m到n次，需要搭配其他正则表达式使用

1	`print(re.findall('fo{3,4}', 'foa fooa foooa fooooa'))`

上述输出结果为：

1	`['fooo', 'foooo']`

字符集合的匹配1

元字符： []
匹配规则：匹配括号中的任意一个字符
其中[]中的值可以为字符串，如abcdef45&*(
也可以为字符区间，如A-G、0-7、&-#等

1 2	`print(re.findall('[ab12AB]', 'abcd1234ABCD.()&^&')) print(re.findall('[a-bA-B0-2-^]', 'abcd1234ABCD.()&^&'))`

上述输出结果为：

1 2	`['a', 'b', '1', '2', 'A', 'B', ''] ['a', 'b', '1', '2', '3', '4', 'A', 'B', 'C', 'D', '.', '', '^']`

字符集合的匹配2

元字符： [^]
匹配规则：匹配除指定字符集之外的任意字符
其中[]中的值可以为字符串，如abcdef45&*(
也可以为字符区间，如A-G、0-7、&-#等

1 2	`print(re.findall('[^ab12AB]', 'abcd1234ABCD.()&^&')) print(re.findall('[^a-bA-B0-2-^]', 'abcd1234ABCD.()&^&'))`

上述输出结果为：

1 2	`['c', 'd', '3', '4', 'C', 'D', '.', '(', ')', '&', '^', '&'] ['c', 'd', '(', ')', '&', '&']`

和上例中的结果刚好相反。

字符任意（非）数字字符

元字符： d
匹配规则：匹配任意数字字符，等价于[0-9]
元字符： D
匹配规则：匹配任意非数字字符，等价于[^0-9]

1 2	`print(re.findall('\d', 'abcd1234')) print(re.findall('\D', 'abcd1234'))`

上述输出结果为：

1 2	`['1', '2', '3', '4'] ['a', 'b', 'c', 'd']`

1 2	`# 匹配11位电话号码 print(re.findall('1\d{10}', '18095555555000000abc18296365985$%#'))`

上述输出结果为：

1	`['18095555555', '18296365985']`

（非）普通字符的匹配

其中普通字符是指：数字字母下划线
元字符： \w
匹配规则：匹配普通字符，等价于[_0-9a-zA-Z]
元字符： \W
匹配规则：匹配非普通字符，等价于[^_0-9a-zA-Z]

1 2	`print(re.findall('\w+', 'abcd_1234&)(%$')) print(re.findall('\W+', 'abcd_1234&)(%$'))`

上述输出结果为：

1 2	`['abcd_1234'] ['&*)(%$']`

（非）空字符的匹配

其中空字符是指：[ \t\r\n\0]
元字符： \s
匹配规则：匹配空字符，等价于[ \t\r\n]
元字符： \S
匹配规则：匹配非空字符，等价于[^ \t\r\n]

1 2	`print(re.findall('\s+', 'hello world \t my \r name \n is leehao')) print(re.findall('\S+', 'hello world \t my \r name \n is \0 leehao'))`

上述输出结果为：

1 2	`[' ', ' \t ', ' \r ', ' \n ', ' '] ['hello', 'world', 'my', 'name', 'is', '\x00', 'leehao']`

匹配出大写字母开头的单词

1	`print(re.findall("[A-Z]\S+", 'hello World , I am from China'))`

其中：[A-Z]表示为大写字母开头的字母，\S表示不为空字符，+表示匹配不为空字符的直到遇到空字符，即可以获取到一个完整的大写字母开头的单词。
上述输出结果为：

1	`['World', 'China']`

起始位置的匹配

也成为精准匹配、绝对匹配。
元字符： \A
匹配规则：匹配字符串的开始位置
元字符： \Z
匹配规则：匹配字符串的结束位置

1
2
3

print(re.findall("\Ahello\Z", 'hello'))
print(re.findall("\A/\w+/\w+/\w+\Z", '/D/User/leehao'))
print(re.findall("\A/\w+/\w+/\w+\Z", '/D/User/leehao/docs/ppt'))

上述输出结果为：

1
2
3

['hello']
['/D/User/leehao']
[]

单词（非）边界位置的匹配

数字字母下划线和其他字符的交界位置，为单词的边界
元字符： \b
匹配规则：匹配单词边界位置
元字符： \B
匹配规则：匹配单词非边界位置

print(re.findall(r"is", 'this is a world,aaisbb'))
# 匹配第二个is
print(re.findall(r"\bis\b", 'this is a world,aaisbb'))
# 匹配第一个this中的is
print(re.findall(r"\Bis\b", 'this is a world,aaisbb'))
# 匹配第三个aaisbb中的is
print(re.findall(r"\Bis\B", 'this is a world,aaisbb'))

上述输出结果为：

['is', 'is', 'is']
['is']
['is']
['is']

总结

匹配单个字符：a . \d \D \w \W \s \S [...] [^...]
匹配重复性：* + ? {n} {m,n}
匹配某个位置：^ $ \A \Z \b \B
其他： | () \

Python

#默认标签

Python3实战演练-正则表达式1

https://leehoward.cn/2020/03/07/Python3实战演练-正则表达式1/

作者

lihao

发布于

2020年3月7日

许可协议

Python3实战演练-正则表达式2 上一篇

读《人生》有感下一篇