Python – re (Regular Expression) 模組


測試環境為 CentSO 8 (虛擬機)

在 Python 可以透過 re 模組來使用 Regular Expression (正規表達式) 來處理資料, 下面來看一下常用的 re 函數 .

[root@localhost ~]# python3
Python 3.6.8 (default, Sep 10 2021, 09:13:53)
[GCC 8.5.0 20210514 (Red Hat 8.5.0-3)] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import re

re.match(pattern, string, flags=0)

要特別注意這個函數的匹配須從字串的起頭來,如須從字串的任何地方可以使用 來取代.

>>> text =' , Author is Ben Ben'


>>> re.match('https', text)
<_sre.SRE_Match object; span=(0, 5), match='https'>
>>> re.match('http', text)
<_sre.SRE_Match object; span=(0, 4), match='http'>

無 ben 開頭,則回傳 NULL.

>>> re.match('ben', text)

透過 group() 函數.

>>> re.match('http|https', text).group()

透過 span() 函數回傳 (起始位置,結束位置)

>>> re.match('https', text).span()
(0, 5)

如不想分大小寫搜尋,可以加上 Flags 參數, re.I 是 re.IGNORECASE .

re.match('HTTP', text)
re.match('HTTP', text , re.I)
<_sre.SRE_Match object; span=(0, 4), match='http'>

參數說明 –

  • re.A ( re.ASCII )
    Perform ASCII-only matching instead of full Unicode matching
  • re.I (re.IGNORECASE )
    Perform case-insensitive matching
  • re.M ( re.MULTILINE )
    This flag is used with metacharacter ^ (caret) and $ (dollar).
  • re.S ( re.DOTALL )
    Make the DOT (.) special character match any character at all, including a newline. Without this flag, DOT(.) will match anything except a newline
  • re.X ( re.VERBOSE )
    Allow comment in the regex. This flag is useful to make regex more readable by allowing comments in the regex.
  • re.L (re.LOCALE )
    Perform case-insensitive matching dependent on the current locale. Use only with bytes patterns, string, flags)

匹對可以從字串的任何地方,使用方式同 re.match()

>>> text ='This is , Author is Ben Ben'
>>> re.match('ben', text)
>>>'ben', text)
<_sre.SRE_Match object; span=(16, 19), match='ben'>

re.findall(pattern, string, flags=0)

不管是 re.match 還是 都只會回傳第一個匹對的字串,如須找所有匹對需使用 re.findall ,回傳值為 list 串列.

>>> text ='This is , Author is Ben Ben'
>>> re.findall('ben', text)
>>> re.findall('Ben', text)
['Ben', 'Ben']
>>> re.findall('ben', text , re.I)
['ben', 'Ben', 'Ben']

re.sub(pattern, repl, string, count=0, flags=0)


>>> text =' , Author is Ben Ben'
>>> re.sub('ben', 'ben10' ,text)
'This is , Author is Ben Ben'
>>> text =' , Author is Ben Ben'
>>> re.sub('Ben', 'Ben10' ,text )
'This is , Author is Ben10 Ben10'

這邊用到 ‘[b|B]en’ Regular Expression 來表示可以為 ben 或是 Ben,更多關於 re 模組的 Regular Expression 請參考 –

>>> re.sub('[b|B]en', 'Ben10' ,text )
'This is , Author is Ben10 Ben10'

re.compile(pattern, flags=0)

如果要匹對的 pattern (Regular Expression) 常用可以直接透過 re.compile 定義,可供 match() , search() 以及其他函數來使用.

prog = re.compile(pattern)
result = prog.match(string)


result = re.match(pattern, string)


>>> text ='This is , Author is Ben Ben'
>>> pattern=re.compile('[b|B]en',re.I)
>>> print(pattern.match(text))
>>> print(
<_sre.SRE_Match object; span=(16, 19), match='ben'>
>>> print(pattern.findall(text))
['ben', 'Ben', 'Ben']
>>> print(pattern.sub('Ben10' ,text))
This is , Author is Ben10 Ben10


發佈留言必須填寫的電子郵件地址不會公開。 必填欄位標示為 *

這個網站採用 Akismet 服務減少垃圾留言。進一步了解 Akismet 如何處理網站訪客的留言資料