測試環境為 CentSO 8 (虛擬機)
在 Python 可以透過 re 模組來使用 Regular Expression (正規表達式) 來處理資料, 下面來看一下常用的 re 函數 .
[root@localhost ~]# python3 Python 3.6.8 (default, Sep 10 2021, 09:13:53) [GCC 8.5.0 20210514 (Red Hat 8.5.0-3)] on linux Type "help", "copyright", "credits" or "license" for more information. >>> import re
re.match(pattern, string, flags=0)
要特別注意這個函數的匹配須從字串的起頭來,如須從字串的任何地方可以使用 re.search() 來取代.
>>> text ='https://benjr.tw , Author is Ben Ben'
範例
>>> re.match('https', text) <_sre.SRE_Match object; span=(0, 5), match='https'>
>>> re.match('http', text) <_sre.SRE_Match object; span=(0, 4), match='http'>
無 ben 開頭,則回傳 NULL.
>>> re.match('ben', text)
透過 group() 函數.
>>> re.match('http|https', text).group() 'https'
透過 span() 函數回傳 (起始位置,結束位置)
>>> re.match('https', text).span() (0, 5)
如不想分大小寫搜尋,可以加上 Flags 參數, re.I 是 re.IGNORECASE .
re.match('HTTP', text) re.match('HTTP', text , re.I) <_sre.SRE_Match object; span=(0, 4), match='http'>
參數說明 – https://pynative.com/python-regex-flags/
- re.A ( re.ASCII )
Perform ASCII-only matching instead of full Unicode matching - re.I (re.IGNORECASE )
Perform case-insensitive matching - re.M ( re.MULTILINE )
This flag is used with metacharacter ^ (caret) and $ (dollar). - re.S ( re.DOTALL )
Make the DOT (.) special character match any character at all, including a newline. Without this flag, DOT(.) will match anything except a newline - re.X ( re.VERBOSE )
Allow comment in the regex. This flag is useful to make regex more readable by allowing comments in the regex. - re.L (re.LOCALE )
Perform case-insensitive matching dependent on the current locale. Use only with bytes patterns
re.search(pattern, string, flags)
匹對可以從字串的任何地方,使用方式同 re.match()
>>> text ='This is https://benjr.tw , Author is Ben Ben' >>> re.match('ben', text) >>> re.search('ben', text) <_sre.SRE_Match object; span=(16, 19), match='ben'>
re.findall(pattern, string, flags=0)
不管是 re.match 還是 re.search 都只會回傳第一個匹對的字串,如須找所有匹對需使用 re.findall ,回傳值為 list 串列.
>>> text ='This is https://benjr.tw , Author is Ben Ben' >>> re.findall('ben', text) ['ben'] >>> re.findall('Ben', text) ['Ben', 'Ben'] >>> re.findall('ben', text , re.I) ['ben', 'Ben', 'Ben']
re.sub(pattern, repl, string, count=0, flags=0)
該函數可以用來取代指定字串.
>>> text ='https://benjr.tw , Author is Ben Ben' >>> re.sub('ben', 'ben10' ,text) 'This is https://ben10jr.tw , Author is Ben Ben'
>>> text ='https://benjr.tw , Author is Ben Ben' >>> re.sub('Ben', 'Ben10' ,text ) 'This is https://benjr.tw , Author is Ben10 Ben10'
這邊用到 ‘[b|B]en’ Regular Expression 來表示可以為 ben 或是 Ben,更多關於 re 模組的 Regular Expression 請參考 – https://benjr.tw/105379
>>> re.sub('[b|B]en', 'Ben10' ,text ) 'This is https://Ben10jr.tw , Author is Ben10 Ben10'
re.compile(pattern, flags=0)
如果要匹對的 pattern (Regular Expression) 常用可以直接透過 re.compile 定義,可供 match() , search() 以及其他函數來使用.
prog = re.compile(pattern) result = prog.match(string)
等同
result = re.match(pattern, string)
看一下實際範例.
>>> text ='This is https://benjr.tw , Author is Ben Ben' >>> pattern=re.compile('[b|B]en',re.I)
>>> print(pattern.match(text)) None
>>> print(pattern.search(text)) <_sre.SRE_Match object; span=(16, 19), match='ben'>
>>> print(pattern.findall(text)) ['ben', 'Ben', 'Ben']
>>> print(pattern.sub('Ben10' ,text)) This is https://Ben10jr.tw , Author is Ben10 Ben10
沒有解決問題,試試搜尋本站其他內容