在python中使用正則表達(dá)式查找可嵌套字符串組

系統(tǒng) 2019-09-27 17:53:36 2144 0

在網(wǎng)上看到一個(gè)小需求，需要用正則表達(dá)式來(lái)處理。原需求如下：

找出文本中包含”因?yàn)椤浴钡木渥樱⒁詢蓚€(gè)詞為中心對(duì)齊輸出前后3個(gè)字，中間全輸出，如果“因?yàn)椤焙汀八浴敝虚g還存在“因?yàn)椤薄八浴保惨页鰜?lái)，另算一行，輸出格式為：

行號(hào) 前面3個(gè)字 *因?yàn)? 全部 &所以& 后面3個(gè)字(標(biāo)點(diǎn)符號(hào)算一個(gè)字)

2 還不是 *因?yàn)? 這里好， &所以& 沒(méi)有人

實(shí)現(xiàn)方法如下：

            
#encoding:utf-8
import os
import re
def getPairStriList(filename):
  pairStrList = []
  textFile = open(filename, 'r')
  pattern = re.compile(u'.{3}\u56e0\u4e3a.*\u6240\u4ee5.{3}') #u'\u56e0\u4e3a和u'\u6240\u4ee5'分別為“因?yàn)椤焙汀八浴钡膗tf8碼
  for line in textFile:
    utfLine = line.decode('utf8')
    result = pattern.search(utfLine)
    while result:
      resultStr = result.group()
      pairStrList.append(resultStr)
      result = pattern.search(resultStr,2,len(resultStr)-2)
  #對(duì)每個(gè)字符串進(jìn)行格式轉(zhuǎn)換和拼接  
  for i in range(len(pairStrList)):
    pairStrList[i] = pairStrList[i][:3] + pairStrList[i][3:5].replace(u'\u56e0\u4e3a',u' *\u56e0\u4e3a* ',1) + pairStrList[i][5:]
    pairStrList[i] = pairStrList[i][:len(pairStrList[i])-5] + pairStrList[i][len(pairStrList[i])-5:].replace(u'\u6240\u4ee5',u' &\u6240\u4ee5& ',1)
    pairStrList[i] = str(i+1) + ' ' + pairStrList[i]
  return pairStrList
  if __name__ == '__main__':
  pairStrList = getPairStriList('test.txt')
  for str in pairStrList:
    print str

PS：下面看下python里使用正則表達(dá)式的組嵌套

由于組本身是一個(gè)完整的正則表達(dá)式，所以可以將組嵌套在其他組中，以構(gòu)建更復(fù)雜的表達(dá)式。下面的例子，就是進(jìn)行組嵌套的例子：

            
#python 3.6 
#蔡軍生  
#http://blog.csdn.net/caimouse/article/details/51749579 
# 
import re 
def test_patterns(text, patterns): 
  """Given source text and a list of patterns, look for 
  matches for each pattern within the text and print 
  them to stdout. 
  """ 
  # Look for each pattern in the text and print the results 
  for pattern, desc in patterns: 
    print('{!r} ({})\n'.format(pattern, desc)) 
    print(' {!r}'.format(text)) 
    for match in re.finditer(pattern, text): 
      s = match.start() 
      e = match.end() 
      prefix = ' ' * (s) 
      print( 
        ' {}{!r}{} '.format(prefix, 
                   text[s:e], 
                   ' ' * (len(text) - e)), 
        end=' ', 
      ) 
      print(match.groups()) 
      if match.groupdict(): 
        print('{}{}'.format( 
          ' ' * (len(text) - s), 
          match.groupdict()), 
        ) 
    print() 
  return

例子：

            
#python 3.6 
#蔡軍生  
#http://blog.csdn.net/caimouse/article/details/51749579 
# 
from re_test_patterns_groups import test_patterns 
test_patterns( 
  'abbaabbba', 
  [(r'a((a*)(b*))', 'a followed by 0-n a and 0-n b')], 
)

結(jié)果輸出如下：

            
'a((a*)(b*))' (a followed by 0-n a and 0-n b)
 'abbaabbba'
 'abb'    ('bb', '', 'bb')
   'aabbb'  ('abbb', 'a', 'bbb')
     'a' ('', '', '')

總結(jié)

以上所述是小編給大家介紹的在python中使用正則表達(dá)式查找可嵌套字符串組，希望對(duì)大家有所幫助，如果大家有任何疑問(wèn)請(qǐng)給我留言，小編會(huì)及時(shí)回復(fù)大家的。在此也非常感謝大家對(duì)腳本之家網(wǎng)站的支持！

更多文章、技術(shù)交流、商務(wù)合作、聯(lián)系博主

微信掃碼或搜索：z360901061

微信掃一掃加我為好友

QQ號(hào)聯(lián)系： 360901061

您的支持是博主寫(xiě)作最大的動(dòng)力，如果您喜歡我的文章，感覺(jué)我的文章對(duì)您有幫助，請(qǐng)用微信掃描下面二維碼支持博主2元、5元、10元、20元等您想捐的金額吧，狠狠點(diǎn)擊下面給點(diǎn)支持吧，站長(zhǎng)非常感激您！手機(jī)微信長(zhǎng)按不能支付解決辦法：請(qǐng)將微信支付二維碼保存到相冊(cè)，切換到微信，然后點(diǎn)擊微信右上角掃一掃功能，選擇支付二維碼完成支付。

【本文對(duì)您有幫助就好】元

2元

5元

10元

20元

自定義

發(fā)表我的評(píng)論

最新評(píng)論總共0條評(píng)論