?
什么是pyQuery:
強大又靈活的網頁解析庫。如果你覺得正則寫起來太麻煩(我不會寫正則),如果你覺得 BeautifulSoup的語法太難記,如果你熟悉JQuery的語法,那么PyQuery就是你最佳的選擇。
pyQuery的安裝pip3 install pyquery即可安裝啦。
pyQuery的基本用法:
初始化:
字符串初始化:
#
!/usr/bin/env python
#
-*- coding: utf-8 -*-
html
=
"""
The Dormouse's story
The Dormouse's story
Once upon a time there were three little sisters;and thier names were
Lacie
and
Title
; and they lived at the boottom of a well.
...
"""
from
pyquery
import
PyQuery as pq
doc
=
pq(html)
print
(doc(
'
a
'
))
運行結果:
URL初始化:
#
!/usr/bin/env python
#
-*- coding: utf-8 -*-
#
URL初始化
from
pyquery
import
PyQuery as pq
doc
= pq(
'
http://www.baidu.com
'
)
print
(doc(
'
input
'
))
運行結果:
文件初始化:
#
!/usr/bin/env python
#
-*- coding: utf-8 -*-
#
文件初始化
from
pyquery
import
PyQuery as pq
doc
= pq(filename=
'
baidu.html
'
)
print
(doc(
'
title
'
))
運行結果:
?選擇方式和jquery一致,id、name、class都是如此,還有很多都和jquery一致。
基本CSS選擇器:
#
!/usr/bin/env python
#
-*- coding: utf-8 -*-
#
Css選擇器
html
=
"""
The Dormouse's story
The Dormouse's story
Once upon a time there were three little sisters;and thier names were
Lacie
and
Title
; and they lived at the boottom of a well.
...
"""
from
pyquery
import
PyQuery as pq
doc
=
pq(html)
print
(doc(
'
.title
'
))
運行結果:
查找元素:
子元素:
#
!/usr/bin/env python
#
-*- coding: utf-8 -*-
#
子元素
html
=
"""
The Dormouse's story
The Dormouse's story
Once upon a time there were three little sisters;and thier names were
Lacie
and
Title
; and they lived at the boottom of a well.
...
"""
from
pyquery
import
PyQuery as pq
doc
=
pq(html)
items
= doc(
'
.title
'
)
print
(type(items))
print
(items)
p
= items.find(
'
b
'
)
print
(type(p))
print
(p)
該代碼為查找id為title的標簽,我們可以看到id為title的標簽有兩個一個是p標簽,一個是a標簽,然后我們再使用find方法,查找出我們需要的p標簽,運行結果:
這里需要注意的是,我們所使用的find是查找每一個元素內部的標簽.
children:
#
!/usr/bin/env python
#
-*- coding: utf-8 -*-
#
子元素
html
=
"""
The Dormouse's story
The Dormouse's story
Once upon a time there were three little sisters;and thier names were
Lacie
and
Title
; and they lived at the boottom of a well.
...
"""
from
pyquery
import
PyQuery as pq
doc
=
pq(html)
items
= doc(
'
.title
'
)
print
(items.children())
運行結果:
也可以在children()內添加選擇器條件:
#
!/usr/bin/env python
#
-*- coding: utf-8 -*-
#
子元素
html
=
"""
The Dormouse's story
The Dormouse's story
Once upon a time there were three little sisters;and thier names were
Lacie
and
Title
; and they lived at the boottom of a well.
...
"""
from
pyquery
import
PyQuery as pq
doc
=
pq(html)
items
= doc(
'
.title
'
)
print
(items.children(
'
b
'
))
輸出結果和上面的一致。
?父元素:
#
!/usr/bin/env python
#
-*- coding: utf-8 -*-
#
子元素
html
=
"""
The Dormouse's story
The Dormouse's story
Once upon a time there were three little sisters;and thier names were
Lacie
and
Title
; and they lived at the boottom of a well.
...
"""
from
pyquery
import
PyQuery as pq
doc
=
pq(html)
items
= doc(
'
#link1
'
)
print
(items)
print
(items.parent())
運行結果:
這里只輸出一個父元素。這里我們用parents方法會給予我們返回所有父元素,祖先元素
#
!/usr/bin/env python
#
-*- coding: utf-8 -*-
#
祖先元素
html
=
"""
The Dormouse's story
Once upo a time were three little sister;and theru name were
Elsie
Lacie
and
Title
Title
...
"""
from
pyquery
import
PyQuery as pq
doc
=
pq(html)
items
= doc(
'
#link1
'
)
print
(items)
print
(items.parents(
'
body
'
))
運行結果:
兄弟元素:
#
!/usr/bin/env python
#
-*- coding: utf-8 -*-
#
兄弟元素
html
=
"""
The Dormouse's story
Once upo a time were three little sister;and theru name were
Elsie
Lacie
and
Title
Title
...
"""
from
pyquery
import
PyQuery as pq
doc
=
pq(html)
items
= doc(
'
#link1
'
)
print
(items)
print
(items.siblings(
'
#link2
'
))
運行結果:
上面就把查找元素的方法都說了,下面我來看一下如何遍歷元素。
遍歷
#
!/usr/bin/env python
#
-*- coding: utf-8 -*-
#
兄弟元素
html
=
"""
The Dormouse's story
Once upo a time were three little sister;and theru name were
Elsie
Lacie
and
Title
Title
...
"""
from
pyquery
import
PyQuery as pq
doc
=
pq(html)
items
= doc(
'
a
'
)
for
k,v
in
enumerate(items.items()):
print
(k,v)
運行結果:
?獲取信息:
獲取屬性:
#
!/usr/bin/env python
#
-*- coding: utf-8 -*-
#
獲取屬性
html
=
"""
The Dormouse's story
Once upo a time were three little sister;and theru name were
Elsie
Lacie
and
Title
Title
...
"""
from
pyquery
import
PyQuery as pq
doc
=
pq(html)
items
= doc(
'
a
'
)
print
(items)
print
(items.attr(
'
href
'
))
print
(items.attr.href)
運行結果:
獲得文本:
#
!/usr/bin/env python
#
-*- coding: utf-8 -*-
#
獲取屬性
html
=
"""
The Dormouse's story
Once upo a time were three little sister;and theru name were
Elsie
Lacie
and
Title
Title
...
"""
from
pyquery
import
PyQuery as pq
doc
=
pq(html)
items
= doc(
'
a
'
)
print
(items)
print
(items.text())
print
(type(items.text()))
運行結果:
獲得HTML:
#
!/usr/bin/env python
#
-*- coding: utf-8 -*-
#
獲取屬性
html
=
"""
The Dormouse's story
Once upo a time were three little sister;and theru name were
Elsie
Lacie
and
Title
Title
...
"""
from
pyquery
import
PyQuery as pq
doc
=
pq(html)
items
= doc(
'
a
'
)
print
(items.html())
運行結果:
DOM操作:
addClass、removeClass
#
!/usr/bin/env python
#
-*- coding: utf-8 -*-
#
DOM操作,addClass、removeClass
html
=
"""
The Dormouse's story
Once upo a time were three little sister;and theru name were
Elsie
Lacie
and
Title
Title
...
"""
from
pyquery
import
PyQuery as pq
doc
=
pq(html)
items
= doc(
'
#link2
'
)
print
(items)
items.addClass(
'
addStyle
'
)
#
add_class
print
(items)
items.remove_class(
'
sister
'
)
#
removeClass
print
(items)
運行結果:
attr、css:
#
!/usr/bin/env python
#
-*- coding: utf-8 -*-
#
DOM操作,attr,css
html
=
"""
The Dormouse's story
Once upo a time were three little sister;and theru name were
Elsie
Lacie
and
Title
Title
...
"""
from
pyquery
import
PyQuery as pq
doc
=
pq(html)
items
= doc(
'
#link2
'
)
items.attr(
'
name
'
,
'
addname
'
)
print
(items)
items.css(
'
width
'
,
'
100px
'
)
print
(items)
可以給予新的屬性,如果原來有該屬性,會覆蓋掉原有的屬性
運行結果:
remove:
#
!/usr/bin/env python
#
-*- coding: utf-8 -*-
#
DOM操作,remove
html
=
"""
Hello World
This is a paragraph.
"""
from
pyquery
import
PyQuery as pq
doc
=
pq(html)
wrap
= doc(
'
.wrap
'
)
print
(wrap.text())
wrap.find(
'
p
'
).remove()
print
(
"
remove以后的數據
"
)
print
(wrap)
運行結果:
還有很多其他的DOM方法,想了解更多的小伙伴可以閱讀其官方文檔,地址:https://pyquery.readthedocs.io/en/latest/api.html
偽類選擇器:
#
!/usr/bin/env python
#
-*- coding: utf-8 -*-
#
DOM操作,偽類選擇器
html
=
"""
The Dormouse's story
Once upo a time were three little sister;and theru name were
Elsie
Lacie
and
Title
Title
...
"""
from
pyquery
import
PyQuery as pq
doc
=
pq(html)
#
print(doc)
wrap = doc(
'
a:first-child
'
)
#
第一個標簽
print
(wrap)
wrap
= doc(
'
a:last-child
'
)
#
最后一個標簽
print
(wrap)
wrap
= doc(
'
a:nth-child(2)
'
)
#
第二個標簽
print
(wrap)
wrap
= doc(
'
a:gt(2)
'
)
#
比2大的索引 標簽 即為 0 1 2 3 4 從0開始的 不是1
print
(wrap)
wrap
= doc(
'
a:nth-child(2n)
'
)
#
第 2的整數倍 個標簽
print
(wrap)
wrap
= doc(
'
a:contains(Lacie)
'
)
#
包含Lacie文本的標簽
print
(wrap)
這里不在詳細的一一列舉了,了解更多CSS選擇器可以查看官方文檔,由W3C提供地址:http://www.w3school.com.cn/css/index.asp
到這里我們就把pyQuery的使用方法大致的說完了,想了解更多,更詳細的可以閱讀官方文檔,地址:https://pyquery.readthedocs.io/en/latest/
上述代碼地址:https://gitee.com/dwyui/pyQuery.git
感謝大家的閱讀,不正確的地方,還希望大家來斧正,鞠躬,謝謝。
更多文章、技術交流、商務合作、聯系博主
微信掃碼或搜索:z360901061
微信掃一掃加我為好友
QQ號聯系: 360901061
您的支持是博主寫作最大的動力,如果您喜歡我的文章,感覺我的文章對您有幫助,請用微信掃描下面二維碼支持博主2元、5元、10元、20元等您想捐的金額吧,狠狠點擊下面給點支持吧,站長非常感激您!手機微信長按不能支付解決辦法:請將微信支付二維碼保存到相冊,切換到微信,然后點擊微信右上角掃一掃功能,選擇支付二維碼完成支付。
【本文對您有幫助就好】元

