作者livehorse (新来的PTT肥宅)
看板Python
标题[问题] beautifulsoup 上的 find() takes no key
时间Wed Dec 14 01:12:01 2022
[问题] beautifulsoup 上的 find() takes no keyword argument 如何解决
请问各位神人
我最近刚开始学习如何用python爬文章於是挑了mobile01当目标
但是遇到了 find() takes no keyword argument不知道如何解决,我上卓查了一些类似的文章说要改成find_all,但是却返回”str”没有find_all这个attribute
更何况我只要爬一个特定目标应该不会是find_all才对
以下程式码
url="
https://www.mobile01.com/newtopics.php?mode=newtopic"
mWeb = openpyxl.load_workbook("mobile.xlsx")
ws = mWeb.active
for a in range(1,6):
#建立一个requet物件,附加request Headers 的资讯,用request去打开网址
request=req.Request(url,headers={
"User-Agent":"Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Mobile Safari/537.36"
})
with req.urlopen(request) as response:
data=response.read().decode("utf-8")
#解析原始码,取得每篇文章的标题(抓全区热门文章)
#data是网页的html原始码
#root表示整份网页
root=bs4.BeautifulSoup(data,"html.parser")#data是透过网路抓下来的资料(html原始码)丢给bs4会用html解析
titleLinks = root.find_all("div",class_="c-articleItem__title")
page = root.find("a",class_="c-pagination c-pagination--next")
for titleLink in titleLinks:
titles = titleLink.a.text
articleLink = "
https://www.mobile01.com/" + titleLink.a["href"]
ws.cell(i,1,i)
ws.cell(i,2,titles)
ws.cell(i,3,articleLink)
mWeb.save("mobile.xlsx")
request=req.Request(articleLink,headers={
"User-Agent":"Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/106.0.0.0 Mobile Safari/537.36"
})
with req.urlopen(request) as response:
inner = response.read().decode("utf-8")
body = inner.find ("div" ,itemprop = "articleBody")
article = body.text
ws.cell(i,6,article)
print(article)
mWeb.save("mobile.xlsx")
i = i+1
n = 1
#抓时间作者资料
titleInfos = root.find_all("div",class_="c-articleItemRemark__wAuto")
for titleInfo in titleInfos:
author = titleInfo.div.a.text
timeInfo = titleInfo.div.next_sibling.text
ws.cell(n,4,author)
ws.cell(n,5,timeInfo)
mWeb.save("mobile.xlsx")
n = n+1
url = "https://www.mobile01.com/" + page["href"]
但我单把那行抓出来开另个档案request测试却又可以抓到文章
root=bs4.BeautifulSoup(data,"html.parser")#data是透过网路抓下来的资料(html原始码)丢给bs4会用html解析
body = root.find("div" , itemprop="articleBody")
article = body.text
print(article)
我不太理解为什麽
先谢过回答的各位了
-----
Sent from JPTT on my Samsung SM-N960F.
--
※ 发信站: 批踢踢实业坊(ptt.cc), 来自: 163.22.18.74 (台湾)
※ 文章网址: https://webptt.com/cn.aspx?n=bbs/Python/M.1670951523.A.F7C.html
※ 编辑: livehorse (163.22.18.74 台湾), 12/14/2022 01:16:19
※ 编辑: livehorse (163.22.18.74 台湾), 12/14/2022 01:18:02
※ 编辑: livehorse (163.22.18.74 台湾), 12/14/2022 01:30:47
※ 编辑: livehorse (163.22.18.74 台湾), 12/14/2022 01:33:00
1F:→ lycantrope: 那不是bs4的find而是str的find 12/14 10:03
2F:→ blc: inner是str 12/14 11:24