作者: justin760204 (华华) 看板: R_Language
标题: [问题] 写crawler但抓不到ajax post网页
时间: Sat Aug 22 01:52:28 2015
ctrl + y 可以删除一整行,请将不需要的内容删除
文章分类提示:
- 问题: 小弟希望用R写一个能够自动上网抓取上市公司财报并做分析的程式
不过写抓资料的程式遇到了瓶颈无法突破, 拜托各位大大指点迷津
Cockie及Referer均已加入但Server仍无Reply
[问题类型]:
程式谘询(我想用R 做某件事情,但是我不知道要怎麽用R 写出来)
[软体熟悉度]:
请把以下不需要的部份删除
入门(写过其他程式,只是对语法不熟悉)
[问题叙述]:
请简略描述你所要做的事情,或是这个程式的目的
[程式码]
library(XML)
library(RCurl)
# 设定Coockie参数
curlHandle = getCurlHandle()
# 进入公开资讯观测站XBRL综合损益表网页抓取Coockie
url = URLencode("http://mops.twse.com.tw/mops/web/t164sb04")
getURL(url, curl = curlHandle, .encoding='utf8')
# 送出POST表单(此以抓取台积电(2330), 104年 01季财报为例)
ajax_url = URLencode("http://mops.twse.com.tw/mops/web/ajax_t164sb04")
html = postForm(ajax_url, encodeURIComponent = "1", step = "1", firstin =
"1", off = "1", keyword4 = "", code1 = "", TYPEK2 = "", checkbtn = "",
queryName = "co_id", TYPEK = "all",
isnew = "false", co_id = "2330", year =
"104", season = "01", .opts = curlOptions(referer="http://mops.twse.com.tw/mops/web/t164sb04", useragent ="Mozilla/5.0 (Windows NT 6.3; WOW64) AppleWebKit/537.36 (KHTML, like Gecko)Chrome/44.0.2403.89 Safari/537.36"), curl = curlHandle, .encoding='utf8')
cat(html, file = "twse.html")
执行总是回复 : in function (type, msg, asError = TRUE) : Empty reply from server
[环境叙述]:
sessionInfo()
R version 3.2.1 (2015-06-18)
Platform: i386-w64-mingw32/i386 (32-bit)
Running under: Windows 8 x64 (build 9200)
locale:
[1] LC_COLLATE=Chinese (Traditional)_Taiwan.950
[2] LC_CTYPE=Chinese (Traditional)_Taiwan.950
[3] LC_MONETARY=Chinese (Traditional)_Taiwan.950
[4] LC_NUMERIC=C
[5] LC_TIME=Chinese (Traditional)_Taiwan.950
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] RCurl_1.95-4.7 bitops_1.0-6 XML_3.98-1.3
loaded via a namespace (and not attached):
[1] tools_3.2.1
附上我的header :
https://drive.google.com/open?id=0B_eIxe-HNv0KM0FLRWdMenUxUVU
找了很久总是无法解决, 拜托各位大大指点迷津, 大大感谢 <(_ _)>
--
※ 发信站: 批踢踢实业坊(ptt.cc), 来自: 140.112.125.138
※ 文章网址: https://webptt.com/cn.aspx?n=bbs/R_Language/M.1440179550.A.FD0.html
※ 编辑: justin760204 (140.112.125.138), 08/22/2015 01:56:28
※ 编辑: justin760204 (140.112.125.138), 08/22/2015 01:57:02
※ 编辑: justin760204 (140.112.125.138), 08/22/2015 01:58:23
※ 编辑: justin760204 (140.112.125.138), 08/22/2015 01:59:30
※ 编辑: justin760204 (140.112.125.138), 08/22/2015 02:15:53
※ 编辑: justin760204 (140.112.125.138), 08/22/2015 02:18:00