作者celestialgod (天)
看板R_Language
标题Re: [问题] 新手XML批次转CSV问题
时间Wed Nov 9 21:24:12 2016
※ 引述《qq9966pp (神鸡大人)》之铭言:
: [问题类型]:
: 程式谘询(我想用R做某件事,但是我不知道要怎麽用R写出来
: [软体熟悉度]:
: 入门(写过其他程式,只是对语法不熟悉)
: [问题叙述]:
: 各位先进好,我想用R批次把XML档转成CSV档
: 但是不知道该怎麽做
: 恳求各位帮忙 感谢先
我抓W3school的XML当示范,自己改成自己要的吧
library(plyr)
library(xml2)
library(data.table) # 1.9.7 其他版本没有fwrite
library(pipeR)
sink('file1.xml')
cat('<?xml version="1.0"?>
<catalog>
<book id="bk101">
<author>Gambardella, Matthew</author>
<title>XML Developer\'s Guide</title>
<genre>Computer</genre>
<price>44.95</price>
<publish_date>2000-10-01</publish_date>
<description>An in-depth look at creating applications with
XML.</description>
</book>
<book id="bk102">
<author>Ralls, Kim</author>
<title>Midnight Rain</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2000-12-16</publish_date>
<description>A former architect battles corporate zombies,
an evil sorceress, and her own childhood to become queen
of the world.</description>
</book>
<book id="bk103">
<author>Corets, Eva</author>
<title>Maeve Ascendant</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2000-11-17</publish_date>
<description>After the collapse of a nanotechnology
society in England, the young survivors lay the
foundation for a new society.</description>
</book>
</catalog>')
sink()
sink('file2.xml')
cat('<?xml version="1.0"?>
<catalog>
<book id="bk104">
<author>Corets, Eva</author>
<title>Oberon\'s Legacy</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2001-03-10</publish_date>
<description>In post-apocalypse England, the mysterious
agent known only as Oberon helps to create a new life
for the inhabitants of London. Sequel to Maeve
Ascendant.</description>
</book>
<book id="bk105">
<author>Corets, Eva</author>
<title>The Sundered Grail</title>
<genre>Fantasy</genre>
<price>5.95</price>
<publish_date>2001-09-10</publish_date>
<description>The two daughters of Maeve, half-sisters,
battle one another for control of England. Sequel to
Oberon\'s Legacy.</description>
</book>
<book id="bk106">
<author>Randall, Cynthia</author>
<title>Lover Birds</title>
<genre>Romance</genre>
<price>4.95</price>
<publish_date>2000-09-02</publish_date>
<description>When Carla meets Paul at an ornithology
conference, tempers fly as feathers get ruffled.</description>
</book>
</catalog>')
sink()
list.files(pattern = "\\.xml") %>>%
llply(function(xmlFile){
read_xml(xmlFile) %>>% xml_children %>>% llply(function(node){
tmp <- xml_children(node)
xml_text(tmp) %>>% `names<-`(xml_name(tmp)) %>>% c(xml_attrs(node))
}) %>>% do.call(what = rbind) %>>% data.table
}) %>>% rbindlist %>>% fwrite("output.csv")
不过我保证出来的csv有问题XD,因为xml资料里面满满的逗号
这只是一个范例自行参考(摊手
--
R资料整理套件系列文:
magrittr #1LhSWhpH (R_Language) https://goo.gl/OBto1x
data.table #1LhW7Tvj (R_Language) https://goo.gl/QFtp17
dplyr(上.下) #1LhpJCfB,#1Lhw8b-s (R_Language) https://goo.gl/GcfNoP
tidyr #1Liqls1R (R_Language) https://goo.gl/pcq5nq
pipeR #1NXESRm5 (R_Language) https://goo.gl/cDIzTh
--
※ 发信站: 批踢踢实业坊(ptt.cc), 来自: 36.232.185.200
※ 文章网址: https://webptt.com/cn.aspx?n=bbs/R_Language/M.1478697856.A.730.html
※ 编辑: celestialgod (36.232.185.200), 11/09/2016 21:35:09
1F:推 qq9966pp: 谢谢大大~我试试看 11/10 13:55