作者celestialgod (天)
看板R_Language
标题Re: [问题] N*1资料 转 a*b格式
时间Tue Aug 22 23:04:10 2017
※ 引述《playaround (打滚)》之铭言:
: [问题类型]:
: N*1的资料 转换成M*16
: [软体熟悉度]:
: R初学
: [问题叙述]:
: 原始资料(csv档)资料大致是这样:
: time1
: a = 5
: b = 70
: c = "rest"
: ...
: ...
: time2
: a = 8
: b = 15
: c = "rest_2"
: ...
: ...
: 想要以16列为单位整理成M*16的矩阵
: 第一列是col标题
: 和每列前面的a,b,c等是row标题
: 类似这样:
: time a b c ...
: time1 5 70 "rest"
: time2 8 15 "rest_2"
: 有找一些指令好像都是以同col内同样资料来分组
: 所以不太知道目前需要做的这功能要怎麽处理
: 手机发文,排版请见谅
: 感谢大家
: -----
: Sent from JPTT on my Xiaomi MI 5.
给另外一种方法参考,然後教你怎麽做自动转型XD
dataStr <- 'time1
a = 5
b = 70
c = "rest"
time2
a = 8
b = 15
c = "rest_2"
time3
a = 1
b = 45
c = "rest_3"'
# 等同於前两位用readLines读档案的txt变数
txt <- strsplit(dataStr, "\n")[[1]]
# 把time也取代成同样的格式
txt[grepl("time", txt)] <- paste0("time = ", txt[grepl("time", txt)])
# 把每一列切割成 column name跟value两个,然後用cbind合并全部分割的资料
out <- do.call(cbind, strsplit(txt, "\\s+=\\s+"))
# 取得column names
columnNames <- unique(out[1, ])
# 把每一个column对应的value取成一个list
columnList <- lapply(columnNames, function(colname){
type.convert(out[2 , out[1, ] == colname]) # 取出对应名字的值并做自动转型
})
# 确定每一个栏位长度都一样
if (length(unique(sapply(out, length))) != 1)
stop("每个栏位的长度不一样,请检查资料")
# 给名字
names(columnList) <- columnNames
# 转成data.frame
resultDf <- as.data.frame(columnList)
# time a b c
# 1 time1 5 70 "rest"
# 2 time2 8 15 "rest_2"
# 3 time3 1 45 "rest_3"
> str(resultDf)
'data.frame': 3 obs. of 4 variables:
$ time: Factor w/ 3 levels "time1","time2",..: 1 2 3
$ a : int 5 8 1
$ b : int 70 15 45
$ c : Factor w/ 3 levels "\"rest\"","\"rest_2\"",..: 1 2 3
难得一篇完全没用套件XD
套件版:
library(data.table)
library(stringr)
library(pipeR)
txt <- strsplit(dataStr, "\n")[[1]]
txt[str_detect(txt, "time")] <- str_c("time = ", txt[str_detect(txt, "time")])
outDf <- txt %>>% str_detect("time") %>>% cumsum %>>%
cbind(do.call(rbind, str_split(txt, "\\s+=\\s+"))) %>>%
data.table %>>% setnames(c("id", "var", "value")) %>>%
`[`(j = id := NULL) %>>%
`[`(j = eval(names(.)) := lapply(.SD, type.convert))
# a b c time
# 1: 5 70 "rest" time1
# 2: 8 15 "rest_2" time2
# 3: 1 45 "rest_3" time3
> str(outDf)
Classes ‘data.table’ and 'data.frame': 3 obs. of 4 variables:
$ a : int 5 8 1
$ b : int 70 15 45
$ c : Factor w/ 3 levels "\"rest\"","\"rest_2\"",..: 1 2 3
$ time: Factor w/ 3 levels "time1","time2",..: 1 2 3
- attr(*, ".internal.selfref")=<externalptr>
--
R资料整理套件系列文:
magrittr #1LhSWhpH (R_Language) https://goo.gl/72l1m9
data.table #1LhW7Tvj (R_Language) https://goo.gl/PZa6Ue
dplyr(上.下) #1LhpJCfB,#1Lhw8b-s (R_Language) https://goo.gl/I5xX9b
tidyr #1Liqls1R (R_Language) https://goo.gl/i7yzAz
pipeR #1NXESRm5 (R_Language) https://goo.gl/zRUISx
--
※ 发信站: 批踢踢实业坊(ptt.cc), 来自: 111.253.88.5
※ 文章网址: https://webptt.com/cn.aspx?n=bbs/R_Language/M.1503414254.A.448.html
※ 编辑: celestialgod (111.253.88.5), 08/23/2017 01:04:19