作者babysian7 (Babysian)
看板R_Language
标题[问题] dataframe include date with caret
时间Tue Nov 3 04:18:09 2015
文章分类提示:
- 问题: 当你想要问问题时,请使用这个类别
[问题类型]:
程式谘询(我想用R 做某件事情,但是我不知道要怎麽用R 写出来)
[软体熟悉度]:
入门
[问题叙述]:
我有一个dataframe,里面包含日期变数,
'data.frame': 1000 obs. of 49 variables:
$ estate_Post : int 10069 10065 10044 10044 10044 10045 10044
10045 10044 10045 ...
$ estate_TransType : int 3 1 4 2 4 4 4 4 4 4 ...
$ estate_LandArea : num 15.54 47.3 20.89 1.99 23.98 ...
$ estate_ZoneUse : int 2 2 3 3 3 3 3 3 3 3 ...
$ estate_TransDate : Date, format: "1989-03-01" "1998-01-01"
"2015-01-01" "2015-01-01" ...
$ estate_Land : int 1 1 1 0 1 1 1 1 1 1 ...
$ estate_House : int 1 0 1 0 1 1 1 1 1 1 ...
$ estate_ParkingLot : int 0 0 2 2 2 1 3 3 4 3 ...
$ estate_TransFloor : int 5 -99 17 -4 11 6 6 5 15 5 ...
$ estate_TotalFloor : int 5 -99 31 31 31 31 31 31 31 31 ...
$ estate_HouseType : int 1 12 2 12 2 2 2 2 2 2 ...
$ estate_HouseUse : int 1 -99 1 3 1 1 1 1 1 1 ...
$ estate_HouseMaterials: int 5 -99 13 13 13 13 13 13 13 13 ...
$ estate_HouseDate : Date, format: "1967-05-19" NA "2013-11-29"
"2013-11-29" ...
$ estate_HouseArea : num 35.1 0 442.7 62.1 507.1 ...
$ estate_HouseRoom_1 : int 1 0 5 0 5 4 4 4 3 4 ...
$ estate_HouseRoom_2 : int 1 0 2 0 2 2 2 2 2 2 ...
$ estate_HouseRoom_3 : int 1 0 6 0 6 3 3 3 3 3 ...
$ estate_HouseRoom_4 : int 1 1 1 1 1 1 1 1 1 1 ...
$ estate_Guards : int 2 2 2 2 2 2 2 2 2 2 ...
$ estate_Price : int 3535 54299 164882 -99 195808 181428 174799
175356 190717 165250 ...
$ estate_ParkingType : int -99 -99 3 4 3 4 4 4 4 4 ...
$ estate_ParkingArea : num 0 0 13.2 32.2 27.5 ...
$ estate_ParkingPrice : int 0 0 0 5600000 0 0 0 0 8400000 0 ...
$ estate_Lng : num 122 122 122 122 122 ...
$ estate_Lat : num 25 25 25 25 25 ...
$ Aport_Distance : num 7.3 6.7 5.3 5.3 5.3 5.3 5.3 5.3 5.3 5.3 ...
$ ParkB_Distance : num 0.29 0.785 0.214 0.217 0.215 ...
$ Univ_Distance : num 1.7 1 1 1 1 1 1 1 1 1 ...
$ ParkR_Distance : num 1.4 2 1.7 1.7 1.7 1.6 1.7 1.7 1.7 1.6 ...
$ MRT_StationDistance : num 0.914 0.327 0.403 0.401 0.402 ...
$ MRT_LineDistance : num 999 999 999 999 999 999 999 999 999 999 ...
$ Fway_EntranceDistance: int 999 999 999 999 999 999 999 999 999 999 ...
$ Fway_LineDistance : int 999 999 999 999 999 999 999 999 999 999 ...
$ TRA_StationDistance : num 1 1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 ...
$ THSR_StationDistance : num 3.1 2.5 1.1 1.1 1.1 1.1 1.1 1.1 1.1 1.1 ...
$ River_Distance : num 999 1.84 1.49 1.48 1.49 ...
$ Schools_Distance : num 0.2 0.2 0.7 0.7 0.7 0.8 0.7 0.7 0.7 0.8 ...
$ Lib_Distance : num 0.8 0.9 1.2 1.2 1.2 1.2 1.2 1.2 1.2 1.2 ...
$ Sport_Distance : num 2.4 1.8 0.9 0.9 0.9 0.8 0.9 0.9 0.9 0.8 ...
$ ParkS_Distance : num 0.6 1 0.6 0.6 0.6 0.7 0.6 0.6 0.6 0.7 ...
$ Hyper_Distance : num 1.3 0.6 1.2 1.2 1.2 1.1 1.2 1.2 1.2 1.1 ...
$ Shop_Distance : num 1.7 1 0.5 0.5 0.5 0.4 0.5 0.5 0.5 0.4 ...
$ Post_Distance : num 0.5 0.2 0.5 0.5 0.5 0.4 0.5 0.5 0.5 0.4 ...
$ Hosp_Distance : num 0.7 0.4 0.9 0.9 0.9 0.8 0.9 0.9 0.9 0.8 ...
$ Gas_Distance : num 0.5 0.4 1.4 1.4 1.4 1.4 1.4 1.5 1.4 1.4 ...
$ Incin_Distance : num 10.9 10.2 8.9 8.9 8.9 8.9 8.9 8.9 8.9 8.9 ...
$ Mort_Distance : num 6.3 5.7 4.3 4.3 4.3 4.3 4.3 4.3 4.3 4.3 ...
$ estate_TotalPrice : num 124117 2568347 73000000 5600000 99300000 ...
当我将日期变数写成as.Date後,在挑选参数时会有错误讯息
Error in { :
task 1 failed - "rfe is expecting 48 importance values but only has 46"
In addition: Warning messages:
1: In predict.lm(object, x) :
prediction from a rank-deficient fit may be misleading
请问我该怎麽改才好
[程式范例]:
library(mlbench)
library(caret)
library(maps)
library(rgdal)
library(raster)
library(sp)
library(spdep)
library(GWmodel)
library(e1071)
library(plyr)
library(kernlab)
library(zoo)
mydata <-
read.csv("E:/SupportVectorRegression/Realestatedata_1000_delete_date.csv",
header=TRUE)
mydata$estate_TransDate<-as.Date(paste(mydata$estate_TransDate,1,sep="-"),format="%Y-%m-%d")
mydata$estate_HouseDate<-as.Date(mydata$estate_HouseDate,format="%Y-%m-%d")
rfectrl <- rfeControl(functions=lmFuncs,
method="cv",number=10,verbose=TRUE,returnResamp = "final")
results <- rfe(mydata[,1:4],mydata[,49],sizes =
c(1:49),rfeControl=rfectrl,method = "svmRadial")
#metric = "Rsquared"
print(results)
predictors(results)
plot(results, type=c("g", "o"))
[环境叙述]:
R version 3.2.2 (2015-08-14)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 8 x64 (build 9200)
[关键字]:
caret、dataframe、date
--
※ 发信站: 批踢踢实业坊(ptt.cc), 来自: 60.250.235.236
※ 文章网址: https://webptt.com/cn.aspx?n=bbs/R_Language/M.1446495492.A.7CB.html
※ 编辑: babysian7 (60.250.235.236), 11/03/2015 04:23:00
1F:推 celestialgod: 算correlation看看是不是有两个变数跟其他变数相关 11/03 08:40
2F:推 celestialgod: 系数很高 11/03 08:40
3F:推 celestialgod: 这个真像实价等登录的资料 11/03 08:43
4F:→ celestialgod: 感觉是input date出错,date是你的变数之一吗? 11/03 09:07
5F:→ babysian7: 您好,里面的两个变数date型态,我想把他们当作input, 11/03 13:42
6F:→ babysian7: 但不知道是哪里出错了 11/03 13:42
8F:→ celestialgod: 跟我想法一致XDD 11/03 14:08
9F:→ celestialgod: 我自己去生成date去跑没问题 他当成整数在run 11/03 14:09
10F:→ celestialgod: 应该是你资料有一部分是相依 11/03 14:09
11F:→ celestialgod: 我也试过NA没有问题 11/03 14:09
12F:→ babysian7: 您好:谢谢您的解答。另外在更改的过程中有新的问题, 11/06 16:58
13F:→ babysian7: 我把NA的部分都改掉,错误讯息是missing value where T 11/06 16:58
14F:→ babysian7: RUE/FALSE needed In adition:There were20 warnings( 11/06 16:58
15F:→ babysian7: use warnings() to see them) 11/06 16:58
16F:→ babysian7: 不是很明白,因为我的资料都是连续型的数值,没有TRUE/ 11/06 17:00
17F:→ babysian7: FALSE... 11/06 17:00
18F:推 celestialgod: 没看到程式 我也无法隔空抓药 如果能附资料一起 我 11/07 11:25
19F:推 celestialgod: 才能重现错误 并尝试找出解决方法 11/07 11:25
20F:→ babysian7: 您好:我将资料整理好如下 11/11 13:35
22F:→ babysian7: NN8GKdVqkgOM6OQ-a?dl=0 11/11 13:35
23F:→ babysian7: 谢谢 11/11 13:35
24F:→ celestialgod: 放弃~"~ 不知道怎麽办qq 11/12 21:45
25F:→ celestialgod: 写信去问作者吧QQ 11/12 21:45
26F:→ babysian7: 还是谢谢您拨空帮忙:) 11/13 13:00