当前位置:文档之家› 数据挖掘作业

数据挖掘作业

一:用R语言编程实现P56页19题
以19(2)为例编写R语言程序,其他小题程序类似1.余弦相似度
> x=c(0,1,0,1)
> y=c(1,0,1,0)
> xy=sum(x*y)
> x1=sqrt(sum(x^2))
> y1=sqrt(sum(y^2))
> c=xy/(x1*y1)
> c
[1] 0
2.相关性
> x=c(0,1,0,1)
> y=c(1,0,1,0)
> xbar=mean(x)
> ybar=mean(y)
> len=length(x)
> sx=sqrt((1/(len-1))*sum((x-xbar)^2))
> sy=sqrt((1/(len-1))*sum((y-ybar)^2))
> sxy=(1/(len-1))*sum((x-xbar)*(y-ybar))
> corrxy=sxy/(sx*sy)
> corrxy
3.欧几里得距离
> x=c(0,1,0,1)
> y=c(1,0,1,0)
> dxy=sqrt(sum((x-y)^2)) > dxy
[1] 2
4.Jaccard系数
> x=c(0,1,0,1)
> y=c(1,0,1,0)
> f00=f01=f10=f11=0
> len=length(x)
> j=1
> while(j<len+1)
+ {if(x[j]==0&y[j]==0) + f00=f00+1
+ if(x[j]==0&y[j]==1)
+ f01=f01+1
+ if(x[j]==1&y[j]==0)
+ f10=f10+1
+ if(x[j]==1&y[j]==1)
+ f11=f11+1
> Jaccard=f11/(f10+f01+f11)
> Jaccard
[1] 0
其他小题运算结果:
(1)c= 1; corr=NaN;dxy=2
(2)c=0;corr=-1;dxy=2;Jaccard=0 (3)c=0;corr=0;dxy=2
(4)c=0.75;corr=0.25;Jaccard=0.6 (5)c=0;corr= -1.433292e-17
二.学习数据导入方法
1.导入文本文件
> a<-read.table("e:/R/r1.txt")
> a
V1 V2 V3 V4
1 16.85 12.35 42.3
2 0.37
2 22.00 15.30 46.51 0.76
3 8.97 7.98 30.36 0.17
4 10.2
5 8.99 40.44 0.46
5 20.81 20.00 35.87 0.43
2.导入excel数据
> b<-read.table("e:/R/r2.csv")
> b
V1
1 16.85,12.35,42.32,0.37
2 22,15.3,46.51,0.76
3 8.97,7.98,30.36,0.17
4 10.25,8.99,40.44,0.46
5 20.81,20,35.87,0.43
3.导入spss数据
> library(Hmisc)
> c<-spss.get("e:/R/r3.sav")
警告信息:
In read.spss(file, bels = bels, to.data.frame = to.data.frame, :
e:/R/r3.sav: Unrecognized record type 7, subtype 18 encountered in system file
> c
VAR00001 VAR00002 VAR00003 VAR00004
1 16.85 12.35 42.3
2 0.37
2 22.00 15.30 46.51 0.76
3 8.97 7.98 30.36 0.17
4 10.2
5 8.99 40.44 0.46
5 20.81 20.00 35.87 0.43。

相关主题