当前位置：文档之家› R语言学习系列06-修改变量名,数据排序,随机抽样

R语言学习系列06-修改变量名,数据排序,随机抽样

06. 修改变量名，数据排序，随机抽样一、变量的重命名1. 用交互式编辑器若修改数据集x的变量名，键入函数fix(x)，即可打开交互式编辑器界面。

> score<-data.frame(student=c("A","B","C","D"),gende r=c("M","M","F","F"),math=c(90,70,80,60),Eng=c(88,78,69,9 8),pl=c(66,59,NA,88))>fix(score)>score.list<-as.list(score) #将score转化为列表>fix(score.list)（1）若数据集为矩阵或数据框将打开“数据编辑器”，单击要修改的变量名，在弹出的“变量编辑器”修改即可：（2）若数据集为列表将交互式编辑器为一个记事本，只需修改“.Names”之后对应的变量名即可：2. 用函数rename()reshape包中的函数rename()，用来修改数据库和列表的变量名，但不能修改矩阵的变量名，基本格式为：rename(x, c(oldname="newname",...))其中，oldname为原变量名，newname为新变量名。

library(reshape)>rename(score,c(pl="chinese"))student gender math Engchinese1 A M 90 88 662 B M 70 78 593 C F 80 69 NA4 D F 60 98 88>rename(score.list,c(pl="chinese"))$student[1] A B C DLevels: A B C D$gender[1] M M F FLevels: F M$math[1] 90 70 80 60$Eng[1] 88 78 69 98$chinese[1] 66 59 NA 88注意：原数据集中的变量名并未被修改。

3. 用函数names()和rename()一样可用来修改数据框和列表的变量名，不能修改矩阵的变量名；区别在于：names()会在原数据集中修改变量名。

其基本格式为：names(x)[i]<-"newname">names(score)[5]="chinese">scorestudent gender math Engchinese1 A M 90 88 662 B M 70 78 593 C F 80 69 NA4 D F 60 98 884. 用函数colnames()和rownames()用来修改矩阵的变量名（行名和列名），也能修改数据框的行名和列名。

基本格式为：rownames(x)[i]<-"newname">colnames(score)[5]="Chinese">scorestudent gender math Eng Chinese1 A M 90 88 662 B M 70 78 593 C F 80 69 NA4 D F 60 98 88>rownames(score)=letters[1:4]>scorestudent gender math Eng ChineseaA M 90 88 66bB M 70 78 59cC F 80 69 NAdD F 60 98 88二、数据排序1.函数sort()，基本格式：sort(x,decreasing=FALSE, st= FALSE,...)其中，x为排序对象（数值型或字符型）；decreasing默认为FALSE 即升序，TURE为降序；st默认为FALSE（NA值将被删除），若为TRUE，则将向量中的NA值放到序列末尾。

>sort(score$math)[1] 60 70 80 90>sort(score$math,decreasing = TRUE)[1] 90 80 70 60>sort(score$Chinese,st = TRUE)[1] 59 66 88 NA2.函数rank()返回值是该向量中对应元素的秩（排名），基本格式为：rank(x,st= FALSE,ties.method=...)其中，ties.method指定对数据集中的重复数据的秩的处理方式：“average”——取平均值秩（默认）“first”——位于前面的数据的秩取小，依次递增“random”——随机定义重复秩“max”——取最大重复秩“min”——取最小重复秩>x<-c(3,4,2,5,5,3,8,9)>rank(x)[1] 2.5 4.0 1.0 5.5 5.5 2.5 7.0 8.0>rank(x,ties.method = "first")[1] 2 4 1 5 6 3 7 8>rank(x,ties.method = "random")[1] 3 4 1 6 5 2 7 8>rank(x,ties.method = "max")[1] 3 4 1 6 6 3 7 83.函数order()对数据进行排序，返回值是对应“排名”的元素所在向量中的位置，即最小值、次小值、...、最大值所在的位置。

基本格式为：order(x,decreasing=FALSE, st= FALSE,...)不同于前两个函数，order()还可以对数据框进行排序：data_frame[order(data_frame$v1, data_frame$v2, …),]若v1值相同，则按v2升序排序；要将升序改为降序，在变量前添加负号，或用decreasing = TRUE即可。

>order(score$math)[1] 4 2 3 1>score[order(score$math),]student gender math Engchinese4 D F 60 98 882 B M 70 78 593 C F 80 69 NA1 A M 90 88 66>score[order(-score$math),]student gender math Engchinese1 A M 90 88 663 C F 80 69 NA2 B M 70 78 594 D F 60 98 884. 函数rev()求逆序，将序列进行反转，即1,2,3变成3,2,1三、简单随机抽样用少量数据测试数据集时，常用随机抽样方法从整体中选出部分样本数据。

简单随机抽样，是指从总体N个样本中任意抽取n个样本，每个样本被抽中的概率相等；分为重复抽样（有放回）、不重复抽样（不放回）。

使用sampling包实现。

1. 有放回简单随机抽样函数srswr()，基本格式为：srswr(n, N)表示从总体N中有放回地随机抽取n个样本，返回一个长度为N的向量，每个分量分别表示各元素被抽取到的次数。

>library(sampling)>LETTERS[1] "A" "B" "C" "D" "E" "F" "G" "H" "I" "J" "K"[12] "L" "M" "N" "O" "P" "Q" "R" "S" "T" "U" "V"[23] "W" "X" "Y" "Z">s<-srswr(10,26)>s[1] 2 0 1 1 0 0 0 0 1 0 0 2 0 0 0 3 0 0 0 0 0 0 0[24] 0 0 0>ind<-(1:26)[s!=0] #被抽到的样本编号>ind[1] 1 3 4 9 12 16>n<-s[s!=0] #被抽到的样本的被抽到的次数>n[1] 2 1 1 1 2 3>ind<-rep(ind,times=n) #按次数重复被抽到的样本编号>ind[1] 1 1 3 4 9 12 12 16 16 16>sample<-LETTERS[ind] #被抽到的字母>sample[1] "A" "A" "C" "D" "I" "L" "L" "P" "P" "P"2. 不放回简单随机抽样函数srswor()，格式和返回值同srswr()，注意返回值向量中只有0和1.>s<-srswor(10,26)>s[1] 1 0 0 0 0 1 0 1 0 0 1 0 0 0 0 0 0 1 1 1 0 0 1[24] 1 0 1>ind<-(1:26)[s!=0]>ind[1] 1 6 8 11 18 19 20 23 24 26>sample<-LETTERS[ind]>sample[1] "A" "F" "H" "K" "R" "S" "T" "W" "X" "Z"3. 函数simple()实现有放回和不放回的简单随机抽样，基本格式为：sample(x, size, replace = FALSE)其中，x为数据集；size为抽取样本数；replace指定是否放回，默认为FALSE（不放回），TURE为有放回。

e商务文档

R语言学习系列06-修改变量名,数据排序,随机抽样

相关文档推荐：