关于三种不同鸢尾花类的平行坐标轴分析天津大学3014218071 王汉超摘要:该文借助数学模型课上讲的平行坐标轴表示法表示三类鸢尾花(Iris)的特征图像。
该文整理了费舍尔(Fisher)文章中的数据,对数据集包含的3个类以及每类的50个实例进行整理,并使用Matlab呈现在平行坐标轴上,并对平行坐标轴属性的优先度进行直观的排序。
得到了非常直观的结果,并表明花瓣宽度和花瓣长度(优先度顺序从大到小)可能是对鸢尾花进行区分的重要特征。
关键词:鸢尾花平行坐标轴特征属性Parallel coordinates for three different classes of IrisAbstract:This paper use Parallel Coordinates Axis learned in professer Lv’s mathematical model class to represent the features of three classes of Iris. This paper considers Fisher’s data, which contains 3 classes and 50 instances for each class. Then it uses Matlab to show the datas in the parallel coordinates axis, and get the important feature: petal width and petal length (ordered by decreasing priority).Key word:Iris features parallel coordinates目录关于三种不同鸢尾花类的平行坐标轴分析 (1)摘要: (1)关键词: (1)资料来源: (3)来源&数据集信息: (4)属性信息: (4)工作步骤: (5)1.数据导入 (5)2.视图转化 (6)3.视图优化与结论评价 (7)4.改进方向 (10)参考文献: (11)附录: (12)最终代码 (12)数据列表: (13)资料来源:来源&数据集信息:创建者:R.A.费舍尔;捐助者:迈克尔.马歇尔(MARSHALL%PLU '@' ).这一数据从费舍尔的文章提供的数据有所不同(由史蒂夫·查德威克鉴定, spchadwick '@')该数据集包含3个类,每类50个实例,其中每个类是指一类鸢尾花(Iris)。
前两类线性可分,后两类彼此非线性可分。
备注:第35样品应该是:4.9,3.1,1.5,0.2,“Iris setosa”,其中的错误是在第四特征。
第38样品:4.9,3.6,1.4,0.1,“Iris setosa”,其中的错误是在第二和第三特征。
属性信息:1. 萼片长度: Ls (cm);2. 萼片宽度: Ws (cm);3. 花瓣长度: Lp (cm);4. 花瓣宽度: Wp (cm);5. 类:Iris SetosaIris VersicolourIris Virginica工作步骤:1.数据导入从来源网站下载文档“iris.data”到文件夹“D:\Program_Files”中。
直接用Matlab读入数据:>> [speal_length, speal_width, petal_length, petal_width, class] = textread('D:\Program_Files\iris.data', '%n%n%n%n%s', 'delimiter', ',');对种类进行赋值替换(Iris-setosa = 1; Iris-versicolor = 2; Iris-virginica = 3;), >> classNo = strrep(class, 'Iris-setosa', '1');>> classNo = strrep(classNo, 'Iris-versicolor', '2');>> classNo = strrep(classNo, 'Iris-virginica', '3');>> classNo = strvcat(classNo);>> classNo = str2num(classNo);得到5个150x1 double 的列向量:2.视图转化将5个列向量依次并入矩阵IrisTable中>> IrisTable = classNo;>> IrisTable = [IrisTable, speal_length];>> IrisTable = [IrisTable, speal_width];>> IrisTable = [IrisTable, petal_length];>> IrisTable = [IrisTable, petal_width];进行平行坐标轴视图转化>> x=[1,2,3,4,5];>> for i=1:150y = IrisTable(i,:);plot (x,y)hold onend>> set(gca,'xtick',0:1:5)>> set(gca,'XTickLabel',{ 'petal width', 'Class', 'sepal length', 'sepalwidth', 'petal length', })// 注:命名顺序默认从0开始,因为图片中轴编号顺序从1开始,故命名时应循环错后一位。
于是得到下图,直观分析得出后两项(花瓣宽度,花瓣长度)较为整齐,前两项穿插较多,尤其第二项规律性不明显。
3.视图优化与结论评价对矩阵IrisTable的三种鸢尾花各属性值的平均值相对差值M和标准差Sa, Sb, Sc进行分析>> A = IrisTable(1:50,1:5);>> B = IrisTable(51:100,1:5);>> C = IrisTable(101:150,1:5);>> Ma = mean(A, 1);>> Mb = mean(B, 1);>> Mc = mean(C, 1);>> M = abs(Ma - Mb) + abs(Mb - Mc) + abs(Mc - Ma);>> Sa = std(A, 0, 1);>> Sb = std(B, 0, 1);>> Sc = std(C, 0, 1);得到:以各标准差总体由小到大,M由大到小给出最优次序: 类,花瓣宽度,花瓣长度,花萼长度,花萼宽度.重新处理:>> classNo = strrep(class, 'Iris-setosa', '1');>> classNo = strrep(classNo, 'Iris-versicolor', '2');>> classNo = strrep(classNo, 'Iris-virginica', '3');>> IrisTable = str2num (strvcat(classNo));>> IrisTable = [IrisTable, petal_width];>> IrisTable = [IrisTable, petal_length];>> IrisTable = [IrisTable, speal_length];>> IrisTable = [IrisTable, speal_width];生成优化后图像:>> x=[1,2,3,4,5];hold ony = IrisTable(i, :);plot (x, y, 'r')end>> for i = 51:100hold ony = IrisTable(i, :);plot (x, y, 'g')end>> for i = 101:150hold ony = IrisTable(i,:);plot (x, y, 'b')end>> set(gca, 'xtick', 0:1:5)>> set(gca, 'XTickLabel', {'sepal width', 'Class', 'petal width', 'petal length', 'sepal length'})>> for i = 2:4hold onplot([i,i],[0 8],'k')endfor j = 1:7hold onplot(i, j, '+k')endend调整后可以看出有一条蓝线基本出现在绿线聚集区,事实上有两个我这样的样例在后面四个轴特征与分类不是很相符,这与资料中的备注相吻合。
4.改进方向本文还可以按照吕老师课上提及的对特征量间的互信息方面进行考量。
参考文献:[1] UCI Data Set:https:///ml/datasets/Iris[2] Fisher,R.A. "The use of multiple measurements in taxonomic problems"Annual Eugenics, 7, Part II, 179-188 (1936); also in "Contributions toMathematical Statistics" (John Wiley, NY, 1950)./paper/2fb499aa4d6a7071a6ba53c679ccca7055813114[3] Duda,R.O., & Hart,P.E. (1973) Pattern Classification and Scene Analysis.(Q327.D83) John Wiley & Sons. ISBN 0-471-22361-1. See page 218./paper/e6b7a3a8c46efef785a6ab735be07dafa0713ff3[4] Dasarathy, B.V. (1980) "Nosing Around the Neighborhood: A New SystemStructure and Classification Rule for Recognition in Partially ExposedEnvironments". IEEE Transactions on Pattern Analysis and MachineIntelligence, Vol. PAMI-2, No. 1, 67-71./paper/acf9d77f6470a326f784fd50b08b7dd60be5fb9a[5] Gates, G.W. (1972) "The Reduced Nearest Neighbor Rule". IEEETransactions on Information Theory, May 1972, 431-433./paper/876f54b2ebfecb6a796590237abf245cf28d3c74See also: 1988 MLC Proceedings, 54-64.附录:最终代码:[speal_length,speal_width,petal_length,petal_width,class]= textread('D:\Program_Files\iris.data', '%n%n%n%n%s', 'delimiter', ','); classNo = strrep(class, 'Iris-setosa', '1');classNo = strrep(classNo, 'Iris-versicolor', '2');classNo = strrep(classNo, 'Iris-virginica', '3');IrisTable = str2num (strvcat(classNo));IrisTable = [IrisTable, petal_width];IrisTable = [IrisTable, petal_length];IrisTable = [IrisTable, speal_length];IrisTable = [IrisTable, speal_width];x=[1,2,3,4,5];for i = 1:50hold ony = IrisTable(i, :);plot (x, y, 'r')endfor i = 51:100hold ony = IrisTable(i, :);plot (x, y, 'g')endfor i = 101:150hold ony = IrisTable(i,:);plot (x, y, 'b')endset(gca, 'xtick', 0:1:5)set(gca,'XTickLabel',{'sepal width','Class','petal width','petal length','sepal length'}) for i = 2:4hold onplot([i,i],[0 8],'k')endfor i = 2:4for j = 1:7hold onplot(i, j, '+k')endend数据列表:5.1,3.5,1.4,0.2,Iris-setosa4.9,3.0,1.4,0.2,Iris-setosa4.7,3.2,1.3,0.2,Iris-setosa4.6,3.1,1.5,0.2,Iris-setosa5.0,3.6,1.4,0.2,Iris-setosa5.4,3.9,1.7,0.4,Iris-setosa4.6,3.4,1.4,0.3,Iris-setosa5.0,3.4,1.5,0.2,Iris-setosa 4.4,2.9,1.4,0.2,Iris-setosa4.9,3.1,1.5,0.1,Iris-setosa5.4,3.7,1.5,0.2,Iris-setosa 4.8,3.4,1.6,0.2,Iris-setosa 4.8,3.0,1.4,0.1,Iris-setosa4.3,3.0,1.1,0.1,Iris-setosa5.8,4.0,1.2,0.2,Iris-setosa 5.7,4.4,1.5,0.4,Iris-setosa 5.4,3.9,1.3,0.4,Iris-setosa 5.1,3.5,1.4,0.3,Iris-setosa 5.7,3.8,1.7,0.3,Iris-setosa 5.1,3.8,1.5,0.3,Iris-setosa 5.4,3.4,1.7,0.2,Iris-setosa 5.1,3.7,1.5,0.4,Iris-setosa4.6,3.6,1.0,0.2,Iris-setosa5.1,3.3,1.7,0.5,Iris-setosa4.8,3.4,1.9,0.2,Iris-setosa5.0,3.0,1.6,0.2,Iris-setosa 5.0,3.4,1.6,0.4,Iris-setosa 5.2,3.5,1.5,0.2,Iris-setosa 5.2,3.4,1.4,0.2,Iris-setosa 4.7,3.2,1.6,0.2,Iris-setosa4.8,3.1,1.6,0.2,Iris-setosa5.4,3.4,1.5,0.4,Iris-setosa 5.2,4.1,1.5,0.1,Iris-setosa 5.5,4.2,1.4,0.2,Iris-setosa4.9,3.1,1.5,0.1,Iris-setosa5.0,3.2,1.2,0.2,Iris-setosa 5.5,3.5,1.3,0.2,Iris-setosa 4.9,3.1,1.5,0.1,Iris-setosa4.4,3.0,1.3,0.2,Iris-setosa5.1,3.4,1.5,0.2,Iris-setosa 5.0,3.5,1.3,0.3,Iris-setosa 4.5,2.3,1.3,0.3,Iris-setosa4.4,3.2,1.3,0.2,Iris-setosa5.0,3.5,1.6,0.6,Iris-setosa 5.1,3.8,1.9,0.4,Iris-setosa4.8,3.0,1.4,0.3,Iris-setosa5.1,3.8,1.6,0.2,Iris-setosa4.6,3.2,1.4,0.2,Iris-setosa5.3,3.7,1.5,0.2,Iris-setosa 5.0,3.3,1.4,0.2,Iris-setosa6.4,3.2,4.5,1.5,Iris-versicolor 6.9,3.1,4.9,1.5,Iris-versicolor5.5,2.3,4.0,1.3,Iris-versicolor6.5,2.8,4.6,1.5,Iris-versicolor5.7,2.8,4.5,1.3,Iris-versicolor6.3,3.3,4.7,1.6,Iris-versicolor 4.9,2.4,3.3,1.0,Iris-versicolor 6.6,2.9,4.6,1.3,Iris-versicolor 5.2,2.7,3.9,1.4,Iris-versicolor 5.0,2.0,3.5,1.0,Iris-versicolor5.9,3.0,4.2,1.5,Iris-versicolor6.0,2.2,4.0,1.0,Iris-versicolor 6.1,2.9,4.7,1.4,Iris-versicolor5.6,2.9,3.6,1.3,Iris-versicolor6.7,3.1,4.4,1.4,Iris-versicolor 5.6,3.0,4.5,1.5,Iris-versicolor5.8,2.7,4.1,1.0,Iris-versicolor6.2,2.2,4.5,1.5,Iris-versicolor 5.6,2.5,3.9,1.1,Iris-versicolor5.9,3.2,4.8,1.8,Iris-versicolor6.1,2.8,4.0,1.3,Iris-versicolor 6.3,2.5,4.9,1.5,Iris-versicolor 6.1,2.8,4.7,1.2,Iris-versicolor 6.4,2.9,4.3,1.3,Iris-versicolor 6.6,3.0,4.4,1.4,Iris-versicolor 6.8,2.8,4.8,1.4,Iris-versicolor 6.7,3.0,5.0,1.7,Iris-versicolor 6.0,2.9,4.5,1.5,Iris-versicolor 5.7,2.6,3.5,1.0,Iris-versicolor 5.5,2.4,3.8,1.1,Iris-versicolor 5.5,2.4,3.7,1.0,Iris-versicolor5.8,2.7,3.9,1.2,Iris-versicolor6.0,2.7,5.1,1.6,Iris-versicolor5.4,3.0,4.5,1.5,Iris-versicolor6.0,3.4,4.5,1.6,Iris-versicolor 6.7,3.1,4.7,1.5,Iris-versicolor 6.3,2.3,4.4,1.3,Iris-versicolor 5.6,3.0,4.1,1.3,Iris-versicolor 5.5,2.5,4.0,1.3,Iris-versicolor5.5,2.6,4.4,1.2,Iris-versicolor6.1,3.0,4.6,1.4,Iris-versicolor 5.8,2.6,4.0,1.2,Iris-versicolor 5.0,2.3,3.3,1.0,Iris-versicolor5.7,3.0,4.2,1.2,Iris-versicolor5.7,2.9,4.2,1.3,Iris-versicolor6.2,2.9,4.3,1.3,Iris-versicolor 5.1,2.5,3.0,1.1,Iris-versicolor5.7,2.8,4.1,1.3,Iris-versicolor6.3,3.3,6.0,2.5,Iris-virginica 5.8,2.7,5.1,1.9,Iris-virginica7.1,3.0,5.9,2.1,Iris-virginica 6.3,2.9,5.6,1.8,Iris-virginica6.5,3.0,5.8,2.2,Iris-virginica7.6,3.0,6.6,2.1,Iris-virginica 4.9,2.5,4.5,1.7,Iris-virginica 7.3,2.9,6.3,1.8,Iris-virginica6.7,2.5,5.8,1.8,Iris-virginica7.2,3.6,6.1,2.5,Iris-virginica 6.5,3.2,5.1,2.0,Iris-virginica 6.4,2.7,5.3,1.9,Iris-virginica 6.8,3.0,5.5,2.1,Iris-virginica 5.7,2.5,5.0,2.0,Iris-virginica5.8,2.8,5.1,2.4,Iris-virginica6.4,3.2,5.3,2.3,Iris-virginica6.5,3.0,5.5,1.8,Iris-virginica7.7,3.8,6.7,2.2,Iris-virginica 7.7,2.6,6.9,2.3,Iris-virginica 6.0,2.2,5.0,1.5,Iris-virginica 6.9,3.2,5.7,2.3,Iris-virginica 5.6,2.8,4.9,2.0,Iris-virginica 7.7,2.8,6.7,2.0,Iris-virginica 6.3,2.7,4.9,1.8,Iris-virginica6.7,3.3,5.7,2.1,Iris-virginica7.2,3.2,6.0,1.8,Iris-virginica 6.2,2.8,4.8,1.8,Iris-virginica 6.1,3.0,4.9,1.8,Iris-virginica6.4,2.8,5.6,2.1,Iris-virginica7.2,3.0,5.8,1.6,Iris-virginica 7.4,2.8,6.1,1.9,Iris-virginica 7.9,3.8,6.4,2.0,Iris-virginica 6.4,2.8,5.6,2.2,Iris-virginica 6.3,2.8,5.1,1.5,Iris-virginica6.1,2.6,5.6,1.4,Iris-virginica7.7,3.0,6.1,2.3,Iris-virginica 6.3,3.4,5.6,2.4,Iris-virginica 6.4,3.1,5.5,1.8,Iris-virginica6.9,3.1,5.4,2.1,Iris-virginica 6.7,3.1,5.6,2.4,Iris-virginica 6.9,3.1,5.1,2.3,Iris-virginica5.8,2.7,5.1,1.9,Iris-virginica6.8,3.2,5.9,2.3,Iris-virginica 6.7,3.3,5.7,2.5,Iris-virginica 6.7,3.0,5.2,2.3,Iris-virginica 6.3,2.5,5.0,1.9,Iris-virginica 6.5,3.0,5.2,2.0,Iris-virginica 6.2,3.4,5.4,2.3,Iris-virginica 5.9,3.0,5.1,1.8,Iris-virginica。