当前位置：文档之家› 【上海交通大学】【神经网络原理与应用】作业3

【上海交通大学】【神经网络原理与应用】作业3

Neural Network Theory and ApplicationsHomework Assignment3oxstar@SJTUJanuary19,20121Data PreprocessingFirst we used‘svm-scale’of LibSVM to scale the data.There are two main advantages of scaling:one is to avoid attributes in greater numeric ranges dominating those in smaller numeric ranges,another one is to avoid numerical diﬃculties during the calculation[1].We linearly scaled each attribute to the range[-1,+1].2Model SelectionWe tried three diﬀerent kernel functions,namely linear,polynomial and RBF.•liner:K(x i,x j)=x T i x j•polynomial:K(x i,x j)=(γx T i x j+r)d,γ>0•radial basis function(RBF):K(x i,x j)=exp(−γ x i−x j 2),γ>0The penalty parameter C and kernel parameters(γ,r,d)should be chosen.We used the ‘grid-search’[1]on C andγwhile r and d are set to their default values:0and3.In Figure1,we presents the contour maps for choosing the proper attributes.We just searched for some maxima while the global maximum is usually diﬃcult toﬁnd and with the values of attributes increasing,the running time increasing dramatically.Note that‘ovr’stands for one-versus-rest task decomposition methods while‘ovo’is short for one-versus-one and‘pvp’is short for part-versus-part.The liner kernel doesn’t have private attributes,so we should just search for the penalty parameter C.The results are shown in Figure2.Theﬁnal selection for each attributes are presented in Table1.Table1:A Selection for Each AttributesDecomposition Kernel CγRBF10 1.0one-versus-rest Polynomial0.10.7Liner1RBF1 1.5one-versus-one Polynomial0.010.2Liner0.1RBF10.1part-versus-part Polynomial0.010.4Liner1Gammal g (c o s t )3840424446485052(a)RBF Kernel(ovr)Gammal g (c o s t )3132333435363738394041(b)Polynomial Kernel (ovr)Gammal g (c o s t )52535455565758(c)RBF Kernel(ovo)Gammal g (c o s t )38404244464850525456(d)Polynomial Kernel (ovo)Gammal g (c o s t )20253035404550(e)RBF Kernel (pvp)Gammalg (c o s t )15202530354045(f)Polynomial Kernel (pvp)Figure 1:Grid Search for RBF and Polynomial Kernellg(cost)A c c u r a c y(a)ovr lg(cost)A c c u r a c y(b)ovolg(cost)A c c u r a c y(c)pvpFigure 2:Attributes Search for Liner Kernel3Experiments3.1Task Decomposition MethodsThere are several multi-class classiﬁcation techniques have been proposed for SVM models.The most typical approach for decomposing tasks is the so called one-versus-rest method which classiﬁes one class from the other class.Assume that we construct N two-class clas-siﬁers,a test datum is classiﬁed to C i iﬀ.the i th classiﬁer recognizes this datum.However,probably more than one classiﬁers recognize it.In this case,we set it belonging to C i if the i th classiﬁer gives the largest decision value.On the other side,if no classiﬁer recognizes it,we would set it belonging to C i if the i th classiﬁer gives the smallest decision value for classifying it to the ‘rest’class.One-versus-one combining all possible two-class classiﬁer is another methodology for dealing with multi-class problems[3].The size of classiﬁer grows super-linearly with the number of classes but the running time may not because each divided problem is much smaller.We used a election strategy to make the ﬁnal decisions:if the number of i -relative classiﬁers that classifying a datum to the i th class is the largest,we would say that this datum belongs to C i .Part-versus-part method is another choice[4].Any two-class problem can be further de-composed into a number of two-class sub-problems as small as needed.It is good at dealing with unbalance classiﬁcation problems.As shown in Table 2,number of training data in each class from our dataset is just unbalance.Table 2:Number of Training Data in Each ClassClass Number of Training Data Class Number of Training Data 05376741994758423281545391910046891013385381143We used MAX-MIN strategy to make the ﬁnal decisions.We also have to determine the size of minimum parts,which aﬀects the performance of classiﬁcation a lot.From Figure 3,we chose 200as the number of each sub-class because the accuracy reach a local maximum and it would make no sense if 1600is chosen.3.2ResultsIn our experiments,we used the Java version of LibSVM[2].25501002004008001600102030405060N per partA c c u r a c yFigure 3:Relationship between Sun-class Size and AccuracyTask Decomposition MethodsA c c u r a c y (%)(a)Accuracy Task Decomposition MethodsT i m e (s )(b)Running TimeFigure 4:Performance of Each Task Decomposition Method and Each Kernel The running time and accuracy are shown in Figure 4a and Figure 4b.3.3DiscussionComparing with ovo and pvp,one-versus-one decomposition method always has the worst accuracy no mater which kernel is used.However,due to the simple procedure,only N classiﬁers are required for a N-class problem.So the scalability of this method is better than the others.The one-versus-one decomposition method performed the best in our experiments.De-spite it need N(N-1)/2classiﬁers for a N-class problem,each sub-problem is really small and the running time is even the smallest.However,with the increasing of class number,the size of classiﬁer would be growing super-linearly.The last one,part-versus-part decomposition method,did not perform well in our exper-iments because its running time is too long comparing with others.Its accuracy is between the one of ovr and the one of ovo.Massively parallel learning can be easily implemented by using this method;however,this advantage is depressed here.RBF kernel is the best choice if we just take accuracy into account but it cost the longest time mostly.The accuracy of polynomial and liner kernel is very close while the running time of liner kernel are more satisfying.References[1]Chih-Wei Hsu,Chih-Chung Chang,and Chih-Jen Lin:A Practical Guide to SupportVector Classiﬃcation,2003.[2]C.-C.Chang and C.-J.Lin:LIBSVM:a library for support vector machines,2001.[3]Johannes F¨u rnkranz:Round Robin Classiﬁcation,2002.[4]Bao-Liang Lu,Kai-An Wang,Masao Utiyama,and Hitoshi Isahara:A Part-Versus-PartMethod for Massively Parallel Training of Support Vector Machines,2004.。

e商务文档

【上海交通大学】【神经网络原理与应用】作业3

相关文档推荐：