当前位置:文档之家› 华南理工大学《模式识别》大作业报告

华南理工大学《模式识别》大作业报告

华南理工大学《模式识别》大作业报告题目:模式识别导论实验学院计算机科学与工程专业计算机科学与技术(全英创新班)学生姓名黄炜杰学生学号201230590051指导教师吴斯课程编号145143课程学分2分起始日期2015年5月18日实验概述【实验目的及要求】Purpose:Develop classifiers,which take input features and predict the labels.Requirement:•Include explanations about why you choose the specific approaches.•If your classifier includes any parameter that can be adjusted,please report the effectiveness of the parameter on the final classification result.•In evaluating the results of your classifiers,please compute the precision and recall values of your classifier.•Partition the dataset into2folds and conduct a cross-validation procedure in measuring the performance.•Make sure to use figures and tables to summarize your results and clarify your presentation.【实验环境】Operating system:window8(64bit)IDE:Matlab R2012bProgramming language:Matlab实验内容【实验方案设计】Main steps for the project is:1.To make it more challenging,I select the larger dataset,Pedestrian,rather than the smaller one.But it may be not wise to learning on such a large dataset,so I normalize the dataset from0to1first and perform a k-means sampling to select the most representative samples.After that feature selection is done so as to decrease the amount of features.At last,a PCA dimension reduction is used to decrease the size of the dataset.2.Six learning algorithms including K-Nearest Neighbor,perception,decision tree, support vector machine,multi-layer perception and Naïve Bayesian are used to learn the pattern of the dataset.3.Six learning algorithm are combing into six multi-classifiers system individually, using bagging algorithm.实验过程:The input dataset is normalized to the range of[0,1]so that make it suitable for performing k-means clustering on it,and also increase the speed of learning algorithms.There are too much sample in the dataset,only a small part of them are enough to learn a good classifier.To select the most representative samples,k-means clustering is used to cluster the sample into c group and select r%of them.There are14596 samples initially,but1460may be enough,so r=10.The selection of c should follow three criterions:a)Less drop of accuracyb)Little change about ratio of two classesc)Smaller c,lower time complexitySo I design two experiments to find the best parameter c:Experiment1:Find out the training accuracy of different amount of cluster.The result is shown in the figure on the left.X-axis is amount of cluster and Y-axis is accuracy.Red line denotes accuracy before sampling and blue line denotes accuracy after sampling.As it’s shown in the figure,c=2,5,7,9,13may be good choice since they have relative higher accuracy.Experiment2:Find out the ratio of sample amount of two class.The result is shown in the figure on the right.X-axis is amount of cluster and Y-axis is the ratio.Red line denotes ratio before sampling and blue line denotes ratio after sampling.As it’s shown in the figure, c=2,5,9may be good choice since the ratio do not change so much.As a result,c=5is selected to satisfy the three criterions.3780features is much more than needed to train a classifier,so I select a small part of them before learning.The target is to select most discriminative features,that is to say,select features that have largest accuracy in each step.But there are six learning algorithm in our project,it’s hard to decide which learning algorithm this feature selection process should depend on and it may also has high time complexity.So relevance,which is the correlation between feature and class is used as a discrimination measurement to select the best feature sets.But only select the most relevant features may introduce rich redundancy.So a tradeoff between relevance and redundancy should be made.An experiment about how to make the best tradeoff is done:Experiment3:This experiment is a filter forward featureselection process.The target is to select thefeature has the maximum value of(relevance+λ*redundancy)in each step,where relevancedenotes the correlation between feature and class,and redundancy denotes mean of pairwise featurecorrelation.λis set from-1to1.The result isshown to the right:X-axis denotes number of selected features,Y-axis denotes accuracy.Each lines represent oneλ.It’s obviously that with a higherλ,the accuracyis lower,that is to say,with higher redundancy,the performance of the classifier is worse.So Iselectλ=-1,and the heuristic function becomes:max(relevance-redundancy)The heuristic function is known now but the best amount of features is still unknown and is found in experiment4:Experiment4:Find out the training accuracy of different amount of features.The result is shown below.X-axis is amount of features and Y-axis is accuracy.Red line denotes accuracy before feature selection and blue line denotes accuracy after feature selection.As it’s shown in the figure,when feature amount reach50,the accuracy trend to be stable.So only50features is selected.To make the dataset smaller,features with contribution rate of PCA≥85%is selected.So we finally obtain a dataset with1460samples and32features.The size of the dataset drops for92.16%but accuracy only has0.61%decease.So these preprocessing steps are successful to decrease the size of the dataset.6models are used in the learning steps:K-Nearest Neighbor,perception,decision tree,support vector machine,multi-layer perception and Naïve Bayesian.I designed a RBF classifier and MLP classifier at first but they are too slow for the reason that matrix manipulation hasn’t been designed carefully,so I use the function in the library instead.Parameter determination for these classifiers are:1K-NNWhen k≥5,the accuracy trends to be stable,so k=52Decision treeMaxcrit is used as binary splitting criterion.3MLP5units for hidden is enoughThe six learning algorithm can be combing into6multi-classifiers system individually to increase their accuracy.Most popular model are boosting and bagging:1BoostingEach classifier is dependent on previous one,and has their own weight. Misclassified samples have higher weight.Boosting always outperform bagging, but may cause the problem of overfitting.2BaggingEach classifier is independent and all sample are treated equally.Final result are vote by each classifier.More suitable for unstable classifier such as ANN(little change in input may cause large difference in learning result).I am interesting about will bagging truly help increasing accuracy of unstable classifier such as MLP and decision tree,and what about stable classifier like K-NN, Naïve Bayesian,Perception and SVM.There is also a question that how many classifier is need.Experiment5will show the answer:Experiment5:Six classifiers is investigated individually,accuracy under different amount of classifiers is shown in the figure below.Each figures stands for a certain kind of classifier.X-axis denotes amount of classifiers and Y-axis denotes the accuracy.Black line is for the highest accuracy for those accuracy,green for the worst,blue for the mean of them and red line is for bagging classifiers.We can learn from the figure that bagging dose help increasing the accuracy of each classifiers,and for the decision tree and MLP,bagging improve the accuracy for a great degree.Which is consistent with the assumption.。

相关主题