当前位置:文档之家› 哈工大机器学习历年考试

哈工大机器学习历年考试

1 Give the defi niti ons or your comprehe nsions of the follow ing terms.(12 '1.1 The in ductive lear ning hypothesisP171.2 Overfitti ngP491.4 Con siste nt lear nerP1482 Give brief answers to the following questions.(15 '2.2 If the size of a version space is |VS |, In general what is the smallest number of queries may berequired by a concept learner using optimal query strategy to perfectly learn the target con ceptP272.3 In genaral, decision trees represent a disjunction of conjunctions of constrains on the attributevalues of in sta nse,the n what expressi on does the followi ng decisi on tree corresp onds to3 Give the explaination to inductive bias, and list inductive bias of CANDIDATE-ELIMINATION algorithm, decisi on tree learni ng(ID3), BACKPROPAGATION algorithm.(10 '4 How to solve overfitt ing in decisi on tree and n eural n etwork(10 'Soluti on:Decisi on tree:及早停止树增长(stop growing earlier)后修剪法(post-pruning)Neural Network权值衰减(weight decay) 验证数据集(validation set)A5 Prove that the LMS weight update rule i i(V train (b) V (b))x i performs a gradientdescent to minimize the squared error. In particular, define the squared error E as in the text. NowAcalculate the derivative of E with respect to the weight i, assuming that V (b) is a linear function as defi ned in the text. Gradie nt desce nt is achieved by updat ing each weight i n proport ion Eto --------- . Therefore, you must show that the LMS trai ning rule alters weights in this proporti on iA2for each training example it encounters. ( E (V train (b) V(b)) ) (8'b ,V t r ain (b) training exampleSolution :As Vtrai n(b) \? (Successor(b))we can get E= (V train (b) V(b))2=2(V train(b)伽)中As mentioned in LMS: i i (V train (b) \?(b))X iWe can get i i ( E / w i)Therefore, gradient descent is achievement by updating each weight in proportion to E / w i;LMS rules alters weights in this proportion for each training example it encounters.6 True or false: if decisi on tree D2 is an elaborati on of tree D1, the n D1 is more-ge neral-tha n D2. Assume D1 and D2 are decision trees representing arbitrary boolean funcions, and that D2 is an elaboratio n of D1 if ID3 could exte nd D1 to D2. If true give a proof; if false, a coun ter example. (Definition: Let h j and h k be boolean-valued functions defined over X .then h j ismore_ge neral_tha n_o r_equal_to h k (writte n h j g h k ) If and only if (x X)[(h k(x) 1) (h j(x) 1)] then h j h k (h j g h k )(h k g h j)) (10 'The hypothesis is false.One cou nter example is A XOR B while if A!=B, trai ning examples are all positive, while if A==B, trai ning examples are all n egative, then, usi ng ID3 to exte nd D1, the new tree D2 will be equivale nt to D1, i.e., D2 is equal to D1.7 Design a two-input perceptron that implements the boolean function A B .Design atwo-layer network of perceptrons that implements A XOR B . (10 '8 Suppose that a hypothesis space containing three hypotheses, h!, h2,h3, and the posteriorprobabilities of these typotheses given the training data are 0.4, 0.3 and 0.3 respectively. And if anew instanee x is encountered, which is classified positive by g, but negative by h2andh3,then give the result and detail classification course of Bayes optimal classifier.(10 'P1259 Suppose S is a collection of training-example days described by attributes including Humidity, which can have the values High or Normal. Assume S is a collection containing 10 examples, [7+,3_]. Of these 10 examples, suppose 3 of the positive and 2 of the negative examples have Humidity = High, and the rema in der have Humidity = Normal. Please calculate the in formati on gain due to sorting the original 10 examples by the attribute Humidity.( log 2l=0, log 22=1, Iog 23=1.58, Iog 24=2, Iog 25=2.32, Iog 26=2.58, Iog 27=2.8, Iog 28=3, Iog 29=3.16, Iog 2l0=3.32,) (5' Solution :(a)Here we denote S=[7+,3-],then Entropy([7+,3-])=丄 l^ 上? I^ ? =0.886;10 10 10 10(b) Gai n(S,Humidity)=E ntropy(S)-v values(Humidity JQ Values(Humidity )={High, Normal}S High {s S|Humidity (s) High}Each trai ning example is a pair of the form ;. x,t ;:, where x is the vector of in put values,Initialize eachi to some small random valueUn til the term in atio n con diti on is met, DoInitialize each i to zero.For each ( x, n in training_examples, DoIn put the in sta nee x to the un it and compute the output o For each linear unit weight i , Do For each linear unit weight i , Do(2) FIND-S AlgorithmIn itialize h to the most specific hypothesis in H For each positive trai ning in sta nee xFor each attribute constraint a i in h—Entropy(Sz) Gain(S,a2)3 3 2 2Entropy(S High )=-Jog2[-匸log ?匚 0.972, 0 5 5 5 54 4 En tropy(S Normal )=-:Iog 2 匚5 55 Thus Gain (S,Humidity)=0.886- ( 0.972 10 Fin ish the followi ng algorithm. (10 '(1) GRADIENT-DESCENT(training examples,)igh 5 =44 V 1 log ? 0.72 , S N5 55*0.72) =0.0410ormal=5and t is the target output value.is the lear ning rate (e.g., 0.05).If ________________________The ndo nothingElsereplace a i in h by the n ext more gen eral con stra int that is satisfied by x Output hypothesis h1. What is the defi niti on of lear ning problem(5)Use a checkers learning problem ” as an example to state how to design a learning system.(15)An swer:A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P, if its performance at tasks in T, as measured by P, improves with experie nce.(5)Example:A checkers lear ning problem:T: play checkers(1)P: perce ntage of games won in a tour name nt (1) E: opport un ity to play aga inst itself (1)To desig n a lear ning system:Step 1: Choos ing the Trai ning Experie nce(4)A checkers lear ning problem:Task T: play ing checkersPerforma nce measure P: perce nt of games won in the world tourn ame ntTraining experie nce E: games played aga in st itselfIn order to complete the design of the learning system, we must now choose1. the exact type of kno wledge to be lear ned2. a represe ntati on for this target kno wledge3. a lear ning mecha nismStep 2: Choos ing the Target Function ⑷1. if b is a final board state that is won, then V(b)=1002. if b is a final board state that is lost, then V (b)=-1003. if b is a final board state that is draw n, the n V (b)=04. if b is a not a final state in the game, then V(b)=V (b'), where b' is the best final board statethat can be achieved starting from b and playing optimally un til the end of the game (assuming the opp onent plays optimally, as well).Step 3: Choos ing a Represe ntati on for the Target Function (4) x1: the nu mber of black pieces on the boardx2: the nu mber of red pieces on the boardx3: the nu mber of black kings on the boardx4: the number of red kings on the boardx5: the number of black pieces threatened by red (i.e., which can be captured on red's ext turn)x6: the number of red pieces threatened by black.Thus, our learning program will represent V (b) a's a linear function of the formV (b)=w o+w l x l+w2x2+w 3x3+w 4x4+w5x5+w6x6 where w o through w6 are numerical coefficients, or weights, to be chosen by the learning algorithm. Learned values for the weights w 1 through w 6 will determine the relative importance of the various board features in determining the value of the board, whereas the weight wo will provide an additive constant to the board value.2. Answer: Find-S & Find-G: Step 1: Initialize S to the most specific hypothesis in H. (1)S0:{ , , , , , }Initialize G to the most general hypothesis in H.G0:{, , , , , }.Step 2: The first example is {<Sunny, Warm, Normal, Strong, Warm, Same, +>} (3)S1:{Sunny, Warm, Normal, Strong, Warm, Same} G1:{, , , , , }.Step 3: The second example is {<Sunny, Warm, High, Strong, Warm, Same, +>} (3) S2:{Sunny, Warm, , Strong, Warm, Same} G2:{, , , , , }.Step 4: The third example is {<Rainy, Cold, High, Strong, Warm, Change, ->} (3)S3:{ Sunny, Warm, , Strong, Warm, Same } G3:{<Sunny, , , , , >, <, Warm, , , , >, <, , , , , Same>} Step 5: The fourth example is {<Sunny, Warm, High, Strong, Cool, Change, +>} (3)S4:{Sunny, Warm, , Strong, , } G4:{<Sunny, , , , , >, <, Warm, , , , > }Finally, all the hypotheses are: (2) {<Sunny, Warm, , Strong, , >, <Sunny, , , Strong, , >, <Sunny, Warm, , , , >,<, Warm, , Strong, , >, <Sunny, , , , , >, <, Warm, , , , > }3. Answer: Flog(X) = -X*log(X)-(1-X)*log(1-X); STEP1 choose the root node: entropy_all =flog(4/10)=0.971; (2) gain_outlook = entropy_all - 0.3*flog(1/3) - 0.3*flog(1) - 0.4*flog(1/2)=0.296; (1) gain_templture = entropy_all - 0.3*flog(1/3) - 0.3*flog(1/3) - 0.4*flog(1/2)=0.02; (1)step 2 choose the sec ond NODE:for sunny (humidity OR temperature):en tropy_su nny = flog(1/3)=0.918; (1) sunn y_gain_wi nd = en tropy_su nny - (2/3)*flog(0.5) - (1/3)*flog(1)=0.252; (1) sunn y_gain_humidity = en tropy_su nny - (2/3)*flog(1) - (1/3)*flog(1)=0.918;(1)sunn y_gain_temperature = en tropy_su nny - (2/3)*flog(1) - (1/3)*flog(1)=0.918; (1) choose humidity or temperature. (1)for rain (win d):en tropy_rain = flog(1/2)=1; (1)rain_gain_wi nd = en tropy_rain - (1/2)*flog(1) - (1/2)*flog(1)=1; (1)rain_gain_humidity = en tropy_rain - (1/2)*flog(1/2)-(1/2)*flog(1/2)=0; (1)rain_gain_temperature = en tropy_rain - (1/4)*flog(1)- (3/4)*flog(1/3)=0.311; (1) gain_wind = en tropy_all - 0.6*flog(5/6)(1)Root Node is(2)⑴0.4*flog(1/4)=0.256;outlook ”: orgain_humidity = entropy_all - 0.5*flog(2/5) - 0.5*flog(1/5)=0.125;4. An swer:A: The primitive n eural un its are: perceptro n, li near unit and sigmoid un it. (3) Perceptr on: (2)A perceptr on takes a vector of real-valued in puts, calculates a lin ear comb in ati on of these inputs, the n output a 1 if the result is greater tha n some threshold and -1 otherwise. More precisely, give n in put x1 through xn, the output o(x1,..xi,.. xn) computed by the perceptr on is NSometimes write the perceptr on fun cti on asLin ear un its: (2)a lin ear unit for which the output o is give n byThus, a lin ear un it corresp onds to the first stage of a perceptr on, without the threshold. Sigmoid un its: (2) The sigmoid un it is illustrated in picture like the perceptro n, the sigmoid un it first computes a lin ear comb in atio n of its in puts, the n applies a threshold to the result. In the case of the sigmoid un it, however, the threshold output is a continu ous fun cti on of its in put.More precisely, the sigmoid un it computes its output o asWhere,B:(因题目有打印错误,所以感知器规则和delta规则均可,给出的是delta规则)Derivati on process is: (6)感知器规则(perceptron learning rule)5. An swer:P( no)=5/14 P(yes)=9/14 (1)P(su nny|no )=3/5 (1)P(cool| no)=1/5 (1) P(high| no)=4/5 (1) P(stro ng| no)=3/5 (1) P(no|new instance)=P(no)*P(sunny|no)*P(cool|no)*P(high|no)*P(strong|no)=5/14*3/5*1/5*4/5*3/5 = 0.02057=2.057*10 -2(2) P(su nny |yes)=2/9 (1) P(cool|yes)=3/9 (1) P(high|yes)=3/9 (1) P(stro ng|yes)=3/9 (1) P(yes|new instance)=P(yes)*P(sunny|yes)*P(cool|yes)*P(high|yes)*P(strong|yes)=9/14*2/9*3/9*3/9*3/9 = 0.0529 仁5.291*10 -3(2) ANSWER: NO (2) 6. An swer:INDUCTIVE BIAS: (8)Consider a concept learning algorithm L for the set of instances X, Let c be an arbitrary concept define over X, and let D c = {< x; c (x) >} be an arbitrary set of training examples of c . Let denote the classification assigned to the instanee x i by L after training on thedata D c .The in ductive bias of L is any mini mal set of asserti ons B such that for any targetconcept c and corresponding training examples D c:(?<i € X)[(B A x i A D c) ? L(x i;D c)]---The?futility?of?bias-free?learning: (7)A?learner?that?makes? no?a?priori?assumptio ns?regardi ng?the?ide ntity?of?the?target?co ncept?ha s?n o?ratio nal?basis?for?classifyi ng?a ny?u nsee n?i nsta nces.?l n?fact,?the?o nly?reaso n?that?the?lea rner?was?able?to?ge neralize?beyo nd?the?observed?trai ning?examples?is?that?it?was?biased?by? the? in ductive?bias.Unfortunately , the only instances that will produce a unanimous vote are the previously observed training examples. For, all the other instances, taking a vote will be futile: each unobserved instance will be classified positive by precisely half the hypotheses in the version space and will be classified n egative by the other half.1 In the EnjoySport lear ning task, every example day is represe nted by 6 attributes. Given thatattributes Sky has three possible values, and that AirTemp、Humidity、Wind、Wind、Water and Forecast each have two possible values. Expla in why the size of the hypothesis space is 973.How would the nu mber of possible in sta nces and possible hypotheses in crease with theaddition of one attribute A that takes on on K possible values2 Write the algorithm of Can didate_Elim in atio n using vers ion space. Assume G is the set ofmaximally gen eral hopytheses in hypothesis space H, and S is the set of maximally specific hopytheses.(a) What is the Entropy of the collection training examples with respect to the target functionclassificati on(b) According to the 5 traning examples, compute the decision tree that be learned by ID3, and showthe decisi on tree.(log23=1.585, log25=2.322)4 Give several approaches to avoid overfitti ng in decisi on tree lear ning. How to determ in thecorrect final tree size5 Write the BackPropagation algorithm for feedforward network containing two layers of sigmoid units.6 Explai n the Maximum a posteriori(MAP) hypothesis.7 Usi ng Naive Byes Classifier to classify the new in sta nee:<Outlook=s unn y,Temperature=cool,Humidity=high,Wi nd=stro ng> Our task is to predict the target value (yes or no) of the target concept PlayTennis for this new8 Question Eight : The definition of three types of fitness functions in genetic algorithmQuestion one :(举一个例子,比如:导航仪、西洋跳棋)Question two :In itilize: G={,,,,,} S={ ,,,,,}Step 1:G={,,,,,} S={s unny ,warm ,no rmal,str on g,warm,same}Step2: coming one positive in sta nee 2G={,,,,,} S={s unny ,warm,,str on g,warm,same}Step3: coming one n egative in sta nee 3G=<S unny,,,,,> <,warm,,,,> <,,,,,same>S={s unny ,warm,,str on g,warm,same}Step4: coming one positive in sta nee 4S= { sunny ,warm,,str on g,, }G=<Su nn y,,,,,> <,warm,,,,>Question three :(a) Entropy(S)= 一 -丨og(3/5) 一 -】og(2/5)= 0.971(b) Gain(S,sky) = Entropy(S) - (4/5) Entropy(Ssunny) + (1/5) Entropy(Srainny)] = 0.322Gai n( S,AirTemp) = Gai n(S,wi nd) = Gai n(S,sky) =0.322Gai n( S,Humidity) = Gain (S,Forcast) = 0.02Gai n( S,water) = 0.171Choose any feature of AirTemp, wi nd and sky as the top no de.The decisi on tree as follow: (If choose sky as the top no de)Question Four :An swer:In ductive bias: give some proor assumpti on for a target con cept made by the lear ner to have a basis for classify ing un see n in sta nces.Suppose L is a machine learning algorithm and x is a set of training examples. L(xi, Dc) denotes the classification assigned to xi by L after training examples on Dc. Then the inductive bias is a minimal set of assertion B, given an arbitrary target concept C and set of training examples Dc:(眾i E 艾)[(B n Dc「Xi) -| L(xi, Dc)]C_E: the target concept is contained in the given gypothesis space H, and the training examples are all positive examples.ID3: a, small trees are preferred over larger trees.B, the trees that place high information gain attribute close to root are preferred over those that do not.BP:Smooth in terpolati on betee n data poin ts.Question Five :Answer: In na?ve bayes classification, we assump that all attributes are independent given the tatget value, while in bayes belif n et, it specifes a set of con diti onal in depe ndence along with a set of probability distributi on.Question Six :随即梯度下降算法Question Seven :朴素贝叶斯例子Question Eight : The definition of three types of fitness functions in genetic algorithmAn swer:In order to select one hypothese according to fitness function, there are always three methods: roulette wheel selecti on, tour name nt selecti on and rank selectio n.Question nine :Sin gle-po int crossover:Two-po int crossover:Offspri ng:()Uniform crossover:Point mutati on:Any mutati on is ok!1 Solutio n:A computer program is said to lear n from experie nee E with respect to some class of tasks T and performa nee measure P,if its performa nee at tasks in T, as measured by P, improves with experie nee E.Example : (po int out the T,P,E of the example)A checkers lear ning problem.A handwriting recognition learning problemA robot drivi ng lear ning problem.2 Solutio n:S o :{ , , , , , }S 1:{Su nny, Warm, Normal, Stro ng, Warm, Same}S 2:{Su nny, Warm, , Stro ng, Warm, Same}G o , G 1, G 2:{, , , , , }S 3:{ Su nny, Warm, , Stro ng, Warm, Same }G 3:{Sunny, , , , , } U {, Warm, , , , } U {, , , , , Same}S 4:{Su nny, Warm, , Stro ng, , }G 4:{Sunny, , , , , } U {, Warm, , , , }3 Solutio n:4 In gen eral,i nductive inference: Some form of prior assumpti ons regard ing the inden tity of thetarget concept made by a learner to have a rational basis for classifying an unseen instances.FormallyCANDIDATE-ELIMINATION:The target con cept c is co ntain ed in the give n hypothesis space H. Decision tree learning(ID3): Shorter trees are preferred over larger trees.Trees that place highinformation gain attributes close to the root are perferred over those that do not. BACKPROPAGATION algorithm:smooth in terpolation between data poi nts.5 Soluti on: (1)⑵6(3) GRADIENT-DESCENT(training examples,)Each training example is a pair of the form : x,t. , where x is the vector of inputvalues, and t is the target output value. is the lear ning rate (e.g., 0.05).(a)Here we denote S=[7+,3-],then Entropy([7+,3-])=10 10 鼻2空 10 10 =0.886;(b) Gai n(S,Humidity)=E ntropy(S)-v values(Humidity J Entropy(S v ) Gain(S,a2)Q Values(Humidity )={High, Normal}S High {s S | Humidity (s) High}3 3 2 2Entropy(S High )=-:log 2:-匸log ?匚 0.972, S High 5=4 5 5 5 5 En tropy(S N ormal )=-|log 24-1log 21. 5 5 5 5 0.72 Thus Gain (S,Humidity) =0.886-(-°972 存OS =°04Initialize each i to some small random valueUn til the term in atio n con diti on is met, DoInitialize each i to zero.For each (x, t) in training_examples, DorIn put the in sta nee x to the un it and compute the output oFor each linear unit weight i, Doa) n+18Dtfinitiort: Consider a concept class C defined over a set of instances X of lengtli n and a learner L using hypothesis space H. C is PAC-learnable by L using H if for all c e C, distributions T> over X t芒such that 0 < € < 1/2, and $ such that 0 < 5 < 1/2, learner L will with probability at least (1 — 5) output a hypothesis h e H such that error^W< 巳in time that is polynomial in 1/百,l/久n r and。

相关主题