当前位置:文档之家› 神经网络控制01 (英文)

神经网络控制01 (英文)

Neural Network & Fuzzy Control SystemsNotes #1: Neural Network笔记#1: (神经网络:反向增值学习的算法)笔记(英文)整理:陈恳1BACK PROPAGATION LEARNING ALGORITHMN(x)=S(y)=( S1(y1), S2(y2),, …, S p(y p) ); S(•): non-linear function.x and y are (1⨯n) and (1⨯p) vectors.d is the desired output, (1⨯p) vector.23e is the error signal, (1⨯p) vector.At iteration k, e k =d k – N(x k ) = d k – S(y k )=[ (d 1k – S(y 1k ), … , (d p k – S(y p k) ]Instantaneous summed squared error:Tkk pj kjj k jk e e y S d E 21))((2112∑==-=The error is observed at iteration k. Total error4∑==Tn k kEE 1n T : total error of data pair, (x 1,d 1; …; x nT ,d nT ).Back propagation learning algorithm minimizes k E at each iteration.Does this mean it also minimizes E?If each term of E is minimized, we expect that E is also minimized. Example:n=2, p=2, i.e., two inputs, two outputs.56y 1=m 11S 1(x 1) + m 21S 2(x 2) y 2=m 12S 1(x 1) + m 22S 2(x 2)Under the assumption, S i (x)= S (x), nonlinearity are the same.[y 1 y 2] = [S(x 1) S(x 2)] ⎥⎦⎤⎢⎣⎡22211211m m m m=[S(x 1) S(x 2)] •N ~Actual network output:[S(y 1) S(y 2)]=S(y)=N(x) Error: e 1=d 1- S(y 1) , e 2=d 2- S(y 2)7Since N~=⎥⎦⎤⎢⎣⎡22211211m m m m are the only variables, then we have tominimize E k with respect to these variables. Thisminimization is also known as the training of neural network.GRADIENT DESCENT ALGORITHMij kkij m E c k m ∂∂-=∆)( orij k km E c k m k m ∂∂-=-+)()1(we will consider two different networks:A)B)8Let’s look at the j th neuron at the output layer:910We simplifyby(See figure later page for a better view)11At k thiteration,;1qjk qj m E m c∂∂-=∆from the learning algorithm=qjkj kjk m y y E ∂∂∂∂-;but∑=qn qk qq qikjh S my )(; n q is the numberof neurons in hidden layer. For convenience, we take n q =p.=)(k qq kjk h S y E ∂∂-12=)()()(k qq kjk jj k jj k h S y y S y S E ∂∂∂∂-=)()()(/k qq k j jk jj k h S y Sy S E ∂∂-; but∑=-=pj kjj k jk y S d E 12))((21=)()())((/k qq k jj k jj k jh S y S y S d -Now consider q thneuron at the hidden layer,1314;1iqk iq m E m c∂∂-=∆from the learning algorithm=iqkq kqk m h h E ∂∂∂∂-;but∑==in i iqk ikqm xh 1;n i is the number of neurons in the input layer.=k ikqkxh E ∂∂-=kikqkqqkqqk xhhShSE∂∂∂∂-)()(=kikqqkqqk xhShSE)()(/∂∂-;=kikqqpjkqqkjkjk xhShSyyE)(])([/1∑=∂∂∂∂-(注意:∑,p个输出)=kikqqpjqjkjk xhSmyE)(][/1∑=∂∂-1516= k ik q q pj qj k jk jj k jj k xh S m yy S y S E )(])()([/1∑=∂∂∂∂-again∑=-=pj kjj k jk y S d E 12))((21=k ik qq pj qj k j jk jj k jxh S m y Sy S d)(])())(([/1/∑=--Here we used the chain rule of differentiation, i.e., if f=f(x 1, x 2,…,x n ) thenini ix d x f df ∑=∂∂=1The way we used this relation in our derivation is through17),...,,()(21k pk k k qq k y y y f h S E =∂∂ then=∂∂)(k qq k h S E ∑=∂∂∂∂pj k qq k j k jk h S yyE 1)(The reason we do this is the fact that it is difficult to evaluate)(k qq k h S E ∂∂ because it is not easy to see how much E k will change ifwe change )(k qq h S .B)Let’s now consider a neuron that belongs to layer G,1819On this figure, I also showed neurons that we considered before, namely neurons j and q. How do we train for the weight m sr ?;1srk sr m E m c∂∂-=∆ from the learning algorithm=sr kr krk m g g E ∂∂∂∂-; but∑==sn s srks skrm f Sg 1)(; n s is thenumber of neurons in the layer F.20=)(ks s krk f S g E ∂∂-=)()()(ks s krk ss k s s kf Sg g S g S E ∂∂∂∂-=)()()(/ks s k rr k rr k f S g S g S E ∂∂-;=)()(])([/1ks s kr r n q k rr k q k qk f S g S g S hhE q∑=∂∂∂∂-21=)()(][/1ks s k rr n q rq k qk f S g S m hE q∑=∂∂-=)()(])()([/1ks s k rr n q rq k qk qq k qq k f S g S m hh S h S E q∑=∂∂∂∂- =)()(])()([/1/ks s k rr n q rq k qqk qq k f S g S m h S h S E q∑=∂∂-=)()(})(])([{/1/1ks s k rr n q rq k qq pj k qq k j k jk f S g S m h S h S yyE q∑∑==∂∂∂∂-22=)()(})(][{/1/1ks s k rr n q rq k qq pj qj k jk f S g S m h S m yE q∑∑==∂∂-=)()(})(])()(([{/1/1/ks s k rr n q rq k qq pj qj k j jk jj k jf Sg S mh S m y Sy S dq∑∑==--The weight m sr can now be updated assr k sr sr m E ck m k m ∂∂-=+)()1(See next pages for the detailed weight connecttions.23kik qq pj qj k jjk jj k jiq x h S m y Sy S dm )(])()(([/1/∑=-=∆)()()]([(/k qq k jj k jj k jqj h S y S y S dm -=∆∑==ni k iiqk qx S mh1)(∑==pq k qqjk jh S my1)(2425THE NON-LINEAR FUNCTION Scxex S -+=11)(2')1()(cxcx ecex S --+==)1(1)1(cx cx cxe e ce---++ =)1(1)111(cxcxeec --++-=)())(1(x S x S c - Note: always positive.26The advantage of Sigmoid function lies with the easy evaluation of its derivative.If the network has to handle negative as well as positive numbers, the Sigmoid function can be shifted as)112()(-+=-cxeK x S=)11(cxcx ee K --+-Note that we don ’t just need to use the Sigmoid function only,27what we need is a monotone non-decreasing differentiable function to represent the non-linearity of neuron.Shifted Sigmoid function)(1)(c x g ee h e x S --+-+= g: gain2)(e h c S +=4)()(/e h gc S -= Recommended for most networks.28Step function S(x)=e if x<cS(x)=h if x>cUsed in earlier networks such as perception Hopfield, etc.. not differentiable.DISCUSSION ON GRADIENT DESCENT ALGORITHM ...|)(!2)(|)(!1)()()(00''20/00+-+-+===x x x x x f x x x f x x x f x fGiven the Taylor expansion above, let us say we want to find the min or max of f(x), then290)(/=x f...|)(!2)(2|)(00''0/=+-+==x x x x x f x x x fThis gives0|)(}|)({)(/1''0x x x x x f x f x x =-=-=-replace x 0=x(k) x=x(k+1)k x x c x f =-=1''}|)({030then)(|)()1(k x x kxf c k x k x =∂∂-=-+which corresponds to the training algorithm)(|)()1(k m m ij k ij ij ij ij m fc k m k m =∂∂-=-+SIMPLE EXAMPLE ON BACK PROPAGATIONLEARNING .Input: x=[-3.0 2.0]Desired output: d=[0.4 0.8]Let us see this on the diagramNow, we will create a neural network that will learn thisinput/output behavior.Here is the network:31Neuron 2 receives a constant input, so called bias. The relation may have non-zero output for zero input. This is why3233we use bias.We want to minimize),())((21,2,121221d x m m f y S d EE k kk=-==∑∑d 1=0.4 , d 2=0.8Since we have only two parameters (m 1, m 2), it is easy toconstruct the error surface graph as below.34Starting from an initial condition (marked by +), the graph demonstrates how the minimum is achieved by the gradient descent technique. Think of a marble with no inertia sliding down to the one of the lowest point the error surface graph.3536MOMENTUM ALGORITHM FOR BACKPROPAGATION We demonstrated that weight adjustments in back propagation algorithm are)()()1(k m c k m k m ij k ij ij ∆+=+where)()(k m E k m ij k ij ∂∂-=∆m ij (•)=follows a first order difference equation. More general updating can be accomplished by37)1()()()1(-∆+∆+=+k m b k m c k m k m ij k ij k ij ijThis third term is called the momentum term. The idea here is not to “forget ” the previous gradient term, i.e.,)1()1(1-∂∂-=-∆-k m E k m ij k ijso if there are sudden random changes in )(k m ij ∆, we willnot be immediately effected by it.To see how it works, consider the momentum update equation given above, i.e.,)]1()([)()()1(--+∆=-+k m k m b k m c k m k m ij ij k ij k ij ij38or)()()1(k m c k m b k m ij k ij k ij ∆=-+δδ)()1()(k m k m k m ij ij ij -+=δ a slight change of notation here.Applying z-transform,)()(][k m c k m b z ij k ij k ∆=-δ])([1)(11k m E zb zc k m ij kkij∂∂--=--δWe see that the weight changes are not immediately effected by the current gradient but it is effected by a low pass filtered current gradient.EXAMPLE: IDENTIFICATION BY NEURAL NETWORK394041x i : ith neuron of input layer X h i : ith neuron of hidden layer H y i : ith neuron of output layer Yerror at iteration k:)()()(^k y k y k e -=)(^k y = S(y 1))(212k e E k =∑==Tn k kEE 142A) Neuron at the output layer Y,)(11111111h S y E m y y E m E m ck k k ∂∂-=∂∂∂∂-=∂∂-=∆=)()()(1111h S y y S y S E k ∂∂∂∂-=)()()(11/1h S y S y S E k ∂∂-; 21))()((21y S k y E k -==)()())()((11/1h S y S y S k y -)()()1(111k m E ck m k m k ∂∂-=+43)112()(111-+=-y c ek y S])1(2[)(211/1111y c y c ee c k y S --+=For the second weight, the learning equation becomes)(12121122h S y E m y y E m E m ck k k ∂∂-=∂∂∂∂-=∂∂-=∆=)()()(2111h S y y S y S E k∂∂∂∂44=)()()(21/1h S y S y S E k∂∂;21))()((21y S k y E k -==)()())()((21/1h S y S y S k y - then,)()()1(222k m E ck m k m k ∂∂-=+B) Neuron in hidden layer,)(11131133x S h E m h h E m E m ck k k ∂∂-=∂∂∂∂-=∂∂-=∆=)()()(1111x S h h S h S E k ∂∂∂∂-45=)()()(11/1x S h S h S E k∂∂-=)()())((11/111x S h S h S y y E k∂∂∂∂-=)()()(11/11x S h S m y E k ∂∂-=)()())()((11/1111x S h S m y y S y S E k ∂∂∂∂-=)()()())()((11/11/1x S h S m y S y S k y -46Similarly,)(12242244x S h E m h h E m E m ck k k ∂∂-=∂∂∂∂-=∂∂-=∆=)()()(2222x S h h S h S E k ∂∂∂∂=)()()(22/2x S h S h S E k∂∂=)()())((12/211x S h S h S y y E k ∂∂∂∂=)()()(22/21x S h S m y E k ∂∂47=)()())()((22/2111x S h S m y y S y S E k ∂∂∂∂=)()()())()((22/21/1x S h S m y S y S k y -5m ∆, 6m ∆ are left to you as an exercise.Note also thatS(x 1)= x 1=y(k-1) if S(•)=1 S(x 2)= x 2=u(k-1)EXAMPLEWhat does the neural network learn?Hint: e→ 0, y^→x48NEURAL NETWORKS AS PREDICTORS ORSIMULATORS TRAINING49After training y(t)=y^(t)Using neural network as a predictor.50。

相关主题