人脸识别论文中英文附录(原文及译文)翻译原文来自Thomas David Heselt ine BSc. Hons. The Un iversity of YorkDepartme nt of Computer Scie neeFor the Qualification of PhD. -- September 2005 -《Face Recog niti on: Two-Dime nsio nal and Three-Dime nsional Tech nique》4 Two-dimensional Face Recognition4.1 Feature LocalizationBefore discuss ing the methods of compari ng two facial images we now take a brief look at some at the prelimi nary processes of facial feature alig nment. This process typically con sists of two stages: face detect ion and eye localisati on. Depe nding on the applicati on, if the positi on of the face with in the image is known beforeha nd (for a cooperative subject in a door access system for example) the n the face detect ion stage can ofte n be skipped, as the regi on of in terest is already known. Therefore, we discuss eye localisati on here, with a brief discussi on of face detect ion in the literature review(sect ion 3.1.1).The eye localisati on method is used to alig n the 2D face images of the various test sets used throughout this section. However, to ensure that all results presented are represe ntative of the face recog niti on accuracy and not a product of the performa nee of the eye localisati on rout ine, all image alig nments are manu ally checked and any errors corrected, prior to testi ng and evaluati on.We detect the position of the eyes within an image using a simple template based method. A training set of manually pre-aligned images of faces is taken, and each image cropped to an area around both eyes. The average image is calculated and used as a template.Figure 4-1 - The average eyes. Used as a template for eye detection.Both eyes are in cluded in a sin gle template, rather tha n in dividually search ing for each eye in turn, as the characteristic symmetry of the eyes either side of the no se, provides a useful feature that helps disti nguish betwee n the eyes and other false positives that may be picked up in the background. Although this method is highly susceptible to scale(i.e. subject distance from the camera) and also in troduces the assumpti on that eyes in the image appear n ear horiz on tai. Some preliminary experimentation also reveals that it is advantageous to include the area of skin just ben eath the eyes. The reas on being that in some cases the eyebrows can closely match the template, particularly if thereare shadows in the eye-sockets, but the area of skin below the eyes helps to disti nguish the eyes from eyebrows (the area just below the eyebrows con tai n eyes, whereas the area below the eyes contains only plain skin).A window is passed over the test images and the absolute difference taken to that of the average eye image shown above. The area of the image with the lowest difference is taken as the region of interest containing the eyes. Applying the same procedure using a smaller template of the in dividual left and right eyes the n refi nes each eye positi on.This basic template-based method of eye localisati on, although provid ing fairly preciselocalisati ons, ofte n fails to locate the eyes completely. However, we are able to improve performa nce by in cludi ng a weighti ng scheme.Eye localisati on is performed on the set of training images, which is the n separated in to two sets: those in which eye detect ion was successful; and those in which eye detect ion failed. Taking the set of successful localisatio ns we compute the average dista nce from the eye template (Figure 4-2 top). Note that the image is quite dark, indicating that the detected eyes correlate closely to the eye template, as we would expect. However, bright points do occur near the whites of the eye, suggesting that this area is often inconsistent, varying greatly from the average eye template.Figure 4-2 -Distance to the eye template for successful detections (top) indicating variance due to noise and failed detections (bottom) showing credible variance due to miss-detected features.In the lower image (Figure 4-2 bottom), we have take n the set of failed localisati on s(images of the forehead, no se, cheeks, backgro und etc. falsely detected by the localisati on routi ne) and once aga in computed the average dista nce from the eye template. The bright pupils surr oun ded by darker areas in dicate that a failed match is ofte n due to the high correlati on of the nose and cheekb one regi ons overwhel ming the poorly correlated pupils. Wanting to emphasise the differenee of the pupil regions for these failed matches and minimise the varianee of the whites of the eyes for successful matches, we divide the lower image values by the upper image to produce a weights vector as show n in Figure 4-3. When applied to the differe nee image before summi ng a total error, this weight ing scheme provides a much improved detect ion rate.Figure 4-3 - Eye template weights used to give higher priority to those pixels that best represent the eyes.4.2 The Direct Correlation ApproachWe begi n our inv estigatio n into face recog niti on with perhaps the simplest approach,k nown as the direct correlation method (also referred to as template matching by Brunelli and Poggio [29 ]) inv olvi ng the direct comparis on of pixel inten sity values take n from facial images. We use the term ‘ Direct Correlation ' to encompass all techniques in which face images are compareddirectly, without any form of image space an alysis, weight ing schemes or feature extracti on, regardless of the dsta nee metric used. Therefore, we do not infer that Pears on ' s correlat applied as the similarity fun cti on (although such an approach would obviously come un der our definition of direct correlation). We typically use the Euclidean distance as our metric in these inv estigati ons (in versely related to Pears on ' s correlati on and can be con sidered as a scale tran slati on sen sitive form of image correlati on), as this persists with the con trast made betwee n image space and subspace approaches in later sect ions.Firstly, all facial images must be alig ned such that the eye cen tres are located at two specified pixel coord in ates and the image cropped to remove any backgro und in formati on. These images are stored as greyscale bitmaps of 65 by 82 pixels and prior to recog niti on con verted into a vector of 5330 eleme nts (each eleme nt containing the corresp onding pixel inten sity value). Each corresp onding vector can be thought of as describ ing a point with in a 5330 dime nsional image space. This simple prin ciple can easily be exte nded to much larger images: a 256 by 256 pixel image occupies a si ngle point in 65,536-dime nsional image space and again, similar images occupy close points within that space. Likewise, similar faces are located close together within the image space, while dissimilar faces are spaced far apart. Calculati ng the Euclidea n dista need, betwee n two facial image vectors (ofte n referred to as the query image q, and gallery imageg), we get an indication of similarity. A threshold is then applied to make the final verification decision.d q g (d threshold ? accept) d threshold ? reject ) . Equ. 4-14.2.1 Verification TestsThe primary concern in any face recognition system is its ability to correctly verify aclaimed identity or determine a person's most likely identity from a set of potential matches in a database. In order to assess a given system ' s ability to perform these tasks, a variety of evaluati on methodologies have arise n. Some of these an alysis methods simulate a specific mode of operatio n (i.e. secure site access or surveilla nee), while others provide a more mathematical description of data distribution in some classificatio n space. In additi on, the results gen erated from each an alysis method may be prese nted in a variety of formats. Throughout the experime ntatio ns in this thesis, weprimarily use the verification test as our method of analysis and comparison, although we also use Fisher Lin ear Discrim inant to an alyse in dividual subspace comp onents in secti on 7 and the iden tificati on test for the final evaluatio ns described in sect ion 8. The verificati on test measures a system ' s ability to correctly accept or reject the proposed ide ntity of an in dividual. At a fun cti on al level, this reduces to two images being prese nted for comparis on, for which the system must return either an accepta nee (the two images are of the same pers on) or rejectio n (the two images are of differe nt people). The test is desig ned to simulate the applicati on area of secure site access. In this scenario, a subject will present some form of identification at a point of en try, perhaps as a swipe card, proximity chip or PIN nu mber. This nu mber is the n used to retrieve a stored image from a database of known subjects (ofte n referred to as the target or gallery image) and compared with a live image captured at the point of entry (the query image). Access is the n gran ted depe nding on the accepta nce/rejecti on decisi on.The results of the test are calculated accord ing to how many times the accept/reject decisi on is made correctly. In order to execute this test we must first define our test set of face images. Although the nu mber of images in the test set does not affect the results produced (as the error rates are specified as percentages of image comparisons), it is important to ensure that the test set is sufficie ntly large such that statistical ano malies become in sig ni fica nt (for example, a couple of badly aligned images matching well). Also, the type of images (high variation in lighting, partial occlusions etc.) will significantly alter the results of the test. Therefore, in order to compare multiple face recog niti on systems, they must be applied to the same test set.However, it should also be no ted that if the results are to be represe ntative of system performance in a real world situation, then the test data should be captured under precisely the same circumsta nces as in the applicati on en vir onmen t. On the other han d, if the purpose of the experime ntati on is to evaluate and improve a method of face recog niti on, which may be applied to a range of applicati on en vir onmen ts, the n the test data should prese nt the range of difficulties that are to be overcome. This may mea n in cludi ng a greater perce ntage of ‘ difficult would be expected in the perceived operati ng con diti ons and hence higher error rates in the results produced. Below we provide the algorithm for execut ing the verificati on test. The algorithm is applied to a sin gle test set of face images, using a sin gle fun cti on call to the face recog niti on algorithm: CompareFaces(FaceA, FaceB). This call is used to compare two facial images, returni ng a dista nce score in dicat ing how dissimilar the two face images are: the lower the score the more similar the two face images. Ideally, images of the same face should produce low scores, while images of differe nt faces should produce high scores.Every image is compared with every other image, no image is compared with itself and no pair is compared more tha n once (we assume that the relati on ship is symmetrical). Once two images have been compared, producing a similarity score, the ground-truth is used to determine if the images are ofthe same person or different people. In practical tests this information is ofte n en capsulated as part of the image file name (by means of a unique pers on ide ntifier). Scores are the n stored in one of two lists: a list containing scores produced by compari ng images of differe nt people and a list containing scores produced by compari ng images of the same pers on. The final accepta nce/reject ion decisi on is made by applicati on of a threshold. Any in correct decision is recorded as either a false acceptance or false rejection. The false rejection rate (FRR) is calculated as the perce ntage of scores from the same people that were classified as rejectio ns. The false accepta nce rate (FAR) is calculated as the perce ntage of scores from differe nt people that were classified as accepta nces.For IndexA = 0 to length (TestSet)For IndexB = lndexA+1 to length (T estSet)Score = CompareFaces (T estSet[IndexA], TestSet[IndexB]) If IndexA and IndexB are the same person Append Score to AcceptScoresListElseAppend Score to RejectScoresListFor Threshold = Minimum Score to Maximum Score:FalseAcceptCount, FalseRejectCount = 0For each Score in RejectScoresListIf Score <= ThresholdIncrease FalseAcceptCountFor each Score in AcceptScoresListIf Score > ThresholdIncrease FalseRejectCountFalseAcceptRate = FalseAcceptCount / Length(AcceptScoresList) FalseRejectRate = FalseRejectCount / length(RejectScoresList) Add plot to error curve at (FalseRejectRate, FalseAcceptRate)These two error rates express the in adequacies of the system whe n operat ing at aspecific threshold value. Ideally, both these figures should be zero, but in reality reducing either the FAR or FRR (by alteri ng the threshold value) will in evitably resultin increasing the other. Therefore, in order to describe the full operating range of a particular system, we vary the threshold value through the en tire range of scores produced. The applicati on of each threshold value produces an additi onal FAR, FRR pair, which when plotted on a graph produces the error rate curve shown below.Figure 4-5 - Example Error Rate Curve produced by the verification test.The equal error rate (EER) can be see n as the point at which FAR is equal to FRR. This EER value is often used as a single figure representing the general recognition performa nee of a biometric system and allows for easy visual comparis on of multiple methods. However, it is important to note that the EER does not indicate the level of error that would be expected in a real world applicati on .It is un likely that any real system would use a threshold value such that the perce ntage of false accepta nces were equal to the perce ntage of false rejecti ons. Secure site access systems would typically set the threshold such that false accepta nces were sig nifica ntly lower tha n false rejecti ons: unwilling to tolerate intruders at the cost of inconvenient access denials.Surveilla nee systems on the other hand would require low false rejectio n rates to successfully ide ntify people in a less con trolled en vir onment. Therefore we should bear in mind that a system with a lower EER might not n ecessarily be the better performer towards the extremes of its operating capability.There is a strong conn ecti on betwee n the above graph and the receiver operat ing characteristic (ROC) curves, also used in such experime nts. Both graphs are simply two visualisati ons of the same results, in that the ROC format uses the True Accepta nee Rate(TAR), where TAR = 1.0 -FRR in place of the FRR, effectively flipping the graph vertically. Another visualisation of the verification test results is to display both the FRR and FAR as functions of the threshold value. This prese ntati on format provides a refere nee to determ ine the threshold value necessary to achieve a specific FRR and FAR. The EER can be seen as the point where the two curves in tersect.ThrasholdFigure 4-6 - Example error rate curve as a function of the score thresholdThe fluctuati on of these error curves due to no ise and other errors is depe ndant on the nu mber of face image comparis ons made to gen erate the data. A small dataset that on ly allows for a small nu mber of comparis ons will results in a jagged curve, in which large steps corresp ond to the in flue nce of a si ngle image on a high proporti on of thecomparis ons made. A typical dataset of 720 images (as used in sect ion 422) provides 258,840 verificatio n operati ons, hence a drop of 1% EER represe nts an additi onal 2588 correct decisions, whereas the quality of a single image could cause the EER tofluctuate by up to 0.28.4.2.2 ResultsAs a simple experiment to test the direct correlation method, we apply the technique described above to a test set of 720 images of 60 different people, taken from the AR Face Database [ 39 ]. Every image is compared with every other image in the test set to produce a like ness score, provid ing 258,840 verificati on operati ons from which to calculate false accepta nce rates and false rejecti on rates. The error curve produced is show n in Figure 4-7.Figure 4-7 - Error rate curve produced by the direct correlation method using no image preprocessing.We see that an EER of 25.1% is produced, meaning that at the EER thresholdapproximately one quarter of all verification operations carried out resulted in anin correct classificati on. There are a nu mber of well-k nown reas ons for this poor levelof accuracy. Tiny changes in lighting, expression or head orientation cause the location in image space to cha nge dramatically. Images in face space are moved far apart due to these image capture conditions, despite being of the same person ' s face. The distanee between images differe nt people becomes smaller tha n the area of face space covered by images of the same pers on and hence false accepta nces and false rejecti ons occur freque ntly. Other disadva ntages in clude the large amount of storage n ecessary for holdi ng many face images and the inten sive process ing required for each comparis on, making this method un suitable for applicati ons applied to a large database. In secti on 4.3 we explore the eige nface method, which attempts to address some of these issues.4二维人脸识别4.1功能定位在讨论比较两个人脸图像,我们现在就简要介绍的方法一些在人脸特征的初步调整过程。