当前位置:文档之家› Greedy Algorithm贪心算法

Greedy Algorithm贪心算法

Greedy Algorithms Greedy AlgorithmsDr. Zhenyu HeAutumn 2008116Greedy Algorithms z16 Greedy AlgorithmsSimilar to dynamic programming. Used for optimization problems.2zzOptimization problems typically go through a sequence of steps, with a set of choices at each step.For many optimization problems, using dynamic programming to determine z y p p ,g y p g g the best choices is overkill (过度的杀伤威力).Greedy Algorithm: Simpler, more efficient16Greedy Algorithms z 16 Greedy AlgorithmsGreedy algorithms (GA) do not always yield 3y g ()y yoptimal solutions, but for many problems they do.161the activity selection problem 16.1, the activity-selection problem (活动安排)16.2, basic elements of the GA; knapsack prob. (贪婪算法的基本特征;背包问题)16.3, an important application: the design of data compression (Huffman) codes. (哈夫曼编码)16.5, unit-time tasks scheduling 6.5,u t t e tas s sc edu g (有限期作业调度)16Greedy Algorithms z 16 Greedy AlgorithmsThe greedy method is quite powerful and works 4g y q pwell for a wide range of problems:minimum-spanning-tree algorithms (Chap 23) minimum-spanning-tree algorithms (Chap 23)(最小生成图)h t t th f i l (Ch 24) shortest paths from a single source (Chap 24)(最短路径)set-covering heuristic (Chap 35).(集合覆盖)…First example:Activity Selection First example: Activity Selection Horseback Riding (骑马)5Canoeing (独木舟)Surfing (冲浪)Driving Swimming Rock Climbing (攀岩)z 91011121314151617How to make an arrangement to have the more activities?S1Sh i i fi 短优原S1. Shortest activity first (最短活动优先原则)Swimming , DrivingS2First starting activity first S2. First starting activity first (最早开始活动优先原则)Horseback Riding , DrivingS3. First finishing activity first (最早结束活动优先原则)Swimming , Rock Climbing , Surfing161An activity selection problem 16.1 An activity-selection problem 6i i i iz n activities require exclusive use of a common resource.Example, scheduling the use of a classroom.playground p e,sc edu g e use o c ss oo (n 个活动,1 项资源,任一活动进行时需唯一占用该资源)Set of activities S = {a 1, a 2, . . . , a n }. a i needs resource during period [s i , f i ), which is a half-open interval, where s i is start time and f i is finish time.Goal:Select the largest possible set of nonoverlapping (mutually Goal: Select the largest possible set of nonoverlapping (mutually compatible) activities. (安排一个活动计划,使得相容的活动数目最多)Other objectives: Maximize income rental fees , …j ,161An activity selection problemz16.1 An activity-selection problemn activities require exclusive use of a common resource.7Set of activities S = {a1, a2, . . . , a n}a i needs resource during period [s i , f i )l S fi i iz Example: S sorted by finish time:Maximum-size mutuallyMaximum size mutuallycompatible set:{a1 , a3 , a6 , a,a3,a6,a8}.Not unique: also{a2 , a5 , a7 , a9}.16.1.1 Optimal substructure of activity selection Space of subproblems8z S ij = {a k ∈S : f i ≤ s k < f k ≤ s j }= activities that start after a i finishes & finish before a j starts zActivities in S ij are compatible with all activities that finish by f i (完成时间早于f i 的活动), and ll ti iti th t t t li th z all activities that start no earlier than s j .To represent the entire problem, add fictitious activities:a 0=[0);1=[“ z0 = [﹣∞, 0); a n+1 = [∞, “∞+1”)We don’t care about ﹣∞ in a 0 or “∞+1” in a n+1.Then S = S 0,n+1. Range for S ij is 0 i, j n + 1.,g j ≤,j ≤16.1.1 Optimal substructure of activity selection Space of subproblems 9zz p pS ij = {a k ∈S : f i ≤ s k < f k ≤ s j }Assume that activities are sorted by monotonically increasing finish time Assume that activities are sorted by monotonically increasing finish time (以结束时间单调增的方式对活动进行排序)f 0 ≤ f 1 ≤ f 2 ≤ · · · ≤ f n < f n+1 (if i ≤ j, then f i ≤ f j )(16.1)11109876543210123456789101112131416.1.1 Optimal substructure of activity selectionif 012···n <f n+1(if i j then i )(161)10Then i ≥ j S ij =i k <f k j <f j i <f j z if f 0 ≤ f 1 ≤ f 2 ≤ ≤ f n < f n+1 (if i ≤ j, then f i ≤ f j ) (16.1)Proof If there exists a k ∈S ij , thenf i ≤ s k < f k ≤ s j < f j f i < f j .z But i ≥ j f i ≥ f j . Contradiction.So only need to worry about with 0 ≤i <j ≤ n + 1.S All other are ij S ij16.1.1 Optimal substructure of activity selection11S th t l ti t i l d H 2b b zSuppose that a solution to S ij includes a k . Have 2 sub-prob S ik (start after a i finishes, finish before a k starts)S kj (start after a k finishes, finish before a j starts)zSolution to S ij = (solution to S ik )∪{a k }∪(solution to S kj )Since a k is in neither of the subproblems, and the subproblems are disjoint |solution to S |=|solution to ik |+1+|solution to kj |zdisjoint, | solution to S | = | solution to S ik | + 1 + | solution to S kj | .Optimal substructure: If an optimal solution to S ij includes a k , then the solutions to S ik and S kj used within this solution must be optimal ll (l t d t t)zas well. (use usual cut-and-paste argument).Let A ij = optimal solution to S ij ,so A ij = A ik ∪{a k ∪A kj (16.2)so j {}j ,assuming: S ij is nonempty; and we know a k .()A il iz16.1.2 A recursive solution Let c[i, j ] = size of maximum-size subset of mutually compatible activities inij c[i j]ij 12i ≥ j S ij = c[i, j ] = 0.z S ij .(c[i,j] 表示S ij 相容的最大活动数)If S ij ≠ , suppose that a k is used in a maximum-size subsets of mutually S ij .Then c[i j]=c[i k]+1+c[k j ]zThen c[i, j] = c[i, k] + 1 + c[k, j ] .But of course we don’t know which k to use, and so0if Sij = (16.3)0 , if Sij ,c [i , j ] = max{c [i , k ] + c [k , j ] + 1} , if Sij ⎯ .i <k<j Why this range of k? Because S ij = {a k ∈S: f i ≤ s k < f k ≤ s j } a k can’t be a i or a j .k ija S ∈16.1.3 Converting a DP solution to a greedy solution 0 , if S ij = ,] 13 i < k < j c [i , j ]= max{c [i , k ] + c [k , j ] + 1} , if S ij ⎯ .(16.3)z a k S ijIt may be easy to design an algorithm to the problem based on recurrence(16.3).Di t i l ith (l it ?)zDirect recursion algorithm (complexity?)Dynamic programming algorithm (complexity?)Can we simplify our solution?16.1.3 Converting a DP solution to a greedy solutionTheorem 16.1L t d l t b th ti it i ith th li t fi i h ti 14Let S ij ≠ , and let a m be the activity in S ij with the earliest finish time: f m =min {f k : a k ∈S ij }. Then1. a m is used in some maximum-size subset of mutually compatible activities of S i j .(a m 包含在某个最大相容活动子集中)2. S im = , so that choosing a m leaves S mj as the only nonemptysubproblem.(仅剩下一个非空子问题S mj )p j 1110987654321123456789101112131416.1.3 Converting a DP solution to a greedy solutionTheorem 16.1L t d l t b th ti it i ith th li t fi i h ti 15Let S ij ≠ , and let a m be the activity in S ij with the earliest finish time: f m =min {f k : a k ∈S ij }. Then1. a m is used in some maximum-size subset of mutuallycompatible activities of S i j .(a m 包含在某个最大相容活动子集中)2. S im = , so that choosing a m leaves S mj as the onlynonempty subproblem. (仅剩下一个非空子问题S mj )Proof2. Suppose there is some .Then Then a S ∈i l l m m l m f s f s f f f <<≤<⇒<l ij a s ∈a m . Therefore, there is no .and it has an earlier finish time than , which contradicts our choice ofl imm f limima S S ∈⇒=∅16.1.3 Converting a DP solution to a greedy solutionTheorem 16.1 Let S ij ≠ , and let a m be the activity in S ij with the li t fi i h ti i {f }Th16earliest finish time: f m = min {f k : a k ∈S ij }. Then 1. a m is used in some maximum-size subset of mutuallycompatible activities of S i j .(a m 包含在某个最大相容活动子集中)p j Proof 1. Let A ij be a maximum-size subset of mutually Compatibleactivities in ij Order activities in ij in monotonically increasing activities in S ij . Order activities in A ij in monotonically increasing order of finish time. Let a k be the first activity in A ij .If a k = a m , done (a m is used in a maximum-size subset).a k a mOtherwise construct =A ij }(replace k by )first activity in A ij to finish. f m ≤f k a m doesn’t overlap anything Otherwise, construct B ij A ij -{a k }∪{a m } (replace a k by a m ).Activities in B ij are disjoint. (Activities in A ij are disjoint, a k is the else in ij ).Since |B ij |=ij |and ij is a maximum-size subset,else in B ij ). Since |B ij | |A ij | and A ij is a maximum size subset,so is B ij .16.1.3 Converting a DP solution to a greedy solution 0 , if S ij = ,,j ]= 17 i k k Sc [i , j ]max{c [i , k ] + c [k , j ] + 1} , if S ij ⎯ . a < < ij(16.3)Th 161L t d l t b th ti it iTheorem 16.1 Let S ij ≠ , and let a m be the activity in S ij with the earliest finish time: f m = min {f k : a k ∈S ij }. Then 1. a m is used in some maximum-size subset of mutually ycompatible activities of S i j .(a m 包含在某个最大相容活动子集中)2. S im = , so that choosing a m leaves S mj as the only nonempty b bl 仅剩下个非空子问题zsubproblem. (仅剩下一个非空子问题S mj )This theorem is great:before theoremafter theorem # of sub-prob in optimal solution #of choices to consider 2O (j i 1)11# of choices to considerO (j –i –1)16.1.3 Converting a DP solution to a greedy solutionTheorem 16.1 Let S ij ≠ , and let a m be the activity in S ij with the earliest finish time:=min {f k :k ij }Then18earliest finish time: f m = min {f k : a k ∈S ij }. Then 1. a m is used in some maximum-size subset of mutually compatible activitiesof S i j .(a m 包含在某个最大相容活动子集中)2i = so that choosing leaves j as the only nonempty subproblemz2. S im = , so that choosing a m leaves S mj as the only nonempty subproblem.(仅剩下一个非空子问题S mj )Now we can solve a problem ij in a top-down fashion Now we can solve a problem S ij in a top down fashion Choose a m ∈S ij with earliest finish time: the greedy choice. ( it leaves asmuch opportunity as possible for the remaining activities to be scheduled )(留下尽可能的时间来安排活动,贪心选择)Then solve S mj .z16.1.3 Converting a DP solution to a greedy solutionWhat are the subproblems?19Original problem is S 0,n+1 〔a 0 = [﹣∞, 0); a n+1 = [∞, “∞+1”) 〕Suppose our first choice is a m1 (in fact, it is a 1)Then ne t s bproblem is Then next subproblem isS m1,n+1Suppose next choice is a m2 (it must be a 2?)Next subproblem isS m2,n+1p ,And so on11107893546121234567891011121314z16.1.3 Converting a DP solution to a greedy solutionWhat are the subproblems?20Original problem is S 0,n+1Suppose our first choice is a m1Then next subproblem is m1n+1 Then next subproblem is S m1,n+1Suppose next choice is a m2Next subproblem is S m2,n+1zAnd so onEach subproblem is S mi ,n+1.And the subproblems chosen have finish times that increase zzAnd the subproblems chosen have finish times that increase. (所选的子问题,其完成时间是增序排列)Therefore, we can consider each activity just once, in monotonically increasing order of finish time increasing order of finish time.z 16.1.4 A recursive greedy algorithmOriginal problem is S 0,n+1E h b bl i 21z zEach subproblem is S mi ,n+1Assumes activites already sorted by monotonically increasing finish time. (If not, then sort in O(n lg n) time.) Return an optimal solution for S i,n+1:×a m√a ma ia iREC-ACTIVITY-SELECTOR(s,f,i,n)REC ACTIVITY SELECTOR(s, f, i, n)1 m ← i+12 while m ≤n and s m < f i // Find first activity in S i,n+1.3d 13 do m ← m+14 if m ≤ n5 then return {a m ∪REC-ACTIVITY-SELECTOR(s, f, m, n){}(,f,,)6 else returna 2a 116.1.4 A recursive greedy algorithmREC-ACTIVITY-SELECTOR(s, f, i, n)22(,f,,)1 m ← i+12 while m ≤n and s m < f i // Find first activity in S i,n+1.3do m3 do m ← m+14 if m ≤ n5 then return {a m }∪REC-ACTIVITY-SELECTOR(s, f, m, n)6else return6 else returnz √Initial call: REC-ACTIVITY-SELECTOR(s, f, 0, n).a mza iIdea: The while loop checks a i+1 , a i+2 , . . . , a n untilit finds an activity a m that is compatible with a i (need s m ≥ f i ).If the loop terminates because a m is found (m ≤n), thenrecursively solve S m,n+1, and return this solution, along with a m .If the loop never finds a compatible (m >n)then just returnIf the loop never finds a compatible a m (m > n), then just return empty set.a 2a 116.1.4 A recursive greedy algorithmREC-ACTIVITY-SELECTOR(s, f, i, n)23(,f,,)1 m ← i+12 while m ≤n and s m < f i // Find first activity in S i,n+1.3do m3 do m ← m+14 if m ≤ n5 then return {a m }∪REC-ACTIVITY-SELECTOR(s, f, m, n)6else return z6 else returnTime: Θ(n)—each activity examined exactly once.T (n ) =m 1 + T (n m 1 ) =m 1 + m 2 + T (n m 1 m 2 )123T n 123)= m 1 + m 2 + m 3 + T (n m 1 m 2 m 3 ) = =mk+ T (nm k )b 1 th 1∑k T (1)Θ(…because: nm k = 1, thenm k = n −1, mk + T (1) = n )z 16.1.4 A recursive greedy algorithmInitial call: REC-ACTIVITY-SELECTOR(s, f, 0, n).24zIdea: The while loop checksa i+1 , a i+2 , . . . , a n until it finds an activity that is finds an activity a m that is compatible with a i (need s m ≥ f i ). If the loop terminates because a m is found n)then rec rsi el (m ≤n), then recursively solve S m,n+1, and return thissolution, along witha m .,g If the loop never finds a compatible a m (m > n),h jthen just return empty set.z 16.1.5 An iterative greedy algorithmREC-ACTIVITY-SELECTOR is almost "tail recursive“.We easily can convert the recursive procedure to an iterative one (Some 25zWe easily can convert the recursive procedure to an iterative one. (Some compilers perform this task automatically)a 2a 1GREEDY-ACTIVITY-SELECTOR(s, f, n)1 A ← {a 1}2 i ← 1×a m √a m53 for m ← 2 to n 4 do if s m ≥ f i a i a ithen A ←A ∪{a m }6i ← m // a i is most recent addition to A 7 return AzReviewGreedy Algorithm Idea: When we have a choice to make, make the one that looks best right now Make a locally optimal choice in hope of getting a 26looks best right now. Make a locally optimal choice in hope of getting a globally optimal solution.(希望当前选择是最好的,每一个局部最优选择能产生全局最优选择)zGreedy Algorithm: Simpler, more efficient16Greedy Algorithms z 16 Greedy Algorithms16.1, the activity-selection problem (活动安排)27z ,y p 16.2, basic elements of the GA; knapsack probz(贪婪算法的基本特征;背包问题)16.3, an important application: the design of data compression (Huffman)codes compression (Huffman) codes(哈夫曼编码)162Elements of the greedy strategy z16.2 Elements of the greedy strategyThe choice that seems best at the moment is chosen(每次决策时当前所做的选择看起来是“最好”的)28z(每次决策时,当前所做的选择看起来是“最好”的)What did we do for activity selection?1Determine the optimal substructure 1. Determine the optimal substructure.2. Develop a recursive solution.3. Prove that at any stage of recursion, one of the optimal choices is y g ,pthe greedy choice.4. Show that all but one of the subproblems resulting from the greedy choice are empty. (通过贪婪选择,只有一个子问题非空)5. Develop a recursive greedy algorithm.6C t it t it ti l ith6. Convert it to an iterative algorithm.162Elements of the greedy strategy z 16.2 Elements of the greedy strategyThese steps looked like dynamic programming.29z zTypically, we streamline these steps (简化这些步骤)Develop the substructure with an eye towardmaking the greedy choice,leaving just one subproblem.F ti it l ti h d th t th d h i i li d zFor activity selection, we showed that the greedy choice implied that in S ij , only i varied, and j was fixed at n+1,So we could have started out with a greedy algorithm in mind:zSo, we could have started out with a greedy algorithm in mind:define S i = {a k ∈S : f i ≤ s k }, (所有在a i 结束之后开始的活动)show the greedy choice,first m to finish in show the greedy choice, first a m to finish in S icombined with optimal solution to S moptimal solution to S i .162Elements of the greedy strategy z16.2 Elements of the greedy strategyTypical streamlined steps (简化这些步骤)301. Cast the optimization problem as one in which we make a choice and areleft with one subproblem to solve.(最优问题为:做一个选择,留下一个待解的子问题)(最优问题为:做个选择,留下个待解的子问题)2. Prove that there’s always an optimal solution that makes the greedychoice, so that the greedy choice is always safe.3. Show that greedy choice and optimal solution to subproblem optimal solution to the problem (证明存在一个基于贪婪选择的最优解,因此贪婪选择是安全的)solution to the problem.(说明由贪婪选择和子问题的最优解 原问题的最优解)162Elements of the greedy strategy z 16.2 Elements of the greedy strategyNo general way to tell if a greedy algorithm is 31g y g y g optimal, but two key ingredients are(没有一般化的规则来说明贪婪算法是否最优,但有两个基本要点)1. greedy-choice property (贪婪选择属性)2. optimal substructurez 16.2.1 Greedy-choice propertyA globally optimal solution can be arrived at by making a locally optimal (greedy)choice 32(greedy) choice. (通过局部最优解可导出全局最优解)S i,r-1k i ..k r k r ..k j Dynamic programming k r i j i zDynamic programming S i,j .... Make a choice at each step.S r-1,j k k r-1k r+1k j Choice depends on knowing optimal solutions to subproblems.Solve subproblems first. (依赖于已知子问题的最优解再作出选择) Solve bottom-up.S 0,n+110911z Greedy a m1S m1,n+154678 Make a choice at each step.132Make the choice before solving the subproblems. (先作选择,再解子问题)Solve top-down.pz16.2.1 Greedy-choice property We must prove that a greedy choice at each step yields a globally optimal solution Difficulty!Cleverness may be required!33zsolution. Difficulty! Cleverness may be required!Typically, Theorem 16.1, shows that the solution (A ij ) can be modified to use the greedy choice (a m ), resulting in one similar but smaller subproblem (A mj ).z We can get efficiency gains from greedy-choice property. (For example, inactivity-selection, sorted the activities in monotonically increasing order of finish times needed to examine each activity just once )finish times, needed to examine each activity just once.)Preprocess input to put it into greedy orderz 16.2.2 Optimal substructureoptimal substructure: an optimal solution to the 34problem contains within it optimal solutions to subproblems.Just show that optimal solution to subproblem z p p (说明子问题的最优解和贪婪选择andgreedy choice optimal solution to problem.)原问题的最优解16.2.3 Greedy vs. dynamic programming bread cake 35pizza cellphone camera laptop 0-1knapsack problem 背包问题小偷问题)z0-1 knapsack problem (0-1背包问题,小偷问题)n items (n 个物品)Item i is worth $v i , weighs w i P (物品i 价值v i ,重w i ) ,g Find a most valuable subset of items with total weight ≤W.(背包的最大负载量为W ,如何选取物品,使得背包装的物品价值最大)Have to either take an item or not take it can’t take part of it z Have to either take an item or not take it—can t take part of it.(每个物品是一个整体,不能分割,因此,要么选取该物品,要么不选取)Fractional knapsack problem (分数背包问题,小偷问题)Lik h 01k k bl b k f i f iLike the 0-1 knapsack problem, but can take fraction of an item.z 16.2.3 Greedy vs. dynamic programming 0-1 knapsack problem (0-1背包问题,小偷问题)F ti l k k bl (分数背包问题小偷问题)36zzFractional knapsack problem (分数背包问题,小偷问题)Both have optimal substructure property.0-1 : choose the most valuable load j that weighs w j ≤ W,j g remove j , choose the most valuable load i that weighs w i ≤W-w j fractional:choose a weight w from item j (part of j )then fractional: choose a weight w from item j ( part of j ), then remove the part, the remaining load is the most valuable load weighing at most W-w that the thief can take from the n-1 original items plus w j -w pounds from item j.(若先选item j 的一部分,其重w ,则接下来的最优挑选方案为从余n-1j j -w z 下的n 1 个物品〔除j 外〕和j 的另外重w j w 的部分中挑选,其重量不超过W-w )But the fractional problem has the greedy-choice property, and the 0-1problem does not.p16.2.3 Greedy vs. dynamic programming37z Fractional knapsack problem has the greedy-choice property,FRACTIONAL-KNAPSACK(v,w,W) 1 load ← 0has the greedy choice property, and the 0-1 knapsack problem does not.2 i ← 13 while load < W and i ≤n4 do if w i ≤W -loadz To solve the fractional problem, rank decreasingly items by v i /w iL t//5 then take all of item i6 else take W-load of w i from item i7 add what was taken to load8i1z Let v i /w i ≥v i+1 /w i+18 i ←i + 1for all iz Time: O(nlg n) to sort, O(n) to greedy choice thereafter.z 16.2.3 Greedy vs. dynamic programming 0-1 knapsack problem 38has not the greedy-choice property zW = 50.G d l ti Optimal solution:z z Greedy solution:take items 1 and 2value =160weight =30Optimal solution:Take items 2 and 3value=220, weight=50 value 160, weight 3020 pounds of capacity leftover.No leftover capacity.(没有剩余空间)breadcakepizzacellphone camera laptop16Greedy Algorithms z16 Greedy Algorithms16.1, the activity-selection problem (活动安排)39z,y p 16.2, basic elements of the GA; knapsack prob z (贪婪算法的基本特征;背包问题)16.3, an important application: the design of data compression (Huffman)codes compression (Huffman) codes (哈夫曼编码)163Huffman codes z16.3 Huffman codesHuffman codes: widely used and very effective technique for compressing data 40 zdata.savings of 20% to 90%Consider the data to be a sequence of characters q Abaaaabbbdcffeaaeaeec z Huffman's greedy algorithm:uses a table of the frequencies of occurrence of the characters to build up an optimal way of representing each character as a binary string.p y p g y g(依据字符出现的频率表,使用二进串来建立一种表示字符的最佳方法)163Huffman codes z16.3 Huffman codesWish to store compactly 100,000-character data file 41only six different characters appear. frequency table a b c d e f Frequency (in thousands)4513121695Frequency (in thousands)Fixed-length codeword Variable-length codeword 45 13 12 16000 001 010 011 100 1010 101 100 111 1101 1100zz Many ways (encodes) to represent such a file of information binary character code (or code for short): each character is represented by a unique binary stringunique binary string.fixed-length code: if use 3-bit codeword, the file can be encoded in 300,000bits. Can we do better?163Huffman codes z16.3 Huffman codes100,000-character data file 4295a b c d e f Frequency (in thousands)Fixed-length codeword 45 13 12 16000 001 010 011 100 101z Variable-length codeword 0 101 100 111 1101 1100binary character code (or code for short)variable-length code: by giving frequent characters short codewords and infrequent characters long codewords, here the 1-bit string 0 represents a,and the 4-bit string 1100 represents f. (高频出现的字符以短字码表示;g p 低频→长字码)(45·1 + 13·3 + 12·3 + 16·3 + 9·4 + 5·4) · 1,000 = 224,000 bits163Huffman codes z16.3 Huffman codes100,000-character data file 4395a b c d e f Frequency (in thousands)45 13 12 16Fixed-length codeword Variable-length codeword 000 001 010 011 100 1010 101 100 111 1101 1100z binary character code (or code for short)fixed-length code: 300,000 bitsvariable length code:224000bits a savings of approximately 25%In variable-length code: 224,000 bits, a savings of approximately 25%. In fact, this is an optimal character code for this file.z 16.3.1 Prefix codes prefix codes (prefix-free codes): no codeword is a prefix of some other codeword (字首码前缀代码前置代码〔44codeword. (字首码,前缀代码,前置代码〔前缀无关码〕:没有字码是其他字码的前缀)95a b c d e f Frequency (in thousands)Fixed-length codeword 45 13 12 16000001010011100101zFixed length codeword Variable-length codeword 000 001 010 011 100 1010 101 100 111 1101 1100Encoding is always simple for any binary character code Encoding (编码)is always simple for any binary character code Concatenate (连接)the codewords representing each character. For example, “abc”, with the variable-length fi d 01011000101100h ‘’t z prefix code as 0·101·100 = 0101100, where we use ‘·’ to denote concatenation.Prefix codes simplify decoding (解码)z 16.3.1 Prefix codes prefix codes (prefix-free codes): no codeword is a prefix of some other codeword (字首码前缀代码前置代码〔前缀无关码〕:没有字码是45codeword. (字首码,前缀代码,前置代码〔前缀无关码〕:没有字码是其他字码的前缀)a b c d e f 0zVariable-length codeword 101 100 111 1101 1100Encoding is always simple for any binary character code P fi d i lif d diz Prefix codes simplify decoding Since no codeword is a prefix of any other, the codewordthat begins an encoded file is unambiguous that begins an encoded file is unambiguous (明确的).We can simply identify the initial codeword, translate itback to the original character, and repeat the decoding th i d f th d d filprocess on the remainder of the encoded file.Exam: 001011101 uniquely as 0·0·101·1101, which decodesto “aabe”.to aabe .16.3.1 Prefix codes a b c d e f460 101 100 111 1101 1100001011101uniquely as 0·0·101·1101,which decodes to “aabe”.D di z Decodingthe process needs a convenient representation for theprefix code so that the initial codeword can be easily p ypicked off.(为了使初始字码容易识别,解码过程需要一种前置无关码的方便表示) A binary tree whose leaves are the given charactersprovides one such representation.树是种方便的表示方法树叶为给定字符(二叉树是一种方便的表示方法,树叶为给定字符)16.3.1 Prefix codes a b c d e f470 101 100 111 1101 1100001011101uniquely as 0·0·101·1101,which decodes to “aabe”.z DecodingWe interpret the binary codeword for a character as thepath from the root to that character.(将字符的二叉码表示解释为一条路径:从树根到树叶)not binary search trees since the leaves need not appear innot binary search trees, since the leaves need not appear in sorted order and internal nodes do not contain character keys.(不是二叉搜索树)16.3.1 Prefix codes48An optimal code for a file is always represented by a full binary tree every zAn optimal code for a file is always represented by a full binary tree, every nonleaf node has two children (Ex16.3-1). The fixed-length code in our example is not optimal.We can restrict our attention to full binary treesz We can restrict our attention to full binary trees C is the alphabet,all character frequencies >0th t f ti l fi d h |C|l f h l tt f C the tree for an optimal prefix code has |C| leaves, one for each letter of C ,and exactly |C|-1 internal nodes.16.3.1 Prefix codes 49Compute # of bits required to encode a file T ) f )(165)c CB (T ) = f (c )d T (c ) (16.5)z Given a tree T corresponding to a prefix code, for each character c in thealphabet C,f (c): frequency of c in the file f ()q y d T (c): depth of c’s leaf in the tree (length of the codeword for character c ). Then, # of bits required to encode a filewhich we define as the cost of the tree T.16.3.2 Constructing a Huffman code HUFFMAN(C)1n |C|50Huffman code:a greedy algorithm 1 n ← |C|2 Q ← C3 for i ← 1 to n -14do )a new node that constructs an optimal prefix code4 do allocate (分配) a new node z5 left[z] ← x ← EXTRACT-MIN (Q)6 right[z] ← y ← EXTRACT-MIN (Q)7f [z]←f [x]+f 8z 7 f [z] f [x] f [y]INSERT(Q, z)9 return EXTRACT-MIN(Q) //return the root of the tree.C: set of n characters, c ∈C: an object with frequency f [c].j q y f Build the tree T corresponding to the optimal code.Begin with |C| leaves, perform |C|-1 “merging” operations.A min priority queue Q keyed on f is used to identify the twoA min-priority queue Q, keyed on f , is used to identify the two least-frequent objects to merge together. Result of the merger is a new object whose frequency is the sum of the frequencies of the bj h d two objects that were merged.。

相关主题