当前位置:文档之家› 华南理工大学数据结构chapter9

华南理工大学数据结构chapter9

A hash function maps key values to positions. It is denoted by h. A hash table is an array that holds the records. It is denoted by HT. HT has M slots, indexed form 0 to M-1.
Cn 0.122n.
8
Self-Organizing Lists
Self-organizing lists modify the order of records within the list based on the actual pattern of record accesses.
Self-organizing lists use a heuristic for deciding how to reorder the list. These heuristics are similar to the rules for managing buffer pools.
11
on: Text Compression. Keep a table of words already seen, organized via Move-to-Front heuristic.
– – If a word not yet seen, send the word. Otherwise, send (current) index in the table.
Searching sorted Arrays-- Jump Search.
For some value j, we check every j’th element in L, that is, we check elements L[j], L[2j], and so on. Define m such that mj≦ n < (m + 1)j, then the total cost of this algorithm is at most m + (j-1) 3-way comparisons. (3-way comparison means that we need to know if K is less than, equal to or greater than L[i].) The cost on n items with a jump of size j is
14
Hashing (2)
For any value K in the key range and some hash function h, h(K) = i, 0 <= i < M, such that HT[i] = K. Hashing is appropriate only for sets (no duplicates). Good for both in-memory and disk-based applications. Hashing is not good for range queries, minimum or maximum queries and queries in key order.
Buffer pools are an example of a self-organizing list.
9
Heuristics
1. Order by actual historical frequency of access. (Similar to LFU buffer pool replacement strategy.) 2. Move-to-Front: When a record is found, move it to the front of the list. (Similar to LRU buffer pool replacement strategy.) 3. Transpose: When a record is found, swap it with the record ahead of it.
An exact-match query is a search for the record whose key value matches a specified key value. A range query is a search for all records whose key value falls within a specified range of key values.
Order lists by (expected) frequency of occurrence.
– Perform sequential search Assume that for each key ki, the probability pi that the record with key ki will be requested. Cost to access first record: 1 Cost to access second record: 2 Expected search cost:
-> 0011010100010100 (from the left, 1 represents primes 2,3,5,7,11,13)
-> 0011010100010100 & 0101010101010101
13
Hashing (1)
Hashing: The process of mapping a key value to a position in a table.
3
Searching Unsorted and Sorted Arrays
Searching Unsorted Arrays-- Sequential Search
How many comparisons does linear search do on average?
For large collections of records that are searched repeatedly, sequential search is unacceptably slow.
Search
Given: Distinct keys k1, k2, …, kn and collection L of n records of the form (k1, I1), (k2, I2), …, (kn, In) where Ij is the information associated with key kj from record j for 1 <= j <= n. • Search Problem: For key value K, locate a record (kj, Ij) in L such that kj = K. (if one exists). Searching is a systematic method for locating the record(s) with key value kj = K.
When we know something about the distribution of key values, then search L at a position p that is appropriate to the value of K as follows.
5
Lists Ordered by Frequency
1
Successful vs. Unsuccessful
A successful search is one in which a record with key kj = K is found. An unsuccessful search is one in which no record with kj = K is found (and no such record exists).
-> For dense sets (small range, high percentage of elements in set). -> Can use logical bit operators. 1 if element in set, 0 otherwise.
2) For Example: to find all primes that are odd numbers between 0 and 15
The car on the left hit the car I left. The car on 3 left hit 3 5 I 5. This is similar in spirit to Ziv-Lempel coding.
12
Searching in Sets
1) Using bit vector to represent the set.
10
Example 9.4-page 321 Assume that we have eight records, with key values A to H, and that they are initially placed in alphabetical Order (A B C D E F G H). Now, consider the result of applying the following access pattern: F D F G E G F A D F G E. 1) Count heuristic, the final list resulting will be F G D E A B C H, the total cost -- 45 comparisons. 2) Move-to-front heuristic, the final list resulting will be E G F D A B C H, the total cost -- 54 comparisons. 3) Transpose heuristic, the final list resulting will be A B F D G E C H, the total cost -- 62 comparisons.
相关主题