当前位置：文档之家› 字符串哈希算法

字符串哈希算法

经典字符串Hash函数工作中经常需要用大hash这个强有力的工具，hash表最核心的部分则在于怎么设计一个好的hash函数，以使数据更均匀地分布在若干个桶上。

下面来介绍一下我现在用到的一个hash函数，我们来看代码：unsigned chostcachehash::get_host_key(const string& host){int result = 1;unsigned i = 0;for (i = 0; i <host.size(); i++)result = 31 * result + host[i];return abs(result);}inline unsigned getkey(unsigned key){return (key % m_capacity);}m_capacity为hash桶的个数对于一个字符串，我们首先调用get_host_key()来得到一个key，然后再用这个key调用getkey来得到他在hash桶里的位置。

这个是我们在工作一直使用的hash函数，效果也还可以。

今天忽然心血来潮，在网上搜了一下看还有没有更好的hash函数，被我发现了这篇文章，于是转过来看看。

php中出现的字符串hash函数static unsigned long hashpjw(char *arkey，unsigned intnkeylength){unsigned long h = 0，g;char *arend=arkey+nkeylength;while (arkey<arend) {h = (h << 4) + *arkey++;if ((g = (h & 0xf0000000))) {h = h ^ (g >> 24);h = h ^ g;}}return h;}openssl中出现的字符串hash函数unsigned long lh_strhash(char *str){int i，l;unsigned long ret=0;unsigned short *s;if (str == null) return(0);l=(strlen(str)+1)/2;s=(unsigned short *)str;for (i=0; i %有点看不懂ret^=(s[i]<<(i&0x0f));return(ret);} *//* the following hash seems to work very well on normal text strings * no collisions on /usr/dict/words and it distributes on %2^n quite* well，not as good as md5，but still good.*/unsigned long lh_strhash(const char *c){unsigned long ret=0;long n;unsigned long v;int r;if ((c == null) || (*c == "{post.abstract}"))return(ret);/*unsigned char b[16];md5(c，strlen(c)，b);return(b[0]|(b[1]<<8)|(b[2]<<16)|(b[3]<<24));*/n=0x100;while (*c){v=n|(*c);n+=0x100;r= (int)((v>>2)^v)&0x0f;ret=(ret<>(32-r));ret&=0xffffffffl;ret^=v*v;c++;}return((ret>>16)^ret);}mysql中出现的字符串hash函数#ifndefnew_hash_function/* calchashvalue for a key */static uintcalc_hashnr(const byte *key，uint length){register uint nr=1，nr2=4;while (length--){nr^= (((nr & 63)+nr2)*((uint) (uchar) *key++))+ (nr << 8);nr2+=3;}return((uint) nr);}/* calchashvalue for a key，case indepenently */static uintcalc_hashnr_caseup(const byte *key，uint length){register uint nr=1，nr2=4;while (length--){nr^= (((nr & 63)+nr2)*((uint) (uchar) toupper(*key++)))+ (nr << 8);nr2+=3;}return((uint) nr);}#else/** fowler/noll/vo hash** the basis of the hash algorithm was taken from an idea sent by email to the * ieeeposix p1003.2 mailing list from phongvo (kpv@) and * glenn fowler (gsf@). landon curt noll (chongo@) * later improved on their algorithm.** the magic is in the interesting relationship between the special prime* 16777619 (2^24 + 403) and 2^32 and 2^8.** this hash produces the fewest collisions of any function that we"ve seen so * far，and works well on both numbers and strings.*/uintcalc_hashnr(const byte *key，uintlen){const byte *end=key+len;uint hash;for (hash = 0; key < end; key++){hash *= 16777619;hash ^= (uint) *(uchar*) key;}return (hash);}uintcalc_hashnr_caseup(const byte *key，uintlen) {const byte *end=key+len;uint hash;for (hash = 0; key < end; key++){hash *= 16777619;hash ^= (uint) (uchar) toupper(*key);}return (hash);}#endif从上表可以看出，这些经典软件虽然构造字符串Hash函数的方法不同，但是它们的效率都是不错的，相互之间差距很小，读者可以参考实际情况从其中借鉴使用。

暴雪公司有个经典的字符串的hash公式先提一个简单的问题，假如有一个庞大的字符串数组，然后给你一个单独的字符串，让你从这个数组中查找是否有这个字符串并找到它，你会怎么做？有一个方法最简单，老老实实从头查到尾，一个一个比较，直到找到为止，我想只要学过程序设计的人都能把这样一个程序作出来，但要是有程序员把这样的程序交给用户，我只能用无语来评价，或许它真的能工作，但...也只能如此了。

最合适的算法自然是使用HashTable（哈希表），先介绍介绍其中的基本知识，所谓Hash，一般是一个整数，通过某种算法，可以把一个字符串"压缩" 成一个整数，这个数称为Hash，当然，无论如何，一个32位整数是无法对应回一个字符串的，但在程序中，两个字符串计算出的Hash值相等的可能非常小，下面看看在MPQ中的Hash算法unsigned long HashString(char *lpszFileName, unsigned long dwHashType){unsigned char *key = (unsigned char *)lpszFileName;unsigned long seed1 = 0x7FED7FED, seed2 = 0xEEEEEEEE;intch;while(*key != 0){ch = toupper(*key );seed1 = cryptTable[(dwHashType<< 8) ch] ^ (seed1 seed2);seed2 = ch seed1 seed2 (seed2 << 5) 3;}return seed1;}Blizzard的这个算法是非常高效的，被称为"One-Way Hash"，举个例子，字符串"unitneutralacritter.grp"通过这个算法得到的结果是0xA26067F3。

是不是把第一个算法改进一下，改成逐个比较字符串的Hash值就可以了呢，答案是，远远不够，要想得到最快的算法，就不能进行逐个的比较，通常是构造一个哈希表(Hash Table)来解决问题，哈希表是一个大数组，这个数组的容量根据程序的要求来定义，例如1024，每一个Hash值通过取模运算(mod)对应到数组中的一个位置，这样，只要比较这个字符串的哈希值对应的位置又没有被占用，就可以得到最后的结果了，想想这是什么速度？是的，是最快的O(1)，现在仔细看看这个算法吧intGetHashTablePos(char *lpszString, SOMESTRUCTURE *lpTable, intnTableSize){intnHash = HashString(lpszString), nHashPos = nHash % nTableSize;if (lpTable[nHashPos].bExists&& !strcmp(lpTable[nHashPos].pString, lpszString)) returnnHashPos;elsereturn -1; //Error value}看到此，我想大家都在想一个很严重的问题："假如两个字符串在哈希表中对应的位置相同怎么办？",究竟一个数组容量是有限的，这种可能性很大。

e商务文档

字符串哈希算法

相关文档推荐：