Online Book Reader

Home Category

Mastering Algorithms With C - Kyle Loudon [74]

By Root 1417 0
not all positions will be probed.

Linear probing


One simple approach to probing an open-addressed hash table is to probe successive positions in the table. Formally stated, if we let i go between and m - 1, where m is the number of positions in the table, a hash function for linear probing is defined as:

h(k,i) = (h'(k)+i) mod m

The function h' is an auxiliary hash function, which is selected like any hash function; that is, so that elements are distributed in a uniform and random manner. For example, we might choose to use the division method of hashing and let h' (k) = k mod m. In this case, if we hash an element with key k = 2998 into a table of size m = 1000, the hash codings produced are (998 + 0) mod 1000 = 998 when i = 0, (998 + 1) mod 1000 = 999 when i = 1, (998 + 2) mod 1000 = when i = 2, and so on. Therefore, to insert an element with key k = 2998, we would look for an unoccupied position first at position 998, then 999, then 0, and so on.

The advantage of linear probing is that it is simple and there are no constraints on m to ensure that all positions will eventually be probed. Unfortunately, linear probing does not approximate uniform hashing very well. In particular, linear probing suffers from a phenomenon known as primary clustering, in which large chains of occupied positions begin to develop as the table becomes more and more full. This results in excessive probing (see Figure 8.2).

Figure 8.2. Linear probing with h(k, i) = (k mod 11 + i) mod 11

Double hashing


One of the most effective approaches for probing an open-addressed hash table focuses on adding the hash codings of two auxiliary hash functions. Formally stated, if we let i go between and m - 1, where m is the number of positions in the table, a hash function for double hashing is defined as:

h(k,i) = (h 1(k)+ih 2(k)) mod m

The functions h 1 and h 2 are auxiliary hash functions, which are selected like any hash function: so that elements are distributed in a uniform and random manner. However, in order to ensure that all positions in the table are visited before any position is visited twice, we must adhere to one of the following procedures: we must select m to be a power of 2 and make h 2 always return an odd value, or we must make m prime and design h 2 so that it always returns a positive integer less than m.

Typically, we let h 1 (k) = k mod m and h 2 (k) = 1 + (k mod m' ), where m' is slightly less than m, say, m - 1 or m - 2. Using this approach, for example, if the hash table contains m = 1699 positions (a prime number) and we hash the key k = 15,385, the positions probed are (94 + (0)(113)) mod 1699 = 94 when i = 0, and every 113th position after this as i increases.

The advantage of double hashing is that it is one of the best forms of probing, producing a good distribution of elements throughout a hash table (see Figure 8.3). The disadvantage is that m is constrained in order to ensure that all positions in the table will be visited in a series of probes before any position is probed twice.

Figure 8.3. Hashing the same keys as Figure 8.2 but with double hashing, where h(k, i) = (k mod 11 + i(1 + k mod 9)) mod 11

Interface for Open-Addressed Hash Tables

Name


ohtbl_init

Synopsis

int ohtbl_init(OHTbl *htbl, int positions, int (*h1)(const void *key)

int (*h2)(const void *key), int (*match)(const void *key1

const void *key2), void (*destroy)(void *data));

Return Value

0if initializing the hash table is successful, or -1 otherwise.

Description

Initializes the open-addressed hash table specified by htbl. This operation must be called for an open-addressed hash table before the hash table can be used with any other operation. The number of positions to be allocated in the hash table is specified by positions. The function pointers h1 and h2 specify user-defined auxiliary hash functions for double hashing. The function pointer match specifies a user-defined function to determine if two keys match. It should perform in a manner similar to that described for chtbl_init. The destroy argument provides

Return Main Page Previous Page Next Page

®Online Book Reader