TutHashCreate


Name

TutHashCreate — create a hash table

Synopsis

T_HASH_TABLE TutHashCreate(init_size, hash_func, test_func) 
T_INT4 init_size; 
T_HASH_CALC_FUNC hash_func; 
T_HASH_TEST_FUNC test_func; 

Arguments

init_size — initial size of the hash table

hash_func — function to calculate the hash code for a key

test_func — function to compare two keys

Return Values

Returns a new hash table.

Diagnostics

None

Description

TutHashCreate creates a new hash table. A hash table consists of keys and values, where each key can be used to look up the corresponding value (hash tables have also been called symbol tables and dictionaries). Values can be inserted, deleted, and looked up very quickly. Hash tables provide a good balance between time used and storage used.

A key can be any pointer-sized (usually 4 bytes) data type (such as a character string, an integer, or any generic pointer). Each hash table normally uses the same data type for all of its keys; for example, all keys may be strings or integers, but not both. SmartSockets provides convenience functions that create hash tables with these types of keys:

SmartSockets hash tables consist of a hash function hash_func, a test function test_func, and an array of buckets. When a value is looked up from its key, the key is hashed using hash_func to provide an index into the bucket array. Each bucket is a linked list of (key, value) pairs. SmartSockets hash tables use open hashing. In open hashing, different keys can hash to the same bucket. To find the proper (key, value) pair from the bucket, the bucket is then searched linearly using test_func to find the proper key.

One problem with hash tables is knowing how big to make the bucket array. If the array of buckets is large enough and the hash function provides a fairly random spread of bucket indices, then on average each bucket only has one (key, value) pair stored in it. This provides for constant time for each look up. If the bucket array is too large, then storage is wasted. If the bucket array is too small, then each bucket has more than one (key, value) pair, and the time for each look up increases. (Each bucket is searched using a linear search.)

To avoid wasting both storage and time, SmartSockets hash tables dynamically resize themselves if they get too full. Each time a new value is inserted into a hash table, the hash table checks how many entries it has and how large its bucket array is. If the number of entries is approaching the number of buckets, then the size of the bucket array is doubled, and the existing entries are redistributed throughout the new bucket array. Resizing never takes place on a look up, only on an insert.

The initial size of the hash table bucket array is specified by init_size. The initial size should be large enough to prevent numerous resizes as entries are inserted, yet small enough to prevent wasted storage.

Write the hash function hash_func to take one argument: the key it is hashing. hash_func should hash the key to a hash code and return that hash code. The array index in the bucket array is then calculated by (hash code) modulo (bucket array size). Hash functions should try to provide a wide range of hash codes.

Write the test function test_func to take two arguments: the two keys that are being compared. test_func should return TRUE if the two keys are the same, and FALSE if the two keys are different.

It is normally not necessary to write hash functions and test functions. The convenience functions TutHashCreateInt, TutHashCreatePtr, TutHashCreateStr, and TutHashCreateStri all create hash tables with predefined hash and test functions.

Hash tables do not manage storage for their keys and values. It is up to the caller to manage the memory allocation of the data structures for the keys and values.

Caution

If init_size is too small, then inserting new entries into the hash table causes the hash table to be resized often.

See Also

TutHashCreateInt, TutHashCreatePtr, TutHashCreateStr, TutHashCreateStri, TutHashDestroy

Examples

This example creates a simple hash function for character strings:

T_INT4 hash_string(key) 
T_PTR key; 
{ 
  T_INT4 hash_code = 0; 
  T_STR str_key = (T_STR)key;  
  while (*str_key != ’\0’) { 
    hash_code += *str_key; 
    str_key++; /* move on to next character */ 
   }  
   return hash_code; 
} /* hash_string */ 

This example is a simple test function for character strings.

T_BOOL test_string(key1, key2) 
T_PTR key1; 
T_PTR key2; 
{ 
  /* return TRUE if the keys are identical */ 
  return (strcmp((T_STR)key1, (T_STR)key2) == 0); 
} /* test_string */ 

This example creates a hash table using those functions with enough room initially for 4 entries.

T_HASH_TABLE hash_table; 
hash_table = TutHashCreate(4, hash_string, test_string); 

Once the hash table is created, you can store values using TutHashInsert and look them up with TutHashLookup.

/* use the above hash table to map the directions to integers */ 
TutHashInsert(hash_table, "NORTH", (T_PTR)1); 
TutHashInsert(hash_table, "EAST", (T_PTR)2); 
TutHashInsert(hash_table, "SOUTH", (T_PTR)3); 
TutHashInsert(hash_table, "WEST", (T_PTR)4); 
/* look up "EAST" */ 
TutOut("the value for EAST is %d\n",  
      (T_INT4)TutHashLookup(hash_table, "EAST")); 

TIBCO SmartSockets™ Utilities
Software Release 6.8, July 2006
Copyright © TIBCO Software Inc. All rights reserved
www.tibco.com