xuyan.nus: Hash Algorithms

Hash (Design Skills) should be considered firstly in case of Insert and Query.

Instead of scanning to find a element, in Hash, a Hash function is designed to map the key directly to an integer (or an address to find the element), so that the cost turns to be O(1).

The three key features:
1) Hash Function
2) Collision Process
3) Dynamic Internal Array Size

Basic:
1) Hash Function (Based on Uniform Distribution)
    1.1: distribute the keys uniformly into the slots of the table
    1.2: Example: h(k) = k mod m.
           Pick m to be a prime not too close to a power of 2 or 10.
   Purpose: keys are likely even dirtributed.
   Example m: 77813

    1.3: Example string hash function:

        int address = key[0] + 31 * key[1] + 137 * key[2] + 1571 * key[3] + 
              11047 * key[4] + 77813 * key[5];
        return address % kNumBuckets;

2) Collision Process
    2.1: Open Address:
          Key Idea:
          Two parameters (key and index)
          Problem:
          Deletion
Hash Load factor:
Given an open-addressed hash table with load factor α = n/m < 1, the
expected number of probes in an unsuccessful search is at most 1/(1–α).

    2.2: Linked List

3) Dynamic Array Size
if the internal array (s) is too small, you will have to do n/s operations for a lookup, which is still O(n). So you need to dynamically expand that internal array as the number of items in the hash table grows in order to keep the O(1) performance.

Implementation:
1: C++ std::map:
Very wired, the underlying data structure is a binary search tree. (a red-black tree).
WHY? ordered iterator. Do we need that? In most cases, no!!!

2: unordered_map:
The underlying data structure is a Hash Function.
unordered_map will be faster on insert and delete than a map.

xuyan.nus

Wednesday, 30 April 2014

Hash Algorithms

No comments:

Post a Comment