Mastering Algorithms With C - Kyle Loudon [64]
A: When inserting a member into a set, in which members may not be duplicated, we must search the entire set to ensure that we do not duplicate a member. This is an O (n) process. Removing a member from a set is O (n) as well because we may have to search the entire set again. In a multiset, inserting a member is considerably more efficient because we do not have to traverse the members looking for duplicates. Therefore, we can insert the new member in O (1) time. In a multiset, removing a member remains an O (n) process because we still must search for the member we want to remove.
Related Topics
Venn diagrams
Graphical representations of sets that help determine the results of set operations visually. For example, a Venn diagram depicting two intersecting sets consists of two slightly overlapping circles. The overlapping regions represent the intersection of the sets.
Bit-vector representation
A representation for sets useful when the universe is small and known. Each member in the universe is represented as a bit in an array. If a member exists in the set, its bit is set to 1; otherwise, its bit is set to 0.
Multisets
Sets in which members may be duplicated. In some problems the restriction of no duplicate members is too strict. A multiset is an alternative type of set for these problems.
Chapter 8. Hash Tables
Hash tables support one of the most efficient types of searching: hashing. Fundamentally, a hash table consists of an array in which data is accessed via a special index called a key. The primary idea behind a hash table is to establish a mapping between the set of all possible keys and positions in the array using a hash function. A hash function accepts a key and returns its hash coding, or hash value. Keys vary in type, but hash codings are always integers.
Since both computing a hash value and indexing into an array can be performed in constant time, the beauty of hashing is that we can use it to perform constant-time searches. When a hash function can guarantee that no two keys will generate the same hash coding, the resulting hash table is said to be directly addressed. This is ideal, but direct addressing is rarely possible in practice. For example, imagine a phone-mail system in which eight-character names are hashed to find messages for users in the system. If we were to rely on direct addressing, the hash table would contain more than 268 = (2.09)1011 entries, and the majority would be unused since most character combinations are not names.
Typically, the number of entries in a hash table is small relative to the universe of possible keys. Consequently, most hash functions map some keys to the same position in the table. When two keys map to the same position, they collide. A good hash function minimizes collisions, but we must still be prepared to deal with them. This chapter presents two types of hash tables that resolve collisions in different ways.
This chapter covers:
Chained hash tables
Hash tables that store data in buckets . Each bucket is a linked list that can grow as large as necessary to accommodate collisions.
Open-addressed hash tables
Hash tables that store data in the table itself instead of in buckets. Collisions are resolved using various methods of probing the table.
Selecting a hash function
The crux of hashing. By distributing keys in a random manner about the table, collisions are minimized. Thus, it is important to select a hash function that accomplishes this.
Collision resolution
Methods of managing when several keys map to the same index. Chained hash tables have an inherent way to resolve collisions. Open-addressed hash tables use various forms of probing.
Some applications of hash tables are:
Database systems
Specifically, those that require efficient random access. Generally, database systems try to optimize between two types of access methods: sequential and random. Hash tables are an important part of efficient random access because they