Online Book Reader

Home Category

Beautiful Code [228]

By Root 5100 0
against the overhead it adds in space or calculation time. There were places where the Python developers found that a relatively naïve implementation was better in the long run than an extra optimization that seemed more appealing at first. In short, it often pays to keep things simple.

Second, real-life benchmarking is critical; only that way can you discover what's really worth doing.

18.1. Inside the Dictionary

Dictionaries are represented by a C structure, PyDictObject, defined in Include/dictobject.h. Here's a schematic of the structure representing a small dictionary mapping "aa", "bb", "cc", …, "mm" to the integers 1 to 13:

Code View: Scroll / Show All

int ma_fill 13

int ma_used 13

int ma_mask 31

PyDictEntry ma_table[]:

[0]: aa, 1 hash(aa) == -1549758592, -1549758592 & 31 = 0

[1]: ii, 9 hash(ii) == -1500461680, -1500461680 & 31 = 16

[2]: null, null

[3]: null, null

[4]: null, null

[5]: jj, 10 hash(jj) == 653184214, 653184214 & 31 = 22

[6]: bb, 2 hash(bb) == 603887302, 603887302 & 31 = 6

[7]: null, null

[8]: cc, 3 hash(cc) == -1537434360, -1537434360 & 31 = 8

[9]: null, null

[10]: dd, 4 hash(dd) == 616211530, 616211530 & 31 = 10

[11]: null, null

[12]: null, null

[13]: null, null

[14]: null, null

[15]: null, null

[16]: gg, 7 hash(gg) == -1512785904, -1512785904 & 31 = 16

[17]: ee, 5 hash(ee) == -1525110136, -1525110136 & 31 = 8

[18]: hh, 8 hash(hh) == 640859986, 640859986 & 31 = 18

[19]: null, null

[20]: null, null

[21]: kk, 11 hash(kk) == -1488137240, -1488137240 & 31 = 8

[22]: ff, 6 hash(ff) == 628535766, 628535766 & 31 = 22

[23]: null, null

[24]: null, null

[25]: null, null

[26]: null, null

[27]: null, null

[28]: null, null

[29]: ll, 12 hash(ll) == 665508394, 665508394 & 31 = 10

[30]: mm, 13 hash(mm) == -1475813016, -1475813016 & 31 = 8

[31]: null, null

The ma_prefix in the field names comes from the word mapping, Python's term for data types that provide key/value lookups. The fields in the structure are:

ma_used

Number of slots occupied by keys (in this case, 13).

ma_fill

Number of slots occupied by keys or by dummy entries (also 13).

ma_mask

Bitmask representing the size of the hash table. The hash table contains ma_mask+1 slots—in this case, 32. The number of slots in the table is always a power of 2, so this value is always of the form 2n–1 for some n, and therefore consists of n set bits.

ma_table

Pointer to an array of PyDictEntry structures. PyDictEntry contains pointers to:

The key object

The value object

A cached copy of the key's hash code

The hash value is cached for the sake of speed. When searching for a key, the exact hash values can be quickly compared before performing a slower, full equality comparison of the keys. Resizing a dictionary also requires the hash value for each key, so caching the value saves having to rehash all the keys when resizing.

We don't keep track directly of the number of slots in the table, but derive it instead as needed from ma_mask. When looking up the entry for a key, slot = hash & mask is used to figure out the initial slot for a particular hash value. For instance, the hash function for the first entry generated a hash of –1549758592, and –1549758592 mod 31 is 0, so the entry is stored in slot 0.

Because the mask is needed so often, we store it instead of the number of slots. It's easy to calculate the number of slots by adding 1, and we never need to do so in the most speed-critical sections of code.

ma_fill and ma_used are updated as objects are added and deleted. ma_used is the number of keys present in the dictionary; adding a new key increases it by 1, and deleting a key decreases it by 1. To delete a key, we make the appropriate slot point to a dummy key; ma_fill therefore remains the same when a key is deleted, but may increase by 1 when a new key is added. (ma_fill is never decremented, but will be given a new value when a dictionary is resized.)

Python's Dictionary Implementation: Being All Things to All People > Special Accommodations

18.2.

Return Main Page Previous Page Next Page

®Online Book Reader