Beautiful Code [47]
>>> obj = MyClass() # Create a class instance
>>> obj.name = 'object' # Add a .name attribute
>>> obj.id = 14 # Add a .id attribute
>>> obj._ _dict_ _ # Retrieve the underlying dictionary
{'name': 'object', 'id': 14}
>>> obj._ _dict_ _['id'] = 12 # Store a new value in the dictionary
>>> obj.id # Attribute is changed accordingly
12
Module contents are also represented as a dictionary, most notably the _ _builtin_ _ module that contains built-in identifiers such as int and open. Any expression that uses such built-ins will therefore result in a few dictionary lookups. Another use of dictionaries is to pass keyword arguments to a function, so a dictionary could potentially be created and destroyed on every function call. This internal use of the dictionary type means that any running Python program has many dictionaries active at the same time, even if the user's program code doesn't explicitly use a dictionary. It's therefore important that dictionaries can be created and destroyed quickly and not use an overly large amount of memory.
The implementation of dictionaries in Python teaches several lessons about performance-critical code. First, one has to trade off the advantages of an optimization against the overhead it adds in space or calculation time. There were places where the Python developers found that a relatively naïve implementation was better in the long run than an extra optimization that seemed more appealing at first. In short, it often pays to keep things simple.
Second, real-life benchmarking is critical; only that way can you discover what's really worth doing.
18.1. Inside the Dictionary
Dictionaries are represented by a C structure, PyDictObject, defined in Include/dictobject.h. Here's a schematic of the structure representing a small dictionary mapping "aa", "bb", "cc", …, "mm" to the integers 1 to 13:
Code View: Scroll / Show All
int ma_fill 13
int ma_used 13
int ma_mask 31
PyDictEntry ma_table[]:
[0]: aa, 1 hash(aa) == -1549758592, -1549758592 & 31 = 0
[1]: ii, 9 hash(ii) == -1500461680, -1500461680 & 31 = 16
[2]: null, null
[3]: null, null
[4]: null, null
[5]: jj, 10 hash(jj) == 653184214, 653184214 & 31 = 22
[6]: bb, 2 hash(bb) == 603887302, 603887302 & 31 = 6
[7]: null, null
[8]: cc, 3 hash(cc) == -1537434360, -1537434360 & 31 = 8
[9]: null, null
[10]: dd, 4 hash(dd) == 616211530, 616211530 & 31 = 10
[11]: null, null
[12]: null, null
[13]: null, null
[14]: null, null
[15]: null, null
[16]: gg, 7 hash(gg) == -1512785904, -1512785904 & 31 = 16
[17]: ee, 5 hash(ee) == -1525110136, -1525110136 & 31 = 8
[18]: hh, 8 hash(hh) == 640859986, 640859986 & 31 = 18
[19]: null, null
[20]: null, null
[21]: kk, 11 hash(kk) == -1488137240, -1488137240 & 31 = 8
[22]: ff, 6 hash(ff) == 628535766, 628535766 & 31 = 22
[23]: null, null
[24]: null, null
[25]: null, null
[26]: null, null
[27]: null, null
[28]: null, null
[29]: ll, 12 hash(ll) == 665508394, 665508394 & 31 = 10
[30]: mm, 13 hash(mm) == -1475813016, -1475813016 & 31 = 8
[31]: null, null
The ma_prefix in the field names comes from the word mapping, Python's term for data types that provide key/value lookups. The fields in the structure are:
ma_used
Number of slots occupied by keys (in this case, 13).
ma_fill
Number of slots occupied by keys or by dummy entries (also 13).
ma_mask
Bitmask representing the size of the hash table. The hash table contains ma_mask+1 slots—in this case, 32. The number of slots in the table is always a power of 2, so this value is always of the form 2n–1 for some n, and therefore consists of n set bits.
ma_table
Pointer to an array of PyDictEntry structures. PyDictEntry contains pointers to:
The key object
The value object
A cached copy of the key's hash code
The hash value is cached for the sake of speed. When searching for a key, the exact hash values can be quickly compared before performing a slower, full equality comparison of the keys. Resizing