Beautiful Code [229]
When trying to be all things to all people—a time- and memory-efficient data type for Python users, an internal data structure used as part of the interpreter's implementation, and a readable and maintainable code base for Python's developers—it's necessary to complicate a pure, theoretically elegant implementation with special-case code for particular cases… but not too much.
18.2.1. A Special-Case Optimization for Small Hashes
The PyDictObject also contains space for an eight-slot hash table. Small dictionaries with five elements or fewer can be stored in this table, saving the time cost of an extra malloc() call. This also improves cache locality; for example, PyDictObject structures occupy 124 bytes of space when using x86 GCC and therefore can fit into two 64-byte cache lines. The dictionaries used for keyword arguments most commonly have one to three keys, so this optimization helps improve function-call performance.
18.2.2. When Special-Casing Is Worth the Overhead
As previously explained, a single dictionary can contain keys of several different data types. In most Python programs, the dictionaries underlying class instances and modules have only strings as keys. It's natural to wonder whether a specialized dictionary object that only accepted strings as keys might provide benefits. Perhaps a special-case data type would be useful and make the interpreter run faster?
18.2.2.1. The Java implementation: another special-case optimization
In fact, there is a string-specialized dictionary type in Jython (http://www.jython.org), an implementation of Python in Java. Jython has an org.python.org.PyStringMap class used only for dictionaries in which all keys are strings; it is used for the _ _dict_ _ dictionary underpinning class instances and modules. Jython code that creates a dictionary for user code employs a different class, org.python.core.PyDictionary, a heavyweight object that uses a java.util.Hashtable to store its contents and does extra indirection to allow PyDictionary to be subclassed.
Python's language definition doesn't allow users to replace the internal_ _dict_ _ dictionaries by a different data type, making the overhead of supporting subclassing unnecessary. For Jython, having a specialized string-only dictionary type makes sense.
18.2.2.2. The C implementation: selecting the storage function dynamically
CPython does not have a specialized dictionary type, as Jython does. Instead, it employs a different trick: an individual dictionary uses a string-only function until a search for non-string data is requested, and then a more general function is used. The implementation is simple. PyDictObject contains one field, ma_lookup, that's a pointer to the function used to look up keys:
struct PyDictObject {
...
PyDictEntry *(*ma_lookup)(PyDictObject *mp, PyObject *key, long hash);
};
PyObject is the C structure that represents any Python data object, containing basic fields such as a reference count and a pointer to a type object. Specific types such as PyIntObject and PyStringObject extend the structure with additional fields as necessary. The dictionary implementation calls (dict->ma_lookup)(dict, key, hash) to find a key; key is a pointer to the PyObject representing the key, and hash is the hash value derived for the key.
ma_lookup is initially set to lookdict_string, a function that assumes that both the keys in the dictionary and the key being searched for are strings represented as Python's standard PyStringObject type. lookdict_string can therefore take a few shortcuts. One shortcut is that string-to-string comparisons never raise exceptions, so some unnecessary error checking can be skipped. Another is that there's no need to check for rich comparisons on the object; arbitrary Python data types can provide their own separate versions of <, >, <=, >=, ==, and !=, but the standard string type has no such special cases.
If a nonstring key is encountered, either because it's used as a dictionary key or the program makes an attempt to search for it, the ma_lookup