Beautiful Code [219]
This comfortable scaling model, unfortunately, was found to be completely false when it came to one class of "enterprise" system, the s390 mainframe computer. This computer could run Linux in a virtual partition (up to 1,024 instances at the same time on a single machine) and had a huge number of different storage devices connected to it. Overall, the system had a lot of memory, but each virtual partition would have only a small slice of that memory. Each virtual partition wanted to see all different storage devices (20,000 could be typical), while only being allocated a few hundred megabytes of RAM.
On these systems, the device tree was quickly found to suck up a huge percentage of memory that was never released back to the user processes. It was time to put the driver model on a diet, and some very smart IBM kernel developers went to work on the problem.
What the developers found was initially surprising. It turned out that the main struct device structure was only around 160 bytes (for a 32-bit processor). With 20,000 devices in the system, that amounted to only 3 to 4 MB of RAM being used, a very manageable usage of memory. The big memory hog was the RAM-based filesystem mentioned earlier, sysfs, which showed all of these devices to userspace. For every device, sysfs created both a struct inode and a struct dentry. These are both fairly heavy structures, with the struct inode weighing in around 256 bytes and struct dentry about 140 bytes.[§]
[§] Both of these structures have since been shrunk, and therefore are smaller in current kernel versions.
For every struct device, at least one struct dentry and one struct inode were being created. Generally, many different copies of these filesystem structures were created, one for every virtual file per device in the system. As an example, a single block device would create about 10 different virtual files, so that meant that a single structure of 160 bytes would end up using 4 KB. In a system of 20,000 devices, about 80 MB were wasted on the virtual filesystem. This memory was consumed by the kernel, unable to be used by any user programs, even if they never wanted to look at the information stored in sysfs.
The solution for this was to rewrite the sysfs code to put these struct inode and struct dentry structures in the kernel's caches, creating them on the fly when the filesystem was accessed. The solution was just a matter of dynamically creating the directories and files on the fly as a user walked through the tree, instead of preallocating everything when the device was originally created. Because these structures are in the main caches of the kernel, if memory pressure is placed on the system by userspace programs, or other parts of the kernel, the caches are freed and the memory is returned to those who need it at the time. This was all done by touching the backend sysfs code, and not the main struct device structures.
The Linux Kernel Driver Model: The Benefits of Working Together > Small Objects Loosely Joined
16.4. Small Objects Loosely Joined
The Linux driver model shows how the C language can be used to create a heavily object-oriented model of code by creating numerous small objects, all doing one thing well. These objects can be embedded in other objects, all without any runtime type identification, creating a very powerful and flexible object tree. And based on the real-world usage of these objects, their memory footprint is minimal, allowing the Linux kernel to be flexible enough to work from the same code base for tiny embedded systems up to the largest supercomputers in the world.
The development of this model also shows two very interesting and powerful aspects of the way Linux kernel development works.