Pulling Strings With Puppet - James Turnbull [6]
A Transactional Layer
Puppet's transactional layer provides the engine of the Puppet client-server deployment. Configurations are created and can be executed repeatedly on the target hosts. The Puppet application architecture describes this sort of configuration as idempotent, meaning multiple applications of the same operation will yield the same results.
Puppet is not fully transactional; your transactions aren't logged (other than informative logging) and hence you can't roll back transactions as you can with many databases. You can, however, model transactions in a noop (no operation) mode that allows you to test the execution of your changes without actually making any changes.
A Resource Abstraction Layer
Lastly, Puppet provides an abstraction layer between your platform and the description of your configuration. The resources defined in Puppet to configure your nodes are independent from the commands, formats, and syntax required to configure those resources locally on your nodes. So it does not matter whether you want to create a user on one of many platforms, Puppet considers the definition of that user to be identical.
Puppet does this abstraction using providers. Providers are implementations of resource types. In the provider model, a resource is defined in Puppet. The resource is then set to be applied on a node or nodes. Puppet then detects the platform of the node, and the appropriate provider for that platform is then called and used to actually implement the configuration on the node. For example, when creating a user, we define a user type resource in Puppet. We then tell Puppet that we want to create that user on all Mac OS X and OpenBSD nodes. Then when an OS X node connects, the OS X user provider is called and the user created. When an OpenBSD node connects, the OpenBSD user provider is called and the user created. Some platforms share providers, for example, managing files on many platforms is similar, if not identical. Some other resource types, for example, the package resource type, which installs and manages software packages, have numerous providers, as package management is different on a variety of platforms.
Puppet Performance and Hardware
Understanding of Puppet's scalability and performance is still immature at the time of writing. There are two facets to performance management-the number of nodes connected and the amount of configuration defined on each node. There are no clear-cut guidelines around how many nodes and how much configuration on each can be supported on a single master server or around the scale and capacity of hardware required to run the master. Anecdotal evidence suggests that 50 to 100 nodes with a moderate amount of configuration can be managed on a single CPU master with 2GBs of RAM. More nodes will obviously require scaled up hardware.
Internally Puppet uses the WEBrick web server to interface with clients. The WEBrick server does have performance limitations and hence does not provide a fully scalable solution. As an alternative, Puppet also has the capability of internally making use of the more scalable Mongrel web server instead of WEBrick. A load balancer, such as Apache with mod _proxy or Pound, is then placed in front of Puppet. This allows the use of multiple load-balanced Puppet master instances, which should result in a more scalable solution. Generally, the WEBrick web server no longer performs adequately if you are managing 50 or more nodes, and migration to Mongrel will probably be needed.
Note -► In Chapter 6, I'll demonstrate how to replace WEBrick with Mongrel.
Sites have reported that when running Puppet with load balancing and Mongrel, node volumes of 5000 or more are feasible with appropriate hardware.
The Future for Puppet
Lastly, it is very important to remember that Puppet is a young tool and is still in the midst of development and change. The Puppet community is growing quickly, and many new ideas, developments, patches, and recipes appear every