Controlling Your Cloud with Puppet

Introduction

If you haven't noticed already, we are getting pretty excited about cloud computing at Evolving Web. Our latest deployment included 5 Rackspace Cloud servers being controlled by Puppet and provided the client with the ability to scale up and down easily depending on traffic. Scaling up involves spinning up a new server instance and configuring it for its role. But what is the best way to configure it? By hand? By scripts? By images? Well technically any of those methods will work, but I want to tell you about one you may not have heard of.

Configuration management tools are becoming more and more important when working in the cloud. Configuration is done programatically, and in one place, so for example to change your PHP memory_limit, you only edit one file and that change will propagate to all machines that require the config file. This is just a simple example of what configuration management tools can do, and how they can help simplify your sys-admin tasks.

Puppet

Puppet is a open-source configuration management tool that is becoming quite popular. It is already being used by companies like Google, Twitter and Red Hat, and has a very active community behind it.

I started looking into Puppet soon after playing with Amazon's EC2 cloud service. EC2 gives you access to hundreds of ready-to-go server images that you can boot up at will, but I couldn't find anything that perfectly fit my configuration needs. This is when Puppet came to my rescue. Within a few days I had the ability to take a fresh Ubuntu 9.10 EC2 instance and bring it to a fully configured LAMP stack. There is a bit of a learning curve for Puppet and I'll cover some of the basics here but the best way to learn is looking at examples. I will cite some of the links I found most helpful at the bottom of this post.

At the very lowest level, Puppet uses resources to manage systems. Some examples are packages, files, templates, directories, mounts, and services. You can even define your own resources if you know a bit of programming. The next step up from resources are classes. Classes contain multiple resources, and usually some logic to make your setup more robust. More detail about the Puppet language can be found here.

Now for some examples. First is my vim class.

class vim {
  package { vim:
    ensure => latest
  }

  file { "/etc/vim/vimrc":
    owner   => root,
    group   => root,
    mode    => 644,
    source  => "puppet:///vim/vimrc",
    require => Package["vim"],
  }

  exec { "update-alternatives --set editor /usr/bin/vim.basic":
    unless => "test /etc/alternatives/editor -ef /usr/bin/vim.basic"
  }
}

This is a pretty simple class, but still very powerful. Any machine that includes this class will have vim installed at the latest version, vim set as the default editor, and most important, have the /etc/vim/vimrc config file pulled in from the Puppet server. Now you can make a change to this file on the Puppet server, and the changes will be pushed to all your machines.

Being able to make a change to a single config file and watch the changes get pushed to all of your machines is pretty powerful. What's even more powerful is the ability to template config files. Template files are rendered using Ruby's ERB templates. Below I have a snippet from our GlusterFS client config file. GlusterFS is a powerful distributed file system.

<% servers.each do |server| -%>
volume vol-<%= servers.index(server) %>
  type protocol/client
  option transport-type tcp
  option remote-host <%= server %>
  option transport.socket.nodelay on
  option remote-port 6996
  option remote-subvolume iothreads
  option username <%= username %>
  option password <%= password %>
end-volume
<% end -%>

volume mirror-0
  type cluster/replicate
  subvolumes <%= servers.collect { |x| "vol-#{servers.index(x)}" }.join(" ") %>
  <% if servers.index(hostname) -%>
  option read-subvolume vol-<%= servers.index(hostname) %>
  <% end -%>
end-volume

Without going into too much detail of how GlusterFS works, the client configuration requires that you list all of the servers you are going to connect to. In our Puppet config we pass an array of server hostnames, along with some other variables, to the template. This is then rendered and stored on the client machine. That means if we ever add a GlusterFS server, we just have to edit the array and the client config will be rebuilt and distributed to all your machines!

So how do you decide which machine gets which config? As before I will show some examples.

node default {
  include git
  include subversion
  include sudo
  include ssh::client
  include ssh::server
  include vim

  package { ["htop", "emacs", "sysv-rc-conf", "python-software-properties", "rsync"]: ensure => latest }
}

First I created my default node. Every node that connects to Puppet will use this config by default. It includes some basic things I require on all my servers such as git, subversion, sudo, ssh server and client, vim, etc.

node webserver inherits default {
  include apache
  apache::module { ["rewrite", "headers"]: }
  include php::five
  include drupal::drush
  include mysql::client
}

Now I defined a webserver node. It will installs apache, enables some apache modules, installs php5, drush and the MySQL client.

node 'web01', 'web02', 'web03' inherits webserver {
  # more configuration code here if needed
}

Now I can assign the webserver role to servers web01, web02 and web03. When any of these machines contact the Puppet master they will receive the combination of default and webserver roles. Also, I can almost guarantee that all three of the machines will have identical configurations. If a local change is made to a configuration file that Puppet controls, it will be replaced the next time the Puppet client checks in with the master.

Conclusion

Puppet is just one of many configuration management tools, so I recommend trying out as many as you can and see which one works best. Chef is a relatively new tool, that shares many similarities with Puppet but also has some interesting features. I will be going over Chef in a upcoming blog post but until then try out Puppet and see what it can do for your infrastructure!

Blog

Controlling Your Cloud with Puppet

Introduction

Puppet

Conclusion

Links

Puppet documentation

Example Puppet files