Buildbot at SysGrove
What is buildbot
BuildBot (http://buildbot.net/) is a framework created to simplify the development of automation solutions in the field of software development. Unlike other systems such as Jenkins (http://jenkins-ci.org/) or TravisCI (https://travis-ci.org/) buildbot only provides a framework on which to build the infrastructure that you need . The project has client-server architecture where the server sends to the client the work to be performed, the server is called master and the clients are called slaves; the framework supports multiple masters and multiple slaves, in this way the system architecture can easily grow with the needs of your project. Buildbot is available for virtually any platform, even if the current version 0.8.x platform support win32-64 is not exactly the best though still usable with some changes (see http://trac.buildbot.net/wiki/RunningBuildbotOnWindows )
Buildbot at sysgrove
For our development needs in SysGrove( http://sysgrove.com) buildbot is used to generate the release of our application for the win32 platform. Every 12 hours we verify that there are some changes in our repository (https://bitbucket.org/sysgrove/sysgrove) and in this case, the following operations are performed: - Download the project from the repository (we use git, but the system supports hg, svn, bazaar etc ...) - Launch the test suite (in this case we use the testing tool discovery nosetests) - Generate the executable using py2exe - Build the initial database (sqlite and postgres) - It creates the archive with the application executable and test database(s) - Load the new release on "Confluence"
Clearly doing all these steps manually is tedious and can beprone to errors or oversights that would decrease the quality of our product.
As mentioned at the beginning we (http://sysgrove.com/) need to create an executable for the platform win32 (and possibly in the future for linux and osx) for which we need to run our script inside a win32 OS; given the relative difficulty of configuring a master on windows it was decided to have the master in a linux host (debian jessie) and a slave in a virtual machine with windows 7 ™ since that the slaves give far fewer problems .
The configuration file, as with cfg extension, is a simple python script, you can use this file to configure these main areas: - Slaves: you can define how many slaves you want each with its own identity - The scheduler: the scheduler indicate how and when the activities will be planned by the master - The builder: indicate the steps to take to create a build - Notifiers: the master can communicate in many ways the status of the build such as irc, email or web - Various configurations such as the database to use to store the state of the system, the identity of the project (name, url) etc.
As I said earlier I had some problems with windows, probably due to my inexperience with the version of python for this operating system. One of the first difficulty concerns sqlalchemy, an ORM for python; buildbot requires in fact a version of sqlalchemy == 0.7.10 as our application requires a newer version. Typically, such a problem is solved easily using the handy tool virtualenv (https://pypi.python.org/pypi/virtualenv) creating an environment for the master (the part that needs access to a db, sqlite by default) and one environment for the slave, but given my experience with windows, I could not separate the environments (which is why the master is on a debian host and the slave is a on windows VM).
Once you understand the general concepts of buildbot, extend and customize the functionality becomes simply a work of configuring and creating scripts that do the real work, besides the web interface by default enables us to keep track of our builds and forcing new if it were needed.
Clearly this is not a "point and click" tool to everyone and this is a good thing, in my humble opinion and responsibility for the maintenance of such a system requires knowledge of a typical IT department and not a manager who, if desired, can always send us an email to request a build rather than messing around with his hands no expert ;)
Exploring Distributed Calculation With RabbitMQ
Lately I’ve had to work on a complex web application which have started to have an increased (and increasing) number of users. Unfortunately this web application was not built to scale and so problems started to get to the surface. When the user base was small to medium the load on the application server was fairly low and we could focus on adding new features and growing our user base until we landed on the Facebook platform. At some point users started to came and the scheduled task took 10-20-40-70-90 minutues to complete leaving the users to stare at the “calculating tasks” page.
The first approved (the correct word should be "imposed") solution was to migrate our servers to AWS so we could increase host perfomance using bigger and bigger servers until we reached the limits of the single machine again.
After a lot of battling with the management we had the opportunity to detach the application core from the web site and started to build the new architecture with these loosely defined requirements:
better use of resources
better code organization and quality
ready for future development
I admit that those where my requirements, but the management undestood that they have lost the ability to understand the architecture beyhond a simple web site with a bounch of scheduled routines that, honestly, worked with the help of an infinite dose of luck…and also my prototype run in a fraction of time.
Our old backend has been broken in to those pieces:
libcore : the base code extracted from the old and stinky web site
RabbitMQ message broker
job scheduler : send messages to appropriate exchanges at appropriate time
workers : consume messages sent by the scheduler
Leaving libcore apart doing its stuff lets focus on the scheduler and the workers; for details about RabbitMQ please refer to its web site (http://www.rabbitmq.com/).
As a scheduling library I have choosen the open source Quartz.NET (http://quartznet.sourceforge.net/) which provided a simple but powerful interface to schedule jobs, I especially liked the ability to specify job triggers with the well known and compact cron syntax. Long story short the scheduled jobs harvests the tasks and send messages to exchanges accordingly. I think that the whole core plus the jobs are not bigger than some hundreds of lines of code. Wonderful!
In order to create a worker one should simply have to inherit from a BaseConsumer class and override the "consume" method to do its tasks when a message is received. Done. Cool :)
Given this simple architecture what are the benefits? Lets see:
scaling: I simply have to add more workers in the same or different machines and the calculating power will increase accordingly
flexibility: if a new task is required I have to create a new scheduled job and its consumer (even in a different language or platform) and maybe a new message type (messages are simple JSON described objects); after all is tested I have to update the scheduler and add a new worker either by updating a running worker or adding a new one without affecting the whole system; yes it can be done while the system is crunching its task and yes, obviously without affecting the web site(s)
cost effective: if we have a lot of task to complete at a certain hour in the day I could simply turn on a couple of more machines when I need it and turn those off when the whole work is completed; if you have ever used the Amazon AWS infrastructure you understand of much one can save using this approach
testing: having everything well separated I can test each block individually instead of having a code blob that is barely manageable
With this short post I hope to increase interest in this simple way of doing distribuited computing using a message broker and a simple but well architected system.