Distcc

Distributed C/C++ Compiler

What is Distcc?
Distributed C Compiling gives the ability to use multiple machines to compile source code for a defined platform or architecture. The benefit of this process is a faster compile time. It is especially useful when installing large packages on a slow machine where resources are limited--such as Open Office on an Intel Pentium3. By distributing the work among two or more machines, the job can be completed in a measurably shorter time frame. As you can immagine, the more cpu's involved, the quicker the job is completed.

Distcc is not the compiler, itself, but rather the vehicle used to communicate between target machine where the binary file will run and the volunteering machines.  A distcc daemon running on volunteering machines accepts the source code (given proper permissions were granted) and envokes an apporpriate compiler, it then returns the resulting binary file and discards the source.

How Does Distcc Work?
Distcc sends preprocessed source code to designated machines on any given network via TCP (transmission control protocol) port 3632 to be compiled and the awaits the return of the resulting binary file.  Each source file distributed is considered a 'job'.  The number of jobs a machine can work varies from machine to machine, depending on type and quantity of processors.  Naturally, mult-core processors can handle more jobs than a single-core processor.  The rule of thumb is two jobs per core involved plus one.  A typical machine has one core, one cpu, and therefore can handle three jobs ( 2 x cpu's + 1); two machines can handle five jobs; etc.  Given a computer lab such as ours consisting of thirty single core machines, the possibility of 61 jobs exists.

Distcc can involve as many machines as desired provided the overhead required to distribute the jobs is taken into consideration.  The target machine has the task of managing the jobs issued to other machines, and thus will consume some resources depnedent upon the quantity of jobs involved.  The theorietical limit for a machine of is not readily quanitfiable, but limited machines can encounter issues when handling more than 20 jobs at a time and therefore could not utilize the capacity of a distcc network the size of our computer lab.

Can Distcc Work Across Multiple Achitectures?
Yes.  With the help of a cross-compiler, a Pentium machine can compile source code for use on a Power PC or SPARC machine.  This does require additional software, but is easily achieved.

My Project:
For my project, I chose to use Gentoo Linux due to its standard of compiling all software for each installation as opposed to installing pre-compiled binary packages.  In theory, this gives the benefit of a more efficient operating system as each program installed is compiled specically for your usage and architecture, instead of a generic compilation for use across a broad spectrum of varying machines.  Gentoo also has the added benefit of a package manager (emerge) that will configure the installing software to your machine and acquire necessary dependencies when applicable.  Due to Gentoo's requirments to target a specific machine, initial setup of the operating system is more tedious, but rewarding as you learn the intimate specifications of your configuration.  This requires the knowledge of the hardware installed on the machine and any additional modules desired so you can properly configure the kernel and compile it for use.  Gentoo requires the kernel to be compiled for eash installation as it does not include one for generic useage.

Installing Gentoo begins with a typical live-CD, available in multiple architectures and for GUI or minimal command-line installation.  I chose a minimal interface for maximum configurability and proceeded to configure and install on my machine (machine-1).  Once installation was complete, I 'emerged' distcc and the began the installation process on an additional machine (machine-2) just as I did the first.  When it came time to compile the kernel, I 'emerged' distcc on the new machine with a 'no-deps' flag to install only distcc without any dependencies.  Then configured machine-1 to accept jobs from machine-2 via the distcc daemon, and configured machine-2 to send jobs to machine-1.  I then started top (a cpu/memory resource monitor) on machine-1 and proceeded to compile the kernel on machine-2. 

SUCCESS!  I was able to watch the distcc daemon and C compilers activate via top on machine-1 while machine-2's kernel was compiled.

I then finished the installation process and watched machine-2 come alive having worked half-as-hard as machine-1 did to get started and completed installation in considerably less time.

Sources Used:

Sources for Alternate Linux Distros:

 

Jack E Dorris

jack.at.jacknat.dot.com

CIS122 Spring 2007