Distcc
Distributed C/C++ Compiler
What is Distcc?
Distributed C Compiling gives the ability to use multiple machines to compile source code for a defined platform or architecture.
The benefit of this process is a faster compile time. It is especially useful when installing large packages on a slow machine where resources are limited--such as Open Office on an Intel Pentium3.
By distributing the work among two or more machines, the job can be completed in a measurably shorter time frame. As you can immagine, the more cpu's involved, the quicker the job is completed.
Distcc is not the compiler, itself, but rather the vehicle used to communicate between
target machine where the binary file will run and the volunteering machines.
A distcc daemon running on volunteering machines accepts the source code (given
proper permissions were granted) and envokes an apporpriate compiler, it then returns
the resulting binary file and discards the source.
How Does Distcc Work?
Distcc sends preprocessed source code to designated machines on any given
network via TCP (transmission control protocol) port 3632 to be compiled and the
awaits the return of the resulting binary file. Each source file distributed
is considered a 'job'. The number of jobs a machine can work varies from machine
to machine, depending on type and quantity of processors. Naturally, mult-core
processors can handle more jobs than a single-core processor. The rule of
thumb is two jobs per core involved plus one. A typical machine has one core,
one cpu, and therefore can handle three jobs ( 2 x cpu's + 1); two machines can
handle five jobs; etc. Given a computer lab such as ours consisting of thirty
single core machines, the possibility of 61 jobs exists.
Distcc can involve as many machines as desired provided the overhead required to
distribute the jobs is taken into consideration. The target machine has the
task of managing the jobs issued to other machines, and thus will consume some resources
depnedent upon the quantity of jobs involved. The theorietical limit for a
machine of is not readily quanitfiable, but limited machines can encounter issues
when handling more than 20 jobs at a time and therefore could not utilize the capacity
of a distcc network the size of our computer lab.
Can Distcc Work Across Multiple Achitectures?
Yes. With the help of a cross-compiler, a Pentium machine can compile
source code for use on a Power PC or SPARC machine. This does require additional
software, but is easily achieved.
My Project:
For my project, I chose to use Gentoo
Linux due to its standard of compiling all software for each installation
as opposed to installing pre-compiled binary packages. In theory, this gives
the benefit of a more efficient operating system as each program installed is compiled
specically for your usage and architecture, instead of a generic compilation for
use across a broad spectrum of varying machines. Gentoo also has the added
benefit of a package manager (emerge) that will configure the installing software
to your machine and acquire necessary dependencies when applicable. Due to
Gentoo's requirments to target a specific machine, initial setup of the operating
system is more tedious, but rewarding as you learn the intimate specifications of
your configuration. This requires the knowledge of the hardware installed
on the machine and any additional modules desired so you can properly configure
the kernel and compile it for use. Gentoo requires the kernel to be compiled
for eash installation as it does not include one for generic useage.
Installing Gentoo begins with a typical live-CD, available in multiple architectures
and for GUI or minimal command-line installation. I chose a minimal interface
for maximum configurability and proceeded to configure and install on my machine
(machine-1). Once installation was complete, I 'emerged' distcc and the began
the installation process on an additional machine (machine-2) just as I did the
first. When it came time to compile the kernel, I 'emerged' distcc on the
new machine with a 'no-deps' flag to install only distcc without any dependencies.
Then configured machine-1 to accept jobs from machine-2 via the distcc daemon, and
configured machine-2 to send jobs to machine-1. I then started top (a cpu/memory
resource monitor) on machine-1 and proceeded to compile the kernel on machine-2.
SUCCESS! I was able to watch the distcc daemon and C compilers activate via
top on machine-1 while machine-2's kernel was compiled.
I then finished the installation process and watched machine-2 come alive having
worked half-as-hard as machine-1 did to get started and completed installation in
considerably less time.
Sources Used:
Sources for Alternate Linux Distros:
Jack E Dorris
jack.at.jacknat.dot.com
CIS122 Spring 2007