Cluster computing with Linux & Beowulf

Levi Pearson levipearson at gmail.com
Sun Feb 23 02:46:59 MST 2014


On Sat, Feb 22, 2014 at 2:42 AM, Dan Egli <ddavidegli at gmail.com> wrote:
> Hey pluggers,
>
>
>
> I've a couple questions on the Linux cluster computing software, Beowulf (I
> think that's what it was called). Anyone know if there's a maximum number
> of nodes that a cluster can support? And can the cluster master add new
> nodes on the fly so to speak?
>

Beowulf clusters are a vaguely-defined concept. The basic idea was to
take a bunch of commodity-grade hardware and build a supercomputer out
of it, leveraging as much free software as possible in the process.
This enabled a lot of universities to run a lot bigger simulations and
other scientific computing tasks than the would otherwise have been
able to afford to run. Eventually the idea spread to gov't research
labs and commercial R&D labs as well.

I worked for Linux Networx for many years, which specialized in
building what were essentially Beowulf clusters. There's not really a
limit to how many nodes you can stick in them, but eventually you run
into issues supplying enough electricity and air conditioning to keep
them running. There are usually multiple layers of software involved
in managing a system like that, but the key layer for actually running
the codes the systems are designed for is the MPI software, which is a
sort of de-facto standard for distributed number crunching sort of
problems.

>
> My understanding of cluster computing is that you can send a job to the
> master server, and the master server will in turn break the job between all
> the nodes in the cluster, then combine the results. I think it's not all
> that dissimilar to multi-processor specific tasks. Is this correct?
>

Yes, MPI is sort of a distributed multi-threading architecture, except
all threads communicate via message-passing rather than shared memory.
You have to write your programs specifically for the MPI library, and
then you use some cluster management software to get the binary
distributed to all the compute nodes that will be running it. You fire
it up on some "master node" and it coordinates the execution
throughout the compute nodes that were delegated to this particular
job.

>
> Here's a scenario I could envision. Please tell me if there's a serious
> flaw in the idea. I hope it explains what I'm thinking of.
>

The serious flaw would be that you have only a vague idea of what
you're asking about, and vague ideas rarely translate well into
distributed algorithms. There's no magic software that will make
arbitrary programs scale over large clusters of machines, and the ways
that will effectively distribute a program over multple machines vary
greatly according to the kind of computing you're doing.

>
> Program X is intended to handle quite literally up to hundreds of millions
> of simultaneous TCP connections across a number of network interfaces (the
> exact number is unimportant). So as program X runs, the overall load begins
> to rise as X has to do more and more work for each of the connections. That
> much is a given. Take, for instance, an MMORPG. The more people login to
> the game and interact with the world, the more work is involved for the
> actual server. Same idea here. My thought was that if X was built to
> support clustering and it was run on a Beowulf cluster, that would slow the
> initial growth of the load. Then, supposing that the dynamic addition is
> possible, I can watch the system and if the system load average on the
> cluster gets too large (due to overwhelming the CPU, not the internet
> pipe), someone  could reduce without the end user noticing anything simply
> by adding new machines. Is this right? Or do I have a major flaw in my
> thinking?
>

Beowulf clusters are almost never tasked with this sort of massive
connectivity problem. They are typically running massive *compute*
jobs, doing big number crunching simulation jobs or similar things.

How well a program scales up when you add more nodes depends to a huge
degree on what kind of constraints your problem has. Some are
"embarrasingly parallel" and can essentially scale indefinitely by
adding new nodes. Others can only be decomposed into
highly-interrelated sub-problems and can barely be scaled at all by
adding new nodes.

>
> If Beowulf isn't the software I'm thinking of, what is? And what other
> issues, other than perhaps memory usage, would be involved? And speaking of
> memory, is there a way to distribute the memory at all? I know that there
> are 8GB dimms available easily these days, and a high-end motherboard can
> usually support 8 dimms. So that's 64GB of ram. But for hundreds of
> millions of connections, I could see the possibility of exceeding memory
> capacity being a potential issue. I suppose I could find bigger dimms (I've
> heard rumors of 16GB and even 32GB dimms, but never seen them) but that's
> going to seriously jack up the server costs if I do it that way. So if
> there's a way to distribute the memory usage along with distributing the
> CPU work, that would be of great assistance.
>

Again, this is a hard problem with no 100% solutions. There's an
important finding in the field of distributed algorithms known as the
CAP Theorem. This essentially says that of the three desirable
properties of a distributed system (consistency, availability, and
partition tolerance) you can't have all at once. Consistency would
mean that all the users of the system see the same view of the data at
any point in time. Availability means that the system always gives a
response to a request as to whether the request was successful or not.
And Partition Tolerance relates to whether a system can continue to
operate in the face of some number of node failures.

There are a number of algorithms available to maintain *some* of these
properties, but it's likely impossible to guarantee that all can be
maintained. So you need to pick what's most important to you if you
want to farm a task out among multiple machines, especially if it's a
live-updating task that needs to run quickly.

    --Levi


More information about the PLUG mailing list