Concurrency, was Re: Doh! Stupid Programming Mistakes <humor>
bryan.sant at gmail.com
Fri Oct 27 13:04:56 MDT 2006
On 10/27/06, Levi Pearson <levi at cold.org> wrote:
> There have been efforts to build distributed shared memory systems,
> but I think they are fundamentally misguided. Even with today's high
> speed, low-latency interconnect fabrics, remote memory access is
> still significantly slower than local memory access to the point that
> hiding it behind an abstraction layer is counterproductive. In order
> to predict the performance of your system, you still need to know
> exactly when an access is local and when it is remote. Considering
> that the point of these systems is high performance, abstracting away
> an important factor in performance is not particularly wise.
I agree with Levi. The simplicity that's provided by an abstraction
like that is tempting, but the details that are being hidden are too
dramatic to the point that it isn't helpful, but harmful. When going
off box is hundreds/thousands of times slower than local access,
you're going to want to dictate that as a developer.
> This is especially true the less tighly connected your compute nodes
> get. A multi-processor computer with a Hypertransport bus can
> probably get away with abstracting away local vs. remote memory
I believe LinuxNetworx does this, but like you said, they have
intimate control over the nodes involved and their interconnect. It's
more similar to an integrated circuit than a loose network cluster.
> access. In a multi-node cluster connected by an Infiniband fabric,
> latency differences between local and remote access become
> significant, but one can typically assume fairly low latency and
> fairly high reliability and bandwidth. A cluster with gigabit
> ethernet moves to higher latency and lower bandwidth, and a grid
> system consisting of nodes spanning multiple networks makes treating
> remote operations like local ones downright insane.
What he said.
> Add these details to the increased difficulty of programming in a
> shared-state concurrency system, and it starts to look like a pretty
> bad idea. There are plenty of established mechanisms for concurrency
> and distribution that work well and provide a model simple enough to
> reason about effectively. Letting people used to writing threaded
> code in C/C++/Java on a platform with limited parallelism carry their
> paradigms over to highly-parallel systems is NOT a good idea in this
> case. Retraining them to use MPI, tuple space, or some other
> reasonable mechanism for distributed programming is definitely worth
> the effort.
I agree completely.
More information about the PLUG