Concurrency, was Re: Doh! Stupid Programming Mistakes <humor>

Levi Pearson levi at cold.org
Wed Oct 25 15:23:25 MDT 2006


On Oct 25, 2006, at 11:51 AM, Bryan Sant wrote:
>
> I always just go with threads.  But then again, I do a lot of desktop
> software, where interaction between components is frequent and shared
> memory is more efficient, reliable, and convenient than message
> passing via pipes or some other IPC mechanism.  I'm not saying that
> Levi's points aren't valid, on the contrary, they are.  Memory space
> protection provided by a process is valuable...  Valuable if you're
> using C, or some other language that can stomp on or leak memory.  If
> you're using a language with memory management (Perl, C#, Java, Lisp),
> then the protection provided by processes has little value and some
> down sides.

You're conflating two different problems here.  First, there is the  
problem of memory safety.  C and C++ allow you to fairly easily write  
into memory that you should not write into.  Memory-safe languages  
don't let you do it at all.  Memory safety requires that all  
primitive memory allocation and especially deallocation be managed by  
the language runtime, typically via a garbage collector.

The second problem is the nondeterministic interleaving of execution  
that exists in the shared-memory concurrency model.  Every heap  
variable is, by default, shared by all threads.  Since the scheduler  
can switch between threads at arbitrary times, a program that uses  
heap variables naively will almost certainly behave unpredictably and  
not do what you want.  Enter locks.  They allow you to re-serialize  
the execution of your program in certain areas, so only one thread  
can run at a time.  This solves one problem, but creates a few more.

First of all, you must remember to put locks in all the right  
places.  Some higher level languages help out quite a bit with this,  
but if you're doing raw pthreads in C, it's pretty easy to screw up  
and create a race condition, where nondeterminism creeps into your  
program again.  And in any language higher-level than assembly, it's  
entirely possible that an operation that looks atomic on the surface  
(i.e., can't be broken down any further in that language) actually  
consists of many machine operations, so the scheduler could switch to  
a different thread /in the middle/ of that operation.  Doing shared- 
memory concurrency safely in a high-level language requires a lot of  
information about the implementation of that language, which kind of  
defeats the purpose.

Second, you are hampered in your ability to create new abstractions.   
When multiple shared resources are involved, you must be careful to  
obtain and release the locks in the correct order.  This is a pain,  
it creates concerns that cross abstraction barriers, and is generally  
an impediment to good software design practices.

Finally, locks can create performance issues.  The purpose of a lock  
is to serialize your program, and if there are too many of them, your  
amount of parallelism drops through the floor and you end up with a  
serial program.  In the worst case, you can deadlock and bring the  
program to a halt.  Getting good performance with locks along with  
elimination of 100% of race conditions and deadlocks is a very hard  
thing to do.  As the amount of concurrency goes up, the performance  
penalty of locks and the chance of hitting a lurking race condition  
goes up, too.

So, I hope that made the distinction between the problems caused by  
lack of memory safety and the problems caused by shared-state  
concurrency clear.  Regardless of the problems, both are still  
sometimes the right solution.  They just shouldn't be the DEFAULT  
solution for a programmer who wants to write correct code, in  
general.  Some particular high-level languages and programming  
environments make using any other concurrency paradigm at least as  
difficult; programmers in such environments are simply screwed, and  
should demand better tools.

> You can achieve a much more natural programming model by
> using threads and semaphores, than processes and marshaled messages.

What feels natural to do is largely defined by the language you are  
using, so that is only true for a subset of languages.   I would  
argue that languages that make shared-state concurrency the most  
natural way to approach a problem ought to be redesigned so that  
shared-state concurrency is well-supported when necessary, but  
alternatives feel just as (or more, preferably) natural.

You have also left out one important option from your list, though;  
threads that by default share nothing, but can explicitly ask for  
regions of memory to be shared.  Combine that with software  
transactional memory (aka optimistic or lock-free concurrency) and  
message-passing and deterministic concurrency whenever they are  
appropriate, and you can use the tool that suits your problem and  
eliminate the possibility of large classes of programming errors,  
just like memory protection eliminates another large class of  
programming errors.

		--Levi



More information about the PLUG mailing list