Concurrency, was Re: Doh! Stupid Programming Mistakes <humor>
bryan.sant at gmail.com
Wed Oct 25 17:11:45 MDT 2006
On 10/25/06, Levi Pearson <levi at cold.org> wrote:
> On Oct 25, 2006, at 11:51 AM, Bryan Sant wrote:
> You're conflating two different problems here. First, there is the
Na uh, you are one.
> The second problem is the nondeterministic interleaving of execution
> that exists in the shared-memory concurrency model. Every heap
> variable is, by default, shared by all threads. Since the scheduler
> can switch between threads at arbitrary times, a program that uses
> heap variables naively will almost certainly behave unpredictably and
> not do what you want. Enter locks. They allow you to re-serialize
> the execution of your program in certain areas, so only one thread
> can run at a time. This solves one problem, but creates a few more.
This same thing can be achieved by using threads and... Drum roll...
Not accessing shared resources. Threads of this nature are called
"worker threads" and serve the same purpose as a child process. Spawn
a separate thread and let that sucker run. No locks, no shared memory
access, just a dumb worker. Threads shine in the face of a separate
process when you have to do a lot of interaction between two or more
threads. However, just because threads are good at interaction via
shared resources and locks doesn't mean you HAVE to use that feature.
So just because you're using threads doesn't mean you're having to
grapple with all of these insane locking/race/dead-lock conditions.
Though I think that managing locks is simple and those who can't are
> First of all, you must remember to put locks in all the right
> places. Some higher level languages help out quite a bit with this,
Or you just dip into the vast "thread safe" libraries that come with
your runtime. All common thread safety issues are handled by your
data structures et al. I can't speak for other lesser languages, but
threading is a cake walk in Java due to the great threading support
built into the language.
List myList = new Vector(); // Thread safe. (uses locks on reads and writes)
Keeping all of my shared data in a List or a thread-safe Map
(Hashtable), ensures I have data integrity between threads and I'm not
having to explicitly mess around with locks if I don't want to. If I
do want to, there is a built-in "synchronized" keyword in java that
makes lock management s-i-m-p-l-e.
> but if you're doing raw pthreads in C, it's pretty easy to screw up
Right. Don't use threads if you're using C/C++. Do use threads when
using a higher-level language. Threads can do everything a forked
process can do, but a process can't do everything a thread can do. So
stick with threads if you're cool like me.
> and create a race condition, where nondeterminism creeps into your
> program again. And in any language higher-level than assembly, it's
> entirely possible that an operation that looks atomic on the surface
> (i.e., can't be broken down any further in that language) actually
Again don't use threads and C.
> consists of many machine operations, so the scheduler could switch to
> a different thread /in the middle/ of that operation. Doing shared-
> memory concurrency safely in a high-level language requires a lot of
> information about the implementation of that language, which kind of
> defeats the purpose.
I don't understand how that defeats the purpose. Please explain.
> Second, you are hampered in your ability to create new abstractions.
> When multiple shared resources are involved, you must be careful to
> obtain and release the locks in the correct order. This is a pain,
> it creates concerns that cross abstraction barriers, and is generally
> an impediment to good software design practices.
I completely disagree. You can design much cleaner software with
minor interaction between threads via locks and shared resources
versus child processes and marshaled messages. Keep your interaction
via locks and thread-safe shared resources to a minimum, but go ahead
a use that ability. It isn't a big deal. You make it out to be a
monumental task that /hampers one's ability to create new
abstractions/. That's nonsense.
> Finally, locks can create performance issues. The purpose of a lock
> is to serialize your program, and if there are too many of them, your
> amount of parallelism drops through the floor and you end up with a
> serial program. In the worst case, you can deadlock and bring the
> program to a halt. Getting good performance with locks along with
> elimination of 100% of race conditions and deadlocks is a very hard
> thing to do. As the amount of concurrency goes up, the performance
> penalty of locks and the chance of hitting a lurking race condition
> goes up, too.
Horrors! Locks serialize things? That's why you scope your locks to
be very specific. Or, as stated before, if you're afraid of locks and
shared resource interaction, then you can always be a coward and use a
thread just like a child process with no interaction at all (or only
via some marshaled method).
> So, I hope that made the distinction between the problems caused by
> lack of memory safety and the problems caused by shared-state
> concurrency clear. Regardless of the problems, both are still
> sometimes the right solution. They just shouldn't be the DEFAULT
> solution for a programmer who wants to write correct code, in
> general. Some particular high-level languages and programming
> environments make using any other concurrency paradigm at least as
> difficult; programmers in such environments are simply screwed, and
> should demand better tools.
In that case, C and Lisp should demand better tools for threading.
> What feels natural to do is largely defined by the language you are
> using, so that is only true for a subset of languages. I would
> argue that languages that make shared-state concurrency the most
> natural way to approach a problem ought to be redesigned so that
> shared-state concurrency is well-supported when necessary, but
> alternatives feel just as (or more, preferably) natural.
That's true by virtue of the fact that a thread can be used just like
a child process but the reverse is not true. Threads give you the
option to touch shared data -- not an obligation to do so. Child
processes restrict your options.
> You have also left out one important option from your list, though;
> threads that by default share nothing, but can explicitly ask for
> regions of memory to be shared. Combine that with software
> transactional memory (aka optimistic or lock-free concurrency) and
> message-passing and deterministic concurrency whenever they are
> appropriate, and you can use the tool that suits your problem and
> eliminate the possibility of large classes of programming errors,
> just like memory protection eliminates another large class of
> programming errors.
I get the benefits of what you're describing today by using threads
and then choosing if I'll allow that thread to access shared data or
not. In a good piece of OO software no data (or at least VERY little)
is global. The thread can't just go out and touch some data when it
wants like in C. When I create a thread, I pass only the objects (and
their data) that I WANT the thread to have access to -- otherwise, the
thread is hands-off from all other heap data. So as described above,
my threads only have access to the explicit this I give it access too.
Life is good.
But thank you for describing the utopia of parallel processing that I
enjoy today in that last paragraph.
More information about the PLUG