[OT] This feels wrong (pthreads question)

Levi Pearson levi at cold.org
Mon Jan 29 14:19:37 MST 2007


"Bryan Sant" <bryan.sant at gmail.com> writes:

> The single advantage of threads over other parallel processing options
> is a shared heap.  If you don't need to share data frequently between
> different lanes of execution, then don't bother with threads.  I think
> you could make a valid performance case for wanting to share the
> R-tree between multiple threads for this app.

Well, both event-driven and multithreaded designs we've discussed live
in a single address space, and using multiple processes would be even
more heavyweight than kernel threads, though it would make scaling
through multiple servers easier.

> I'm sure this happens a lot with novice thread developers, but an
> experienced thread developer knows that A) thread creation is
> heavyweight, and B) threads should be pooled and reused.  Allowing a
> threaded app to spawn threads at will is a recipe for disappointment
> in both performance and resource usage.

It sounds like you're trying to disagree with me, but this is exactly
what I've been saying.  A reasonably-sized thread pool is not what I
meant by massively multithreaded, and is precisely what I advocated
once he expressed concern over CPU utilization on a multi-processor
machine.

> That link is broken, but it's a straw man anyway.  I'd like to see the
> apache config.  If apache was configured to use a thread *pool*,
> there'd be no increased DOS risk (and throughput would be much
> better).

The link works for me.  Google for 'apache vs yaws' if it continues to
not work for you.  The page includes a little bit of information
regarding the apache configuration, namely that it uses
mpm_worker_module.  I don't know beyond that how the setup was
configured.  In any case, I was not trying to attack apache with the
comparison, but rather the naive "one thread per connection" model
that apache seemed to be using in this case.  So, it was not a straw
man at all, but an analogy.  A thread pool, as you and I have both
suggested, simply combines event dispatch with a small pool of worker
threads, so it would clearly have a similar profile to
yaws/lighttpd/etc.

> It's ironic that threads are essentially an event loop handled by the
> kernel's scheduler rather than your userland process.  So in both
> cases, you're really just comparing two event loop models.  In the
> case of threads there is the potential bonus of true parallel
> execution and thus improved performance and CPU utilization, on the
> down side you have potential context switches and a comparatively high
> creation time.

Creation time is not the only issue.  System resource utilization by
threads is also significant, since a new process structure and a new
stack space must be allocated for each thread.  Scaling to the 100k+
thread level may even require a kernel recompile to make more space,
depending on how the kernel is configured by the distro.  Clearly a
thread pool doesn't have this problem, though.

> If you create your thread pool up front and don't use many more
> threads than you have CPU cores, then you'll see a performance champ
> (good overall performance, but more importantly, good /throughput/).
> If you use too many threads, then you may have context thrashing and
> might actually see performance go down.

Again, this is pretty much what I've been saying.  Hopefully it will
be convincing coming from so many people.

                --Levi



More information about the PLUG mailing list