Corey Edwards tensai at zmonkey.org
Wed Oct 1 10:42:39 MDT 2008

On Tue, 2008-09-30 at 19:53 -0600, Matthew Walker wrote:
> On Tue, September 30, 2008 4:57 pm, Michael Torrie wrote:
> > Not sure what you mean here.  High load normally means the CPU is *not*
> > being utilized efficiently.  In fact, processes are not running because
> > they are waiting for stuff.  So a high load often will have a processor
> > that's nearly idle.  Sometimes a process can cause a high cpu usage and
> > cause the load average to climb if the process is holding down resources
> > that other processes are waiting on.
> >
> Huh. Maybe I've generally had well designed systems, but my experience has been that if
> my load average is high, there's too many CPU intensive tasks running, and they're vying
> for the processor.
> I can only think of a handful of instances of some other resource being the bottleneck
> that raised my load average.

I've seen both. Does that make me special or just unlucky?

Michael is right that I/O causes load to climb. If a process is
otherwise runnable but just waiting for a disk read, it will count in
the load average. I had a mail server which didn't appreciate having
more than one active LVM snapshot at a time. After making a second
snapshot, the disk would seize up and load would immediately skyrocket
until the mail daemon saw it hit 300. At that point it started refusing
connections. All that time the CPU was mostly idle.

Matthew is right that CPU-intensive tasks cause load to climb. Each of
those processes is runnable and fighting for CPU, so they count toward
the load average. That's why you don't run SETI at home and Doom III on
your render farm.

The important point is to differentiate the two, since the causes and
resolution are so different. Vmstat is great here. Watch the "us",
"sys", "id" and "wa" columns. Lots of I/O wait time ("wa") indicates the
first scenario. No idle time ("id") and high user ("us") or system
("sys") points to the second.

Resolutions are harder to come by. Sometimes the only answer is better
hardware. The server mentioned above replaced three previous servers and
did a better job because the I/O card was upgraded (U320 vs U160) and
the storage device was much faster (a RAID10 SAN vs. onboard RAID5). The
previous systems were always fighting for load and the only way to "fix"
it was to disconnect clients.


-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://plug.org/pipermail/plug/attachments/20081001/5236245f/attachment.bin 

More information about the PLUG mailing list