System comes to a halt on heavy disk I/O

Charles Curley charlescurley at charlescurley.com
Mon Feb 1 13:04:57 MST 2010


On Mon, 01 Feb 2010 09:51:55 -0700
Kenneth Burgener <kenneth at mail1.ttak.org> wrote:

> On 2/1/2010 9:03 AM, Charles Curley wrote:
> > When I run fairly disk intensive tasks, like copying tens of
> > gigabytes to this machine, it slows to a crawl. Disk I/O slows down
> > by two orders of magnitude.
> >    
> 
> 
> Linux tends to use disk cache as much as possible, so until you start 
> performing disk operations that fill all of the available RAM for the 
> disk cache, things will appear snappier.
> 
> While you are performing your disk operations, try watching 'vmstat
> 2' under the 'wa' (IO wait) to see what percentage of the CPU time is
> being spent waiting for IO.  This number should remain as close to
> zero as possible.  If the IO queue is so backed up that things aren't
> being handled prompty, then you will quickly notice IO based apps
> will begin to crawl.  Adding more RAM usually helps with IO issues,
> as more of the disk can be cached to RAM.

Thanks. That was very helpful.

That number is percent of CPU time spent waiting for I/O. I started my
transfer, and for just under five minutes I saw times from 0 up to as
high as 40, with a return to 0 by the next second or two, and lots of
high idle times.

Then I saw it go to 99 for two seconds, and it stayed above 45
thereafter, and the system went to its knees. The sum of idle and wait
time was 98+ during that time.

Swap is turned off. Adding RAM won't help, as I am running a 32 bit
system and have more RAM in there than it will address (RAM is cheap!).
I am transferring several tens of gigabytes as a test. However, the
transfer is via rsync over 100 megabit Ethernet, so I doubt I'm
swamping the hard drive (normally in the 85 to 92 MB/sec range).

The process that is doing the data transfer is running with "nice -n
10".


> 
> Also check 'smartctl -a /dev/sda' and check to see if the error rate
> is increasing rapidly.  If the disk is spending it's time recovering
> from failures, this would decrease the throughput, and also indicates
> that the drive is probably going bad.

Right. I see no failures from smartctl, and no evidence of bad data on
the drive. Nor do I see write errors in the log files.

So far I've been testing by writing to sdb. I think a similar test to
send data to sda is next. If I can't swamp the system with that, that
will suggest that sbd is the problem.

Thanks, that was very useful.

-- 

Charles Curley                  /"\    ASCII Ribbon Campaign
Looking for fine software       \ /    Respect for open standards
and/or writing?                  X     No HTML/RTF in email
http://www.charlescurley.com    / \    No M$ Word docs in email

Key fingerprint = CE5C 6645 A45A 64E4 94C0  809C FFF6 4C48 4ECD DFDB



More information about the PLUG mailing list