System comes to a halt on heavy disk I/O
charlescurley at charlescurley.com
Mon Feb 1 13:04:57 MST 2010
On Mon, 01 Feb 2010 09:51:55 -0700
Kenneth Burgener <kenneth at mail1.ttak.org> wrote:
> On 2/1/2010 9:03 AM, Charles Curley wrote:
> > When I run fairly disk intensive tasks, like copying tens of
> > gigabytes to this machine, it slows to a crawl. Disk I/O slows down
> > by two orders of magnitude.
> Linux tends to use disk cache as much as possible, so until you start
> performing disk operations that fill all of the available RAM for the
> disk cache, things will appear snappier.
> While you are performing your disk operations, try watching 'vmstat
> 2' under the 'wa' (IO wait) to see what percentage of the CPU time is
> being spent waiting for IO. This number should remain as close to
> zero as possible. If the IO queue is so backed up that things aren't
> being handled prompty, then you will quickly notice IO based apps
> will begin to crawl. Adding more RAM usually helps with IO issues,
> as more of the disk can be cached to RAM.
Thanks. That was very helpful.
That number is percent of CPU time spent waiting for I/O. I started my
transfer, and for just under five minutes I saw times from 0 up to as
high as 40, with a return to 0 by the next second or two, and lots of
high idle times.
Then I saw it go to 99 for two seconds, and it stayed above 45
thereafter, and the system went to its knees. The sum of idle and wait
time was 98+ during that time.
Swap is turned off. Adding RAM won't help, as I am running a 32 bit
system and have more RAM in there than it will address (RAM is cheap!).
I am transferring several tens of gigabytes as a test. However, the
transfer is via rsync over 100 megabit Ethernet, so I doubt I'm
swamping the hard drive (normally in the 85 to 92 MB/sec range).
The process that is doing the data transfer is running with "nice -n
> Also check 'smartctl -a /dev/sda' and check to see if the error rate
> is increasing rapidly. If the disk is spending it's time recovering
> from failures, this would decrease the throughput, and also indicates
> that the drive is probably going bad.
Right. I see no failures from smartctl, and no evidence of bad data on
the drive. Nor do I see write errors in the log files.
So far I've been testing by writing to sdb. I think a similar test to
send data to sda is next. If I can't swamp the system with that, that
will suggest that sbd is the problem.
Thanks, that was very useful.
Charles Curley /"\ ASCII Ribbon Campaign
Looking for fine software \ / Respect for open standards
and/or writing? X No HTML/RTF in email
http://www.charlescurley.com / \ No M$ Word docs in email
Key fingerprint = CE5C 6645 A45A 64E4 94C0 809C FFF6 4C48 4ECD DFDB
More information about the PLUG