High Load average no CPU utilization

adam fisher afisher at circlepix.com
Thu Mar 29 16:11:49 MDT 2007


Okay here we go I think we are making some headway with the nfs,

I followed this knowledge base article and now all the nfs mounts come up wonderfully.

http://tinyurl.com/2g2qmw


When the load average goes to 149 I get a maxclients reached in my error log.  I can browse the web page fine fast and everything until that limit is reached.    So I am thinking it is an apache issue and it is not closing the connections.  


Right now I have this many connections

# netstat -anlp | grep ESTABLISHED | wc -l
28
# netstat -anlp | grep ESTABLISHED | wc -l
27
# netstat -anlp | grep ESTABLISHED | wc -l
24

But then this many processes in D state

# ps aux | grep httpd | grep "D" | wc -l
112

This is where I am thinking something is messed up in my httpd.conf

<IfModule prefork.c>
StartServers       8
MinSpareServers    5
MaxSpareServers    20
MaxClients         150
MaxRequestsPerChild  1000
</IfModule>


<IfModule worker.c>
ServerLimit         16
StartServers         2
MaxClients         150
MinSpareThreads     25
MaxSpareThreads     75
ThreadsPerChild     25
MaxRequestsPerChild  1000
</IfModule>


Running out of ideas,

Adam

----- Steve Alligood <steve at bluehost.com> wrote:
> You are getting into the area of nfs that I usually have to poke
> around 
> and hope to get lucky.
> 
> Try forcing to either nfs v2 or udp on nfs v3. (the nfsvers=2 or udp 
> mount options)
> 
> Also, turn off the atime checking (noatime option), as this will pound
> 
> your nfs mount for every read.
> 
> And you can try async, but I think that may only be for writes.
> 
> -Steve
> 
> adam fisher wrote:
> > So these are the mount statments for nfs
> > 
> > 
> > 10.11.1.91:/data/media           /mnt/media     nfs    
> rsize=8192,wsize=8192,timeo=14,rw,hard,intr  0  0
> > 10.11.1.91:/data/halo            /mnt/web       nfs    
> rsize=8192,wsize=8192,timeo=14,rw,hard,intr  0  0
> > 10.11.1.91:/data/util            /mnt/util      nfs    
> rsize=8192,wsize=8192,timeo=14,rw,hard,intr  0  0
> > 10.11.1.91:/data/www             /www           nfs    
> rsize=8192,wsize=8192,timeo=14,rw,hard,intr  0  0
> > 10.11.1.91:/data/online          /mnt/online    nfs    
> rsize=8192,wsize=8192,timeo=14,rw,hard,intr  0  0
> > 10.11.1.91:/data/library         /mnt/library   nfs    
> rsize=8192,wsize=8192,timeo=14,rw,hard,intr  0  0
> > 
> > 
> > When I restart the box only the last three are mounted.  When I run
> a mount -a all of them mount and everything runs.  I can browse the
> website just fine till the load average gets to be around 70 or so and
> it eventually gets to 149 and then just stays there because the max
> clients is set at 150.
> > 
> > when I run a nfsstat -o net I get this.
> > 
> > Server packet stats:
> > packets    udp        tcp        tcpconn
> > 0          0          0          0
> > 
> > Client packet stats:
> > packets    udp        tcp        tcpconn
> > 0          0          0          0
> > 
> > However, nfsstat does show activity.
> > 
> > Server rpc stats:
> > calls      badcalls   badauth    badclnt    xdrcall
> > 0          0          0          0          0
> > 
> > Client rpc stats:
> > calls      retrans    authrefrsh
> > 40915      0          0
> > 
> > Client nfs v3:
> > null         getattr      setattr      lookup       access      
> readlink
> > 0         0% 37438    91% 0         0% 537       1% 1957      4% 7  
>       0%
> > read         write        create       mkdir        symlink     
> mknod
> > 966       2% 0         0% 0         0% 0         0% 0         0% 0  
>       0%
> > remove       rmdir        rename       link         readdir     
> readdirplus
> > 0         0% 0         0% 0         0% 0         0% 0         0% 0  
>       0%
> > fsstat       fsinfo       pathconf     commit
> > 3         0% 6         0% 0         0% 0         0%
> > 
> > 
> > Any other idea? What am I missing?
> > 
> > thanks,
> > Adam
> > 
> > 
> > ----- Steve Alligood <steve at bluehost.com> wrote:
> >> If the other boxes are working fine with nfs, it probably isn't the
> 
> >> number of nfsd processes running (though you can change that in 
> >> /etc/sysconfig/nfs with the RPCNFSDCOUNT setting, default is 8).
> >>
> >> Again, I would make sure it can actually get cat the files from the
> 
> >> fedora box during the higher load times, make sure the mount isn't
> 
> >> stale, that the network is performing correctly (forced NIC and 
> >> switchport rather than auto, check with netstat -in for interface 
> >> errors), and even make sure to force the nfs mount rather than
> assume
> >>
> >> the defaults (BSD may default to a larger window, etc, etc).
> >>
> >> None of these are certain, but places worth checking.
> >>
> >> -Steve
> >>
> >> adam fisher wrote:
> >>> This is the mount statement for our BSD boxes and the fedora box.
> >>>
> >>> 10.11.1.91:/data/online          /mnt/online    nfs    
> >> rw,port=2049,intr    0    0
> >>> We then have a /online ->/mnt/online
> >>>
> >>> Fedora says the default is v2.
> >>>
> >>> I am not sure what the 0   0 are doing at the end of the mount
> but
> >> they were on the freebsd boxes so I just left them.
> >>> Is there away to make sure that we are allowing enough
> connections
> >> on the NFS server?
> >>> let me know what you see.
> >>>
> >>> thanks,
> >>> Adam
> >>>
> >>>
> >>> ----- Steve Alligood <steve at bluehost.com> wrote:
> >>>> it may be HOW you are mounting it, and how fedora versus BSD
> >> defaults
> >>>> to 
> >>>> mount it.
> >>>>
> >>>> nfs v2 will be really quick, but not as reliable for data writes
> >> (aka,
> >>>> udp)
> >>>>
> >>>> nfs v3 will be more reliable (tcp) but slower
> >>>>
> >>>> nfs v4 will be reliable (tcp) and secure (encrypted) but a lot
> >> slower
> >>>> Fedora may default to v4 while your BSD does v3 or v2.
> >>>>
> >>>>
> >>>> I have some mounts I use nfs v2 because I am not as worried
> about
> >>>> writes 
> >>>> and I need the speed.  I also change the read and write window
> >> sizes,
> >>>> and turn off atime checking:
> >>>>
> >>>> async,soft,noatime,intr,nfsvers=2,rsize=8192,wsize=8192
> >>>>
> >>>> Of course, the server must support the v2 nfs as well (obvious,
> but
> >>>> worth mentioning)
> >>>>
> >>>> -Steve
> >>>>
> >>>> adam fisher wrote:
> >>>>> I appreciate everybody's thoughts on this.
> >>>>>
> >>>>> I agree that the NFS looks to be the bottle neck however we
> have
> >> 5
> >>>> other load balanced web servers that are pulling the web data
> from
> >> our
> >>>> NFS server.  We mount the partition and then created sym links
> to
> >>>> those mounts.  The other 5 web boxes are up and running fine. 
> It
> >> is
> >>>> the sixth alone that is having this issue.
> >>>>> The first 5 are BSD this is a Fedora installation as we want to
> >> get
> >>>> away from BSD.  
> >>>>> Any other ideas?
> >>>>>
> >>>>> thanks,
> >>>>> Adam
> >>>>>
> >>>>>
> >>>>> ----- Ryan Simpkins <plug at ryansimpkins.com> wrote:
> >>>>>> On Wed, March 28, 2007 11:44, adam fisher wrote:
> >>>>>>> apache   17268  0.7  0.6  29552 12868 ?        D    04:27  
> >> 0:04
> >>>>>> /usr/sbin/httpd
> >>>>>>> apache   17456  1.1  0.6  29728 13168 ?        S    04:27  
> >> 0:06
> >>>>>> /usr/sbin/httpd
> >>>>>>> apache   17890  0.5  0.6  29928 12588 ?        D    04:28  
> >> 0:02
> >>>>>> /usr/sbin/httpd
> >>>>>>> apache   17893  0.0  0.5  29032 11548 ?        D    04:28  
> >> 0:00
> >>>>>> /usr/sbin/httpd
> >>>>>>> apache   17895  0.0  0.5  29184 11716 ?        D    04:28  
> >> 0:00
> >>>>>> /usr/sbin/httpd
> >>>>>>> apache   17896  0.0  0.5  28740 11256 ?        D    04:28  
> >> 0:00
> >>>>>> /usr/sbin/httpd
> >>>>>>> apache   17897  0.0  0.5  28912 11452 ?        D    04:28  
> >> 0:00
> >>>>>> /usr/sbin/httpd
> >>>>>>> apache   17904  0.3  0.5  29288 11876 ?        D    04:28  
> >> 0:01
> >>>>>> /usr/sbin/httpd
> >>>>>>> apache   17913  0.5  0.5  29316 11892 ?        D    04:29  
> >> 0:02
> >>>>>> /usr/sbin/httpd
> >>>>>>> apache   17923  0.1  0.5  29364 12052 ?        D    04:29  
> >> 0:00
> >>>>>> /usr/sbin/httpd
> >>>>>>
> >>>>>>> Device:         rrqm/s   wrqm/s   r/s   w/s   rsec/s   wsec/s
> >>>>>> avgrq-sz avgqu-sz
> >>>>>>> await  svctm  %util
> >>>>>>> sda               0.00    11.00  0.00  6.00     0.00   136.00 
> 
> >>>>>> 22.67     0.00
> >>>>>>> 0.50   0.17   0.10
> >>>>>>> The web root is located on an NFS share.  I restarted NFS on
> >> this
> >>>>>> box just to make
> >>>>>>> sure.  When I restart httpd and the load average drops to
> >> around
> >>>> 10
> >>>>>> or 11 I can
> >>>>>>> browse the webpage just fine.  It is when it gets to around
> 150
> >>>> that
> >>>>>> I can't.
> >>>>>> Bingo. Your web root is running over NFS. NFS is pure evil for
> >>>> this
> >>>>>> type of work.
> >>>>>> You may be able to improve performance playing around with the
> >>>> various
> >>>>>> NFS mount
> >>>>>> options.
> >>>>>>
> >>>>>> -Ryan
> >>>>>>
> >>>>>> /*
> >>>>>> PLUG: http://plug.org, #utah on irc.freenode.net
> >>>>>> Unsubscribe: http://plug.org/mailman/options/plug
> >>>>>> Don't fear the penguin.
> >>>>>> */
> >>>
> > 
> >






More information about the PLUG mailing list