High Load average no CPU utilization
adam fisher
afisher at circlepix.com
Thu Mar 29 16:11:49 MDT 2007
Okay here we go I think we are making some headway with the nfs,
I followed this knowledge base article and now all the nfs mounts come up wonderfully.
http://tinyurl.com/2g2qmw
When the load average goes to 149 I get a maxclients reached in my error log. I can browse the web page fine fast and everything until that limit is reached. So I am thinking it is an apache issue and it is not closing the connections.
Right now I have this many connections
# netstat -anlp | grep ESTABLISHED | wc -l
28
# netstat -anlp | grep ESTABLISHED | wc -l
27
# netstat -anlp | grep ESTABLISHED | wc -l
24
But then this many processes in D state
# ps aux | grep httpd | grep "D" | wc -l
112
This is where I am thinking something is messed up in my httpd.conf
<IfModule prefork.c>
StartServers 8
MinSpareServers 5
MaxSpareServers 20
MaxClients 150
MaxRequestsPerChild 1000
</IfModule>
<IfModule worker.c>
ServerLimit 16
StartServers 2
MaxClients 150
MinSpareThreads 25
MaxSpareThreads 75
ThreadsPerChild 25
MaxRequestsPerChild 1000
</IfModule>
Running out of ideas,
Adam
----- Steve Alligood <steve at bluehost.com> wrote:
> You are getting into the area of nfs that I usually have to poke
> around
> and hope to get lucky.
>
> Try forcing to either nfs v2 or udp on nfs v3. (the nfsvers=2 or udp
> mount options)
>
> Also, turn off the atime checking (noatime option), as this will pound
>
> your nfs mount for every read.
>
> And you can try async, but I think that may only be for writes.
>
> -Steve
>
> adam fisher wrote:
> > So these are the mount statments for nfs
> >
> >
> > 10.11.1.91:/data/media /mnt/media nfs
> rsize=8192,wsize=8192,timeo=14,rw,hard,intr 0 0
> > 10.11.1.91:/data/halo /mnt/web nfs
> rsize=8192,wsize=8192,timeo=14,rw,hard,intr 0 0
> > 10.11.1.91:/data/util /mnt/util nfs
> rsize=8192,wsize=8192,timeo=14,rw,hard,intr 0 0
> > 10.11.1.91:/data/www /www nfs
> rsize=8192,wsize=8192,timeo=14,rw,hard,intr 0 0
> > 10.11.1.91:/data/online /mnt/online nfs
> rsize=8192,wsize=8192,timeo=14,rw,hard,intr 0 0
> > 10.11.1.91:/data/library /mnt/library nfs
> rsize=8192,wsize=8192,timeo=14,rw,hard,intr 0 0
> >
> >
> > When I restart the box only the last three are mounted. When I run
> a mount -a all of them mount and everything runs. I can browse the
> website just fine till the load average gets to be around 70 or so and
> it eventually gets to 149 and then just stays there because the max
> clients is set at 150.
> >
> > when I run a nfsstat -o net I get this.
> >
> > Server packet stats:
> > packets udp tcp tcpconn
> > 0 0 0 0
> >
> > Client packet stats:
> > packets udp tcp tcpconn
> > 0 0 0 0
> >
> > However, nfsstat does show activity.
> >
> > Server rpc stats:
> > calls badcalls badauth badclnt xdrcall
> > 0 0 0 0 0
> >
> > Client rpc stats:
> > calls retrans authrefrsh
> > 40915 0 0
> >
> > Client nfs v3:
> > null getattr setattr lookup access
> readlink
> > 0 0% 37438 91% 0 0% 537 1% 1957 4% 7
> 0%
> > read write create mkdir symlink
> mknod
> > 966 2% 0 0% 0 0% 0 0% 0 0% 0
> 0%
> > remove rmdir rename link readdir
> readdirplus
> > 0 0% 0 0% 0 0% 0 0% 0 0% 0
> 0%
> > fsstat fsinfo pathconf commit
> > 3 0% 6 0% 0 0% 0 0%
> >
> >
> > Any other idea? What am I missing?
> >
> > thanks,
> > Adam
> >
> >
> > ----- Steve Alligood <steve at bluehost.com> wrote:
> >> If the other boxes are working fine with nfs, it probably isn't the
>
> >> number of nfsd processes running (though you can change that in
> >> /etc/sysconfig/nfs with the RPCNFSDCOUNT setting, default is 8).
> >>
> >> Again, I would make sure it can actually get cat the files from the
>
> >> fedora box during the higher load times, make sure the mount isn't
>
> >> stale, that the network is performing correctly (forced NIC and
> >> switchport rather than auto, check with netstat -in for interface
> >> errors), and even make sure to force the nfs mount rather than
> assume
> >>
> >> the defaults (BSD may default to a larger window, etc, etc).
> >>
> >> None of these are certain, but places worth checking.
> >>
> >> -Steve
> >>
> >> adam fisher wrote:
> >>> This is the mount statement for our BSD boxes and the fedora box.
> >>>
> >>> 10.11.1.91:/data/online /mnt/online nfs
> >> rw,port=2049,intr 0 0
> >>> We then have a /online ->/mnt/online
> >>>
> >>> Fedora says the default is v2.
> >>>
> >>> I am not sure what the 0 0 are doing at the end of the mount
> but
> >> they were on the freebsd boxes so I just left them.
> >>> Is there away to make sure that we are allowing enough
> connections
> >> on the NFS server?
> >>> let me know what you see.
> >>>
> >>> thanks,
> >>> Adam
> >>>
> >>>
> >>> ----- Steve Alligood <steve at bluehost.com> wrote:
> >>>> it may be HOW you are mounting it, and how fedora versus BSD
> >> defaults
> >>>> to
> >>>> mount it.
> >>>>
> >>>> nfs v2 will be really quick, but not as reliable for data writes
> >> (aka,
> >>>> udp)
> >>>>
> >>>> nfs v3 will be more reliable (tcp) but slower
> >>>>
> >>>> nfs v4 will be reliable (tcp) and secure (encrypted) but a lot
> >> slower
> >>>> Fedora may default to v4 while your BSD does v3 or v2.
> >>>>
> >>>>
> >>>> I have some mounts I use nfs v2 because I am not as worried
> about
> >>>> writes
> >>>> and I need the speed. I also change the read and write window
> >> sizes,
> >>>> and turn off atime checking:
> >>>>
> >>>> async,soft,noatime,intr,nfsvers=2,rsize=8192,wsize=8192
> >>>>
> >>>> Of course, the server must support the v2 nfs as well (obvious,
> but
> >>>> worth mentioning)
> >>>>
> >>>> -Steve
> >>>>
> >>>> adam fisher wrote:
> >>>>> I appreciate everybody's thoughts on this.
> >>>>>
> >>>>> I agree that the NFS looks to be the bottle neck however we
> have
> >> 5
> >>>> other load balanced web servers that are pulling the web data
> from
> >> our
> >>>> NFS server. We mount the partition and then created sym links
> to
> >>>> those mounts. The other 5 web boxes are up and running fine.
> It
> >> is
> >>>> the sixth alone that is having this issue.
> >>>>> The first 5 are BSD this is a Fedora installation as we want to
> >> get
> >>>> away from BSD.
> >>>>> Any other ideas?
> >>>>>
> >>>>> thanks,
> >>>>> Adam
> >>>>>
> >>>>>
> >>>>> ----- Ryan Simpkins <plug at ryansimpkins.com> wrote:
> >>>>>> On Wed, March 28, 2007 11:44, adam fisher wrote:
> >>>>>>> apache 17268 0.7 0.6 29552 12868 ? D 04:27
> >> 0:04
> >>>>>> /usr/sbin/httpd
> >>>>>>> apache 17456 1.1 0.6 29728 13168 ? S 04:27
> >> 0:06
> >>>>>> /usr/sbin/httpd
> >>>>>>> apache 17890 0.5 0.6 29928 12588 ? D 04:28
> >> 0:02
> >>>>>> /usr/sbin/httpd
> >>>>>>> apache 17893 0.0 0.5 29032 11548 ? D 04:28
> >> 0:00
> >>>>>> /usr/sbin/httpd
> >>>>>>> apache 17895 0.0 0.5 29184 11716 ? D 04:28
> >> 0:00
> >>>>>> /usr/sbin/httpd
> >>>>>>> apache 17896 0.0 0.5 28740 11256 ? D 04:28
> >> 0:00
> >>>>>> /usr/sbin/httpd
> >>>>>>> apache 17897 0.0 0.5 28912 11452 ? D 04:28
> >> 0:00
> >>>>>> /usr/sbin/httpd
> >>>>>>> apache 17904 0.3 0.5 29288 11876 ? D 04:28
> >> 0:01
> >>>>>> /usr/sbin/httpd
> >>>>>>> apache 17913 0.5 0.5 29316 11892 ? D 04:29
> >> 0:02
> >>>>>> /usr/sbin/httpd
> >>>>>>> apache 17923 0.1 0.5 29364 12052 ? D 04:29
> >> 0:00
> >>>>>> /usr/sbin/httpd
> >>>>>>
> >>>>>>> Device: rrqm/s wrqm/s r/s w/s rsec/s wsec/s
> >>>>>> avgrq-sz avgqu-sz
> >>>>>>> await svctm %util
> >>>>>>> sda 0.00 11.00 0.00 6.00 0.00 136.00
>
> >>>>>> 22.67 0.00
> >>>>>>> 0.50 0.17 0.10
> >>>>>>> The web root is located on an NFS share. I restarted NFS on
> >> this
> >>>>>> box just to make
> >>>>>>> sure. When I restart httpd and the load average drops to
> >> around
> >>>> 10
> >>>>>> or 11 I can
> >>>>>>> browse the webpage just fine. It is when it gets to around
> 150
> >>>> that
> >>>>>> I can't.
> >>>>>> Bingo. Your web root is running over NFS. NFS is pure evil for
> >>>> this
> >>>>>> type of work.
> >>>>>> You may be able to improve performance playing around with the
> >>>> various
> >>>>>> NFS mount
> >>>>>> options.
> >>>>>>
> >>>>>> -Ryan
> >>>>>>
> >>>>>> /*
> >>>>>> PLUG: http://plug.org, #utah on irc.freenode.net
> >>>>>> Unsubscribe: http://plug.org/mailman/options/plug
> >>>>>> Don't fear the penguin.
> >>>>>> */
> >>>
> >
> >
More information about the PLUG
mailing list