Linux Process Sampling

Frank Sorenson frank at tuxrocks.com
Sun Aug 22 20:58:54 MDT 2010


  On 08/19/2010 01:56 PM, Lonnie Olson wrote:
> On Thu, Aug 19, 2010 at 12:08 PM, Dave Smith<dave at thesmithfam.org>  wrote:
>> In Mac OS X, I can use the "Activity Monitor" app to sample a process.
>> It shows the call tree at the moment, which is very handy to debug a
>> runaway process. I can find no such feature in Linux, so a co-worker
>> whipped up a script to use gdb to attach to a process repeatedly, dump
>> stack traces, and build a call tree. This works okay, but is very slow.
>> Mac OS X can sample a process in a few seconds. Mine can take a few
>> minutes (to sample it 10 times, for example).
> Unfortunately, until DTrace can be ported to Linux, strace and gdb are
> as close as we can get.
> And DTrace is even less likely to be an option now that Oracle owns
> it.  We're screwed.

Why would we have to depend on DTrace?  I can think of several ways to 
do this without DTrace.  And why would we want DTrace, when we've got 
SystemTap?

First, there's "gstack" (aka pstack), which is delivered (at least for 
me) as part of gdb.  It looks like it works similarly to what Dave 
described, but is likely better performing (dumping stack of a Firefox 
process with 12 threads takes barely 1 second).

Then, there's "fstack", which is a part of frysk.  It uses ptrace to 
connect to each thread, examine its register structures, and analyze the 
stack.  That also shows good performance (the Firefox threads takes 0.4 
seconds).

There are also a number of features of strace that are very useful, 
besides just "strace -p <pid>".  To display the frequency of each system 
call, along with the time spent in those calls, use the "-c" option (and 
perhaps "-f" to monitor child processes as well).  This displays a nice 
summary at the end.  For example:
$ strace -c -p 4068
Process 4068 attached - interrupt to quit
^CProcess 4068 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
  91.86    0.058622           7      8334           poll
   3.89    0.002484           1      2433        12 futex
   3.26    0.002083           0     12712      8658 read
   0.49    0.000314           0      1148           writev
   0.49    0.000312          13        24           munmap
...
------ ----------- ----------- --------- --------- ----------------
100.00    0.063815                 24981      8673 total

Depending on whether the process in question is one you've compiled, you 
might look into various profilers and other debuggers.

There are a number of systemtap functions and pre-existing taps to 
monitor just about any process or kernel function.  Look through the 
examples (http://sourceware.org/systemtap/examples/ or under 
/usr/share/doc/systemtap-*/examples for the more recent examples) to see 
whether some of them might get you the information you're looking for, 
or write your own as necessary (I've written several special-purpose 
systemtaps which have tracked down an otherwise-difficult system problem 
easily).


Frank
-- 

Frank Sorenson - KD7TZK
Linux Systems Engineer, DSS Engineering, UBS AG
frank at tuxrocks.com



More information about the PLUG mailing list