Re: Process timing issue

Jerome Flesch Thu, 24 Feb 2011 06:22:41 -0800

Thanks for your explanations. It helped greatly. Using ktrdump andschedgraph.py and after modifying our test program to set and unsetautomatically debug.ktr.mask, I've been able to get useful information.

First, It made me realize that task switching, with default settings and2 active processes, only occurs each 100ms. Knowing that, expecting alatency time around 100ms was kind of silly :)

Next, it seems most of the latency pikes are due to a process startingor waking up. For instance, it usually happens when the openssl speedtest is started (http://jflesch.kwain.net/~jflesch/sys_latence/sched/sched_graph_openssl_start.png) or when pagedaemon wakes up (I forgot to disable the swap and my testprogram used too much memory to store the result values ...). I'm notsure why, but when we start openssl, it is often allowed to run for >=300ms, even with our test program set to real time priority. Myintuition is that, at first, it's considered as an interactive process,until the scheduler realizes it's not. But then, does anyone know why itwould take more than 300ms for the scheduler to realize that ?

Anyway, by setting kern.sched.interact=5 (so openssl isn't considered asan interactive process), kern.sched.slice=3 (to get an high enoughscheduling resolution), and our program to real-time priority, we gotrid of both problems. I'm just a little bit worried aboutkern.sched.slice=3. Is there any known side effect when reducing slicessize ?

Also, another issue remain: We were hoping to keep our program with anormal priority. However when we set our test program to a normalpriority (but still an higher priority than openssl), both get 50% ofthe CPU (I guess this is to be expected), and from time to time we havea "hiccup" in the scheduling:http://jflesch.kwain.net/~jflesch/sys_latence/sched/sched_graph_hicups.png. Is there any way to avoid them ? In other words, is it possible tomake sure that the low priority process never gets more CPU time thanthe high priority one ?




On 23.02.2011 02:08, Ryan Stone wrote:

To debug weird scheduling issues I find it helpful to start by looking
at a schedgraph.  schedgraph is a tool that can display a graphical
representation of what the scheduler was doing over a small slice of
time.  The one downside is that you have to recompile your kernel to
get the hooks that collect the necessary data, but it sounds like your
problem is pretty easy to reproduce and so that shouldn't be a big
problem.

To enable the hooks, put the following in your kernel conf:

options KTR
options KTR_COMPILE=KTR_SCHED
options KTR_MASK=KTR_SCHED
options KTR_ENTIRES=(128*1024)

Then rebuild and install the new kernel.  Next, run your test.  The
instant that your test has detected the long delay, set the sysctl
debug.ktr.mask to 0.  The scheduler hooks record data into a ring
buffer, so the interesting data can be flushed out within seconds.
Clearing that sysctl will stop any further events from being logged,
which means that the interesting data will stay there until you go and
collect it.

You can get the raw data by redirecting the output of "ktrdump -ct"
into a file.  Then from any machine with a graphical interface(FreeBSD
with X installed, Windows, Mac, whatever) and python installed, run:
$ python schedgraph.py /path/to/ktrdump/output

You can get schedgraph.py from /usr/src/tools/sched.

If you want to collect the data again, set the sysctl debug.ktr.mask
back to 0x20000000 to restart gathering scheduler data.

Ryan

Re: Process timing issue

Reply via email to