Edward Pilatowicz wrote:
On Fri, Jun 11, 2010 at 02:26:50PM -0700, J. J. Farrell wrote:
Can anyone point me at any measurements and/or analysis of the cost
of moving back and forth between user and kernel space - when doing
an ioctl call into a driver, for example. Interested in current
OpenSolaris on x86-64 in particular.
I'm looking at the additional costs of sharing work between modules
in user and kernel space, trying to get some quantitative feel for
how much it costs to hop back and forth. It's obvious we want to
minimise the number of transitions, but I'd like some understanding
of the numbers to see how much effort it's worth putting into
minimising them.
I'm also interested in other context switch costs - thread sleep and
wakeup principally. A much more complex area to analyse, but any
pointers to useful write-ups or measurements would be welcome.
no offence intended, but unless you have some performance data
indicating that your application is spending too much time context
switching it seems to me like you're over-optimizing.
that said, i don't know of any generic no syscall overhead writups lying
around. but you could always measure this yourself using something
like libmicro:
http://hub.opensolaris.org/bin/view/Project+libmicro/
just try benchmarking a super simple system call like getpid().
you should be aware of the fact that different x86 machines use
different system call mechanisms. if you look in /usr/lib/libc/ you'll
see three different version of libc, all which use different syscall
mechanisms. the default mechanism for a system calls is chosen at boot
time by lofs mounting one of the copies of libc above onto
/lib/libc.so.1. so depending on what version of libc your using you'll
get different performance numbers. (and not all syscall mechanisms are
supported by all x86 processors.)
ed
Thanks Ed; no offence taken, but I'm not committing premature
optimization. This is partly just a search for general background
knowledge, and partly an input to design decisions for a complex
application which will occupy the whole system. What I need to know is:
in the event that it's CPU-limited, would the extra effort involved in
reducing u/k transitions and context switches save me a fraction of a
percent of latency on each operation, and allow me to increase overall
throughput by a fraction of a percent, or could it have a more
substantial effect? If it's very low it's unlikely to be worth worrying
about at all. If it could make a more significant difference, it could
well be worth paying attention to up front.
I can certainly measure it myself, and I expect I'll end up doing so
anyway. I'm sure this area has been studied in great depth before, and
by people who have a much better understanding of what they're measuring
and how to measure it than me. If any of that's public, I can almost
certainly learn from the analysis and discussion - and the numbers might
tell me all I need.
Thanks for the libmicro pointer and the information on syscall
mechanisms. I knew these were very processor specific, but I hadn't
realised there was that much variance within the x86-64 implementations.
Regards,
jjf
_______________________________________________
driver-discuss mailing list
[email protected]
http://mail.opensolaris.org/mailman/listinfo/driver-discuss