------- Comment #6 from gnb at sgi dot com  2006-07-24 02:23 -------
Ian Lance Taylor says:
> Please send e-mail to [EMAIL PROTECTED] first.  When that process is
> complete, send the patch to [EMAIL PROTECTED]

Thanks, that was the guidance I needed ;-)  The process is underway.

Seongbae Park says:
> It seems that you didn't change libgcov.c,
> which suggests that you didn't address __gcov_{pow2,interval}_profiler.

That's correct, I haven't addressed those, for the simple
reason that I don't use them and so hadn't noticed an issue.

> so it would be very nice if you can fix that as well 
> (this is just a suggestion).

I'm willing to try ;-) but no promises.

> I think there are three choices:
> 
> #1 make above profiler routines to use atomic increment *always*

I decided not to do this for the -fprofile-arcs case because:

a.  Atomic increment is a new feature of gcc and so I assume it's not
    available on all platforms, even all those whose ISA supports it.

b.  Even when available, atomic increment may incur additional overhead
    which isn't necessary in a single-threaded context, e.g. the mf
    (memory fence) instruction on ia64.

I expect the same arguments apply to -fprofile-values.

> #2 introduce a new environment variable to pick atomic/non-atomic increment

The -fprofile-arcs instrumentation would be hard to make behave this
way without incurring additional runtime overhead, so I think this
approach would just provide an opportunity for the two options to
behave inconsistently.

> #3 make atomic increment version of those routines and
> -fprofile-multithread to generate codes to call them.
>
> I prefer #3, [...]

Agreed.

> Another comment is about the name of -fprofile-multithread.
> There's an alternative MT-safe profiling scheme of making counters TLS.

Yes, actually that was my first approach.  However I couldn't figure
out a way to make it work in the kernel context (need to gather coverage
for dynamically loaded kernel modules), safe (need to aggregate counters
from multiple threads/cpus atomically while the code may be running),
and efficient (avoid one or two indirections).  It would clearly be a
better approach if it could be made to work, because it avoid the problem
of bouncing counter cachelines in large multiprocessor machines such as
SGI makes.  Using the atomic increment builtins is a lesser but easier
solution.

> So I'd prefer if the option for atomic increment is more explicit, 
> something like -fprofile-atomic-increment.

Fair enough, I'll make that change.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=28441

Reply via email to