On 01/18/11 22:12, Roman Divacky wrote:
On Tue, Jan 18, 2011 at 09:35:17AM -0800, Steve Kargl wrote:
On Tue, Jan 18, 2011 at 06:16:57PM +0100, Roman Divacky wrote:
On Tue, Jan 18, 2011 at 04:43:13PM +0200, Kostik Belousov wrote:
On Tue, Jan 18, 2011 at 03:32:05PM +0100, Roman Divacky wrote:
On Mon, Jan 17, 2011 at 10:44:11AM -0800, Steve Kargl wrote:
How does one build an executable for profiling with clang?

LLVM (and thus clang) does not support GPROF profiling.

clang -o testf -O2 -march=native -pipe -static -pg -I/usr/local/include -I../mp 
testf.c -L/usr/local/lib -L../mp -lsgk -lmpfr -lgmp -L/usr/home/kargl/work/lib 
-lm_clang_p
clang: warning: the clang compiler does not support '-pg'


If you are really desperate to find the hotspots in your program when compiled with clang, you could call clang with -v to find the call to /bin/ld. Then append _p to the appropriate libs if still needed and replace crt1.o by gcrt1.o while calling ld directly. E.g.

"/usr/bin/ld" -Bstatic -o testcoll /usr/lib/gcrt1.o /usr/lib/crti.o /usr/lib/crtbegin.o testcoll.o angle.o apsis.o error.o minmax.o qags.o qext.o qk21.o sort.o timint.o zero.o vmol.o -lm_p -lgcc -lgcc_eh -lc_p -lgcc -lgcc_eh -t /usr/lib/crtend.o /usr/lib/crtn.o

You will get a profile without the number of calls for the objects compiled with clang, but with the time spent. In my case:

granularity: each sample hit covers 4 byte(s) for 0.00% of 6.41 seconds

  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 30.3       1.94     1.94        0  100.00%           f_timint [2]
 20.2       3.24     1.29        0  100.00%           _mcount [3]
 19.4       4.48     1.24 21900000     0.00     0.00  exp [4]
 13.2       5.32     0.85        0   40.51%           vmol [1]
  7.3       5.79     0.47        0  100.00%           f_angle [5]
  2.8       5.98     0.18  1000000     0.00     0.00  pow [7]
  2.7       6.15     0.17        0   48.70%           qk21 [6]
  2.4       6.30     0.15        0  100.00%           .mcount (51)
  0.5       6.33     0.03        0  100.00%           zero [8]
  0.4       6.35     0.02        0  100.00%           qext [9]
  0.4       6.38     0.02        0  100.00%           qags [10]
...

I suppose it will be pointless to ask, but shouldn't clang
support one of the most basic gcc compiler options if clang
is to replace gcc as the base system compiler?

is GPROF really needed at this point? we have HWPMC, isnt
it sufficient?
Hwpmc requires additional work for each new CPU model. Also,
hwpmc is not supported even on all Intel or AMD CPUs, esp. older
models, and e.g. VIA cores.

Not to mention !x86 architectures.

yes. I agree. HWPMC is not 100% solution.

for those interested in profiling in LLVM in detail:

         http://llvm.org/pubs/2010-04-NeustifterProfiling.html

summary: LLVM supports inserting profiling probes (but the selection
          of places where to put them is very naive) but there's no
          "GPROF writer".

I mailed the author of the thesis yesterday and it looks like his work may
get committed to upstream LLVM.


Thanks for the url and checking on the status of profiling with llvm.

I checked the LLVM code instead and here's what I found:

LLVM actually supports profiling, in its own format (llvmprof.out). This can
only be used for its PGO optimization (BasicBlockPlacement) and is very naive.

Theoretically it should be possible to write "llvmprof.out ->  a.out.gmon"
converter - no idea how feasible it is. I guess it would not be very easy.

I believe it can be sufficiently easy to write a "gprof-like dumper" for
the llvmprof.out files (if there's not one already) that would print
stuff like "foo called X times, bar called Y times". I dont know about
the actual measuring of time. I think it's not in the llvmprof.out.


I have not yet completely read the reference provided, but my impression is that it describes considerably more sophistication than needed to get gprof running with clang (though the thesis looks very interesting!). All gprof needs is statistical profiling as provided by the kernel through profil(2) and addition by the compiler of a call to .mcount (and possibly allocation of a small amount of storage) on entry of each function. gcc (and pcc before it) has done this for more than 20 years, although I must admit that the code generated for the amd64 using -pg is a bit opaque to me (i386 is straightforward, though). The rest of the machinery needed is already there (in lib/libc/gmon and e.g. lib/csu/amd64/crt1.c).

Kind regards,

Hans Ottevanger
_______________________________________________
[email protected] mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-toolchain
To unsubscribe, send any mail to "[email protected]"

Reply via email to