Re: Adding Profiling support - GCC 4.1.1
Jim Wilson wrote: Rohit Arul Raj wrote: 1. The function mcount: While building with native gcc, the mcount function is defined in glibc. Is the same mcount function available in newlib? or is it that we have to define it in our back-end as SPARC does (gmon-sol2.c). Did you try looking at newlib? Try something like this find . -type f | xargs grep mcount That will show you all of the mcount support in newlib/libgloss. sparc-solaris is a special case. Early versions of Solaris shipped without the necessary support files. (Maybe it still does? I don't know, and don't care to check.) I think that there were part of the add-on extra-cost compiler. This meant that people using gcc only were not able to use profiling unless gcc provided the mcount library. Otherwise it never would have been put here. mcount belongs in the C library. 2. Is it possible to reuse the existing mcount definition or is it customized for every backend? It must be customized for every backend. 3. Any other existing back-ends that support profiling. Pretty much all targets do, at least ones for operating systems. It is much harder to make mcount work for an embedded target with no file system. If you want to learn how mcount works, just pick any existing target with mcount support, and study it. You might take a look at the profiling support in the GNU tool chain for the Xscale that Intel distributes. There was some support to use GDB to read the required information out of the embedded target even if it didn't have a file system. -Will
Re: how to tweak x86 code generation to instrument certain opcodes with CC trap?
On 10/23/2015 01:37 AM, Yasser Shalabi wrote: > Hello, > > I am new to the GCC code. I want to make a simple modification to the > back end. I want to add a debug exception (int3) to be generated > before any instance of certain x86 instructions. > > I tried to modify gcc/config/i386/i386.md by adding a "int3" to the > define_insn for instructions of interest. But that just caused > configure to fail (cannot run generated C programs). > > Any pointers on how to approach this? Also, suggestions for > alternative approaches are also welcome. > > Thanks! > Hi, Do you need the int3 specifically before those instructions? Or are you just looking to instrument the code and collect some information before those instructions are executed? Some alternative instrumentation tools you might look at to instrument existing code are: dyninst http://www.dyninst.org/ Valgrind http://valgrind.org/ Intel's Pin tool https://software.intel.com/en-us/articles/pin-a-dynamic-binary-instrumentation-tool -Will
Re: how to tweak x86 code generation to instrument certain opcodes with CC trap?
On 10/23/2015 11:37 AM, Yasser Shalabi wrote: > Hey Will, > > Thanks for the quick reply. Yeah I need the int3 instruction to be > statically included in he binary so I can't use any dynamic > instrumentation tool. Dyninst can do binary rewrites of executables so that might still be suitable. http://www.dyninst.org/sites/default/files/downloads/w2009/legendre-binrewriter.pdf -Will > > On Fri, Oct 23, 2015 at 10:32 AM, William Cohen wrote: >> On 10/23/2015 01:37 AM, Yasser Shalabi wrote: >>> Hello, >>> >>> I am new to the GCC code. I want to make a simple modification to the >>> back end. I want to add a debug exception (int3) to be generated >>> before any instance of certain x86 instructions. >>> >>> I tried to modify gcc/config/i386/i386.md by adding a "int3" to the >>> define_insn for instructions of interest. But that just caused >>> configure to fail (cannot run generated C programs). >>> >>> Any pointers on how to approach this? Also, suggestions for >>> alternative approaches are also welcome. >>> >>> Thanks! >>> >> >> Hi, >> >> Do you need the int3 specifically before those instructions? Or are you >> just looking to instrument the code and collect some information before >> those instructions are executed? Some alternative instrumentation tools you >> might look at to instrument existing code are: >> >> dyninst http://www.dyninst.org/ >> Valgrind http://valgrind.org/ >> Intel's Pin tool >> https://software.intel.com/en-us/articles/pin-a-dynamic-binary-instrumentation-tool >> >> -Will
Re: eliminate dead stores across functions
On 03/06/2018 09:28 AM, Richard Biener wrote: > On Tue, Mar 6, 2018 at 1:00 PM, Prathamesh Kulkarni > wrote: >> Hi, >> For the following test-case, >> >> int a; >> >> __attribute__((noinline)) >> static void foo() >> { >> a = 3; >> } >> >> int main() >> { >> a = 4; >> foo (); >> return a; >> } >> >> I assume it's safe to remove "a = 4" since 'a' would be overwritten >> by call to foo ? >> IIUC, ipa-reference pass does mod/ref analysis to compute side-effects >> of function call, >> so could we perhaps use ipa_reference_get_not_written_global() in dse >> pass to check if a global variable will be killed on call to a >> function ? If not, I suppose we could write a similar ipa pass that >> computes the set of killed global variables per function but I am not >> sure if that's the correct approach. > > Do you think the situation happens often enough to make this worthwhile? > > ipa-reference doesn't compute must-def, only may-def and may-use IIRC. > > Richard. > >> Thanks, >> Prathamesh This dead write optimization sounds similar to "DeadSpy: a tool to pinpoint program inefficiencies" by Milind Chabbi and John Mellor-Crummey of Rice University: https://dl.acm.org/citation.cfm?id=2259033 The abstract says there were numerous dead writes in the SPEC 2006 gcc benchmark and eliminating those provided average 15% improvement in performance. -Will
Re: for getting profiling times in millsecond resolution.
jayaraj wrote: Hi, I want to get the profiling data of an application in linux. Now I am using -pg options of gcc for generating the profile data. then used gprof for generating profiles. Here I am getting only in terms of seconds. But I want in millisecond resolution. can anybody help me. Thanks & regards Jayaraj The sampling with the -pg profiling is fairly low resolution, 100 samples a second on linux. This would relate to about 10 milliseconds per sample. The only way that you are going to get estimates for functions in the millisecond if there are multiple calls to the funtion. The accumulated time would be divided equally between the counted function calls. You might make more runs over the same section of code to accumulate more sample and function calls to get a better estimate of the time. If you are just looking for flat profilings with higher resolution, you might look at OProfile. The sampling intervals can be much smaller. However, you need to be careful on some processors because the time for a clock cycle can be changed by power management. If you know what sections of code you are interested in you might use the timestamp register to read timing information and compute clock cycles (time) spent in certain regions of code. Alternatively you might use perfmon or perfctr to access performance counters (assuming that the kernel has appropriate patches in it for these). -Will