Re: [RFH] - Less than optimal code compiling 252.eon -O2 for x86
On Jun 24, 2005, at 3:16 PM, Andrew Pinski wrote: I wonder why combine can do the simplification though which is why still produce good code for the simple testcase: void f1(double *d,float *f2) { *f2 = 0.0; *d = 0.0; } It is hard to reproduce the simple test case, exhibiting the same problem (-O1 producing better code than -O2). Yes, small test cases move the desired simplification to other phases. - fariborz Thanks, Andrew Pinski
Re: [RFH] - Less than optimal code compiling 252.eon -O2 for x86
On Jun 24, 2005, at 5:06 PM, Steven Bosscher wrote: On Saturday 25 June 2005 01:48, fjahanian wrote: On Jun 24, 2005, at 3:16 PM, Andrew Pinski wrote: I wonder why combine can do the simplification though which is why still produce good code for the simple testcase: void f1(double *d,float *f2) { *f2 = 0.0; *d = 0.0; } It is hard to reproduce the simple test case, exhibiting the same problem (-O1 producing better code than -O2). Yes, small test cases move the desired simplification to other phases. It often helps if you know what function your poorer code is in. You could e.g. try to make the .optimized dump of that function compilable and see if the problem shows up there again. Then work your way down to something small. Yes, I am planning to do this. My first question was though if the RTL generated by -O2, which does not get simplified, is correct and should be optimized in one of the rtl optimizers. If not, then focus shifts to tree optimizers. - Thanks ,fariborz Gr. Steven
Re: [RFH] - Less than optimal code compiling 252.eon -O2 for x86
On Jun 24, 2005, at 5:20 PM, fjahanian wrote: On Jun 24, 2005, at 5:06 PM, Steven Bosscher wrote: On Saturday 25 June 2005 01:48, fjahanian wrote: On Jun 24, 2005, at 3:16 PM, Andrew Pinski wrote: I wonder why combine can do the simplification though which is why still produce good code for the simple testcase: void f1(double *d,float *f2) { *f2 = 0.0; *d = 0.0; } It is hard to reproduce the simple test case, exhibiting the same problem (-O1 producing better code than -O2). Yes, small test cases move the desired simplification to other phases. It often helps if you know what function your poorer code is in. You could e.g. try to make the .optimized dump of that function compilable and see if the problem shows up there again. Then work your way down to something small. Yes, I am planning to do this. My first question was though if the RTL generated by -O2, which does not get simplified, is correct and should be optimized in one of the rtl optimizers. If not, then focus shifts to tree optimizers. This email went through late and superseded by earlier exchanges, It turned out to be all RTL related issues. - faribrz - Thanks ,fariborz Gr. Steven
Re: [RFH] - Less than optimal code compiling 252.eon -O2 for x86
On Jun 27, 2005, at 2:50 PM, Fariborz Jahanian wrote: On Jun 27, 2005, at 12:56 PM, Richard Henderson wrote: Hmm. I would suspect this is obsolete now. We'll have forced everything into "registers" (or something equivalent that we can work with) during tree optimization. Any CSEs that can be made should have been made. I will do sanity check followed by SPEC runs (x86 and ppc darwin) and see if behavior changes by obsoleting -fforce-mem in -O2 (or higher). Bootstrapped and dejagnu tested on apple-x86-darwin and apple-ppc- darwin. We also observed that on ppc, SPEC did not show any performance change either way. On apple-x86-darwin 252.eon improved by 7% as expected, with no noticeable change in other benchmarks. One caveat to all these is that this may expose optimization bugs which were previously hidden by inclusion of -fforce-mem. OK for check-in? - fariborz ChangeLog: 2005-06-30 Fariborz Jahanian <[EMAIL PROTECTED]> * opts.c (decode_options): Don't set -fforce-mem with -O2 and more. Index: opts.c === RCS file: /cvs/gcc/gcc/gcc/opts.c,v retrieving revision 1.114 diff -c -p -r1.114 opts.c *** opts.c 24 Jun 2005 03:09:45 - 1.114 --- opts.c 30 Jun 2005 15:55:15 - *** decode_options (unsigned int argc, const *** 559,565 flag_rerun_cse_after_loop = 1; flag_rerun_loop_opt = 1; flag_caller_saves = 1; - flag_force_mem = 1; flag_peephole2 = 1; #ifdef INSN_SCHEDULING flag_schedule_insns = 1; --- 559,564 - Thanks, fariborz r~