Re: Is it Okay for GCC to do the following simplifications with "-ffast-math" flag
On Wed, May 6, 2015 at 2:56 AM, wrote: > On 05/05/2015 08:27 AM, Renlin Li wrote: >> >> Hi all, >> >> For the following illustrative code, >> >> double f1(int x) { return (double)(float)x; } --> return (double)x; >> int f2(double x) { return (int)(float)x; } --> return (int)x; >> >> Is it Okay for the compiler to do the simplifications shown above with >> fast-match enabled? > > > Such a transformation would yield different results > for integers that are exactly representable in double > but not in float. For example, the smallest positive > integer with such a property in IEEE 754, 16,777,217, > converts to 16,777,216 in float. I'm not a math expert > but such a result would seem unexpected even with > -ffast-math. Yeah, such changes would be not welcome with -ffast-math. Richard. > Martin > >> >> Regards, >> Renlin Li >> >
Re: Merging debug-early work?
On Wed, May 6, 2015 at 12:33 AM, Aldy Hernandez wrote: > Gentlemen! > > I believe I have done as much as is reasonable for a merge, but I'd like to > get your opinion before I post a huge patch to the list. > > The branch bootstraps with one regression in GCC > (gcc.dg/debug/dwarf2/stacked-qualified-types-3.c) and none for GDB. On which triplets? > The GCC regression is a missed optimization while merging the common > denominator of a set of qualifiers for a type within a DIE. For example, if > two types share "const volatile" (say "const volatile int" and "const > volatile char"), dwarf2out outputs things in the most efficient manner as to > share the maximum common type DIEs. This is not working in the branch as > TYPE_MAIN_VARIANTs are not complete by the time early dwarf is run. If it > is possible, I'd like to work on this one regression post-merge. Not a big > deal if you disagree, but I'd prefer to postpone on this non crucial bit. > > A few caveats... > > Richi wants to play around with free-lang-data in the non LTO path. I > haven't not done so, and it's left as an exercise to the reader :). Yeah - I'd also like the early/late paths in dwarf2out.c to be refactored to completely different functions (that is, not have a single function creating and/or annotating DIEs early and late but two - with the late one only doing the annotation work and only annotating with stuff we expect). The branch already has accumulated quite some checks like "if DIE was created early..." and with the LTO prototype work I saw I'd only need to add (very?) much more of those. > Shortly after the merge I'll work on a pass to prune unused decl DIEs as > we're presently creating more DIEs than mainline. This was expected, and if > I understood Jason correctly, it is ok to work on this post-merge. However, > even without such a pass, the .debug_info size difference is reasonable: > > gcc/* (except testsuite): > Total .debug_info size for [debug-early]: 91081591.00 > Total .debug_info size for [mainline]: 84777565.00 > Total change: 7.44% > > libstdc++-v3/* (except testsuite): > Total .debug_info size for [debug-early]: 5173014.00 > Total .debug_info size for [mainline]: 5044971.00 > Total change: 2.54% > > x86_64-unknown-linux-gnu/* > Total .debug_info size for [debug-early]: 5893131.00 > Total .debug_info size for [mainline]: 5694176.00 > Total change: 3.49% > > The above stats are for "size -A | grep debug_info...". > > Within gcc there were a handful of files that were significantly bigger > (twice as much), and at least the 3-4 I investigated were all due to extra > unused DIEs that will be handled by a DECL DIE optimization pass. > Specifically, there are cases where external variables have their DIEs > generated, because we cannot look at TREE_USED within early dwarf. Stuff > like this will get a debug info (which is not terribly bad IMO): > > struct somestruct { int somefield; }; > extern struct somestruct *sometable; > > The other common scenario is the ICF pass which will mark hunks as > undebuggable late in the compilation process (by setting DECL_IGNORED_P)-- > actually any pass calling expand_hunk(). This happens for something like > c-family/stub-objc.c which have multiple identical stubs and get folded into > one function. > > So...all in all, the .debug_info increase is within what was expected when > we started this project (3-7%). Actually, I'm pleasantly surprised it's not > 10-15%. I expect to get this down significantly in short time. > > Thoughts on moving forward? Is the stacked qualifier regression a show > stopper? Is the .debug_info size regression acceptable? I think both are acceptable if they are fixed in a reasonable time frame (before stage1 ends). So I suggest to go forward with merging and send a nice patch-set. > And of course... I'm not going anywhere. Unfortunately, I'm not even going > on vacation... so I'm here to fix the fallout ;-). Good to know. Thanks, Richard. > Aldy
Question
Good day sir, how do I get tutorial of GNU make?thanks for you answer. Sent from my BlackBerry wireless device from MTN
Re: Is it Okay for GCC to do the following simplifications with "-ffast-math" flag
On 05/06/2015 05:11 AM, Richard Biener wrote: On Wed, May 6, 2015 at 2:56 AM, wrote: On 05/05/2015 08:27 AM, Renlin Li wrote: Hi all, For the following illustrative code, double f1(int x) { return (double)(float)x; } --> return (double)x; int f2(double x) { return (int)(float)x; } --> return (int)x; Is it Okay for the compiler to do the simplifications shown above with fast-match enabled? Such a transformation would yield different results for integers that are exactly representable in double but not in float. For example, the smallest positive integer with such a property in IEEE 754, 16,777,217, converts to 16,777,216 in float. I'm not a math expert but such a result would seem unexpected even with -ffast-math. Yeah, such changes would be not welcome with -ffast-math. Agreed. jeff
Re: [i386] Scalar DImode instructions on XMM registers
2015-04-25 4:32 GMT+03:00 Jan Hubicka : > Hi, > I am adding Vladimir and Richard into CC. I tried to solve similar problem > with FP math years ago by having -mfpmath=sse,i387. The idea was to allow > use of i387 registers when SSE ones run out and possibly also model the fact > that Pentium4 had faster i387 additions than SSE additions. I also had some > plans to extend this one mixed SSE/MMX/GPR integer arithmetics, but never > got to that. > > This did not really fly becuase of the regalloc not really being able to > understnad it (I made path to regclass to propagate the classes and figure out > what operations needs to stay in i387 and what in SSE to avoid reloading, but > that never got in). > > I believe Vladimir did some work on this with IRA (he is able to spill GPR > regs into SSE and do bit of other tricks). > > Also I believe it was kind of Richard's design deicsion to avoid use of > (paradoxical) subregs for vector conversions because these have funny > implications. > > The code for handling upper parts of paradoxical subregs is controlled by > macros around SUBREG_PROMOTED_VAR_P but I do not think it will handle > V1DI->V2DI conversions fluently without some middle-end hacking. (it will > probably try to produce zero extensions) > > When we are on SSE instructions, it would be great to finally teach > copy_by_pieces/store_by_pieces to use vector instructions (these are more > compact and either equaly fast or faster on some CPUs). I hope to get into > this, but it would be great if someone beat me. > > Honza > I'm trying to implement it as separate RTL pass which chooses a scalar/vector mode for each 64bit computation chain and performs transformation if we choose to use vectors. I also want to split DI instructions which are going to be implemented on GPRs before RA (currently it is done on the second split). Good metrics for such transformation is a big question but currently I can't even make it generate correct code when paradoxical subregs are used. It works in simple cases but I get troubles when spills appear. Trying to beat the following testcase: test (long long *arr) { register unsigned long long tmp; tmp = arr[0] | arr[1] & arr[2]; while (tmp) { counter (tmp); tmp = *(arr++) & tmp; } } RTL I generate seems OK to me (ignoring the fact that it is not optimal): (insn 6 3 50 2 (set (reg:DI 98 [ MEM[(long long int *)arr_5(D) + 8B] ]) (mem:DI (plus:SI (reg/v/f:SI 96 [ arr ]) (const_int 8 [0x8])) [2 MEM[(long long int *)arr_5(D) + 8B]+0 S8 A64])) pr65105-1.c:22 89 {*movdi_internal} (nil)) (insn 50 6 7 2 (set (reg:DI 104) (mem:DI (plus:SI (reg/v/f:SI 96 [ arr ]) (const_int 16 [0x10])) [2 MEM[(long long int *)arr_5(D) + 16B]+0 S8 A64])) pr65105-1.c:22 -1 (nil)) (insn 7 50 51 2 (set (subreg:V2DI (reg:DI 97 [ D.2586 ]) 0) (and:V2DI (subreg:V2DI (reg:DI 98 [ MEM[(long long int *)arr_5(D) + 8B] ]) 0) (subreg:V2DI (reg:DI 104) 0))) pr65105-1.c:22 3487 {*andv2di3} (expr_list:REG_DEAD (subreg:V2DI (reg:DI 98 [ MEM[(long long int *)arr_5(D) + 8B] ]) 0) (expr_list:REG_UNUSED (reg:CC 17 flags) (expr_list:REG_EQUAL (and:DI (mem:DI (plus:SI (reg/v/f:SI 96 [ arr ]) (const_int 8 [0x8])) [2 MEM[(long long int *)arr_5(D) + 8B]+0 S8 A64]) (mem:DI (plus:SI (reg/v/f:SI 96 [ arr ]) (const_int 16 [0x10])) [2 MEM[(long long int *)arr_5(D) + 16B]+0 S8 A64])) (nil) (insn 51 7 8 2 (set (reg:DI 105) (mem:DI (reg/v/f:SI 96 [ arr ]) [2 *arr_5(D)+0 S8 A64])) pr65105-1.c:22 -1 (nil)) (insn 8 51 46 2 (set (subreg:V2DI (reg/v:DI 87 [ tmp ]) 0) (ior:V2DI (subreg:V2DI (reg:DI 97 [ D.2586 ]) 0) (subreg:V2DI (reg:DI 105) 0))) pr65105-1.c:22 3489 {*iorv2di3} (expr_list:REG_DEAD (subreg:V2DI (reg:DI 97 [ D.2586 ]) 0) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil (insn 46 8 47 2 (set (reg:V2DI 103) (subreg:V2DI (reg/v:DI 87 [ tmp ]) 0)) pr65105-1.c:22 -1 (nil)) (insn 47 46 48 2 (set (subreg:SI (reg:DI 101) 0) (subreg:SI (reg:V2DI 103) 0)) pr65105-1.c:22 -1 (nil)) (insn 48 47 49 2 (set (reg:V2DI 103) (lshiftrt:V2DI (reg:V2DI 103) (const_int 32 [0x20]))) pr65105-1.c:22 -1 (nil)) (insn 49 48 9 2 (set (subreg:SI (reg:DI 101) 4) (subreg:SI (reg:V2DI 103) 0)) pr65105-1.c:22 -1 (nil)) (note 9 49 10 2 NOTE_INSN_DELETED) (insn 10 9 11 2 (parallel [ (set (reg:CCZ 17 flags) (compare:CCZ (ior:SI (subreg:SI (reg:DI 101) 4) (subreg:SI (reg:DI 101) 0)) (const_int 0 [0]))) (clobber (scratch:SI)) ]) pr65105-1.c:23 447 {*iorsi_3} (nil)) (jump_insn 11 10 37 2 (set (pc) (if_then_else (ne (reg:CCZ 17 flags) (const_int 0 [0])) (label_ref:SI 37) (pc))) pr65105
Re: Is it Okay for GCC to do the following simplifications with "-ffast-math" flag
Hi, On Wed, 6 May 2015, Richard Biener wrote: > >> double f1(int x) { return (double)(float)x; } --> return (double)x; > >> int f2(double x) { return (int)(float)x; } --> return (int)x; > >> > >> Is it Okay for the compiler to do the simplifications shown above with > >> fast-match enabled? > > > > > > Such a transformation would yield different results > > for integers that are exactly representable in double > > but not in float. For example, the smallest positive > > integer with such a property in IEEE 754, 16,777,217, > > converts to 16,777,216 in float. I'm not a math expert > > but such a result would seem unexpected even with > > -ffast-math. > > Yeah, such changes would be not welcome with -ffast-math. It's just a normal 1ulp round-off error and these are quite acceptable under fast-math. It just so happens to look large because of the base value, and it affects rounded integers. I don't see how _that_ can be used as reason to reject it from fast-math (we'd have to reject pretty much all transformation of fast-math then). Also the above transformations are strictly _increasing_ precision, so programs relying on fantansy values before should equally be fine with more precise fantasy values. More useful reasons for rejections are: breaks program such-and-such (benchmarks), or "no known meaningful performance improvements" (only microbenchs for instance). Ciao, Michael.
Re: Is it Okay for GCC to do the following simplifications with "-ffast-math" flag
On May 6, 2015 5:56:10 PM GMT+02:00, Michael Matz wrote: >Hi, > >On Wed, 6 May 2015, Richard Biener wrote: > >> >> double f1(int x) { return (double)(float)x; } --> return >(double)x; >> >> int f2(double x) { return (int)(float)x; } --> return (int)x; >> >> >> >> Is it Okay for the compiler to do the simplifications shown above >with >> >> fast-match enabled? >> > >> > >> > Such a transformation would yield different results >> > for integers that are exactly representable in double >> > but not in float. For example, the smallest positive >> > integer with such a property in IEEE 754, 16,777,217, >> > converts to 16,777,216 in float. I'm not a math expert >> > but such a result would seem unexpected even with >> > -ffast-math. >> >> Yeah, such changes would be not welcome with -ffast-math. > >It's just a normal 1ulp round-off error and these are quite acceptable >under fast-math. 1ulp? In the double precision result it's more than that. It's one ulp for the int to float conversion. It just so happens to look large because of the base >value, and it affects rounded integers. I don't see how _that_ can be >used as reason to reject it from fast-math (we'd have to reject pretty >much all transformation of fast-math then). Also the above >transformations are strictly _increasing_ precision, so programs >relying >on fantansy values before should equally be fine with more precise >fantasy values. Yes, if we think in infinite precision math (maybe that's a good way to document unsafe-math opts, that they can violate IEEE by inter preting code as written with infinite precision math). >More useful reasons for rejections are: breaks program such-and-such >(benchmarks), or "no known meaningful performance improvements" (only >microbenchs for instance). Sure. Richard. > >Ciao, >Michael.
ANN: gcc-python-plugin 0.14
gcc-python-plugin is a plugin for GCC 4.6 onwards which embeds the CPython interpreter within GCC, allowing you to write new compiler warnings in Python, generate code visualizations, etc. It ships with "gcc-with-cpychecker", which implements static analysis passes for GCC aimed at finding bugs in CPython extensions. In particular, it can automatically detect reference-counting errors: http://gcc-python-plugin.readthedocs.org/en/latest/cpychecker.html This release adds support for GCC 5. Tarball releases are available at: https://fedorahosted.org/releases/g/c/gcc-python-plugin/ Prebuilt-documentation can be seen at: http://gcc-python-plugin.readthedocs.org/en/latest/index.html The project's homepage is: https://fedorahosted.org/gcc-python-plugin/ The plugin and checker are Free Software, licensed under the GPLv3 or later. Enjoy! Dave Malcolm
gcc-4.9-20150506 is now available
Snapshot gcc-4.9-20150506 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/4.9-20150506/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 4.9 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_9-branch revision 222864 You'll find: gcc-4.9-20150506.tar.bz2 Complete GCC MD5=cb3e6b08d4f266cf322720b42c34f674 SHA1=4f0a4d804e83c00655b837a01f90b70d53289571 Diffs from 4.9-20150429 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-4.9 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
Re: Compiler warnings while compiling gcc with clang‏
Am Tue, 5 May 2015 21:37:10 -0700 Andrew Pinski : > On Tue, May 5, 2015@9:00 PM, Aditya K wrote: > >>> > >>> gcc/rtlanal.c:5573:23: warning: array index 1 is past the end of the > >>> array (which contains 1 element) [-Warray-bounds] > >>> ../../gcc/rtlanal.c:5573:23: warning: array index 1 is past the end of > >>> the array (which contains 1 element) [-Warray-bounds] > >>> *second = GEN_INT (CONST_DOUBLE_HIGH (value)); > >>> ^ > >> > >> These warnings are bogus due to the array being the last element of the > >> structure. > >> > >> Please file that with clang. > >> > > > > IIRC, C++ does not allow flexible array members. > > > But this has been a common extension for many years now (since C++ and > C have been around). So warning is useless. A flexible array member has no size or with the gcc extension has size 0. Clang also does not warn about these. The array here seems to have size 1 and I have seen similar cases in the gcc code base with size 2. The benefit of cleaning this up would be that you could get proper warnings for arrays at the end of the struct which are not meant to be flexible array members. Martin
Re: [OR1K port] where do I change the function frame structure
On 05/05/2015 05:19 PM, Peter T. Breuer wrote: Please .. where (in what file, dir) of the gcc (4.9.1) source should I rummage in order to change the sequence of instructions eventually emitted to do a function call? Are you trying to change the caller or the callee? For the callee, or1k_compute_frame_size calculates the frame size, which depends on the frame layout. or1k_expand_prologue emits the RTL for the prologue. or1k_expand_epilogue emits the RTL for the epilogue. There are also a few other closely related helper functions. These are all in gcc/config/or1k/or1k.c. For the caller, I see that the or1k port already sets ACCUMULATE_OUTGOING_ARGS, so there should be no stack pointer inc/dec around a call. Only in the prologue/epilogue. Jim