Re: Is it Okay for GCC to do the following simplifications with "-ffast-math" flag

2015-05-06 Thread Richard Biener
On Wed, May 6, 2015 at 2:56 AM,   wrote:
> On 05/05/2015 08:27 AM, Renlin Li wrote:
>>
>> Hi all,
>>
>> For the following illustrative code,
>>
>> double f1(int x) { return (double)(float)x; } --> return (double)x;
>> int f2(double x) { return (int)(float)x; } --> return (int)x;
>>
>> Is it Okay for the compiler to do the simplifications shown above with
>> fast-match enabled?
>
>
> Such a transformation would yield different results
> for integers that are exactly representable in double
> but not in float. For example, the smallest positive
> integer with such a property in IEEE 754, 16,777,217,
> converts to 16,777,216 in float. I'm not a math expert
> but such a result would seem unexpected even with
> -ffast-math.

Yeah, such changes would be not welcome with -ffast-math.

Richard.

> Martin
>
>>
>> Regards,
>> Renlin Li
>>
>


Re: Merging debug-early work?

2015-05-06 Thread Richard Biener
On Wed, May 6, 2015 at 12:33 AM, Aldy Hernandez  wrote:
> Gentlemen!
>
> I believe I have done as much as is reasonable for a merge, but I'd like to
> get your opinion before I post a huge patch to the list.
>
> The branch bootstraps with one regression in GCC
> (gcc.dg/debug/dwarf2/stacked-qualified-types-3.c) and none for GDB.

On which triplets?

> The GCC regression is a missed optimization while merging the common
> denominator of a set of qualifiers for a type within a DIE.  For example, if
> two types share "const volatile" (say "const volatile int" and "const
> volatile char"), dwarf2out outputs things in the most efficient manner as to
> share the maximum common type DIEs.  This is not working in the branch as
> TYPE_MAIN_VARIANTs are not complete by the time early dwarf is run.  If it
> is possible, I'd like to work on this one regression post-merge.  Not a big
> deal if you disagree, but I'd prefer to postpone on this non crucial bit.
>
> A few caveats...
>
> Richi wants to play around with free-lang-data in the non LTO path.  I
> haven't not done so, and it's left as an exercise to the reader :).

Yeah - I'd also like the early/late paths in dwarf2out.c to be refactored
to completely different functions (that is, not have a single function
creating and/or annotating DIEs early and late but two - with the late
one only doing the annotation work and only annotating with stuff
we expect).  The branch already has accumulated quite some checks
like "if DIE was created early..." and with the LTO prototype work I saw
I'd only need to add (very?) much more of those.

> Shortly after the merge I'll work on a pass to prune unused decl DIEs as
> we're presently creating more DIEs than mainline.  This was expected, and if
> I understood Jason correctly, it is ok to work on this post-merge.  However,
> even without such a pass, the .debug_info size difference is reasonable:
>
> gcc/* (except testsuite):
> Total .debug_info size for [debug-early]: 91081591.00
> Total .debug_info size for [mainline]: 84777565.00
> Total change: 7.44%
>
> libstdc++-v3/* (except testsuite):
> Total .debug_info size for [debug-early]: 5173014.00
> Total .debug_info size for [mainline]: 5044971.00
> Total change: 2.54%
>
> x86_64-unknown-linux-gnu/*
> Total .debug_info size for [debug-early]: 5893131.00
> Total .debug_info size for [mainline]: 5694176.00
> Total change: 3.49%
>
> The above stats are for "size -A  | grep debug_info...".
>
> Within gcc there were a handful of files that were significantly bigger
> (twice as much), and at least the 3-4 I investigated were all due to extra
> unused DIEs that will be handled by a DECL DIE optimization pass.
> Specifically, there are cases where external variables have their DIEs
> generated, because we cannot look at TREE_USED within early dwarf. Stuff
> like this will get a debug info (which is not terribly bad IMO):
>
> struct somestruct { int somefield; };
> extern  struct somestruct *sometable;
>
> The other common scenario is the ICF pass which will mark hunks as
> undebuggable late in the compilation process (by setting DECL_IGNORED_P)--
> actually any pass calling expand_hunk().  This happens for something like
> c-family/stub-objc.c which have multiple identical stubs and get folded into
> one function.
>
> So...all in all, the .debug_info increase is within what was expected when
> we started this project (3-7%).  Actually, I'm pleasantly surprised it's not
> 10-15%.  I expect to get this down significantly in short time.
>
> Thoughts on moving forward?  Is the stacked qualifier regression a show
> stopper?  Is the .debug_info size regression acceptable?

I think both are acceptable if they are fixed in a reasonable time frame
(before stage1 ends).  So I suggest to go forward with merging and send
a nice patch-set.

> And of course... I'm not going anywhere.  Unfortunately, I'm not even going
> on vacation... so I'm here to fix the fallout ;-).

Good to know.

Thanks,
Richard.

> Aldy


Question

2015-05-06 Thread stanleyomachieful
Good day sir, how do I get tutorial of GNU make?thanks for you answer.
Sent from my BlackBerry wireless device from MTN



Re: Is it Okay for GCC to do the following simplifications with "-ffast-math" flag

2015-05-06 Thread Jeff Law

On 05/06/2015 05:11 AM, Richard Biener wrote:

On Wed, May 6, 2015 at 2:56 AM,   wrote:

On 05/05/2015 08:27 AM, Renlin Li wrote:


Hi all,

For the following illustrative code,

double f1(int x) { return (double)(float)x; } --> return (double)x;
int f2(double x) { return (int)(float)x; } --> return (int)x;

Is it Okay for the compiler to do the simplifications shown above with
fast-match enabled?



Such a transformation would yield different results
for integers that are exactly representable in double
but not in float. For example, the smallest positive
integer with such a property in IEEE 754, 16,777,217,
converts to 16,777,216 in float. I'm not a math expert
but such a result would seem unexpected even with
-ffast-math.


Yeah, such changes would be not welcome with -ffast-math.

Agreed.

jeff



Re: [i386] Scalar DImode instructions on XMM registers

2015-05-06 Thread Ilya Enkovich
2015-04-25 4:32 GMT+03:00 Jan Hubicka :
> Hi,
> I am adding Vladimir and Richard into CC. I tried to solve similar problem
> with FP math years ago by having -mfpmath=sse,i387. The idea was to allow
> use of i387 registers when SSE ones run out and possibly also model the fact
> that Pentium4 had faster i387 additions than SSE additions. I also had some
> plans to extend this one mixed SSE/MMX/GPR integer arithmetics, but never
> got to that.
>
> This did not really fly becuase of the regalloc not really being able to
> understnad it (I made path to regclass to propagate the classes and figure out
> what operations needs to stay in i387 and what in SSE to avoid reloading, but
> that never got in).
>
> I believe Vladimir did some work on this with IRA (he is able to spill GPR
> regs into SSE and do bit of other tricks).
>
> Also I believe it was kind of Richard's design deicsion to avoid use of
> (paradoxical) subregs for vector conversions because these have funny
> implications.
>
> The code for handling upper parts of paradoxical subregs is controlled by
> macros around SUBREG_PROMOTED_VAR_P but I do not think it will handle
> V1DI->V2DI conversions fluently without some middle-end hacking. (it will
> probably try to produce zero extensions)
>
> When we are on SSE instructions, it would be great to finally teach
> copy_by_pieces/store_by_pieces to use vector instructions (these are more
> compact and either equaly fast or faster on some CPUs). I hope to get into
> this, but it would be great if someone beat me.
>
> Honza
>

I'm trying to implement it as separate RTL pass which chooses a
scalar/vector mode for each 64bit computation chain and performs
transformation if we choose to use vectors. I also want to split DI
instructions which are going to be implemented on GPRs before RA
(currently it is done on the second split). Good metrics for such
transformation is a big question but currently I can't even make it
generate correct code when paradoxical subregs are used. It works in
simple cases but I get troubles when spills appear.

Trying to beat the following testcase:

test (long long *arr)
{
  register unsigned long long tmp;
  tmp = arr[0] | arr[1] & arr[2];
  while (tmp)
{
  counter (tmp);
  tmp = *(arr++) & tmp;
}
}

RTL I generate seems OK to me (ignoring the fact that it is not optimal):

(insn 6 3 50 2 (set (reg:DI 98 [ MEM[(long long int *)arr_5(D) + 8B] ])
(mem:DI (plus:SI (reg/v/f:SI 96 [ arr ])
(const_int 8 [0x8])) [2 MEM[(long long int *)arr_5(D)
+ 8B]+0 S8 A64])) pr65105-1.c:22 89 {*movdi_internal}
 (nil))
(insn 50 6 7 2 (set (reg:DI 104)
(mem:DI (plus:SI (reg/v/f:SI 96 [ arr ])
(const_int 16 [0x10])) [2 MEM[(long long int
*)arr_5(D) + 16B]+0 S8 A64])) pr65105-1.c:22 -1
 (nil))
(insn 7 50 51 2 (set (subreg:V2DI (reg:DI 97 [ D.2586 ]) 0)
(and:V2DI (subreg:V2DI (reg:DI 98 [ MEM[(long long int
*)arr_5(D) + 8B] ]) 0)
(subreg:V2DI (reg:DI 104) 0))) pr65105-1.c:22 3487 {*andv2di3}
 (expr_list:REG_DEAD (subreg:V2DI (reg:DI 98 [ MEM[(long long int
*)arr_5(D) + 8B] ]) 0)
(expr_list:REG_UNUSED (reg:CC 17 flags)
(expr_list:REG_EQUAL (and:DI (mem:DI (plus:SI (reg/v/f:SI
96 [ arr ])
(const_int 8 [0x8])) [2 MEM[(long long int
*)arr_5(D) + 8B]+0 S8 A64])
(mem:DI (plus:SI (reg/v/f:SI 96 [ arr ])
(const_int 16 [0x10])) [2 MEM[(long long
int *)arr_5(D) + 16B]+0 S8 A64]))
(nil)
(insn 51 7 8 2 (set (reg:DI 105)
(mem:DI (reg/v/f:SI 96 [ arr ]) [2 *arr_5(D)+0 S8 A64]))
pr65105-1.c:22 -1
 (nil))
(insn 8 51 46 2 (set (subreg:V2DI (reg/v:DI 87 [ tmp ]) 0)
(ior:V2DI (subreg:V2DI (reg:DI 97 [ D.2586 ]) 0)
(subreg:V2DI (reg:DI 105) 0))) pr65105-1.c:22 3489 {*iorv2di3}
 (expr_list:REG_DEAD (subreg:V2DI (reg:DI 97 [ D.2586 ]) 0)
(expr_list:REG_UNUSED (reg:CC 17 flags)
(nil
(insn 46 8 47 2 (set (reg:V2DI 103)
(subreg:V2DI (reg/v:DI 87 [ tmp ]) 0)) pr65105-1.c:22 -1
 (nil))
(insn 47 46 48 2 (set (subreg:SI (reg:DI 101) 0)
(subreg:SI (reg:V2DI 103) 0)) pr65105-1.c:22 -1
 (nil))
(insn 48 47 49 2 (set (reg:V2DI 103)
(lshiftrt:V2DI (reg:V2DI 103)
(const_int 32 [0x20]))) pr65105-1.c:22 -1
 (nil))
(insn 49 48 9 2 (set (subreg:SI (reg:DI 101) 4)
(subreg:SI (reg:V2DI 103) 0)) pr65105-1.c:22 -1
 (nil))
(note 9 49 10 2 NOTE_INSN_DELETED)
(insn 10 9 11 2 (parallel [
(set (reg:CCZ 17 flags)
(compare:CCZ (ior:SI (subreg:SI (reg:DI 101) 4)
(subreg:SI (reg:DI 101) 0))
(const_int 0 [0])))
(clobber (scratch:SI))
]) pr65105-1.c:23 447 {*iorsi_3}
 (nil))
(jump_insn 11 10 37 2 (set (pc)
(if_then_else (ne (reg:CCZ 17 flags)
(const_int 0 [0]))
(label_ref:SI 37)
(pc))) pr65105

Re: Is it Okay for GCC to do the following simplifications with "-ffast-math" flag

2015-05-06 Thread Michael Matz
Hi,

On Wed, 6 May 2015, Richard Biener wrote:

> >> double f1(int x) { return (double)(float)x; } --> return (double)x;
> >> int f2(double x) { return (int)(float)x; } --> return (int)x;
> >>
> >> Is it Okay for the compiler to do the simplifications shown above with
> >> fast-match enabled?
> >
> >
> > Such a transformation would yield different results
> > for integers that are exactly representable in double
> > but not in float. For example, the smallest positive
> > integer with such a property in IEEE 754, 16,777,217,
> > converts to 16,777,216 in float. I'm not a math expert
> > but such a result would seem unexpected even with
> > -ffast-math.
> 
> Yeah, such changes would be not welcome with -ffast-math.

It's just a normal 1ulp round-off error and these are quite acceptable 
under fast-math.  It just so happens to look large because of the base 
value, and it affects rounded integers.  I don't see how _that_ can be 
used as reason to reject it from fast-math (we'd have to reject pretty 
much all transformation of fast-math then).  Also the above 
transformations are strictly _increasing_ precision, so programs relying 
on fantansy values before should equally be fine with more precise 
fantasy values.

More useful reasons for rejections are: breaks program such-and-such 
(benchmarks), or "no known meaningful performance improvements" (only 
microbenchs for instance).


Ciao,
Michael.


Re: Is it Okay for GCC to do the following simplifications with "-ffast-math" flag

2015-05-06 Thread Richard Biener
On May 6, 2015 5:56:10 PM GMT+02:00, Michael Matz  wrote:
>Hi,
>
>On Wed, 6 May 2015, Richard Biener wrote:
>
>> >> double f1(int x) { return (double)(float)x; } --> return
>(double)x;
>> >> int f2(double x) { return (int)(float)x; } --> return (int)x;
>> >>
>> >> Is it Okay for the compiler to do the simplifications shown above
>with
>> >> fast-match enabled?
>> >
>> >
>> > Such a transformation would yield different results
>> > for integers that are exactly representable in double
>> > but not in float. For example, the smallest positive
>> > integer with such a property in IEEE 754, 16,777,217,
>> > converts to 16,777,216 in float. I'm not a math expert
>> > but such a result would seem unexpected even with
>> > -ffast-math.
>> 
>> Yeah, such changes would be not welcome with -ffast-math.
>
>It's just a normal 1ulp round-off error and these are quite acceptable 
>under fast-math.  

1ulp?  In the double precision result it's more than that.  It's one ulp for 
the int to float conversion.

It just so happens to look large because of the base 
>value, and it affects rounded integers.  I don't see how _that_ can be 
>used as reason to reject it from fast-math (we'd have to reject pretty 
>much all transformation of fast-math then).  Also the above 
>transformations are strictly _increasing_ precision, so programs
>relying 
>on fantansy values before should equally be fine with more precise 
>fantasy values.

Yes, if we think in infinite precision math (maybe that's a good way to 
document unsafe-math opts, that they can violate IEEE by inter preting  code as 
written with infinite precision math).

>More useful reasons for rejections are: breaks program such-and-such 
>(benchmarks), or "no known meaningful performance improvements" (only 
>microbenchs for instance).

Sure.

Richard.

>
>Ciao,
>Michael.




ANN: gcc-python-plugin 0.14

2015-05-06 Thread David Malcolm
gcc-python-plugin is a plugin for GCC 4.6 onwards which embeds the
CPython interpreter within GCC, allowing you to write new compiler
warnings in Python, generate code visualizations, etc.

It ships with "gcc-with-cpychecker", which implements static analysis
passes for GCC aimed at finding bugs in CPython extensions.  In
particular, it can automatically detect reference-counting errors:
 http://gcc-python-plugin.readthedocs.org/en/latest/cpychecker.html

This release adds support for GCC 5.

Tarball releases are available at:
  https://fedorahosted.org/releases/g/c/gcc-python-plugin/

Prebuilt-documentation can be seen at:
  http://gcc-python-plugin.readthedocs.org/en/latest/index.html

The project's homepage is:
  https://fedorahosted.org/gcc-python-plugin/

The plugin and checker are Free Software, licensed under the GPLv3 or
later.

Enjoy!
Dave Malcolm




gcc-4.9-20150506 is now available

2015-05-06 Thread gccadmin
Snapshot gcc-4.9-20150506 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.9-20150506/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.9 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_9-branch 
revision 222864

You'll find:

 gcc-4.9-20150506.tar.bz2 Complete GCC

  MD5=cb3e6b08d4f266cf322720b42c34f674
  SHA1=4f0a4d804e83c00655b837a01f90b70d53289571

Diffs from 4.9-20150429 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.9
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Re: Compiler warnings while compiling gcc with clang‏

2015-05-06 Thread Martin Uecker
Am Tue, 5 May 2015 21:37:10 -0700
Andrew Pinski :
> On Tue, May 5, 2015@9:00 PM, Aditya K  wrote:

> >>>
> >>> gcc/rtlanal.c:5573:23: warning: array index 1 is past the end of the 
> >>> array (which contains 1 element) [-Warray-bounds]
> >>> ../../gcc/rtlanal.c:5573:23: warning: array index 1 is past the end of 
> >>> the array (which contains 1 element) [-Warray-bounds]
> >>> *second = GEN_INT (CONST_DOUBLE_HIGH (value));
> >>> ^
> >>
> >> These warnings are bogus due to the array being the last element of the 
> >> structure.
> >>
> >> Please file that with clang.
> >>
> >
> > IIRC, C++ does not allow flexible array members.
> 
> 
> But this has been a common extension for many years now (since C++ and
> C have been around).  So warning is useless.

A flexible array member has no size or with the gcc extension has
size 0. Clang also does not warn about these. The array here seems
to have size 1 and I have seen similar cases in the gcc code base
with size 2.

The benefit of cleaning this up would be that you could get proper
warnings for arrays at the end of the struct which are not meant 
to be flexible array members.

Martin



Re: [OR1K port] where do I change the function frame structure

2015-05-06 Thread Jim Wilson

On 05/05/2015 05:19 PM, Peter T. Breuer wrote:

Please ..  where (in what file, dir) of the gcc (4.9.1) source should I
rummage in order to change the sequence of instructions eventually
emitted to do a function call?


Are you trying to change the caller or the callee?

For the callee, or1k_compute_frame_size calculates the frame size, which 
depends on the frame layout.  or1k_expand_prologue emits the RTL for the 
prologue.  or1k_expand_epilogue emits the RTL for the epilogue.  There 
are also a few other closely related helper functions.  These are all in 
gcc/config/or1k/or1k.c.


For the caller, I see that the or1k port already sets 
ACCUMULATE_OUTGOING_ARGS, so there should be no stack pointer inc/dec 
around a call.  Only in the prologue/epilogue.


Jim