Re: Bootstrap failure in stage 2 on i386.c

2016-11-23 Thread Richard Biener
On Wed, Nov 23, 2016 at 1:12 AM, Serge Belyshev
 wrote:
>> My builds for the last couple of days have all been failing in stage 2
>> like so:
>>
>> /home/arth/src/gcc/gcc/config/i386/i386.c: In function ‘rtx_def* 
>> ix86_expand_bui
>> ltin(tree, rtx, rtx, machine_mode, int)’:
>> /home/arth/src/gcc/gcc/config/i386/i386.c:38407:18: error: ‘fcn’ may be used 
>> uni
>> nitialized in this function [-Werror=maybe-uninitialized]
>> emit_insn (fcn (target, accum, wide_reg, mem));
>> ~~^~~~
>>
>> Anyone else seeing this?
>
> I'm seeing it too. The code is new, but the 'may be used unitialized'
> warning itself is, probably, an instance of the old PR36550.
>
> I have reduced this testcase to:
>
> //
> /* { dg-do compile } */
> /* { dg-options "-O2 -Wuninitialized" } */
>
> int force_reg (int, int);
> int expand_expr_real (int, int);
> void f (int);
>
> void ix86_expand_builtin (int fcode, int pmode)
> {
>   int fcn;
>   int i, addr, masked = 1;
>
>   switch (fcode)
> {
> case 1:
>   fcn = 1;
>   masked = 0;
>   goto s4fma_expand;
>
> case 2:
>   fcn = 2;
>   masked = 0;
>   goto s4fma_expand;
>
> case 4:
>   {
>   s4fma_expand:
> for (i = 0; i < 4; i++)
>   expand_expr_real (0, 0);
>
> addr = expand_expr_real (0, 0);
> force_reg ((pmode ? 0 : 2), addr);
>
> if (! masked)
>   f (fcn);
>   }
> }
> }
> //
>
> It fails with every gcc version down to 3.2 (the oldest one I could
> compile on my amd64 box).  Note that this testcase is particularly
> flaky: recent gccs will not issue a warning if one, for example, changes
> the '2' to '1' in the force_reg() call.

It's an interesting case where the uninit pass has to do sth else than
looking for a guard on the incoming path to the uninit PHI value
(there's none).  But it has to simplify the guard on the use with
values on the edge of the uninit var:

  # fcn_1 = PHI 
  # masked_3 = PHI <1(3), 0(14), 0(4)>
s4fma_expand:
...
  if (masked_3 == 0)
goto ;
  else
goto ;

  :
  goto  ();

  :
  f (fcn_1); [tail call]

that's not something it even tries to do (and this case is particularly
simple even)

Richard.


Re: [SVE] Support for variable-sized machine modes

2016-11-23 Thread Richard Biener
On Thu, Nov 17, 2016 at 11:00 PM, Richard Sandiford
 wrote:
> Thanks for the comments.
>
> Richard Biener  writes:
>> On Fri, Nov 11, 2016 at 6:50 PM, Richard Sandiford
>>> Constructing variable-length vectors
>>> 
>>>
>>> Currently both tree and rtl vector constants require the number of
>>> elements to be known at compile time and allow the elements to be
>>> arbitrarily different from one another.  SVE vector constants instead
>>> have a variable number of elements and require the constant to have
>>> some inherent structure, so that the values remain predictable however
>>> long the vector is.  In practice there are two useful types of constant:
>>>
>>> (a) a duplicate of a single value to all elements.
>>>
>>> (b) a linear series in which element E has the value BASE + E * STEP,
>>> for some given BASE and STEP.
>>>
>>> For integers, (a) could simply be seen as a special form of (b) in
>>> which the step is zero.  However, we've deliberately not defined (b)
>>> for floating-point modes, in order to avoid specifying whether element
>>> E should really be calculcated as BASE + E * STEP or as BASE with STEP
>>> added E times (which would round differently).  So treating (a) as a
>>> separate kind of constant from (b) is useful for floating-point types.
>>>
>>> We need to support the same operations for non-constant vectors as well
>>> as constant ones.  Both operations have direct analogues in SVE.
>>>
>>> rtl already supports (a) for variables via vec_duplicate.  For constants
>>> we simply wrapped such vec_duplicates in a (const ...), so for example:
>>>
>>>   (const:VnnHI (vec_duplicate:VnnHI (const_int 10)))
>>>
>>> represents a vector constant in which each element is the 16-bit value 10.
>>>
>>> For (b) we created a new vec_series rtl code that takes the base and step
>>> as operands.  A vec_series is constant if it has constant operands, in which
>>> case it too can be wrapped in a (const ...).  For example:
>>>
>>>   (const:VnnSI (vec_series:VnnSI (const_int 1) (const_int 3)))
>>>
>>> represents the constant { 1, 4, 7, 10, ... }.
>>>
>>> We only use constant vec_duplicate and vec_series when the number of
>>> elements is variable.  Vectors with a constant number of elements
>>> continue to use const_vector.  It might be worth considering using
>>> vec_duplicate across the board in future though, since it's significantly
>>> more compact when the number of elements is large.
>>>
>>> In both vec_duplicate and vec_series constants, the value of the element
>>> can be any constant that is valid for the element's mode; it doesn't have
>>> to be a const_int.
>>>
>>> The patches take a similar approach for trees.  A new VEC_DUPLICATE_EXPR
>>> returns a vector in which every element is equal to operand 0, while a new
>>> VEC_SERIES_EXPR creates a linear series, taking the same two operands as the
>>> rtl code.  The trees are TREE_CONSTANT if the operands are TREE_CONSTANT.
>>>
>>> The new trees are valid gimple values iff they are TREE_CONSTANT.
>>> This means that the constant forms can be used in a very similar way
>>> to VECTOR_CST, rather than always requiring a separate gimple assignment.
>>
>> Hmm.  They are hopefully (at least VEC_DUPLICATE_EXPR) not GIMPLE_SINGLE_RHS.
>> But it means they'd appear (when TREE_CONSTANT) as gimple operand in
>> GENERIC form.
>
> You guessed correctly: they're GIMPLE_SINGLE_RHS :-)  That seemed to be
> our current way of handling this kind of expression.  Do you not like it
> because of the overhead of the extra tree node in plain:
>
>reg = VEC_DUPLICATE_EXPR 
>
> assignments?

Not the overhead but it's a step backward of getting rid of GENERIC
_expressions_
in GIMPLE.  They require special handling in most of the middle-end (genmatch,
value-numbering, PRE to just name a few).

>  The problem is that if we treat them as unary, the
> VEC_DUPLICATE_EXPR node appears and disappears depending on whether
> the assignment has an operator or not, which makes them significantly
> different from VECTOR_CST.  The idea was to make them as similar as
> possible, so that most code wouldn't care that a different tree code
> is being used.
>
> I think in practice most duplicates are used as operands rather than
> as rhses in their own right.

Well, then maybe restrict them to this.  Or add VEC_DUPLICATE_EXPR
and VEC_DUPLICATE_CONST (like we fold a RHS CONSTRUCTOR
to a VECTOR_CST when all elements become constant).

>>> Variable-length permutes
>>> 
>>>
>>> SVE has a similar set of permute instructions to Advanced SIMD: it has
>>> a TBL instruction for general permutes and specific instructions like
>>> TRN1 for certain common operations.  Although it would be possible to
>>> construct variable-length masks for all the special-purpose permutes,
>>> the expression to construct things like "interleave high" would be
>>> relatively complex.  It seemed better to add optabs and internal
>>> functions for certain kinds of

Re: [PING][RFC] Assertions as optimization hints

2016-11-23 Thread Richard Biener
On Tue, Nov 22, 2016 at 6:52 PM, Yuri Gribov  wrote:
> Hi all,
>
> I've recently revisited an ancient patch from Paolo
> (https://gcc.gnu.org/ml/gcc-patches/2004-04/msg00551.html) which uses
> asserts as optimization hints. I've rewritten the patch to be more
> stable under expressions with side-effects and did some basic
> investigation of it's efficacy.
>
> Optimization is hidden under !defined NDEBUG && defined
> __ASSUME_ASSERTS__. !NDEBUG-part is necessary because assertions often
> rely on special !NDEBUG-protected support code outside of assert
> (dedicated fields in structures and similar stuff, collectively called
> "ghost variables"). __ASSUME_ASSERTS__ gives user a choice whether to
> enable optimization or not (should probably be hidden under a friendly
> compiler switch e.g. -fassume-asserts).
>
> I do not have access to a good machine for speed benchmarks so I only
> looked at size improvements in few popular projects. There are no
> revolutionary changes (0.1%-1%) but some functions see good reductions
> which may result in noticeable runtime improvements in practice. One
> good  example is MariaDB where you frequently find the following
> pattern:
>   struct A {
> virtual void foo() { assert(0); }
>   };
>   ...
>   A *a;
>   a->foo();
> Here the patch will prevent GCC from inlining A::foo (as it'll figure
> out that it's impossible to occur at runtime) thus saving code size.
>
> Does this approach make sense in general? If it does I can probably
> come up with more measurements.
>
> As a side note, at least some users may consider this a useful feature:
> http://www.nntp.perl.org/group/perl.perl5.porters/2013/11/msg209482.html

You should CC relevant maintainers or annotate the subject -- this is
a C/C++ frontend patch introducing __builtin_has_side_effects_p
plus a patch adding a GCC supplied assert.h header.

Note that from a distribution point of view I wouldn't enable assume-asserts
for a distro-build given the random behavior of __builtin_unreachable in case
of assert failure.

Richard.

> -I


Re: [PING][RFC] Assertions as optimization hints

2016-11-23 Thread Yuri Gribov
On Wed, Nov 23, 2016 at 11:31 AM, Richard Biener
 wrote:
> On Tue, Nov 22, 2016 at 6:52 PM, Yuri Gribov  wrote:
>> Hi all,
>>
>> I've recently revisited an ancient patch from Paolo
>> (https://gcc.gnu.org/ml/gcc-patches/2004-04/msg00551.html) which uses
>> asserts as optimization hints. I've rewritten the patch to be more
>> stable under expressions with side-effects and did some basic
>> investigation of it's efficacy.
>>
>> Optimization is hidden under !defined NDEBUG && defined
>> __ASSUME_ASSERTS__. !NDEBUG-part is necessary because assertions often
>> rely on special !NDEBUG-protected support code outside of assert
>> (dedicated fields in structures and similar stuff, collectively called
>> "ghost variables"). __ASSUME_ASSERTS__ gives user a choice whether to
>> enable optimization or not (should probably be hidden under a friendly
>> compiler switch e.g. -fassume-asserts).
>>
>> I do not have access to a good machine for speed benchmarks so I only
>> looked at size improvements in few popular projects. There are no
>> revolutionary changes (0.1%-1%) but some functions see good reductions
>> which may result in noticeable runtime improvements in practice. One
>> good  example is MariaDB where you frequently find the following
>> pattern:
>>   struct A {
>> virtual void foo() { assert(0); }
>>   };
>>   ...
>>   A *a;
>>   a->foo();
>> Here the patch will prevent GCC from inlining A::foo (as it'll figure
>> out that it's impossible to occur at runtime) thus saving code size.
>>
>> Does this approach make sense in general? If it does I can probably
>> come up with more measurements.
>>
>> As a side note, at least some users may consider this a useful feature:
>> http://www.nntp.perl.org/group/perl.perl5.porters/2013/11/msg209482.html
>
> You should CC relevant maintainers or annotate the subject -- this is
> a C/C++ frontend patch introducing __builtin_has_side_effects_p
> plus a patch adding a GCC supplied assert.h header.

Thanks, Richard, I've cc-ed Joseph (who is both frontend maintainer
and also commented on old version of this patch).

> Note that from a distribution point of view I wouldn't enable assume-asserts
> for a distro-build given the random behavior of __builtin_unreachable in case
> of assert failure.

Definitely, aggressively abusing assertions like this should be a
per-project decision. E.g. I've seen code which parallels assertions
with error checking i.e. something like
  FILE *p = fopen(...);
  assert(p);  // Quickly abort in debug mode
  if (!p)
return ERROR;
Enabling __ASSUME_ASSERTS__ for such code would effectively disable
all safety checks.

On the other hand many projects use assert(0) to mark unreachable
regions of code (e.g. in default branches of switch statements) and
optimizing these out sounds like a viable decision to me.

-Iurii


Re: Bootstrap failure in stage 2 on i386.c

2016-11-23 Thread Jeff Law

On 11/23/2016 01:29 AM, Richard Biener wrote:

On Wed, Nov 23, 2016 at 1:12 AM, Serge Belyshev
 wrote:

My builds for the last couple of days have all been failing in stage 2
like so:

/home/arth/src/gcc/gcc/config/i386/i386.c: In function ‘rtx_def* ix86_expand_bui
ltin(tree, rtx, rtx, machine_mode, int)’:
/home/arth/src/gcc/gcc/config/i386/i386.c:38407:18: error: ‘fcn’ may be used uni
nitialized in this function [-Werror=maybe-uninitialized]
emit_insn (fcn (target, accum, wide_reg, mem));
~~^~~~

Anyone else seeing this?


I'm seeing it too. The code is new, but the 'may be used unitialized'
warning itself is, probably, an instance of the old PR36550.

I have reduced this testcase to:

//
/* { dg-do compile } */
/* { dg-options "-O2 -Wuninitialized" } */

int force_reg (int, int);
int expand_expr_real (int, int);
void f (int);

void ix86_expand_builtin (int fcode, int pmode)
{
  int fcn;
  int i, addr, masked = 1;

  switch (fcode)
{
case 1:
  fcn = 1;
  masked = 0;
  goto s4fma_expand;

case 2:
  fcn = 2;
  masked = 0;
  goto s4fma_expand;

case 4:
  {
  s4fma_expand:
for (i = 0; i < 4; i++)
  expand_expr_real (0, 0);

addr = expand_expr_real (0, 0);
force_reg ((pmode ? 0 : 2), addr);

if (! masked)
  f (fcn);
  }
}
}
//

It fails with every gcc version down to 3.2 (the oldest one I could
compile on my amd64 box).  Note that this testcase is particularly
flaky: recent gccs will not issue a warning if one, for example, changes
the '2' to '1' in the force_reg() call.


It's an interesting case where the uninit pass has to do sth else than
looking for a guard on the incoming path to the uninit PHI value
(there's none).  But it has to simplify the guard on the use with
values on the edge of the uninit var:

  # fcn_1 = PHI 
  # masked_3 = PHI <1(3), 0(14), 0(4)>
s4fma_expand:
...
  if (masked_3 == 0)
goto ;
  else
goto ;

  :
  goto  ();

  :
  f (fcn_1); [tail call]

that's not something it even tries to do (and this case is particularly
simple even)
But what left that in the IL?  I'd expect jump threading to have 
optimized the masked_3 test away completely by isolating the paths where 
it's known to be zero from the path where it's known to be one.


jeff


Re: Bootstrap failure in stage 2 on i386.c

2016-11-23 Thread Richard Biener
On November 23, 2016 8:17:34 PM GMT+01:00, Jeff Law  wrote:
>On 11/23/2016 01:29 AM, Richard Biener wrote:
>> On Wed, Nov 23, 2016 at 1:12 AM, Serge Belyshev
>>  wrote:
 My builds for the last couple of days have all been failing in
>stage 2
 like so:

 /home/arth/src/gcc/gcc/config/i386/i386.c: In function ‘rtx_def*
>ix86_expand_bui
 ltin(tree, rtx, rtx, machine_mode, int)’:
 /home/arth/src/gcc/gcc/config/i386/i386.c:38407:18: error: ‘fcn’
>may be used uni
 nitialized in this function [-Werror=maybe-uninitialized]
 emit_insn (fcn (target, accum, wide_reg, mem));
 ~~^~~~

 Anyone else seeing this?
>>>
>>> I'm seeing it too. The code is new, but the 'may be used
>unitialized'
>>> warning itself is, probably, an instance of the old PR36550.
>>>
>>> I have reduced this testcase to:
>>>
>>>
>//
>>> /* { dg-do compile } */
>>> /* { dg-options "-O2 -Wuninitialized" } */
>>>
>>> int force_reg (int, int);
>>> int expand_expr_real (int, int);
>>> void f (int);
>>>
>>> void ix86_expand_builtin (int fcode, int pmode)
>>> {
>>>   int fcn;
>>>   int i, addr, masked = 1;
>>>
>>>   switch (fcode)
>>> {
>>> case 1:
>>>   fcn = 1;
>>>   masked = 0;
>>>   goto s4fma_expand;
>>>
>>> case 2:
>>>   fcn = 2;
>>>   masked = 0;
>>>   goto s4fma_expand;
>>>
>>> case 4:
>>>   {
>>>   s4fma_expand:
>>> for (i = 0; i < 4; i++)
>>>   expand_expr_real (0, 0);
>>>
>>> addr = expand_expr_real (0, 0);
>>> force_reg ((pmode ? 0 : 2), addr);
>>>
>>> if (! masked)
>>>   f (fcn);
>>>   }
>>> }
>>> }
>>>
>//
>>>
>>> It fails with every gcc version down to 3.2 (the oldest one I could
>>> compile on my amd64 box).  Note that this testcase is particularly
>>> flaky: recent gccs will not issue a warning if one, for example,
>changes
>>> the '2' to '1' in the force_reg() call.
>>
>> It's an interesting case where the uninit pass has to do sth else
>than
>> looking for a guard on the incoming path to the uninit PHI value
>> (there's none).  But it has to simplify the guard on the use with
>> values on the edge of the uninit var:
>>
>>   # fcn_1 = PHI 
>>   # masked_3 = PHI <1(3), 0(14), 0(4)>
>> s4fma_expand:
>> ...
>>   if (masked_3 == 0)
>> goto ;
>>   else
>> goto ;
>>
>>   :
>>   goto  ();
>>
>>   :
>>   f (fcn_1); [tail call]
>>
>> that's not something it even tries to do (and this case is
>particularly
>> simple even)
>But what left that in the IL?  I'd expect jump threading to have 
>optimized the masked_3 test away completely by isolating the paths
>where 
>it's known to be zero from the path where it's known to be one.

I'm looking at another case where we fail to thread (albeit a lot more 
complicated), it seems the backwards threader is not very good in handling 
opportunities that require forward propagating of PHI args and DOM requires 
excessive iteration (11 DOM passes in this other case...).

Richard.

>jeff




Re: Bootstrap failure in stage 2 on i386.c

2016-11-23 Thread Jeff Law

On 11/23/2016 12:48 PM, Richard Biener wrote:

On November 23, 2016 8:17:34 PM GMT+01:00, Jeff Law  wrote:

On 11/23/2016 01:29 AM, Richard Biener wrote:

On Wed, Nov 23, 2016 at 1:12 AM, Serge Belyshev
 wrote:

My builds for the last couple of days have all been failing in

stage 2

like so:

/home/arth/src/gcc/gcc/config/i386/i386.c: In function ‘rtx_def*

ix86_expand_bui

ltin(tree, rtx, rtx, machine_mode, int)’:
/home/arth/src/gcc/gcc/config/i386/i386.c:38407:18: error: ‘fcn’

may be used uni

nitialized in this function [-Werror=maybe-uninitialized]
emit_insn (fcn (target, accum, wide_reg, mem));
~~^~~~

Anyone else seeing this?


I'm seeing it too. The code is new, but the 'may be used

unitialized'

warning itself is, probably, an instance of the old PR36550.

I have reduced this testcase to:



//

/* { dg-do compile } */
/* { dg-options "-O2 -Wuninitialized" } */

int force_reg (int, int);
int expand_expr_real (int, int);
void f (int);

void ix86_expand_builtin (int fcode, int pmode)
{
  int fcn;
  int i, addr, masked = 1;

  switch (fcode)
{
case 1:
  fcn = 1;
  masked = 0;
  goto s4fma_expand;

case 2:
  fcn = 2;
  masked = 0;
  goto s4fma_expand;

case 4:
  {
  s4fma_expand:
for (i = 0; i < 4; i++)
  expand_expr_real (0, 0);

addr = expand_expr_real (0, 0);
force_reg ((pmode ? 0 : 2), addr);

if (! masked)
  f (fcn);
  }
}
}


//


It fails with every gcc version down to 3.2 (the oldest one I could
compile on my amd64 box).  Note that this testcase is particularly
flaky: recent gccs will not issue a warning if one, for example,

changes

the '2' to '1' in the force_reg() call.


It's an interesting case where the uninit pass has to do sth else

than

looking for a guard on the incoming path to the uninit PHI value
(there's none).  But it has to simplify the guard on the use with
values on the edge of the uninit var:

  # fcn_1 = PHI 
  # masked_3 = PHI <1(3), 0(14), 0(4)>
s4fma_expand:
...
  if (masked_3 == 0)
goto ;
  else
goto ;

  :
  goto  ();

  :
  f (fcn_1); [tail call]

that's not something it even tries to do (and this case is

particularly

simple even)

But what left that in the IL?  I'd expect jump threading to have
optimized the masked_3 test away completely by isolating the paths
where
it's known to be zero from the path where it's known to be one.


I'm looking at another case where we fail to thread (albeit a lot more 
complicated), it seems the backwards threader is not very good in handling 
opportunities that require forward propagating of PHI args and DOM requires 
excessive iteration (11 DOM passes in this other case...).

Pass it along.

I've got a case here which may need something similar.  I've got some 
skeleton code that I might be able to beat into shape.


Jeff