Re: PATCH: PR target/59363: [4.9 Regression] r203886 miscompiles git

2013-12-03 Thread Uros Bizjak
On Tue, Dec 3, 2013 at 2:05 AM, H.J. Lu  wrote:

> emit_memset fails to adjust destination address after gen_strset, which
> leads to the wrong address in aliasing info.  This patch fixes it.
> Tested on Linux/x86-64.  OK to install?
>
> 2013-12-03   H.J. Lu  
>
> PR target/59363
> * config/i386/i386.c (emit_memset): Adjust destination address
> after gen_strset.
>
> gcc/testsuite/
>
> 2013-12-03   H.J. Lu  
>
> PR target/59363
> * gcc.target/i386/pr59363.c: New file.

OK, but according to [1], there are other places where similar issues
should be fixed. I propose to wait for Michael's analysis and eventual
patch, and fix them all together.

[1] http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59363#c21

Thanks,
Uros.


Re: [wide-int] i am concerned about the typedef for widest-int.

2013-12-03 Thread Richard Biener
On Mon, 2 Dec 2013, Kenneth Zadeck wrote:

> On 12/02/2013 03:34 PM, Richard Sandiford wrote:
> > Kenneth Zadeck  writes:
> > > see wide-int.h around line 290
> > > 
> > > the MAX_BITSIZE_MODE_ANY_INT is the largest mode on the machine. however
> > > if the value coming in is an unsigned number of the type the represents
> > > that mode, don't we loose a bit?
> > That was the +1 mentioned here:
> > 
> > http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03745.html
> > 
> > I.e. it should be "widest supported arithmetic input + 1".
> > 
> > Thanks,
> > Richard
> do we want 129 or do we want to round that up to the next hwi?

round up to the next hwi of course.

Richard.


Re: PATCH: PR target/59363: [4.9 Regression] r203886 miscompiles git

2013-12-03 Thread Michael Zolotukhin
Hi Uros, HJ,

I checked expand_movmem_epilogue - it seemingly doesn't have such a
problem, so the patch is ok.

We might want to add similar adjust_automodify_address_nv call to here as well:
if (TARGET_64BIT)
{
  dest = change_address (destmem, DImode, destptr);
  emit_insn (gen_strset (destptr, dest, value));
  emit_insn (gen_strset (destptr, dest, value));
}
  else
{
  dest = change_address (destmem, SImode, destptr);
  emit_insn (gen_strset (destptr, dest, value));
  emit_insn (gen_strset (destptr, dest, value));
  emit_insn (gen_strset (destptr, dest, value));
  emit_insn (gen_strset (destptr, dest, value));
}
(code snippet from previous HJ's comment in bugzilla).
I think it's needed here, but I didn't manage to exploit the bug in
this code. Maybe Uros or Jan can comment whether it's needed in such
code or not.

Thanks,
Michael


On 3 December 2013 12:11, Uros Bizjak  wrote:
> On Tue, Dec 3, 2013 at 2:05 AM, H.J. Lu  wrote:
>
>> emit_memset fails to adjust destination address after gen_strset, which
>> leads to the wrong address in aliasing info.  This patch fixes it.
>> Tested on Linux/x86-64.  OK to install?
>>
>> 2013-12-03   H.J. Lu  
>>
>> PR target/59363
>> * config/i386/i386.c (emit_memset): Adjust destination address
>> after gen_strset.
>>
>> gcc/testsuite/
>>
>> 2013-12-03   H.J. Lu  
>>
>> PR target/59363
>> * gcc.target/i386/pr59363.c: New file.
>
> OK, but according to [1], there are other places where similar issues
> should be fixed. I propose to wait for Michael's analysis and eventual
> patch, and fix them all together.
>
> [1] http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59363#c21
>
> Thanks,
> Uros.



-- 
---
Best regards,
Michael V. Zolotukhin,
Software Engineer
Intel Corporation.


[PATCH] Provide a prototype for runtime_traceback

2013-12-03 Thread Richard Biener

Hi, I'm getting a bootstrap fail with Go with our custom patched
compiler (it adds that warning):

09:48 < richi> ../../../libgo/runtime/go-signal.c: In function
   'runtime_sighandler':
09:48 < richi> ../../../libgo/runtime/go-signal.c:221:4: error: call to
   function 'runtime_traceback' without a real prototype
   [-Werror=unprototyped-calls] runtime_traceback (g);

which is because the declaration of runtime_traceback looks like

  void   runtime_traceback();

and thus if you call it the compiler doesn't know the number of
arguments or their types and thus unwanted promotions may apply
that change calling conventions (for example float -> double).
In this case the argument is a pointer, so it's probably not
an issue.  Still the above is not a prototype which the patch
below fixes.

Bootstrap/testing in progress on x86_64-unknown-linux-gnu
(let's hope this was the only one).

Ian, please merge.

Thanks,
Richard.

2013-12-03   Richard Biener  

libgo/
* runtime/runtime.h (runtime_traceback): Prototype properly.

Index: libgo/runtime/runtime.h
===
--- libgo/runtime/runtime.h (revision 205585)
+++ libgo/runtime/runtime.h (working copy)
@@ -453,7 +453,7 @@ enum {
 };
 void   runtime_hashinit(void);
 
-void   runtime_traceback();
+void   runtime_traceback(G*);
 void   runtime_tracebackothers(G*);
 
 /*


Re: [PATCH] Fix nested function ICE with VLAs (PR middle-end/59011)

2013-12-03 Thread Richard Biener
On Tue, 3 Dec 2013, Jakub Jelinek wrote:

> Hi!
> 
> tree-nested.c uses declare_vars with last argument true, which
> relies on BLOCK_VARS of gimple_bind_block being a tail
> of the gimple_bind_vars chain.  But unfortunately a debug info
> improvement I've added to gimplify_var_or_parm_decl 4 years ago
> violates this assumption, in that it adds some VAR_DECLs
> at the head of BLOCK_VARS (DECL_INITIAL (current_function_decl))
> chain, but doesn't adjust gimple_bind_vars chain correspondingly.
> 
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
> ok for trunk/4.8?

Ok.

Thanks,
Richard.

> 2013-12-02  Jakub Jelinek  
> 
>   PR middle-end/59011
>   * gimplify.c (nonlocal_vla_vars): New variable.
>   (gimplify_var_or_parm_decl): Put VAR_DECLs for VLAs into
>   nonlocal_vla_vars chain.
>   (gimplify_body): Call declare_vars on nonlocal_vla_vars chain
>   if outer_bind has DECL_INITIAL (current_function_decl) block.
> 
>   * gcc.dg/pr59011.c: New test.
> 
> --- gcc/gimplify.c.jj 2013-12-02 14:33:34.0 +0100
> +++ gcc/gimplify.c2013-12-02 20:32:02.883491995 +0100
> @@ -1689,6 +1689,9 @@ gimplify_conversion (tree *expr_p)
>  /* Nonlocal VLAs seen in the current function.  */
>  static struct pointer_set_t *nonlocal_vlas;
>  
> +/* The VAR_DECLs created for nonlocal VLAs for debug info purposes.  */
> +static tree nonlocal_vla_vars;
> +
>  /* Gimplify a VAR_DECL or PARM_DECL.  Return GS_OK if we expanded a
> DECL_VALUE_EXPR, and it's worth re-examining things.  */
>  
> @@ -1737,14 +1740,13 @@ gimplify_var_or_parm_decl (tree *expr_p)
>   ctx = ctx->outer_context;
> if (!ctx && !pointer_set_insert (nonlocal_vlas, decl))
>   {
> -   tree copy = copy_node (decl), block;
> +   tree copy = copy_node (decl);
>  
> lang_hooks.dup_lang_specific_decl (copy);
> SET_DECL_RTL (copy, 0);
> TREE_USED (copy) = 1;
> -   block = DECL_INITIAL (current_function_decl);
> -   DECL_CHAIN (copy) = BLOCK_VARS (block);
> -   BLOCK_VARS (block) = copy;
> +   DECL_CHAIN (copy) = nonlocal_vla_vars;
> +   nonlocal_vla_vars = copy;
> SET_DECL_VALUE_EXPR (copy, unshare_expr (value_expr));
> DECL_HAS_VALUE_EXPR_P (copy) = 1;
>   }
> @@ -8562,6 +8564,21 @@ gimplify_body (tree fndecl, bool do_parm
>  
>if (nonlocal_vlas)
>  {
> +  if (nonlocal_vla_vars)
> + {
> +   /* tree-nested.c may later on call declare_vars (..., true);
> +  which relies on BLOCK_VARS chain to be the tail of the
> +  gimple_bind_vars chain.  Ensure we don't violate that
> +  assumption.  */
> +   if (gimple_bind_block (outer_bind)
> +   == DECL_INITIAL (current_function_decl))
> + declare_vars (nonlocal_vla_vars, outer_bind, true);
> +   else
> + BLOCK_VARS (DECL_INITIAL (current_function_decl))
> +   = chainon (BLOCK_VARS (DECL_INITIAL (current_function_decl)),
> +  nonlocal_vla_vars);
> +   nonlocal_vla_vars = NULL_TREE;
> + }
>pointer_set_destroy (nonlocal_vlas);
>nonlocal_vlas = NULL;
>  }
> --- gcc/testsuite/gcc.dg/pr59011.c.jj 2013-12-02 20:14:22.350702153 +0100
> +++ gcc/testsuite/gcc.dg/pr59011.c2013-12-02 20:15:10.902455660 +0100
> @@ -0,0 +1,22 @@
> +/* PR middle-end/59011 */
> +/* { dg-do compile } */
> +/* { dg-options "-std=gnu99" } */
> +
> +void
> +foo (int m)
> +{
> +  int a[m];
> +  void
> +  bar (void)
> +  {
> +{
> +  int
> +  baz (void)
> +  {
> + return a[0];
> +  }
> +}
> +a[0] = 42;
> +  }
> +  bar ();
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imend"orffer


Re: [PATCH] Fix up cmove expansion (PR target/58864, take 2)

2013-12-03 Thread Richard Biener
On Mon, 2 Dec 2013, Jeff Law wrote:

> On 12/02/13 15:51, Jakub Jelinek wrote:
> > Hi!
> > 
> > On Sat, Nov 30, 2013 at 12:38:30PM +0100, Eric Botcazou wrote:
> > > > Rather than adding do_pending_stack_adjust () in all the places,
> > > > especially
> > > > when it isn't clear whether emit_conditional_move will be called at all
> > > > and
> > > > whether it will actually do do_pending_stack_adjust (), I chose to add
> > > > two new functions to save/restore the pending stack adjustment state,
> > > > so that when instruction sequence is thrown away (either by doing
> > > > start_sequence/end_sequence around it and not emitting it, or
> > > > delete_insns_since) the state can be restored, and have changed all the
> > > > places that IMHO need it for emit_conditional_move.
> > > 
> > > Why not do it in emit_conditional_move directly then?  The code thinks
> > > it's
> > > clever to do:
> > > 
> > >do_pending_stack_adjust ();
> > >last = get_last_insn ();
> > >prepare_cmp_insn (XEXP (comparison, 0), XEXP (comparison, 1),
> > >   GET_CODE (comparison), NULL_RTX, unsignedp, OPTAB_WIDEN,
> > >   &comparison, &cmode);
> > > [...]
> > >delete_insns_since (last);
> > >return NULL_RTX;
> > > 
> > > but apparently not, so why not delete the stack adjustment as well and
> > > restore
> > > the state afterwards?
> > 
> > On Sat, Nov 30, 2013 at 09:10:35AM +0100, Richard Biener wrote:
> > > The idea is good but I'd like to see a struct rather than an array for the
> > > storage.
> > 
> > So, this patch attempts to include both of the proposed changes.
> > Furthermore, I've noticed that calls.c has been saving/restoring those
> > two values by hand, so now it can use the new APIs for that too.
> > 
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> > 
> > What about 4.8 branch?  I could create an alternative patch for 4.8,
> > keep everything as is and just save/restore the two fields by hand in
> > emit_conditional_move like calls.c used to do it.

The patch is ok for the branch as-is after a while on trunk.

Thanks,
Richard.

> > 2013-12-02  Jakub Jelinek  
> > 
> > PR target/58864
> > * dojump.c (save_pending_stack_adjust, restore_pending_stack_adjust):
> > New functions.
> > * expr.h (struct saved_pending_stack_adjust): New type.
> > (save_pending_stack_adjust, restore_pending_stack_adjust): New
> > prototypes.
> > * optabs.c (emit_conditional_move): Call save_pending_stack_adjust
> > and get_last_insn before do_pending_stack_adjust, call
> > restore_pending_stack_adjust after delete_insns_since.
> > * expr.c (expand_expr_real_2): Don't call do_pending_stack_adjust
> > before calling emit_conditional_move.
> > * expmed.c (expand_sdiv_pow2): Likewise.
> > * calls.c (expand_call): Use {save,restore}_pending_stack_adjust.
> > 
> > * g++.dg/opt/pr58864.C: New test.
> OK.
> 
> Branch maintainers call on how they want to deal with this in 4.8.
> 
> jeff



Re: [PATCH] Provide a prototype for runtime_traceback

2013-12-03 Thread Richard Biener
On Tue, 3 Dec 2013, Richard Biener wrote:

> 
> Hi, I'm getting a bootstrap fail with Go with our custom patched
> compiler (it adds that warning):
> 
> 09:48 < richi> ../../../libgo/runtime/go-signal.c: In function
>'runtime_sighandler':
> 09:48 < richi> ../../../libgo/runtime/go-signal.c:221:4: error: call to
>function 'runtime_traceback' without a real prototype
>[-Werror=unprototyped-calls] runtime_traceback (g);
> 
> which is because the declaration of runtime_traceback looks like
> 
>   void   runtime_traceback();
> 
> and thus if you call it the compiler doesn't know the number of
> arguments or their types and thus unwanted promotions may apply
> that change calling conventions (for example float -> double).
> In this case the argument is a pointer, so it's probably not
> an issue.  Still the above is not a prototype which the patch
> below fixes.
> 
> Bootstrap/testing in progress on x86_64-unknown-linux-gnu
> (let's hope this was the only one).

Hmm, didn't work out (the implementation has no argument).

Try the following instead.

2013-12-03  Richard Biener  

libgo/
* runtime/runtime.h (runtime_traceback): Fix prototype.
* runtime/go-signal.c (runtime_sighandler): Don't pass
excess argument to runtime_traceback.

Index: libgo/runtime/runtime.h
===
--- libgo/runtime/runtime.h (revision 205623)
+++ libgo/runtime/runtime.h (working copy)
@@ -453,7 +453,7 @@ enum {
 };
 void   runtime_hashinit(void);
 
-void   runtime_traceback();
+void   runtime_traceback(void);
 void   runtime_tracebackothers(G*);
 
 /*
Index: libgo/runtime/go-signal.c
===
--- libgo/runtime/go-signal.c   (revision 205623)
+++ libgo/runtime/go-signal.c   (working copy)
@@ -218,7 +218,7 @@ runtime_sighandler (int sig, Siginfo *in
  G *g;
 
  g = runtime_g ();
- runtime_traceback (g);
+ runtime_traceback ();
  runtime_tracebackothers (g);
 
  /* The gc library calls runtime_dumpregs here, and provides


Re: [PING^2] [PATCH] PR59063

2013-12-03 Thread Andreas Schwab
Jeff Law  writes:

> On 12/01/13 23:12, Yury Gribov wrote:
>>  > This is causing all the tests being run on all targets,
>>  > even if libsanitizer is not supported,
>>  > most of them failing due to link errors.
>>
>> Thanks for the info and sorry about this. I should probably check
>> non-sanitized platforms as well before commiting patches. Does the
>> attached patch make sense to you? Worked for me on x64 and x64 with
>> manually disabled libsanitizer.
> [ ... ]
> Is this still necessary after HJ's patch?

The situation hasn't changed in the last four days.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."


RE: [PATCH] Strict volatile bit-fields clean-up

2013-12-03 Thread Bernd Edlinger
On Mon, 2 Dec 2013 15:42:48, Richard Biener wrote:
>
> On Wed, Nov 20, 2013 at 11:48 AM, Bernd Edlinger
>  wrote:
>> Hello Richard,
>>
>> as a follow-up patch to the bit-fields patch(es), I wanted to remove the 
>> dependencies on
>> the variable flag_strict_volatile_bitfields from expand_assignment and 
>> expand_expr_real_1.
>> Additionally I want the access mode of the field to be selected in the 
>> memory context,
>> instead of the structure's mode.
>>
>> Boot-strapped and regression-tested on x86_64-linux-gnu.
>>
>> OK for trunk?
>
> Ok.
>
> Thanks,
> Richard.
>


Oops

Sorry, tonight this patch caused an Ada regression, in pack17.adb and 
misaligned_nest.adb

So I'll have to put that on hold at the moment.

This ICE is very similar to PR59134.
It is again the recursion between store_fixed_bit_field and 
store_split_bit_field.

The trigger is a latent problem in the ada gcc_interface.

That is we have a bit-field of exactly 8 bit size, which is not byte-aligned,
but DECL_MODE=QImode, DECL_BIT_FIELD=false which looks quite strange,
and is totally different from how C structures look like. I should mention that 
there
are even some places in the target back-ends, where the attribute 
DECL_BIT_FIELD is
used for whatever.

Now, due to this hunk in the cleanup-patch we get the QImode selected in the 
memory
context:

   if (MEM_P (to_rtx))
    {
- if (volatilep && flag_strict_volatile_bitfields> 0)
+ if (mode1 != VOIDmode)
    to_rtx = adjust_address (to_rtx, mode1, 0);

However even without that patch, I can arrange for "volatilep && 
flag_strict_volatile_bitfields> 0"
to be true in Ada (even on X86_64, or whatever platform you like):

-- { dg-do run }
-- { dg-options "-gnatp -fstrict-volatile-bitfields" }

procedure Misaligned_Volatile is

   type Byte is mod 2**8;

   type Block is record
  B : Boolean;
  V : Byte;
   end record;
   pragma Volatile (Block);
   pragma Pack (Block);
   for Block'Alignment use 1;

   type Pair is array (1 .. 2) of Block;

   P : Pair;
begin
   for K in P'Range loop
  P(K).V := 237;
   end loop;
   for K in P'Range loop
  if P(K).V /= 237 then
 raise Program_error;
  end if;
   end loop;
end;


This Ada test case causes either wrong code generation or an ICE at compile 
time,
if the -fstrict-volatile-bitfields option is either given by the user,
or by the target-specific default as it is on ARM for instance  (which is 
completely
pointless on Ada, I know!)...

Now I am preparing a new bitfields-update-patch which fixes this above test 
case and the
latent recursion problem.


Thanks  ...  for you patience :-(
Bernd.

Re: [Patch] Add comments for future regex work

2013-12-03 Thread Paolo Carlini

On 12/02/2013 10:15 PM, Tim Shen wrote:

...for optimization purpose. Should be done in one month.
Great. Consider that we are now in Stage 3, until mid of February. 
Considering that  is a new feature, I think we have some leeway 
in terms of timing, but definitely we can't commit invasive changes not 
fixing regressions while in Stage 4.


Thanks!
Paolo.



Re: [Patch] Add comments for future regex work

2013-12-03 Thread Paolo Carlini

.. I meant mid of *January* of course.

Paolo.


Re: [PATCH] Strict volatile bit-fields clean-up

2013-12-03 Thread Richard Biener
On Tue, Dec 3, 2013 at 10:47 AM, Bernd Edlinger
 wrote:
> On Mon, 2 Dec 2013 15:42:48, Richard Biener wrote:
>>
>> On Wed, Nov 20, 2013 at 11:48 AM, Bernd Edlinger
>>  wrote:
>>> Hello Richard,
>>>
>>> as a follow-up patch to the bit-fields patch(es), I wanted to remove the 
>>> dependencies on
>>> the variable flag_strict_volatile_bitfields from expand_assignment and 
>>> expand_expr_real_1.
>>> Additionally I want the access mode of the field to be selected in the 
>>> memory context,
>>> instead of the structure's mode.
>>>
>>> Boot-strapped and regression-tested on x86_64-linux-gnu.
>>>
>>> OK for trunk?
>>
>> Ok.
>>
>> Thanks,
>> Richard.
>>
>
>
> Oops
>
> Sorry, tonight this patch caused an Ada regression, in pack17.adb and 
> misaligned_nest.adb
>
> So I'll have to put that on hold at the moment.
>
> This ICE is very similar to PR59134.
> It is again the recursion between store_fixed_bit_field and 
> store_split_bit_field.
>
> The trigger is a latent problem in the ada gcc_interface.
>
> That is we have a bit-field of exactly 8 bit size, which is not byte-aligned,
> but DECL_MODE=QImode, DECL_BIT_FIELD=false which looks quite strange,

It's not required that DECL_BIT_FIELD is set I think, and the mode
is that of the type if it's not a bitfield ...

> and is totally different from how C structures look like. I should mention 
> that there
> are even some places in the target back-ends, where the attribute 
> DECL_BIT_FIELD is
> used for whatever.
>
> Now, due to this hunk in the cleanup-patch we get the QImode selected in the 
> memory
> context:
>
>if (MEM_P (to_rtx))
> {
> - if (volatilep && flag_strict_volatile_bitfields> 0)
> + if (mode1 != VOIDmode)
> to_rtx = adjust_address (to_rtx, mode1, 0);

Which I think is correct - the memory access is in QImode - that's what
get_inner_reference said.

So whatever issue we run into downstream points to the bug ...

OTOH Eric may know better what the middle-end can expect for
bit-aligned Ada records (apart from "all bets are off" which isn't
really a good solution ;))

Richard.

> However even without that patch, I can arrange for "volatilep && 
> flag_strict_volatile_bitfields> 0"
> to be true in Ada (even on X86_64, or whatever platform you like):
>
> -- { dg-do run }
> -- { dg-options "-gnatp -fstrict-volatile-bitfields" }
>
> procedure Misaligned_Volatile is
>
>type Byte is mod 2**8;
>
>type Block is record
>   B : Boolean;
>   V : Byte;
>end record;
>pragma Volatile (Block);
>pragma Pack (Block);
>for Block'Alignment use 1;
>
>type Pair is array (1 .. 2) of Block;
>
>P : Pair;
> begin
>for K in P'Range loop
>   P(K).V := 237;
>end loop;
>for K in P'Range loop
>   if P(K).V /= 237 then
>  raise Program_error;
>   end if;
>end loop;
> end;
>
>
> This Ada test case causes either wrong code generation or an ICE at compile 
> time,
> if the -fstrict-volatile-bitfields option is either given by the user,
> or by the target-specific default as it is on ARM for instance  (which is 
> completely
> pointless on Ada, I know!)...
>
> Now I am preparing a new bitfields-update-patch which fixes this above test 
> case and the
> latent recursion problem.
>
>
> Thanks  ...  for you patience :-(
> Bernd.


Re: [PATCH] Strict volatile bit-fields clean-up

2013-12-03 Thread Richard Biener
On Tue, Dec 3, 2013 at 10:59 AM, Richard Biener
 wrote:
> On Tue, Dec 3, 2013 at 10:47 AM, Bernd Edlinger
>  wrote:
>> On Mon, 2 Dec 2013 15:42:48, Richard Biener wrote:
>>>
>>> On Wed, Nov 20, 2013 at 11:48 AM, Bernd Edlinger
>>>  wrote:
 Hello Richard,

 as a follow-up patch to the bit-fields patch(es), I wanted to remove the 
 dependencies on
 the variable flag_strict_volatile_bitfields from expand_assignment and 
 expand_expr_real_1.
 Additionally I want the access mode of the field to be selected in the 
 memory context,
 instead of the structure's mode.

 Boot-strapped and regression-tested on x86_64-linux-gnu.

 OK for trunk?
>>>
>>> Ok.
>>>
>>> Thanks,
>>> Richard.
>>>
>>
>>
>> Oops
>>
>> Sorry, tonight this patch caused an Ada regression, in pack17.adb and 
>> misaligned_nest.adb
>>
>> So I'll have to put that on hold at the moment.
>>
>> This ICE is very similar to PR59134.
>> It is again the recursion between store_fixed_bit_field and 
>> store_split_bit_field.
>>
>> The trigger is a latent problem in the ada gcc_interface.
>>
>> That is we have a bit-field of exactly 8 bit size, which is not byte-aligned,
>> but DECL_MODE=QImode, DECL_BIT_FIELD=false which looks quite strange,
>
> It's not required that DECL_BIT_FIELD is set I think, and the mode
> is that of the type if it's not a bitfield ...

Btw, I see DECL_BIT_FIELD and DECL_BIT_FIELD_TYPE set (but QImode
used) for 'v' on x86_64 and the testcase "works".

>> and is totally different from how C structures look like. I should mention 
>> that there
>> are even some places in the target back-ends, where the attribute 
>> DECL_BIT_FIELD is
>> used for whatever.
>>
>> Now, due to this hunk in the cleanup-patch we get the QImode selected in the 
>> memory
>> context:
>>
>>if (MEM_P (to_rtx))
>> {
>> - if (volatilep && flag_strict_volatile_bitfields> 0)
>> + if (mode1 != VOIDmode)
>> to_rtx = adjust_address (to_rtx, mode1, 0);
>
> Which I think is correct - the memory access is in QImode - that's what
> get_inner_reference said.
>
> So whatever issue we run into downstream points to the bug ...
>
> OTOH Eric may know better what the middle-end can expect for
> bit-aligned Ada records (apart from "all bets are off" which isn't
> really a good solution ;))
>
> Richard.
>
>> However even without that patch, I can arrange for "volatilep && 
>> flag_strict_volatile_bitfields> 0"
>> to be true in Ada (even on X86_64, or whatever platform you like):
>>
>> -- { dg-do run }
>> -- { dg-options "-gnatp -fstrict-volatile-bitfields" }
>>
>> procedure Misaligned_Volatile is
>>
>>type Byte is mod 2**8;
>>
>>type Block is record
>>   B : Boolean;
>>   V : Byte;
>>end record;
>>pragma Volatile (Block);
>>pragma Pack (Block);
>>for Block'Alignment use 1;
>>
>>type Pair is array (1 .. 2) of Block;
>>
>>P : Pair;
>> begin
>>for K in P'Range loop
>>   P(K).V := 237;
>>end loop;
>>for K in P'Range loop
>>   if P(K).V /= 237 then
>>  raise Program_error;
>>   end if;
>>end loop;
>> end;
>>
>>
>> This Ada test case causes either wrong code generation or an ICE at compile 
>> time,
>> if the -fstrict-volatile-bitfields option is either given by the user,
>> or by the target-specific default as it is on ARM for instance  (which is 
>> completely
>> pointless on Ada, I know!)...
>>
>> Now I am preparing a new bitfields-update-patch which fixes this above test 
>> case and the
>> latent recursion problem.
>>
>>
>> Thanks  ...  for you patience :-(
>> Bernd.


Re: [Patch] Add comments for future regex work

2013-12-03 Thread Tim Shen
On Tue, Dec 3, 2013 at 4:47 AM, Paolo Carlini  wrote:
> Great. Consider that we are now in Stage 3, until mid of February.
> Considering that  is a new feature, I think we have some leeway in
> terms of timing, but definitely we can't commit invasive changes not fixing
> regressions while in Stage 4.

...committed.


-- 
Regards,
Tim Shen


Re: [PING^2] [PATCH] PR59063

2013-12-03 Thread Yury Gribov

> The situation hasn't changed in the last four days.

Thanks. Do you think you can check the patch in question?

-Y


RE: [PATCH] Strict volatile bit-fields clean-up

2013-12-03 Thread Bernd Edlinger
On Tue, 3 Dec 2013 10:59:15, Richard Biener wrote:
>
> On Tue, Dec 3, 2013 at 10:47 AM, Bernd Edlinger
>  wrote:
>> On Mon, 2 Dec 2013 15:42:48, Richard Biener wrote:
>>>
>>> On Wed, Nov 20, 2013 at 11:48 AM, Bernd Edlinger
>>>  wrote:
 Hello Richard,

 as a follow-up patch to the bit-fields patch(es), I wanted to remove the 
 dependencies on
 the variable flag_strict_volatile_bitfields from expand_assignment and 
 expand_expr_real_1.
 Additionally I want the access mode of the field to be selected in the 
 memory context,
 instead of the structure's mode.

 Boot-strapped and regression-tested on x86_64-linux-gnu.

 OK for trunk?
>>>
>>> Ok.
>>>
>>> Thanks,
>>> Richard.
>>>
>>
>>
>> Oops
>>
>> Sorry, tonight this patch caused an Ada regression, in pack17.adb and 
>> misaligned_nest.adb
>>
>> So I'll have to put that on hold at the moment.
>>
>> This ICE is very similar to PR59134.
>> It is again the recursion between store_fixed_bit_field and 
>> store_split_bit_field.
>>
>> The trigger is a latent problem in the ada gcc_interface.
>>
>> That is we have a bit-field of exactly 8 bit size, which is not byte-aligned,
>> but DECL_MODE=QImode, DECL_BIT_FIELD=false which looks quite strange,
>
> It's not required that DECL_BIT_FIELD is set I think, and the mode
> is that of the type if it's not a bitfield ...
>

I agree. But I hope the back-ends agree too.

>> and is totally different from how C structures look like. I should mention 
>> that there
>> are even some places in the target back-ends, where the attribute 
>> DECL_BIT_FIELD is
>> used for whatever.
>>
>> Now, due to this hunk in the cleanup-patch we get the QImode selected in the 
>> memory
>> context:
>>
>> if (MEM_P (to_rtx))
>> {
>> - if (volatilep && flag_strict_volatile_bitfields> 0)
>> + if (mode1 != VOIDmode)
>> to_rtx = adjust_address (to_rtx, mode1, 0);
>
> Which I think is correct - the memory access is in QImode - that's what
> get_inner_reference said.
>

I absolutely agree. I want it this way everything else is nonsense.

> So whatever issue we run into downstream points to the bug ...
>

That's for sure. I think I have a good idea, how to fix the root of the 
recursion.

I just have to write a change-log and will send an update in a few moments.

Thanks
Bernd.

> OTOH Eric may know better what the middle-end can expect for
> bit-aligned Ada records (apart from "all bets are off" which isn't
> really a good solution ;))
>

Yes. I somehow did expect DECL_BIT_FIELD to be something simple,
without dependency on if -fstrict-volatile-bitfields is given, or what
Language front-end generated it, or if STRICT_ALIGNMENT is defined.

> Richard.
>
>> However even without that patch, I can arrange for "volatilep && 
>> flag_strict_volatile_bitfields> 0"
>> to be true in Ada (even on X86_64, or whatever platform you like):
>>
>> -- { dg-do run }
>> -- { dg-options "-gnatp -fstrict-volatile-bitfields" }
>>
>> procedure Misaligned_Volatile is
>>
>> type Byte is mod 2**8;
>>
>> type Block is record
>> B : Boolean;
>> V : Byte;
>> end record;
>> pragma Volatile (Block);
>> pragma Pack (Block);
>> for Block'Alignment use 1;
>>
>> type Pair is array (1 .. 2) of Block;
>>
>> P : Pair;
>> begin
>> for K in P'Range loop
>> P(K).V := 237;
>> end loop;
>> for K in P'Range loop
>> if P(K).V /= 237 then
>> raise Program_error;
>> end if;
>> end loop;
>> end;
>>
>>
>> This Ada test case causes either wrong code generation or an ICE at compile 
>> time,
>> if the -fstrict-volatile-bitfields option is either given by the user,
>> or by the target-specific default as it is on ARM for instance (which is 
>> completely
>> pointless on Ada, I know!)...
>>
>> Now I am preparing a new bitfields-update-patch which fixes this above test 
>> case and the
>> latent recursion problem.
>>
>>
>> Thanks ... for you patience :-(
>> Bernd. 

Re: Fix a bug in points-to solver

2013-12-03 Thread Richard Biener
On Mon, Dec 2, 2013 at 6:38 PM, Xinliang David Li  wrote:
> Points to solver has a bug that can cause complex constraints to be
> skipped leading to wrong points-to results. In the case that exposed
> the problem, there is sd constraint: x = *y which is never processed.
> 'y''s final points to set is { NULL READONLY ESCAPED NOLOCAL}, but 'x'
> points-to set is {}.
>
> What happens is before 'y'' is processed, it is merged with another
> node 'z' during cycle elimination (the complex constraints get
> transferred to 'z'), but 'z' is not marked as 'changed' so it is
> skipped in a later iteration.
>
> The attached patch fixed the problem. The problem is exposed by a
> large program built with -fprofile-generate in LIPO mode -- so there
> is no small testcase attached.
>
> Bootstrapped and regression tested on x86_64-unknown-linux-gnu, OK for trunk?

Hmm, the unify_nodes call in eliminate_indirect_cycles is supposed to
set the changed bit...  which in this special case (updating of complex
constraints, not the solution!) doesn't happen because unify_nodes doesn't
consider this as a change.  Which needs to change.

So, can you please update your patch to return a bool from
merge_node_constraints (any change happened?) and update changed
accordingly in unify_nodes?

Thanks,
Richard.

> Index: ChangeLog
> ===
> --- ChangeLog   (revision 205579)
> +++ ChangeLog   (working copy)
> @@ -1,3 +1,8 @@
> +2013-12-02  Xinliang David Li  
> +
> +   * tree-ssa-structalias.c (solve_graph): Mark rep node changed
> +   after cycle elimination.
> +
>  2013-12-01  Eric Botcazou  
>
> * config/i386/winnt.c (i386_pe_asm_named_section): Be prepared for an
> Index: tree-ssa-structalias.c
> ===
> --- tree-ssa-structalias.c  (revision 205579)
> +++ tree-ssa-structalias.c  (working copy)
> @@ -2655,8 +2655,13 @@ solve_graph (constraint_graph_t graph)
>
>   /* In certain indirect cycle cases, we may merge this
>  variable to another.  */
> - if (eliminate_indirect_cycles (i) && find (i) != i)
> -   continue;
> + if (eliminate_indirect_cycles (i))
> +{
> + unsigned int rep = find (i);
> + bitmap_set_bit (changed, rep);
> + if (i != rep)
> +   continue;
> +}
>
>   /* If the node has changed, we need to process the
>  complex constraints and outgoing edges again.  */


Re: [PATCH] Strict volatile bit-fields clean-up

2013-12-03 Thread Eric Botcazou
> The trigger is a latent problem in the ada gcc_interface.
> 
> That is we have a bit-field of exactly 8 bit size, which is not
> byte-aligned, but DECL_MODE=QImode, DECL_BIT_FIELD=false which looks quite
> strange, and is totally different from how C structures look like. I should
> mention that there are even some places in the target back-ends, where the
> attribute DECL_BIT_FIELD is used for whatever.

This needs to be investigated because the intent is very clear in gigi, see 
create_field_decl and finish_record_type: we set DECL_BIT_FIELD conservatively
and clear it only when we know that we can.  This code is quite old now.

> However even without that patch, I can arrange for "volatilep &&
> flag_strict_volatile_bitfields> 0" to be true in Ada (even on X86_64, or
> whatever platform you like):
> 
> -- { dg-do run }
> -- { dg-options "-gnatp -fstrict-volatile-bitfields" }
> 
> procedure Misaligned_Volatile is
> 
>type Byte is mod 2**8;
> 
>type Block is record
>   B : Boolean;
>   V : Byte;
>end record;
>pragma Volatile (Block);
>pragma Pack (Block);
>for Block'Alignment use 1;
> 
>type Pair is array (1 .. 2) of Block;
> 
>P : Pair;
> begin
>for K in P'Range loop
>   P(K).V := 237;
>end loop;
>for K in P'Range loop
>   if P(K).V /= 237 then
>  raise Program_error;
>   end if;
>end loop;
> end;
> 
> 
> This Ada test case causes either wrong code generation or an ICE at compile
> time, if the -fstrict-volatile-bitfields option is either given by the
> user, or by the target-specific default as it is on ARM for instance 
> (which is completely pointless on Ada, I know!)...

The test indeed raises Program_Error on x86-64.

-- 
Eric Botcazou


[C++ Patch] Avoid pairs of error calls in duplicate_decls

2013-12-03 Thread Paolo Carlini

Hi,

thus, as agreed a few days ago, I audited duplicate_decls for those 
pairs of error and permerror where we want to use inform. I found at 
least a straightforward case having to do with pragma omp, but other 
cases required a little more work (I suspected that). For example in 
some cases we issued pairs of errors with the same location (which I 
merged); or we mentioned in the diagnostics the olddecl *before* the 
newdecl (we want the inform about the olddecl). In some cases we had 
inform (input_location, ...) and then '+', which seems pretty silly to 
me. Finally, for consistency, I used 'ambiguating' similarly to 
'conflicting' in at least a message, not sure if we could do better, 
English-wise.


Tested x86_64-linux.

Thanks,
Paolo.

/
/cp
2013-12-03  Paolo Carlini  

* decl.c (duplicate_decls): Replace pairs of errors and permerrors
with error + inform (permerror + inform, respectively).

/testsuite
2013-12-03  Paolo Carlini  

* g++.dg/cpp0x/constexpr-46336.C: Adjust expected messages.
* g++.dg/cpp0x/defaulted2.C: Likewise.
* g++.dg/cpp1y/auto-fn8.C: Likewise.
* g++.dg/gomp/udr-3.C: Likewise.
* g++.dg/lookup/extern-c-redecl5.C: Likewise.
* g++.dg/lookup/linkage1.C: Likewise.
* g++.dg/overload/new1.C: Likewise.
* g++.dg/parse/friend5.C: Likewise.
* g++.dg/parse/namespace-alias-1.C: Likewise.
* g++.dg/parse/namespace10.C: Likewise.
* g++.dg/parse/redef2.C: Likewise.
* g++.dg/template/friend44.C: Likewise.
* g++.old-deja/g++.brendan/crash42.C: Likewise.
* g++.old-deja/g++.brendan/crash52.C: Likewise.
* g++.old-deja/g++.brendan/crash55.C: Likewise.
* g++.old-deja/g++.jason/overload21.C: Likewise.
* g++.old-deja/g++.jason/overload5.C: Likewise.
* g++.old-deja/g++.jason/redecl1.C: Likewise.
* g++.old-deja/g++.law/arm8.C: Likewise.
* g++.old-deja/g++.other/main1.C: Likewise.
Index: cp/decl.c
===
--- cp/decl.c   (revision 205590)
+++ cp/decl.c   (working copy)
@@ -1299,8 +1299,9 @@ duplicate_decls (tree newdecl, tree olddecl, bool
{
  if (warning (OPT_Wattributes, "function %q+D redeclared as inline",
   newdecl))
-   inform (input_location, "previous declaration of %q+D "
-   "with attribute noinline", olddecl);
+   inform (DECL_SOURCE_LOCATION (olddecl),
+   "previous declaration of %qD with attribute noinline",
+   olddecl);
}
   else if (DECL_DECLARED_INLINE_P (olddecl)
   && DECL_UNINLINABLE (newdecl)
@@ -1308,7 +1309,8 @@ duplicate_decls (tree newdecl, tree olddecl, bool
{
  if (warning (OPT_Wattributes, "function %q+D redeclared with "
   "attribute noinline", newdecl))
-   inform (input_location, "previous declaration of %q+D was inline",
+   inform (DECL_SOURCE_LOCATION (olddecl),
+   "previous declaration of %qD was inline",
olddecl);
}
 }
@@ -1343,11 +1345,8 @@ duplicate_decls (tree newdecl, tree olddecl, bool
warning (0, "library function %q#D redeclared as non-function %q#D",
 olddecl, newdecl);
  else
-   {
- error ("declaration of %q#D", newdecl);
- error ("conflicts with built-in declaration %q#D",
-olddecl);
-   }
+   error ("declaration of %q#D conflicts with built-in "
+  "declaration %q#D", newdecl, olddecl);
  return NULL_TREE;
}
   else if (DECL_OMP_DECLARE_REDUCTION_P (olddecl))
@@ -1355,8 +1354,8 @@ duplicate_decls (tree newdecl, tree olddecl, bool
  gcc_assert (DECL_OMP_DECLARE_REDUCTION_P (newdecl));
  error_at (DECL_SOURCE_LOCATION (newdecl),
"redeclaration of %");
- error_at (DECL_SOURCE_LOCATION (olddecl),
-   "previous % declaration");
+ inform (DECL_SOURCE_LOCATION (olddecl),
+ "previous % declaration");
  return error_mark_node;
}
   else if (!types_match)
@@ -1407,11 +1406,8 @@ duplicate_decls (tree newdecl, tree olddecl, bool
  /* A near match; override the builtin.  */
 
  if (TREE_PUBLIC (newdecl))
-   {
- warning (0, "new declaration %q#D", newdecl);
- warning (0, "ambiguates built-in declaration %q#D",
-  olddecl);
-   }
+   warning (0, "new declaration %q#D ambiguates built-in "
+"declaration %q#D", newdecl, olddecl);
  else
warning (OPT_Wshadow, 
  DECL_BUILT_IN (olddecl)
@@ -1504,7 +1500,8 @@ duplicate_decls (tree newdecl, tree olddecl, bool
   error

[PATCH] Strict volatile bit-fields clean-up, Take 2

2013-12-03 Thread Bernd Edlinger
Hi,

This is my proposal for ulimately getting rid of the nasty 
store_fixed_bit_field recursion.

IMHO, the root of the recursion trouble is here:

@@ -1007,12 +1013,8 @@ store_fixed_bit_field (rtx op0, unsigned

   if (MEM_P (op0))
 {
  mode = GET_MODE (op0);
  if (GET_MODE_BITSIZE (mode) == 0
 || GET_MODE_BITSIZE (mode)> GET_MODE_BITSIZE (word_mode))
   mode = word_mode;
   mode = get_best_mode (bitsize, bitnum, bitregion_start, bitregion_end,
   MEM_ALIGN (op0), mode, MEM_VOLATILE_P (op0));


But, because now we always have bitregion_start and bitregion_end to limit
the access size, it is no longer necessary to restrict the largest mode, that
get_best_mode may return.

This patch is very similar to the previous patch, which split up the 
extract_fixed_bit_field,

This time, I just split up store_fixed_bit_field and use 
store_fixed_bit_field_1 to force the
strict-volatile-bitfield mode it necessary, and let get_best_mode find a mode, 
that is
can be used to access the field, which is no longer impacted by the memory 
context's selected
mode in this case.

I tried this patch with an ARM-Cross compiler and a large eCos application, to 
see if anything
changes in the generated code with this patch, but 2 MB of code stays binary 
the same,
that's a good sign.

I added the new Ada test case, and the test case from PR59134, which does no 
longer re-produce
after my previous C++ memory model patch, but this "fix" was more or less by 
chance.


Boot-Strap on X86_64-pc-linux-gnu (languages=all,ada,go) and regression-tests
still running.


Ok for trunk (when the tests succeed)?


Thanks
Bernd.2013-12-03  Bernd Edlinger  

PR middle-end/59134
* expmed.c (store_fixed_bit_field_1): New function.
(store_bit_field): Use narrow_bit_field_mem and
store_fixed_bit_field_1 for -fstrict-volatile-bitfields.
(extract_bit_field): Likewise. Use narrow_bit_field_mem and
extract_fixed_bit_field_1 to do the extraction.
(store_fixed_bit_field): Simplify mode selection algorithm.
Call store_fixed_bit_field_1 to do the real work.
(extract_fixed_bit_field_1): New function.

testsuite:
2013-12-03  Bernd Edlinger  

PR middle-end/59134
* gcc.c-torture/compile/pr59134.c: New test.
* gnat.dg/misaligned_volatile.adb: New test.



patch-bitfields-update-2.diff
Description: Binary data


Re: [PATCH] Strict volatile bit-fields clean-up

2013-12-03 Thread Eric Botcazou
> Yes. I somehow did expect DECL_BIT_FIELD to be something simple,
> without dependency on if -fstrict-volatile-bitfields is given, or what
> Language front-end generated it, or if STRICT_ALIGNMENT is defined.

Yes, it should be that way, and everything else is a bug.

-- 
Eric Botcazou



Re: [PING][PATCH] LRA: check_rtl modifies RTL instruction stream

2013-12-03 Thread Alan Modra
On Mon, Dec 02, 2013 at 11:04:39PM -0700, Jeff Law wrote:
> On 11/28/13 16:50, Alan Modra wrote:
> >
> >This is due to that innocuous seeming change of setting
> >lra_in_progress before calling check_rtl(), in combination with
> >previous changes Vlad made to the rs6000 backend here:
> >http://gcc.gnu.org/ml/gcc-patches/2013-10/msg02208.html
> >In particular the "Call legitimate_constant_pool_address_p in strict
> >mode for LRA" change, that sets "strict" when lra_in_progress.
> Is this still an issue?

No, the changes Vlad made fixed the powerpc problem.  Thanks!

> That code has gone through a couple revisions.  Robert's change was
> reverted and Vlad twiddled thigns to use recog_memoized instead of
> insn_invalid_p which prevents check_rtl from incorrectly adding
> CLOBBERs after the point where an insn's form is supposed to be
> fixed.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH] Strict volatile bit-fields clean-up, Take 2

2013-12-03 Thread Richard Biener
On Tue, Dec 3, 2013 at 12:01 PM, Bernd Edlinger
 wrote:
> Hi,
>
> This is my proposal for ulimately getting rid of the nasty 
> store_fixed_bit_field recursion.
>
> IMHO, the root of the recursion trouble is here:
>
> @@ -1007,12 +1013,8 @@ store_fixed_bit_field (rtx op0, unsigned
>
>if (MEM_P (op0))
>  {
>   mode = GET_MODE (op0);
>   if (GET_MODE_BITSIZE (mode) == 0
>  || GET_MODE_BITSIZE (mode)> GET_MODE_BITSIZE (word_mode))
>mode = word_mode;
>mode = get_best_mode (bitsize, bitnum, bitregion_start, bitregion_end,
>MEM_ALIGN (op0), mode, MEM_VOLATILE_P (op0));
>
>
> But, because now we always have bitregion_start and bitregion_end to limit
> the access size, it is no longer necessary to restrict the largest mode, that
> get_best_mode may return.

Note that this is not true, as get_bit_range itself may end up giving
up with bitregion_start = bitregion_end = 0.  But maybe this is not
what you are saying here?  That is, does a

  gcc_assert (bitregion_start != 0 || bitregion_end != 0);

before the get_best_mode call work for you?

> This patch is very similar to the previous patch, which split up the 
> extract_fixed_bit_field,
>
> This time, I just split up store_fixed_bit_field and use 
> store_fixed_bit_field_1 to force the
> strict-volatile-bitfield mode it necessary, and let get_best_mode find a 
> mode, that is
> can be used to access the field, which is no longer impacted by the memory 
> context's selected
> mode in this case.
>
> I tried this patch with an ARM-Cross compiler and a large eCos application, 
> to see if anything
> changes in the generated code with this patch, but 2 MB of code stays binary 
> the same,
> that's a good sign.
>
> I added the new Ada test case, and the test case from PR59134, which does no 
> longer re-produce
> after my previous C++ memory model patch, but this "fix" was more or less by 
> chance.
>
>
> Boot-Strap on X86_64-pc-linux-gnu (languages=all,ada,go) and regression-tests
> still running.
>
>
> Ok for trunk (when the tests succeed)?
>
>
> Thanks
> Bernd.


RE: [PATCH] Strict volatile bit-fields clean-up

2013-12-03 Thread Bernd Edlinger
>
>> Yes. I somehow did expect DECL_BIT_FIELD to be something simple,
>> without dependency on if -fstrict-volatile-bitfields is given, or what
>> Language front-end generated it, or if STRICT_ALIGNMENT is defined.
>
> Yes, it should be that way, and everything else is a bug.
>

I tried something like this, and it did boot-strap, but failed in some acats 
tests:

  /* If the bitfield is volatile, we want to access it in the
 field's mode, not the computed mode.
 If a MEM has VOIDmode (external with incomplete type),
 use BLKmode for it instead.  */
  if (MEM_P (to_rtx))
    {
+  gcc_assert ((TREE_CODE (to) == COMPONENT_REF && DECL_BIT_FIELD 
(TREE_OPERAND (to, 1)))
+                         || (bitpos % BITS_PER_UNIT == 0 && bitsize % 
BITS_PER_UNIT == 0));
  if (volatilep && flag_strict_volatile_bitfields> 0)
    to_rtx = adjust_address (to_rtx, mode1, 0);



Bernd.

Re: libsanitizer merge from upstream r196090

2013-12-03 Thread Jakub Jelinek
On Mon, Dec 02, 2013 at 05:43:17PM +0100, Konstantin Serebryany wrote:
> > No, so your patch doesn't regress anything. I can configure with
> > --disable-libsanitizer to skip build of libsanitizer, although it
> > would be nice to support RHEL5 derived long-term distributions.
> Ok, so this does not gate the merge.

Well, it regresses against 4.8, so it still is a P1 regression.

BTW, even the merge itself apparently regresses, powerpc* doesn't build any
longer and it did before this merge.

The first attached patch fixes the build on powerpc* (tested on RHEL6/ppc*)
and the second patch fixes the build on RHEL5/x86_64, beyond what I've
posted earlier the patch attempts to handle the .cfi* stuff (tried looking
if clang defines some usable macro, but couldn't find any, so no idea how
you can find out if you are compiled with clang -fno-dwarf2-cfi-asm
(when tsan will probably not build at all), or with the -fdwarf2-cfi-asm
default.  Probably either you need to convince llvm folks to add something,
or add configure test, moved the linux/mroute* headers to the Linux specific
file so that linux/in.h stuff doesn't clash with netinet/in.h etc. and
finally hacks to avoid including linux/types.h at all costs, that header
historically has always been terrible portability minefield, as it defines
types that glibc headers define too.

With these two patches on top of your patch, I get libsanitizer to build
in multilib configurations on Fedora19/x86_64, RHEL6/ppc* and RHEL5/x86_64
(in all cases both 32-bit and 64-bit builds, but not the third x32 multilib).

Testsuite passes on Fedora19, on RHEL5/x86_64 tests that fail typically fail
on
==2738==AddressSanitizer CHECK failed:
../../../../libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cc:260 
"((*tls_addr + *tls_size)) <= ((*stk_addr + *stk_size))" (0x2af8df1bc240, 
0x2af8df1bc000)
which clearly is a bug in sanitizer_common,

#if defined(__x86_64__) || defined(__i386__)
// sizeof(struct thread) from glibc.
// There has been a report of this being different on glibc 2.11 and 2.13. We
// don't know when this change happened, so 2.14 is a conservative estimate.
#if __GLIBC_PREREQ(2, 14)
const uptr kThreadDescriptorSize = FIRST_32_SECOND_64(1216, 2304);
#else
const uptr kThreadDescriptorSize = FIRST_32_SECOND_64(1168, 2304);
#endif

uptr ThreadDescriptorSize() {
  return kThreadDescriptorSize;
}
but also the InitTlsSize hack can't ever work reliably.
If you need this info, ask glibc folks for a proper supported API.
Hardcoding size of glibc private structure that glibc has changed multiple
times (note, RHEL5 has glibc 2.5 and clearly the size doesn't match there
even with the one you've computed on glibc 2.11) will most likely break any
time glibc adds further fields in there, not to mention that the above
means that if you e.g. build libsanitizer say on glibc 2.11 and run against
glibc 2.14, it will also not work.  Plus calling _dl_get_tls_static_info,
which is a GLIBC_PRIVATE symbol, is also going to break whenever glibc
decides to change the ABI of that function.
I believe libthread_db.so.1 has some APIs to query sizeof (struct pthread),
but have never used those APIs so don't know how to query that.

My recommendation would be to just ifdef out this for now until you use
some sane reliable and supportable API.  Otherwise, it may work if you are
lucky and always build libsanitizer against the exact same version of glibc
as you run it against, perhaps that is the only use model that llvm cares
about, but for GCC that is definitely not acceptable.

Jakub
--- libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.h.jj3 
2013-12-03 03:33:20.0 -0500
+++ libsanitizer/sanitizer_common/sanitizer_platform_limits_posix.h 
2013-12-03 05:26:41.864437661 -0500
@@ -140,23 +140,32 @@ namespace __sanitizer {
 int gid;
 int cuid;
 int cgid;
-#ifdef __powerpc64__
+#ifdef __powerpc__
 unsigned mode;
 unsigned __seq;
+u64 __unused1;
+u64 __unused2;
 #else
 unsigned short mode;
 unsigned short __pad1;
 unsigned short __seq;
 unsigned short __pad2;
+#if defined(__x86_64__) && !defined(_LP64)
+u64 __unused1;
+u64 __unused2;
+#else
+unsigned long __unused1;
+unsigned long __unused2;
+#endif
 #endif
-uptr __unused1;
-uptr __unused2;
   };
 
   struct __sanitizer_shmid_ds {
 __sanitizer_ipc_perm shm_perm;
   #ifndef __powerpc__
 uptr shm_segsz;
+  #elif !defined(__powerpc64__)
+uptr __unused0;
   #endif
 uptr shm_atime;
   #ifndef _LP64
@@ -288,17 +297,20 @@ namespace __sanitizer {
   typedef long __sanitizer_clock_t;
 
 #if SANITIZER_LINUX
-#if defined(_LP64) || defined(__x86_64__)
+#if defined(_LP64) || defined(__x86_64__) || defined(__powerpc__)
   typedef unsigned __sanitizer___kernel_uid_t;
   typedef unsigned __sanitizer___kernel_gid_t;
-  typedef long long __sanitizer___kernel_off_t;
 #else
   typedef unsigned short __sanitizer___kernel_uid_t;
   typedef unsigned short __sani

Re: patch for elimination to SP when it is changed in RTL (PR57293)

2013-12-03 Thread Marcus Shawcroft
On 2 December 2013 23:44, Vladimir Makarov  wrote:

> If somebody with the rights approves, I can commit it tomorrow.
>
> 2013-12-02  Vladimir Makarov  
>
> * config/aarch64/aarch64.c (aarch64_frame_pointer_required): Check
> LR_REGNUM.
> (aarch64_can_eliminate): Don't check elimination source when
> frame_pointer_requred is false.
>


This is fine with me, go ahead and commit it.  Thanks /Marcus


Re: [PATCH] Strict volatile bit-fields clean-up

2013-12-03 Thread Eric Botcazou
> I tried something like this, and it did boot-strap, but failed in some acats
> tests:

Am I supposed to guess which ones?  Note that you can have non-byte-aligned 
aggregate fields in Ada, i.e. arrays, so ARRAY_REFs can presumably reach here.

-- 
Eric Botcazou


RE: [PATCH] Strict volatile bit-fields clean-up

2013-12-03 Thread Bernd Edlinger

> From: ebotca...@adacore.com
> To: bernd.edlin...@hotmail.de
> CC: gcc-patches@gcc.gnu.org; richard.guent...@gmail.com
> Subject: Re: [PATCH] Strict volatile bit-fields clean-up
> Date: Tue, 3 Dec 2013 13:00:25 +0100
>
>> I tried something like this, and it did boot-strap, but failed in some acats
>> tests:
>
> Am I supposed to guess which ones? Note that you can have non-byte-aligned
> aggregate fields in Ada, i.e. arrays, so ARRAY_REFs can presumably reach here.
>
> --
> Eric Botcazou

hmm, probably yes. that was a long night. sorry. they all had to do with nested 
views,
I think.  

Re: PATCH: PR target/59363: [4.9 Regression] r203886 miscompiles git

2013-12-03 Thread H.J. Lu
On Tue, Dec 3, 2013 at 12:45 AM, Michael Zolotukhin
 wrote:
> Hi Uros, HJ,
>
> I checked expand_movmem_epilogue - it seemingly doesn't have such a
> problem, so the patch is ok.
>
> We might want to add similar adjust_automodify_address_nv call to here as 
> well:
> if (TARGET_64BIT)
> {
>   dest = change_address (destmem, DImode, destptr);
>   emit_insn (gen_strset (destptr, dest, value));
>   emit_insn (gen_strset (destptr, dest, value));
> }
>   else
> {
>   dest = change_address (destmem, SImode, destptr);
>   emit_insn (gen_strset (destptr, dest, value));
>   emit_insn (gen_strset (destptr, dest, value));
>   emit_insn (gen_strset (destptr, dest, value));
>   emit_insn (gen_strset (destptr, dest, value));
> }
> (code snippet from previous HJ's comment in bugzilla).
> I think it's needed here, but I didn't manage to exploit the bug in
> this code. Maybe Uros or Jan can comment whether it's needed in such
> code or not.

It is almost impossible to reach those codes with current
memset/memcpy strategy.  expand_setmem_epilogue is
called either with a constant count or max_size > 32.
None of them will trigger those codes.  But we should
fix them with proper address.  Otherwise, we will run
into the similar problem if they are used one day in
the future.

> Thanks,
> Michael
>
>
> On 3 December 2013 12:11, Uros Bizjak  wrote:
>> On Tue, Dec 3, 2013 at 2:05 AM, H.J. Lu  wrote:
>>
>>> emit_memset fails to adjust destination address after gen_strset, which
>>> leads to the wrong address in aliasing info.  This patch fixes it.
>>> Tested on Linux/x86-64.  OK to install?
>>>
>>> 2013-12-03   H.J. Lu  
>>>
>>> PR target/59363
>>> * config/i386/i386.c (emit_memset): Adjust destination address
>>> after gen_strset.
>>>
>>> gcc/testsuite/
>>>
>>> 2013-12-03   H.J. Lu  
>>>
>>> PR target/59363
>>> * gcc.target/i386/pr59363.c: New file.
>>
>> OK, but according to [1], there are other places where similar issues
>> should be fixed. I propose to wait for Michael's analysis and eventual
>> patch, and fix them all together.
>>
>> [1] http://gcc.gnu.org/bugzilla/show_bug.cgi?id=59363#c21
>>

Here is the updated patch.  Tested on Linux/x86-64.  It
fixed git.  OK to install?

Thanks.

-- 
H.J.
---
gcc/

2013-12-03   H.J. Lu  

PR target/59363
* config/i386/i386.c (emit_memset): Adjust destination address
after gen_strset.
(expand_setmem_epilogue): Likewise.

gcc/testsuite/

2013-12-03   H.J. Lu  

PR target/59363
* gcc.target/i386/pr59363.c: New file.

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index b11363be..d048511 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -22806,6 +22806,8 @@ emit_memset (rtx destmem, rtx destptr, rtx promoted_val,
   if (piece_size <= GET_MODE_SIZE (word_mode))
 {
   emit_insn (gen_strset (destptr, dst, promoted_val));
+  dst = adjust_automodify_address_nv (dst, move_mode, destptr,
+  piece_size);
   continue;
 }

@@ -22875,14 +22877,18 @@ expand_setmem_epilogue (rtx destmem, rtx
destptr, rtx value, rtx vec_value,
 {
   dest = change_address (destmem, DImode, destptr);
   emit_insn (gen_strset (destptr, dest, value));
+  dest = adjust_automodify_address_nv (dest, DImode, destptr, 8);
   emit_insn (gen_strset (destptr, dest, value));
 }
   else
 {
   dest = change_address (destmem, SImode, destptr);
   emit_insn (gen_strset (destptr, dest, value));
+  dest = adjust_automodify_address_nv (dest, SImode, destptr, 4);
   emit_insn (gen_strset (destptr, dest, value));
+  dest = adjust_automodify_address_nv (dest, SImode, destptr, 8);
   emit_insn (gen_strset (destptr, dest, value));
+  dest = adjust_automodify_address_nv (dest, SImode, destptr, 12);
   emit_insn (gen_strset (destptr, dest, value));
 }
   emit_label (label);
@@ -22900,6 +22906,7 @@ expand_setmem_epilogue (rtx destmem, rtx
destptr, rtx value, rtx vec_value,
 {
   dest = change_address (destmem, SImode, destptr);
   emit_insn (gen_strset (destptr, dest, value));
+  dest = adjust_automodify_address_nv (dest, SImode, destptr, 4);
   emit_insn (gen_strset (destptr, dest, value));
 }
   emit_label (label);
diff --git a/gcc/testsuite/gcc.target/i386/pr59363.c
b/gcc/testsuite/gcc.target/i386/pr59363.c
new file mode 100644
index 000..a4e1240
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/pr59363.c
@@ -0,0 +1,24 @@
+/* PR target/59363 */
+/* { dg-do run } */
+/* { dg-options "-O2 -mtune=amdfam10" } */
+
+typedef struct {
+  int ctxlen;
+  long interhunkctxlen;
+  int flags;
+  long find_func;
+  void *find_func_priv;
+  int hunk_func;
+} xdemitconf_t;
+
+__attribute__((noinline))
+int xdi_diff(xdemitconf_t *xecfg) {
+  if (xecfg->hunk_func == 0)
+__builtin_abort();
+  return 0;
+}
+int mai

Re: PATCH: PR target/59363: [4.9 Regression] r203886 miscompiles git

2013-12-03 Thread Jan Hubicka
> Here is the updated patch.  Tested on Linux/x86-64.  It
> fixed git.  OK to install?
> 
> Thanks.
> 
> -- 
> H.J.
> ---
> gcc/
> 
> 2013-12-03   H.J. Lu  
> 
> PR target/59363
> * config/i386/i386.c (emit_memset): Adjust destination address
> after gen_strset.
> (expand_setmem_epilogue): Likewise.
> 
> gcc/testsuite/
> 
> 2013-12-03   H.J. Lu  
> 
> PR target/59363
> * gcc.target/i386/pr59363.c: New file.

Yes, this seems fine to me.  As discussed previously, we probably want to make
strmov patterns use to match strset (I will need to re-check codegen on targets
that does single memops) and then we will need similar update of aliasing
there, too.
Currently I assume we are fine becaue we use it only in expand_movmem epilogue
and on the way there we already cleared the alias offset on all code paths?

Thanks for looking into it.
Honza
> 
> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
> index b11363be..d048511 100644
> --- a/gcc/config/i386/i386.c
> +++ b/gcc/config/i386/i386.c
> @@ -22806,6 +22806,8 @@ emit_memset (rtx destmem, rtx destptr, rtx 
> promoted_val,
>if (piece_size <= GET_MODE_SIZE (word_mode))
>  {
>emit_insn (gen_strset (destptr, dst, promoted_val));
> +  dst = adjust_automodify_address_nv (dst, move_mode, destptr,
> +  piece_size);
>continue;
>  }
> 
> @@ -22875,14 +22877,18 @@ expand_setmem_epilogue (rtx destmem, rtx
> destptr, rtx value, rtx vec_value,
>  {
>dest = change_address (destmem, DImode, destptr);
>emit_insn (gen_strset (destptr, dest, value));
> +  dest = adjust_automodify_address_nv (dest, DImode, destptr, 8);
>emit_insn (gen_strset (destptr, dest, value));
>  }
>else
>  {
>dest = change_address (destmem, SImode, destptr);
>emit_insn (gen_strset (destptr, dest, value));
> +  dest = adjust_automodify_address_nv (dest, SImode, destptr, 4);
>emit_insn (gen_strset (destptr, dest, value));
> +  dest = adjust_automodify_address_nv (dest, SImode, destptr, 8);
>emit_insn (gen_strset (destptr, dest, value));
> +  dest = adjust_automodify_address_nv (dest, SImode, destptr, 12);
>emit_insn (gen_strset (destptr, dest, value));
>  }
>emit_label (label);
> @@ -22900,6 +22906,7 @@ expand_setmem_epilogue (rtx destmem, rtx
> destptr, rtx value, rtx vec_value,
>  {
>dest = change_address (destmem, SImode, destptr);
>emit_insn (gen_strset (destptr, dest, value));
> +  dest = adjust_automodify_address_nv (dest, SImode, destptr, 4);
>emit_insn (gen_strset (destptr, dest, value));
>  }
>emit_label (label);
> diff --git a/gcc/testsuite/gcc.target/i386/pr59363.c
> b/gcc/testsuite/gcc.target/i386/pr59363.c
> new file mode 100644
> index 000..a4e1240
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/pr59363.c
> @@ -0,0 +1,24 @@
> +/* PR target/59363 */
> +/* { dg-do run } */
> +/* { dg-options "-O2 -mtune=amdfam10" } */
> +
> +typedef struct {
> +  int ctxlen;
> +  long interhunkctxlen;
> +  int flags;
> +  long find_func;
> +  void *find_func_priv;
> +  int hunk_func;
> +} xdemitconf_t;
> +
> +__attribute__((noinline))
> +int xdi_diff(xdemitconf_t *xecfg) {
> +  if (xecfg->hunk_func == 0)
> +__builtin_abort();
> +  return 0;
> +}
> +int main() {
> +  xdemitconf_t xecfg = {0};
> +  xecfg.hunk_func = 1;
> +  return xdi_diff(&xecfg);
> +}


[REPOST] Invalid Code when reading from unaligned zero-sized array

2013-12-03 Thread Bernd Edlinger
Hi Jeff,

please find attached the patch (incl. test cases) for the unaligned read BUG 
that I found while investigating
on PR#57748: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57748

one test case is this one:

pr57748-3.c:
/* PR middle-end/57748 */
/* { dg-do run } */
/* wrong code in expand_expr_real_1.  */

#include 

extern void abort (void);

typedef long long V
  __attribute__ ((vector_size (2 * sizeof (long long)), may_alias));

typedef struct S { V a; V b[0]; } P __attribute__((aligned (1)));

struct __attribute__((packed)) T { char c; P s; };

void __attribute__((noinline, noclone))
check (P *p)
{
  if (p->b[0][0] != 3 || p->b[0][1] != 4)
    abort ();
}

void __attribute__((noinline, noclone))
foo (struct T *t)
{
  V a = { 3, 4 };
  t->s.b[0] = a;
}

int
main ()
{
  struct T *t = (struct T *) calloc (128, 1);

  foo (t);
  check (&t->s);

  free (t);
  return 0;
}


and the other one is
pr57748-4.c:
/* PR middle-end/57748 */
/* { dg-do run } */
/* wrong code in expand_expr_real_1.  */

#include 

extern void abort (void);

typedef long long V
  __attribute__ ((vector_size (2 * sizeof (long long)), may_alias));

typedef struct S { V b[1]; } P __attribute__((aligned (1)));

struct __attribute__((packed)) T { char c; P s; };

void __attribute__((noinline, noclone))
check (P *p)
{
  if (p->b[1][0] != 3 || p->b[1][1] != 4)
    abort ();
}

void __attribute__((noinline, noclone))
foo (struct T *t)
{
  V a = { 3, 4 };
  t->s.b[1] = a;
}

int
main ()
{
  struct T *t = (struct T *) calloc (128, 1);

  foo (t);
  check (&t->s);

  free (t);
  return 0;
}


The patch does add a boolean "expand_reference" parameter to expand_expr_real 
and
expand_expr_real_1. I pass true when I intend to use the returned memory context
as an array reference, instead of a value. At places where mis-aligned values 
are extracted,
I do not return a register with the extracted mis-aligned value if 
expand_reference is true.
When I have a VIEW_CONVERT_EXPR I pay attention to pass down the outer 
"expand_reference"
to the inner expand_expr_real call. Expand_reference, is pretty much similar to 
the
expand_modifier "EXPAND_MEMORY".

Boot-strapped and regression-tested on X86_64-pc-linux-gnu (many times).

Ok for trunk?


Thanks
Bernd.2013-11-07  Bernd Edlinger  

PR middle-end/57748
* expr.h (expand_expr_real, expand_expr_real_1): Add new parameter
expand_reference.
(expand_expr, expand_normal): Adjust.
* expr.c (expand_expr_real, expand_expr_real_1): Add new parameter
expand_reference. Use expand_reference to expand inner references.
(store_expr): Adjust.
* cfgexpand.c (expand_call_stmt): Adjust.

testsuite:
2013-11-07  Bernd Edlinger  

PR middle-end/57748
* gcc.dg/torture/pr57748-3.c: New test.
* gcc.dg/torture/pr57748-4.c: New test.



patch-pr57748-2.diff
Description: Binary data


Re: [PATCH] Don't create out-of-bounds BIT_FIELD_REFs

2013-12-03 Thread Jakub Jelinek
On Thu, Nov 28, 2013 at 12:23:43AM +0100, Tom de Vries wrote:
> Committed to trunk.
> 
> Also ok for 4.8 branch? It's a 4.8/4.9 regression.

Ok, but I guess you need to adjust your patch for 4.8 (tree_to_*
and tree_fits_* to host_integerp/tree_low_cst), so please make sure
you test it before commiting.

Jakub


Re: [PING] [PATCH] Optional alternative base_expr in finding basis for CAND_REFs

2013-12-03 Thread Yufeng Zhang

On 12/03/13 06:48, Jeff Law wrote:

On 12/02/13 08:47, Yufeng Zhang wrote:

Ping~

http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03360.html




Thanks,
Yufeng

On 11/26/13 15:02, Yufeng Zhang wrote:

On 11/26/13 12:45, Richard Biener wrote:

On Thu, Nov 14, 2013 at 12:25 AM, Yufeng
Zhangwrote:

On 11/13/13 20:54, Bill Schmidt wrote:

The second version of your original patch is ok with me with the
following changes.  Sorry for the little side adventure into the
next-interp logic; in the end that's going to hurt more than it
helps in
this case.  Thanks for having a look at it, anyway.  Thanks also for
cleaning up this version to be less intrusive to common interfaces; I
appreciate it.



Thanks a lot for the review.  I've attached an updated patch with the
suggested changes incorporated.

For the next-interp adventure, I was quite happy to do the
experiment; it's
a good chance of gaining insight into the pass.  Many thanks for
your prompt
replies and patience in guiding!



Everything else looks OK to me.  Please ask Richard for final
approval,
as I'm not a maintainer.

First a note, I need to check on voting for Bill as the slsr maintainer
from the steering committee.   Voting was in progress just before the
close of stage1 development so I haven't tallied the results :-)


Looking forward to some good news! :)



Yes, you are right about the non-trivial 'base' tree are rarely shared.
The cached is introduced mainly because get_alternative_base () may be
called twice on the same 'base' tree, once in the
find_basis_for_candidate () for look-up and the other time in
alloc_cand_and_find_basis () for record_potential_basis ().  I'm happy
to leave out the cache if you think the benefit is trivial.

Without some sense of how expensive the lookups are vs how often the
cache hits it's awful hard to know if the cache is worth it.

I'd say take it out unless you have some sense it's really saving time.
   It's a pretty minor implementation detail either way.


I think the affine tree routines are generally expensive; it is worth 
having a cache to avoid calling them too many times.  I run the slsr-*.c 
tests under gcc.dg/tree-ssa/ and find out that the cache hit rates range 
from 55.6% to 90%, with 73.5% as the average.  The samples may not well 
represent the real world scenario, but they do show the fact that the 
'base' tree can be shared to some extent.  So I'd like to have the cache 
in the patch.







+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-slsr" } */
+
+typedef int arr_2[50][50];
+
+void foo (arr_2 a2, int v1)
+{
+  int i, j;
+
+  i = v1 + 5;
+  j = i;
+  a2 [i-10] [j] = 2;
+  a2 [i] [j++] = i;
+  a2 [i+20] [j++] = i;
+  a2 [i-3] [i-1] += 1;
+  return;
+}
+
+/* { dg-final { scan-tree-dump-times "MEM" 5 "slsr" } } */
+/* { dg-final { cleanup-tree-dump "slsr" } } */

scanning for 5 MEMs looks non-sensical.  What transform do
you expect?  I see other slsr testcases do similar non-sensical
checking which is bad, too.


As the slsr optimizes CAND_REF candidates by simply lowering them to
MEM_REF from e.g. ARRAY_REF, I think scanning for the number of MEM_REFs
is an effective check.  Alternatively, I can add a follow-up patch to
add some dumping facility in replace_ref () to print out the replacing
actions when -fdump-tree-slsr-details is on.

I think adding some details to the dump and scanning for them would be
better.  That's the only change that is required for this to move forward.


I've updated to patch to dump more details when -fdump-tree-slsr-details 
is on.  The tests have also been updated to scan for these new dumps 
instead of MEMs.




I suggest doing it quickly.  We're well past stage1 close at this point.


The bootstrapping on x86_64 is still running.  OK to commit if it succeeds?

Thanks,
Yufeng

gcc/

* gimple-ssa-strength-reduction.c: Include tree-affine.h.
(name_expansions): New static variable.
(alt_base_map): Ditto.
(get_alternative_base): New function.
(find_basis_for_candidate): For CAND_REF, optionally call
find_basis_for_base_expr with the returned value from
get_alternative_base.
(record_potential_basis): Add new parameter 'base' of type 'tree';
add an assertion of non-NULL base; use base to set node->base_expr.
(alloc_cand_and_find_basis): Update; call record_potential_basis
for CAND_REF with the returned value from get_alternative_base.
(replace_refs): Dump details on the replacing.
(execute_strength_reduction): Call pointer_map_create for
alt_base_map; call free_affine_expand_cache with &name_expansions.

gcc/testsuite/

* gcc.dg/tree-ssa/slsr-39.c: Update.
* gcc.dg/tree-ssa/slsr-41.c: New test.diff --git a/gcc/gimple-ssa-strength-reduction.c b/gcc/gimple-ssa-strength-reduction.c
index 88afc91..bf3362f 100644
--- a/gcc/gimple-ssa-strength-reduction.c
+++ b/gcc/gimple-ssa-strength-reduction.c
@@ -53,6 +53,7 @@ along with GCC; see 

Re: PATCH: PR target/59363: [4.9 Regression] r203886 miscompiles git

2013-12-03 Thread H.J. Lu
On Tue, Dec 3, 2013 at 4:41 AM, Jan Hubicka  wrote:
>> Here is the updated patch.  Tested on Linux/x86-64.  It
>> fixed git.  OK to install?
>>
>> Thanks.
>>
>> --
>> H.J.
>> ---
>> gcc/
>>
>> 2013-12-03   H.J. Lu  
>>
>> PR target/59363
>> * config/i386/i386.c (emit_memset): Adjust destination address
>> after gen_strset.
>> (expand_setmem_epilogue): Likewise.
>>
>> gcc/testsuite/
>>
>> 2013-12-03   H.J. Lu  
>>
>> PR target/59363
>> * gcc.target/i386/pr59363.c: New file.
>
> Yes, this seems fine to me.  As discussed previously, we probably want to make
> strmov patterns use to match strset (I will need to re-check codegen on 
> targets
> that does single memops) and then we will need similar update of aliasing
> there, too.
> Currently I assume we are fine becaue we use it only in expand_movmem epilogue
> and on the way there we already cleared the alias offset on all code paths?
>

I believe it is the case.

I checked it in.

Thanks.

-- 
H.J.


[v3 patch] fix formatting in doxygen comment

2013-12-03 Thread Jonathan Wakely
2013-12-03  Jonathan Wakely  

   * include/std/fstream (basic_filebuf::open): Use preformatted text
   for table in Doxygen comment.

Tested x86_64-linux. Committed to trunk.
commit 8699304a9d3261d2b52636ee46c0203538de8fa1
Author: Jonathan Wakely 
Date:   Tue Dec 3 12:40:46 2013 +

* include/std/fstream (basic_filebuf::open): Use preformatted text
for table in Doxygen comment.

diff --git a/libstdc++-v3/include/std/fstream b/libstdc++-v3/include/std/fstream
index 48e5c3d..701247a 100644
--- a/libstdc++-v3/include/std/fstream
+++ b/libstdc++-v3/include/std/fstream
@@ -238,9 +238,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*  given in @a __mode.
*
*  Table 92, adapted here, gives the relation between openmode
-   *  combinations and the equivalent fopen() flags.
+   *  combinations and the equivalent @c fopen() flags.
*  (NB: lines app, in|out|app, in|app, binary|app, binary|in|out|app,
*  and binary|in|app per DR 596)
+   *  
*  +-+
*  | ios_base Flag combinationstdio equivalent   |
*  |binary  in  out  trunc  app  |
@@ -265,6 +266,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
*  |   + +   +   +a+b|
*  |   + +   +a+b|
*  +-+
+   *  
*/
   __filebuf_type*
   open(const char* __s, ios_base::openmode __mode);


Re: Ping Re: [gomp4] Dumping gimple for offload.

2013-12-03 Thread Bernd Schmidt
On 11/29/2013 06:12 PM, Jakub Jelinek wrote:
> On Fri, Nov 29, 2013 at 06:07:38PM +0100, Bernd Schmidt wrote:
>> By what mechanism do you choose? This is unclear to me from what I've
>> seen. Does this involve user action, and what's the advantage of doing
>> it this way?
> 
> See the 3 threads I've mentioned.  The compiler would know the list of
> available offloading targets (after all, it needs to build libgomp plugins
> for those targets), and that would be the default, and user could override
> that through link time command line options (say, ok, while gcc has been
> configured to support all of hsail-none, ptx-none and x86_64-k10m-linux
> offloading targets, I only want to support here one of those, and
> please use these additional options for compilation of that target).

Ok, IIUC the model is that we just compile all target code for all
targets (or a subset of them). Is that correct? In that case I can see
how the code that's on the branch now is sufficient; I'd assumed
something more fine-grained would be desirable.
It would be helpful to see the other pieces of this work if they already
exist.


Bernd



Re: [REPOST] Invalid Code when reading from unaligned zero-sized array

2013-12-03 Thread Richard Biener
On Tue, Dec 3, 2013 at 1:48 PM, Bernd Edlinger
 wrote:
> Hi Jeff,
>
> please find attached the patch (incl. test cases) for the unaligned read BUG 
> that I found while investigating
> on PR#57748: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57748
>
> one test case is this one:
>
> pr57748-3.c:
> /* PR middle-end/57748 */
> /* { dg-do run } */
> /* wrong code in expand_expr_real_1.  */
>
> #include 
>
> extern void abort (void);
>
> typedef long long V
>   __attribute__ ((vector_size (2 * sizeof (long long)), may_alias));
>
> typedef struct S { V a; V b[0]; } P __attribute__((aligned (1)));
>
> struct __attribute__((packed)) T { char c; P s; };
>
> void __attribute__((noinline, noclone))
> check (P *p)
> {
>   if (p->b[0][0] != 3 || p->b[0][1] != 4)
> abort ();
> }
>
> void __attribute__((noinline, noclone))
> foo (struct T *t)
> {
>   V a = { 3, 4 };
>   t->s.b[0] = a;
> }
>
> int
> main ()
> {
>   struct T *t = (struct T *) calloc (128, 1);
>
>   foo (t);
>   check (&t->s);
>
>   free (t);
>   return 0;
> }
>
>
> and the other one is
> pr57748-4.c:
> /* PR middle-end/57748 */
> /* { dg-do run } */
> /* wrong code in expand_expr_real_1.  */
>
> #include 
>
> extern void abort (void);
>
> typedef long long V
>   __attribute__ ((vector_size (2 * sizeof (long long)), may_alias));
>
> typedef struct S { V b[1]; } P __attribute__((aligned (1)));
>
> struct __attribute__((packed)) T { char c; P s; };
>
> void __attribute__((noinline, noclone))
> check (P *p)
> {
>   if (p->b[1][0] != 3 || p->b[1][1] != 4)
> abort ();
> }
>
> void __attribute__((noinline, noclone))
> foo (struct T *t)
> {
>   V a = { 3, 4 };
>   t->s.b[1] = a;
> }
>
> int
> main ()
> {
>   struct T *t = (struct T *) calloc (128, 1);
>
>   foo (t);
>   check (&t->s);
>
>   free (t);
>   return 0;
> }
>
>
> The patch does add a boolean "expand_reference" parameter to expand_expr_real 
> and
> expand_expr_real_1. I pass true when I intend to use the returned memory 
> context
> as an array reference, instead of a value. At places where mis-aligned values 
> are extracted,
> I do not return a register with the extracted mis-aligned value if 
> expand_reference is true.
> When I have a VIEW_CONVERT_EXPR I pay attention to pass down the outer 
> "expand_reference"
> to the inner expand_expr_real call. Expand_reference, is pretty much similar 
> to the
> expand_modifier "EXPAND_MEMORY".
>
> Boot-strapped and regression-tested on X86_64-pc-linux-gnu (many times).
>
> Ok for trunk?

It still feels like papering over the underlying issue.  Let me have a
second (or third?) look.

Richard.

>
> Thanks
> Bernd.


RE: [PATCH] _Cilk_for for C and C++

2013-12-03 Thread Iyer, Balaji V


> -Original Message-
> From: Jeff Law [mailto:l...@redhat.com]
> Sent: Tuesday, December 3, 2013 1:30 AM
> To: Jason Merrill; Iyer, Balaji V; Aldy Hernandez
> Cc: gcc-patches@gcc.gnu.org; r...@redhat.com; Jakub Jelinek
> Subject: Re: [PATCH] _Cilk_for for C and C++
> 
> On 11/27/13 17:52, Jason Merrill wrote:
> > On 11/27/2013 04:14 PM, Iyer, Balaji V wrote:
> >> I completely agree with you that there are certain parts of Cilk
> >> Plus that is similar to OMP4, namely #pragma simd and SIMD-enabled
> >> functions (formerly called elemental functions). But, the Cilk
> >> keywords is almost completely orthogonal to OpenMP. They are
> >> semantically different  and one cannot be transformed to another.
> >> Cilk uses automatically load-balanced work-stealing using the Cilk
> >> runtime, whereas OMP uses work sharing via OMP runtime. There are a
> >> number of other semantic differences but this is the core-issue.
> >> #pragma simd and #pragma omp have converged in several places but the
> >> Cilk part has always been different from OpenMP.
> >
> > Yes, Cilk for loops will use the Cilk runtime and OMP for loops will
> > use the OMP runtime, but that doesn't mean they can't share a lot of
> > the middle end code along the way.
> >
> > We already have several different varieties of parallel/simd loops all
> > represented by GIMPLE_OMP_FOR, and I think this could be another
> > GP_OMP_FOR_KIND_.
> Right.  It's not a question of what runtime they call back into, but that both
> share a common overall structure.
> 
> Conceptually I look at a for loop as having 4 main components.
> 
> Initializer, test condition, increment and the body.
> 
> I'd like to hope things like the syntatic & semantic analysis of the first 
> three
> would be largely the same.  Most of the Cilk specific bits would be in the
> handling of the body -- but there may be some significant code sharing that
> can happen there too.
> 
> 
> >
> > ...
> >> As you can tell, this is not how openmp handles a #pragma omp for loop.
> >
> > It's different in detail, but #pragma omp parallel for works very
> > similarly: it creates a separate function for the body of the loop and
> > then passes that to GOMP_parallel along with any shared data.
> My thoughts exactly.

I understand you both now. Let me look into the OMP routines and see what it is 
doing and see how I can port it to _Cilk_for. 

Thanks,

Balaji V. Iyer.

> jeff


[PE-POST] Adjust Bit-region in expand_assignment

2013-12-03 Thread Bernd Edlinger
Hi,

I almost forgot about this (rather simple) one. It falls in the category 
"Mostly Harmless",
but it would be nice to have the bitregion_start/end consistent in all cases, 
even when
it does currently not cause any mal-function.

this was:
> Subject: RE: [PATCH, PR 57748] Check for out of bounds access, Part 3
> Date: Wed, 6 Nov 2013 15:40:23 +0100
>
>> Meanwhile I am able to give an example where that code is executed
>> with bitpos = 64, bitsize=32, bitregion_start = 32, bitregion_end = 95.
>>
>> Afterwards bitpos=0, bitsize=32, which is completely outside
>> bitregion_start=32, bitregion_end=95.
>>
>> However this can only be seen in the debugger, as the store_field
>> goes thru a code path that does not look at bitregion_start/end.
>>
>> Well that is at least extremely ugly, and I would not be sure, that
>> I cannot come up with a sample that crashes or creates wrong code.
>>
>> Currently I think that maybe the best way to fix that would be this:
>>
>> --- gcc/expr.c 2013-10-21 08:27:09.546035668 +0200
>> +++ gcc/expr.c 2013-10-22 15:19:56.749476525 +0200
>> @@ -4762,6 +4762,9 @@ expand_assignment (tree to, tree from, b
>> && MEM_ALIGN (to_rtx) == GET_MODE_ALIGNMENT (mode1))
>> {
>> to_rtx = adjust_address (to_rtx, mode1, bitpos / BITS_PER_UNIT);
>> + bitregion_start = 0;
>> + if (bitregion_end>= (unsigned HOST_WIDE_INT) bitpos)
>> + bitregion_end -= bitpos;
>> bitpos = 0;
>> }
>>

>
> well, as already discussed, the following example executes the above memory 
> block,
> and leaves bitregion_start/end in an undefined state:
>

extern void abort (void);

struct x{
 int a;
 int :32;
 volatile int b:32;
};

struct s
{
  int a,b,c,d;
  struct x xx[1];
};

struct s ss;
volatile int k;

int main()
{
  ss.xx[k].b = 1;
  //  asm volatile("":::"memory");
  if ( ss.xx[k].b != 1)
  abort ();
  return 0;
}

> Although this does not cause malfunction at the time, I'd propose to play 
> safe,
> and update the bitregion_start/bitregion_end. Additionally I'd propose to 
> remove
> this comment in expand_assignment and expand_expr_real_1:
>
>  "A constant address in TO_RTX can have VOIDmode, we must not try
>   to call force_reg for that case.  Avoid that case."
>
> This comment is completely out of sync: There is no longer any force_reg in 
> that if-block,
> and a constant address in TO_RTX has SImode or DImode in GET_MODE (XEXP 
> (to_rtx, 0))
> I do not know how to make it a VOIDmode, therefore the comment does not help 
> either.
>
>

Boot-strapped and regression-tested on x86_64-linux-gnu.

OK for trunk?


Regards
Bernd.2013-11-06  Bernd Edlinger  

PR middle-end/57748
* expr.c (expand_assignment): Remove bogus comment.
Update bitregion_start/bitregion_end.
(expand_expr_real_1): Remove bogus comment.



patch-pr57748-3.diff
Description: Binary data


Re: [PATCH] Don't create out-of-bounds BIT_FIELD_REFs

2013-12-03 Thread Tom de Vries

On 03-12-13 13:49, Jakub Jelinek wrote:

On Thu, Nov 28, 2013 at 12:23:43AM +0100, Tom de Vries wrote:

Committed to trunk.

Also ok for 4.8 branch? It's a 4.8/4.9 regression.


Ok, but I guess you need to adjust your patch for 4.8 (tree_to_*
and tree_fits_* to host_integerp/tree_low_cst), so please make sure
you test it before commiting.



Jakub,

Richard B. already ok'ed the patch, but with the checking bit left out.

The functions you mention were in that checking bit.

I've tested and committed attached patch.

Thanks,
- Tom


Jakub



2013-11-27  Tom de Vries  
	Marc Glisse  

	PR middle-end/59037
	* semantics.c (cxx_fold_indirect_ref): Don't create out-of-bounds
	BIT_FIELD_REF.

	* fold-const.c (fold_indirect_ref_1): Don't create out-of-bounds
	BIT_FIELD_REF.
	* gimplify.c (gimple_fold_indirect_ref): Same.

	* c-c++-common/pr59037.c: New testcase.

diff --git a/gcc/cp/semantics.c b/gcc/cp/semantics.c
index 571fc7c..e7e2034 100644
--- a/gcc/cp/semantics.c
+++ b/gcc/cp/semantics.c
@@ -7543,7 +7543,7 @@ cxx_fold_indirect_ref (location_t loc, tree type, tree op0, bool *empty_base)
 	  unsigned HOST_WIDE_INT indexi = offset * BITS_PER_UNIT;
 	  tree index = bitsize_int (indexi);
 
-	  if (offset/part_widthi <= TYPE_VECTOR_SUBPARTS (op00type))
+	  if (offset / part_widthi < TYPE_VECTOR_SUBPARTS (op00type))
 		return fold_build3_loc (loc,
 	BIT_FIELD_REF, type, op00,
 	part_width, index);
diff --git a/gcc/fold-const.c b/gcc/fold-const.c
index f666429..7433686 100644
--- a/gcc/fold-const.c
+++ b/gcc/fold-const.c
@@ -16595,7 +16595,7 @@ fold_indirect_ref_1 (location_t loc, tree type, tree op0)
 	  unsigned HOST_WIDE_INT indexi = offset * BITS_PER_UNIT;
 	  tree index = bitsize_int (indexi);
 
-	  if (offset/part_widthi <= TYPE_VECTOR_SUBPARTS (op00type))
+	  if (offset / part_widthi < TYPE_VECTOR_SUBPARTS (op00type))
 		return fold_build3_loc (loc,
 	BIT_FIELD_REF, type, op00,
 	part_width, index);
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index e711928..1f1deb8 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -4369,7 +4369,7 @@ gimple_fold_indirect_ref (tree t)
   unsigned HOST_WIDE_INT indexi = offset * BITS_PER_UNIT;
   tree index = bitsize_int (indexi);
   if (offset / part_widthi
-  <= TYPE_VECTOR_SUBPARTS (TREE_TYPE (addrtype)))
+  < TYPE_VECTOR_SUBPARTS (TREE_TYPE (addrtype)))
 return fold_build3 (BIT_FIELD_REF, type, TREE_OPERAND (addr, 0),
 part_width, index);
 	}
diff --git a/gcc/testsuite/c-c++-common/pr59037.c b/gcc/testsuite/c-c++-common/pr59037.c
new file mode 100644
index 000..fae13c2
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/pr59037.c
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+
+typedef int v4si __attribute__ ((vector_size (16)));
+
+int
+main (int argc, char** argv)
+{
+  v4si x = {0,1,2,3};
+  x = (v4si) {(x)[3], (x)[2], (x)[1], (x)[0]};
+  return x[4];
+}


Re: [PATCH] _Cilk_for for C and C++

2013-12-03 Thread Jakub Jelinek
On Tue, Dec 03, 2013 at 01:25:57PM +, Iyer, Balaji V wrote:
> > >> As you can tell, this is not how openmp handles a #pragma omp for loop.
> > >
> > > It's different in detail, but #pragma omp parallel for works very
> > > similarly: it creates a separate function for the body of the loop and
> > > then passes that to GOMP_parallel along with any shared data.
> > My thoughts exactly.
> 
> I understand you both now. Let me look into the OMP routines and see what
> it is doing and see how I can port it to _Cilk_for.

Yeah.  The work is actually multi-stage, first during gimplification
the code does determine what variables are used in the #pragma omp parallel
(etc., in your case _Cilk_for) region, and whether they should be shared,
or privatized (and in that case in what way, normal private, firstprivate,
lastprivate, firstprivate+lastprivate, reduction, ...).  Then there is
omplower pass (already enabled for Cilk+ due to #pragma simd) that e.g.
lowers accesses to shared variables, creates new VAR_DECLs for the
privatized vars etc. and then ompexp pass that will create the outlined body
of the function and create call to the runtime library.
I have no idea what privatization behavior _Cilk_for wants, I'd expect that
at least the IV must be privatized, otherwise it would be racy, but about
other vars?

Jakub


Re: [PATCH][ARM] Implement CRC32 intrinsics for AArch32 in ARMv8-A

2013-12-03 Thread Kyrill Tkachov

Ping?
http://gcc.gnu.org/ml/gcc-patches/2013-11/msg02351.html

Thanks,
Kyrill

On 26/11/13 09:44, Kyrill Tkachov wrote:

Ping?

Thanks,
Kyrill

On 19/11/13 17:04, Kyrill Tkachov wrote:

On 19/11/13 16:26, Joseph S. Myers wrote:

In any target header installed for user use, such as arm_acle.h, you need
to be namespace-clean.  In this case, that means you need to use
implementation-namespace identifiers such as __a, __b and __d in case the
user has defined macros with names such as a, b and d (unless the ACLE
says that identifiers a, b and d are in the implementation's namespace
when this header is included, which would be a very odd thing for it to
do).


Hi Joseph,

Thanks for the catch. ACLE doesn't expect a,b,d to be in the implementation
namespace. I've added underscores before them.

Made sure tests pass.

Revised patch attached.
How's this?

Kyrill

gcc/
2013-11-19  Kyrylo Tkachov  

   * Makefile.in (TEXI_GCC_FILES): Add arm-acle-intrinsics.texi.
   * config.gcc (extra_headers): Add arm_acle.h.
   * config/arm/arm.c (FL_CRC32): Define.
   (arm_have_crc): Likewise.
   (arm_option_override): Set arm_have_crc.
   (arm_builtins): Add CRC32 builtins.
   (bdesc_2arg): Likewise.
   (arm_init_crc32_builtins): New function.
   (arm_init_builtins): Initialise CRC32 builtins.
   (arm_file_start): Handle architecture extensions.
   * config/arm/arm.h (TARGET_CPU_CPP_BUILTINS): Define __ARM_FEATURE_CRC32.
   Define __ARM_32BIT_STATE.
   (TARGET_CRC32): Define.
   * config/arm/arm-arches.def: Add armv8-a+crc.
   * config/arm/arm-tables.opt: Regenerate.
   * config/arm/arm.md (type): Add crc.
   (): New insn.
   * config/arm/arm_acle.h: New file.
   * config/arm/iterators.md (CRC): New int iterator.
   (crc_variant, crc_mode): New int attributes.
   * confg/arm/unspecs.md (UNSPEC_CRC32B, UNSPEC_CRC32H, UNSPEC_CRC32W,
   UNSPEC_CRC32CB, UNSPEC_CRC32CH, UNSPEC_CRC32CW): New unspecs.
   * doc/invoke.texi: Document -march=armv8-a+crc option.
   * doc/extend.texi: Document ACLE intrinsics.
   * doc/arm-acle-intrinsics.texi: New.


gcc/testsuite
2013-11-19  Kyrylo Tkachov  

   * lib/target-supports.exp (add_options_for_arm_crc): New procedure.
   (check_effective_target_arm_crc_ok_nocache): Likewise.
   (check_effective_target_arm_crc_ok): Likewise.
   * gcc.target/arm/acle/: New directory.
   * gcc.target/arm/acle/acle.exp: New.
   * gcc.target/arm/acle/crc32b.c: New test.
   * gcc.target/arm/acle/crc32h.c: Likewise.
   * gcc.target/arm/acle/crc32w.c: Likewise.
   * gcc.target/arm/acle/crc32d.c: Likewise.
   * gcc.target/arm/acle/crc32cb.c: Likewise.
   * gcc.target/arm/acle/crc32ch.c: Likewise.
   * gcc.target/arm/acle/crc32cw.c: Likewise.
   * gcc.target/arm/acle/crc32cd.c: Likewise.





Re: Ping Re: [gomp4] Dumping gimple for offload.

2013-12-03 Thread Michael V. Zolotukhin
Hi Bernd,

I am working on offloading support for OpenMP4, so I'll try to share my vision
of how everything works and answer your questions.

GCC compiles host version of the code (as usual) and dumps Gimple, as it does
for LTO, but for offloading.  Gimple IR is stored only for functions/variables
that have target versions - for OpenMP that means the functions/variables under
'omp declare target' pragma and outlined functions from 'omp target' regions.
Dumped gimple is then picked up by lto-plugin, which performs compilations for
all specified targets (specified either in configure or by compilation flags).
(Note that though we use lto-infrastructure, it possible that we don't do any
link-time optimizations.)

Compiled target images are then embedded into a data-sections of the host 
binary,
and are registered in a global for all targets descriptor.  This descriptor is
basically an array of pointers to beginning of the images and its sizes.

Along with pointers to images, the descriptor contains pointer to
functions/globals tables and some other data.  Function tables are used to find
the correspondence between host and target versions of functions, that could be
offloaded.

To compile code for target, lto-wrapper calls target compiler, giving it
fat-object with dumped gimple as an input.  This target compiler could actually
be anything that understands Gimple as its input and produces a target image.
(Patch for this part hasn't been submitted yet, but I hope to do it in a near
future).

Another big part of the entire picture is libgomp - a runtime for supporting all
this infrastructure.  To support different targets, we added plugin-support to
libgomp.  Thus, libgomp operates in 'general' terms and calls plugin-hooks, when
it needs to do anything target-specific (examples of such hooks are:
device_run_function, device_init_device, etc.).

So, for each target we need to have two important pieces:
1) compiler, that accepts gimple as its input and produces target image
2) plugin for libgomp, which will implement all device-specific part

That's just a general overview, so I could explain some parts in more details if
you need.  Not everything has already been implemented yet, here is the current
status (also, very general and maybe rough):
 - Plugins support in libgomp.  Done in gomp4-branch.
 - Streaming gimple IR into dedicated sections for later use by target
   compilers.  Done in gomp4-branch.
 - Invoking target compilers and embedding target images into host binary.  Work
   in progress, hope to send the patch soon.
 - Configure options, compile options for specifying offloading targets.  Work
   in progress.

Thanks,
Michael

On 03 Dec 14:00, Bernd Schmidt wrote:
> On 11/29/2013 06:12 PM, Jakub Jelinek wrote:
> > On Fri, Nov 29, 2013 at 06:07:38PM +0100, Bernd Schmidt wrote:
> >> By what mechanism do you choose? This is unclear to me from what I've
> >> seen. Does this involve user action, and what's the advantage of doing
> >> it this way?
> > 
> > See the 3 threads I've mentioned.  The compiler would know the list of
> > available offloading targets (after all, it needs to build libgomp plugins
> > for those targets), and that would be the default, and user could override
> > that through link time command line options (say, ok, while gcc has been
> > configured to support all of hsail-none, ptx-none and x86_64-k10m-linux
> > offloading targets, I only want to support here one of those, and
> > please use these additional options for compilation of that target).
> 
> Ok, IIUC the model is that we just compile all target code for all
> targets (or a subset of them). Is that correct? In that case I can see
> how the code that's on the branch now is sufficient; I'd assumed
> something more fine-grained would be desirable.
> It would be helpful to see the other pieces of this work if they already
> exist.
> 
> 
> Bernd
> 


Re: [REPOST] Invalid Code when reading from unaligned zero-sized array

2013-12-03 Thread Richard Biener
On Tue, Dec 3, 2013 at 2:10 PM, Richard Biener
 wrote:
> On Tue, Dec 3, 2013 at 1:48 PM, Bernd Edlinger
>  wrote:
>> Hi Jeff,
>>
>> please find attached the patch (incl. test cases) for the unaligned read BUG 
>> that I found while investigating
>> on PR#57748: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57748
>>
>> one test case is this one:
>>
>> pr57748-3.c:
>> /* PR middle-end/57748 */
>> /* { dg-do run } */
>> /* wrong code in expand_expr_real_1.  */
>>
>> #include 
>>
>> extern void abort (void);
>>
>> typedef long long V
>>   __attribute__ ((vector_size (2 * sizeof (long long)), may_alias));
>>
>> typedef struct S { V a; V b[0]; } P __attribute__((aligned (1)));
>>
>> struct __attribute__((packed)) T { char c; P s; };
>>
>> void __attribute__((noinline, noclone))
>> check (P *p)
>> {
>>   if (p->b[0][0] != 3 || p->b[0][1] != 4)
>> abort ();
>> }
>>
>> void __attribute__((noinline, noclone))
>> foo (struct T *t)
>> {
>>   V a = { 3, 4 };
>>   t->s.b[0] = a;
>> }
>>
>> int
>> main ()
>> {
>>   struct T *t = (struct T *) calloc (128, 1);
>>
>>   foo (t);
>>   check (&t->s);
>>
>>   free (t);
>>   return 0;
>> }
>>
>>
>> and the other one is
>> pr57748-4.c:
>> /* PR middle-end/57748 */
>> /* { dg-do run } */
>> /* wrong code in expand_expr_real_1.  */
>>
>> #include 
>>
>> extern void abort (void);
>>
>> typedef long long V
>>   __attribute__ ((vector_size (2 * sizeof (long long)), may_alias));
>>
>> typedef struct S { V b[1]; } P __attribute__((aligned (1)));
>>
>> struct __attribute__((packed)) T { char c; P s; };
>>
>> void __attribute__((noinline, noclone))
>> check (P *p)
>> {
>>   if (p->b[1][0] != 3 || p->b[1][1] != 4)
>> abort ();
>> }
>>
>> void __attribute__((noinline, noclone))
>> foo (struct T *t)
>> {
>>   V a = { 3, 4 };
>>   t->s.b[1] = a;
>> }
>>
>> int
>> main ()
>> {
>>   struct T *t = (struct T *) calloc (128, 1);
>>
>>   foo (t);
>>   check (&t->s);
>>
>>   free (t);
>>   return 0;
>> }
>>
>>
>> The patch does add a boolean "expand_reference" parameter to 
>> expand_expr_real and
>> expand_expr_real_1. I pass true when I intend to use the returned memory 
>> context
>> as an array reference, instead of a value. At places where mis-aligned 
>> values are extracted,
>> I do not return a register with the extracted mis-aligned value if 
>> expand_reference is true.
>> When I have a VIEW_CONVERT_EXPR I pay attention to pass down the outer 
>> "expand_reference"
>> to the inner expand_expr_real call. Expand_reference, is pretty much similar 
>> to the
>> expand_modifier "EXPAND_MEMORY".
>>
>> Boot-strapped and regression-tested on X86_64-pc-linux-gnu (many times).
>>
>> Ok for trunk?
>
> It still feels like papering over the underlying issue.  Let me have a
> second (or third?) look.

So I believe the issue is that we are asking the target for an optab
for movmisaling with the wrong mode and then create a movmisalign
with a wrong mode.

Index: gcc/expr.c
===
--- gcc/expr.c  (revision 205623)
+++ gcc/expr.c  (working copy)
@@ -9737,7 +9737,7 @@ expand_expr_real_1 (tree exp, rtx target
&& mode != BLKmode
&& align < GET_MODE_ALIGNMENT (mode))
  {
-   if ((icode = optab_handler (movmisalign_optab, mode))
+   if ((icode = optab_handler (movmisalign_optab, tmode))
!= CODE_FOR_nothing)
  {
struct expand_operand ops[2];
@@ -9745,7 +9745,7 @@ expand_expr_real_1 (tree exp, rtx target
/* We've already validated the memory, and we're creating a
   new pseudo destination.  The predicates really can't fail,
   nor can the generator.  */
-   create_output_operand (&ops[0], NULL_RTX, mode);
+   create_output_operand (&ops[0], NULL_RTX, tmode);
create_fixed_operand (&ops[1], temp);
expand_insn (icode, 2, ops);
temp = ops[0].value;
@@ -9966,7 +9966,7 @@ expand_expr_real_1 (tree exp, rtx target
  != INTEGER_CST)
  && modifier != EXPAND_STACK_PARM
  ? target : NULL_RTX),
-VOIDmode,
+mode1,
 modifier == EXPAND_SUM ? EXPAND_NORMAL : modifier);

/* If the bitfield is volatile, we want to access it in the

fixes the testcases (the VIEW_CONVERT_EXPR path would need a
similar fix I think, both for the recursion and the movmisaling handling).
Note that the alignment checks guarding the movmisalign handling
use the wrong mode as well, they are also somewhat non-sensical
as they apply to the non-offsetted object.

That said, the "get me a 'base' MEM for this" via recursing is misdesigned.
I still think that splitting it out is the correct fix in the end (doing the
movmisaling handling only there, but with correct info from the parent).

I will try if I can come up with something that f

Re: wide-int, avr

2013-12-03 Thread Denis Chertykov
2013/11/23 Mike Stump :
> Richi has asked the we break the wide-int patch so that the individual port 
> and front end maintainers can review their parts without have to go through 
> the entire patch.This patch covers the avr port.
>
> Ok?
>

Ok.
Excuse my long delay. I was on vacation.

Please, apply.

Denis.


RE: [PATCH] _Cilk_for for C and C++

2013-12-03 Thread Iyer, Balaji V


> -Original Message-
> From: Jakub Jelinek [mailto:ja...@redhat.com]
> Sent: Tuesday, December 3, 2013 8:40 AM
> To: Iyer, Balaji V
> Cc: Jeff Law; Jason Merrill; Aldy Hernandez; gcc-patches@gcc.gnu.org;
> r...@redhat.com
> Subject: Re: [PATCH] _Cilk_for for C and C++
> 
> On Tue, Dec 03, 2013 at 01:25:57PM +, Iyer, Balaji V wrote:
> > > >> As you can tell, this is not how openmp handles a #pragma omp for
> loop.
> > > >
> > > > It's different in detail, but #pragma omp parallel for works very
> > > > similarly: it creates a separate function for the body of the loop
> > > > and then passes that to GOMP_parallel along with any shared data.
> > > My thoughts exactly.
> >
> > I understand you both now. Let me look into the OMP routines and see
> > what it is doing and see how I can port it to _Cilk_for.
> 
> Yeah.  The work is actually multi-stage, first during gimplification the code
> does determine what variables are used in the #pragma omp parallel (etc., in
> your case _Cilk_for) region, and whether they should be shared, or
> privatized (and in that case in what way, normal private, firstprivate,
> lastprivate, firstprivate+lastprivate, reduction, ...).  Then there is 
> omplower
> pass (already enabled for Cilk+ due to #pragma simd) that e.g.
> lowers accesses to shared variables, creates new VAR_DECLs for the
> privatized vars etc. and then ompexp pass that will create the outlined body
> of the function and create call to the runtime library.
> I have no idea what privatization behavior _Cilk_for wants, I'd expect that at
> least the IV must be privatized, otherwise it would be racy, but about other
> vars?
> 

In Cilk_for you don't need to privatize any variables. I need to pass in the 
loop_count, the grain (if the user specifies one), the nested function and its 
context to a Cilk specific function:__cilkrts_cilk_for_64 (or 32). The nested 
function has the body of the _Cilk_for and it has 3 parameter, context, start 
and end, where the start and end are passed in by the runtime which will tell 
what parts of the loop to execute. This thread has an example: 
http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03567.html

Thanks,

Balaji V. Iyer.

>   Jakub


Re: [Ping, avr] Emit diagnostics if -f{pic,PIC,pie,PIE} or -shared is passed

2013-12-03 Thread Denis Chertykov
2013/11/18 Senthil Kumar Selvaraj :
> Ping!
>
> Regards
> Senthil
>
> On Mon, Nov 04, 2013 at 06:45:19PM +0530, Senthil Kumar Selvaraj wrote:
>> The AVR backend does not generate position independent code, yet it
>> happily accepts -fpic, -fPIC, -fpie and -fPIE. The generated code
>> doesn't change at all. Also, it accepts the -shared option to generate a
>> shared library, without really doing anything with it.
>>
>> This causes one of the regression tests
>> (gcc.dg/lto/pr54709 c_lto_pr54709_0.o-c_lto_pr54709_1.o link) to fail with
>> an 'undefined reference to main' error, when the test is trying to build
>> a shared object.
>>
>> The attached patch generates a warning if one of the -f{pic,PIC,pie,PIE}
>> options is provided, and an error if -shared is provided (
>> config/mep/mep.c and config/s390/tpf.h already do something very similar).
>>
>> Regression tested with no new failures.Tests which exercise PIC now report as
>> unsupported.
>>
>> If ok, could someone commit please? I don't have commit access.
>>
>> Regards
>> Senthil
>>
>> gcc/ChangeLog
>> 2013-11-04  Senthil Kumar Selvaraj  
>>
>>   * config/avr/avr.c (avr_option_override): Warn if asked to generate
>>   position independent code.
>>   * config/avr/avr.h: Modify LINK_SPEC to reject -shared.
>>

Sorry for delay. (I was on vacation)

Committed.

Denis.


RE: [REPOST] Invalid Code when reading from unaligned zero-sized array

2013-12-03 Thread Bernd Edlinger

> Date: Tue, 3 Dec 2013 14:51:21 +0100
> Subject: Re: [REPOST] Invalid Code when reading from unaligned zero-sized 
> array
> From: richard.guent...@gmail.com
> To: bernd.edlin...@hotmail.de
> CC: l...@redhat.com; gcc-patches@gcc.gnu.org; ja...@redhat.com
>
> On Tue, Dec 3, 2013 at 2:10 PM, Richard Biener
>  wrote:
>> On Tue, Dec 3, 2013 at 1:48 PM, Bernd Edlinger
>>  wrote:
>>> Hi Jeff,
>>>
>>> please find attached the patch (incl. test cases) for the unaligned read 
>>> BUG that I found while investigating
>>> on PR#57748: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57748
>>>
>>> one test case is this one:
>>>
>>> pr57748-3.c:
>>> /* PR middle-end/57748 */
>>> /* { dg-do run } */
>>> /* wrong code in expand_expr_real_1. */
>>>
>>> #include 
>>>
>>> extern void abort (void);
>>>
>>> typedef long long V
>>> __attribute__ ((vector_size (2 * sizeof (long long)), may_alias));
>>>
>>> typedef struct S { V a; V b[0]; } P __attribute__((aligned (1)));
>>>
>>> struct __attribute__((packed)) T { char c; P s; };
>>>
>>> void __attribute__((noinline, noclone))
>>> check (P *p)
>>> {
>>> if (p->b[0][0] != 3 || p->b[0][1] != 4)
>>> abort ();
>>> }
>>>
>>> void __attribute__((noinline, noclone))
>>> foo (struct T *t)
>>> {
>>> V a = { 3, 4 };
>>> t->s.b[0] = a;
>>> }
>>>
>>> int
>>> main ()
>>> {
>>> struct T *t = (struct T *) calloc (128, 1);
>>>
>>> foo (t);
>>> check (&t->s);
>>>
>>> free (t);
>>> return 0;
>>> }
>>>
>>>
>>> and the other one is
>>> pr57748-4.c:
>>> /* PR middle-end/57748 */
>>> /* { dg-do run } */
>>> /* wrong code in expand_expr_real_1. */
>>>
>>> #include 
>>>
>>> extern void abort (void);
>>>
>>> typedef long long V
>>> __attribute__ ((vector_size (2 * sizeof (long long)), may_alias));
>>>
>>> typedef struct S { V b[1]; } P __attribute__((aligned (1)));
>>>
>>> struct __attribute__((packed)) T { char c; P s; };
>>>
>>> void __attribute__((noinline, noclone))
>>> check (P *p)
>>> {
>>> if (p->b[1][0] != 3 || p->b[1][1] != 4)
>>> abort ();
>>> }
>>>
>>> void __attribute__((noinline, noclone))
>>> foo (struct T *t)
>>> {
>>> V a = { 3, 4 };
>>> t->s.b[1] = a;
>>> }
>>>
>>> int
>>> main ()
>>> {
>>> struct T *t = (struct T *) calloc (128, 1);
>>>
>>> foo (t);
>>> check (&t->s);
>>>
>>> free (t);
>>> return 0;
>>> }
>>>
>>>
>>> The patch does add a boolean "expand_reference" parameter to 
>>> expand_expr_real and
>>> expand_expr_real_1. I pass true when I intend to use the returned memory 
>>> context
>>> as an array reference, instead of a value. At places where mis-aligned 
>>> values are extracted,
>>> I do not return a register with the extracted mis-aligned value if 
>>> expand_reference is true.
>>> When I have a VIEW_CONVERT_EXPR I pay attention to pass down the outer 
>>> "expand_reference"
>>> to the inner expand_expr_real call. Expand_reference, is pretty much 
>>> similar to the
>>> expand_modifier "EXPAND_MEMORY".
>>>
>>> Boot-strapped and regression-tested on X86_64-pc-linux-gnu (many times).
>>>
>>> Ok for trunk?
>>
>> It still feels like papering over the underlying issue. Let me have a
>> second (or third?) look.
>
> So I believe the issue is that we are asking the target for an optab
> for movmisaling with the wrong mode and then create a movmisalign
> with a wrong mode.
>
> Index: gcc/expr.c
> ===
> --- gcc/expr.c (revision 205623)
> +++ gcc/expr.c (working copy)
> @@ -9737,7 +9737,7 @@ expand_expr_real_1 (tree exp, rtx target
> && mode != BLKmode
> && align < GET_MODE_ALIGNMENT (mode))
> {
> - if ((icode = optab_handler (movmisalign_optab, mode))
> + if ((icode = optab_handler (movmisalign_optab, tmode))
> != CODE_FOR_nothing)
> {
> struct expand_operand ops[2];
> @@ -9745,7 +9745,7 @@ expand_expr_real_1 (tree exp, rtx target
> /* We've already validated the memory, and we're creating a
> new pseudo destination. The predicates really can't fail,
> nor can the generator. */
> - create_output_operand (&ops[0], NULL_RTX, mode);
> + create_output_operand (&ops[0], NULL_RTX, tmode);
> create_fixed_operand (&ops[1], temp);
> expand_insn (icode, 2, ops);
> temp = ops[0].value;
> @@ -9966,7 +9966,7 @@ expand_expr_real_1 (tree exp, rtx target
> != INTEGER_CST)
> && modifier != EXPAND_STACK_PARM
> ? target : NULL_RTX),
> - VOIDmode,
> + mode1,
> modifier == EXPAND_SUM ? EXPAND_NORMAL : modifier);
>
> /* If the bitfield is volatile, we want to access it in the
>
> fixes the testcases (the VIEW_CONVERT_EXPR path would need a
> similar fix I think, both for the recursion and the movmisaling handling).
> Note that the alignment checks guarding the movmisalign handling
> use the wrong mode as well, they are also somewhat non-sensical
> as they apply to the non-offsetted object.
>
> That said, the "get me a 'base' MEM for this" via recursing is misdesigned.
> I still think that splitting it out is the correct fix in the end (doing the
> movmisaling handling only there, but with correct info fro

Re: [PowerPC] libffi fixes and support for PowerPC64 ELFv2

2013-12-03 Thread David Edelsohn
On Thu, Nov 21, 2013 at 9:57 PM, Alan Modra  wrote:
> David,
> Here comes the inevitable followup..  I broke backwards compatibility
> when adding an extra field to ffi_cif.  I'd like to import again from
> upstream, where I've already fixed the problem.
>
> https://sourceware.org/ml/libffi-discuss/2013/msg00220.html
>
> Actually, it's not a straight import because many files outside of
> libffi/src/powerpc/ have diverged, but fortunately for me, not
> significantly.  For the record, I've shown the files that need
> patching below.  Identical patches went in upstream (except for
> formatting differences in Makefile.am).  Bootstrapped etc.
> powerpc64-linux and powerpc64le-linux.  OK to apply?
>
> libffi/
> * src/powerpc/ffitarget.h: Import from upstream.
> * src/powerpc/ffi_powerpc.h: Likewise.
> * src/powerpc/ffi.c: Likewise.
> * src/powerpc/ffi_sysv.c: Likewise.
> * src/powerpc/ffi_linux64.c: Likewise.
> * src/powerpc/sysv.S: Likewise.
> * src/powerpc/ppc_closure.S: Likewise.
> * src/powerpc/linux64.S: Likewise.
> * src/powerpc/linux64_closure.S: Likewise.
> * src/types.c: Likewise.
> * Makefile.am (EXTRA_DIST): Add new src/powerpc files.
> (nodist_libffi_la_SOURCES ): Likewise.
> * configure.ac (HAVE_LONG_DOUBLE_VARIANT): Define for powerpc.
> * include/ffi.h.in (ffi_prep_types): Declare.
> * src/prep_cif.c (ffi_prep_cif_core): Call ffi_prep_types.
> * configure: Regenerate.
> * fficonfig.h.in: Regenerate.
> * Makefile.in: Regenerate.
> * man/Makefile.in: Regenerate.
> * include/Makefile.in: Regenerate.
> * testsuite/Makefile.in: Regenerate.

Have you tested this patch on targets other than powerpc-linux? Have
you tested this patch on AIX?

- David


Re: [PATCH] _Cilk_for for C and C++

2013-12-03 Thread Jakub Jelinek
On Tue, Dec 03, 2013 at 02:01:17PM +, Iyer, Balaji V wrote:
> In Cilk_for you don't need to privatize any variables. I need to pass in
> the loop_count, the grain (if the user specifies one), the nested function
> and its context to a Cilk specific function:__cilkrts_cilk_for_64 (or 32). 
> The nested function has the body of the _Cilk_for and it has 3 parameter,
> context, start and end, where the start and end are passed in by the
> runtime which will tell what parts of the loop to execute.  This thread
> has an example: http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03567.html

So Cilk+ only allows say:
_Cilk_for (int ii = 5; ii < 15; ii++)
{

}
and not
int ii;
_Cilk_for (ii = 5; ii < 15; ii++)
{

}
?  Other variables can be all shared, that is after all the default
for #pragma omp parallel for, except for the IVs and a couple of other
exceptions (e.g. readonly vars etc.), if somebody wants private vars,
they can be surely declared inside of the  somwhere.

Jakub


Re: [REPOST] Invalid Code when reading from unaligned zero-sized array

2013-12-03 Thread Richard Biener
On Tue, Dec 3, 2013 at 2:10 PM, Richard Biener
 wrote:
> On Tue, Dec 3, 2013 at 1:48 PM, Bernd Edlinger
>  wrote:
>> Hi Jeff,
>>
>> please find attached the patch (incl. test cases) for the unaligned read BUG 
>> that I found while investigating
>> on PR#57748: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=57748
>>
>> one test case is this one:
>>
>> pr57748-3.c:
>> /* PR middle-end/57748 */
>> /* { dg-do run } */
>> /* wrong code in expand_expr_real_1.  */
>>
>> #include 
>>
>> extern void abort (void);
>>
>> typedef long long V
>>   __attribute__ ((vector_size (2 * sizeof (long long)), may_alias));
>>
>> typedef struct S { V a; V b[0]; } P __attribute__((aligned (1)));
>>
>> struct __attribute__((packed)) T { char c; P s; };
>>
>> void __attribute__((noinline, noclone))
>> check (P *p)
>> {
>>   if (p->b[0][0] != 3 || p->b[0][1] != 4)
>> abort ();
>> }
>>
>> void __attribute__((noinline, noclone))
>> foo (struct T *t)
>> {
>>   V a = { 3, 4 };
>>   t->s.b[0] = a;
>> }
>>
>> int
>> main ()
>> {
>>   struct T *t = (struct T *) calloc (128, 1);
>>
>>   foo (t);
>>   check (&t->s);
>>
>>   free (t);
>>   return 0;
>> }
>>
>>
>> and the other one is
>> pr57748-4.c:
>> /* PR middle-end/57748 */
>> /* { dg-do run } */
>> /* wrong code in expand_expr_real_1.  */
>>
>> #include 
>>
>> extern void abort (void);
>>
>> typedef long long V
>>   __attribute__ ((vector_size (2 * sizeof (long long)), may_alias));
>>
>> typedef struct S { V b[1]; } P __attribute__((aligned (1)));
>>
>> struct __attribute__((packed)) T { char c; P s; };
>>
>> void __attribute__((noinline, noclone))
>> check (P *p)
>> {
>>   if (p->b[1][0] != 3 || p->b[1][1] != 4)
>> abort ();
>> }
>>
>> void __attribute__((noinline, noclone))
>> foo (struct T *t)
>> {
>>   V a = { 3, 4 };
>>   t->s.b[1] = a;
>> }
>>
>> int
>> main ()
>> {
>>   struct T *t = (struct T *) calloc (128, 1);
>>
>>   foo (t);
>>   check (&t->s);
>>
>>   free (t);
>>   return 0;
>> }
>>
>>
>> The patch does add a boolean "expand_reference" parameter to 
>> expand_expr_real and
>> expand_expr_real_1. I pass true when I intend to use the returned memory 
>> context
>> as an array reference, instead of a value. At places where mis-aligned 
>> values are extracted,
>> I do not return a register with the extracted mis-aligned value if 
>> expand_reference is true.
>> When I have a VIEW_CONVERT_EXPR I pay attention to pass down the outer 
>> "expand_reference"
>> to the inner expand_expr_real call. Expand_reference, is pretty much similar 
>> to the
>> expand_modifier "EXPAND_MEMORY".
>>
>> Boot-strapped and regression-tested on X86_64-pc-linux-gnu (many times).
>>
>> Ok for trunk?
>
> It still feels like papering over the underlying issue.  Let me have a
> second (or third?) look.

Few comments on your patch.

@@ -9520,6 +9526,7 @@ expand_expr_real_1 (tree exp, rtx target, enum mac
align = get_object_alignment (exp);
if (modifier != EXPAND_WRITE
&& modifier != EXPAND_MEMORY
+   && !expand_reference
&& mode != BLKmode
&& align < GET_MODE_ALIGNMENT (mode)
/* If the target does not have special handling for unaligned

(TARGET_MEM_REF), expand_reference should never be true here,
there may be no component-refs around TARGET_MEM_REFs.

You miss adjusting the VIEW_CONVERT_EXPR path?  (line-numbers
are off a lot in your patch, context doesn't help very much :/  Does not
seem to be against 4.8 either ...)

Index: gcc/cfgexpand.c
===
--- gcc/cfgexpand.c (revision 204411)
+++ gcc/cfgexpand.c (working copy)
@@ -2189,7 +2189,7 @@ expand_call_stmt (gimple stmt)
   if (lhs)
 expand_assignment (lhs, exp, false);
   else
-expand_expr_real_1 (exp, const0_rtx, VOIDmode, EXPAND_NORMAL, NULL);
+expand_expr_real_1 (exp, const0_rtx, VOIDmode, EXPAND_NORMAL, NULL, false);

   mark_transaction_restart_calls (stmt);
 }

this should use

  expand_expr (exp, const0_rtx, VOIDmode, EXPAND_NORMAL);

anyway.

@@ -10286,7 +10297,10 @@ expand_expr_real_1 (tree exp, rtx target, enum mac
  op0 = copy_rtx (op0);
  set_mem_align (op0, MAX (MEM_ALIGN (op0), TYPE_ALIGN (type)));
}
- else if (mode != BLKmode
+ else if (modifier != EXPAND_WRITE
+  && modifier != EXPAND_MEMORY
+  && !expand_reference
+  && mode != BLKmode
   && MEM_ALIGN (op0) < GET_MODE_ALIGNMENT (mode)
   /* If the target does have special handling for unaligned
  loads of mode then use them.  */

@@ -10307,6 +10321,9 @@ expand_expr_real_1 (tree exp, rtx target, enum mac
  return reg;
}
  else if (STRICT_ALIGNMENT
+  && modifier != EXPAND_WRITE
+  && modifier != EXPAND_MEMORY
+  && !expand_reference
   && mode != BLKmode
   && MEM_ALIGN (

RE: [PATCH] Strict volatile bit-fields clean-up, Take 2

2013-12-03 Thread Bernd Edlinger
On Tue, 3 Dec 2013 12:23:11, Richard Biener wrote:
>
> On Tue, Dec 3, 2013 at 12:01 PM, Bernd Edlinger
>  wrote:
>> Hi,
>>
>> This is my proposal for ulimately getting rid of the nasty 
>> store_fixed_bit_field recursion.
>>
>> IMHO, the root of the recursion trouble is here:
>>
>> @@ -1007,12 +1013,8 @@ store_fixed_bit_field (rtx op0, unsigned
>>
>> if (MEM_P (op0))
>> {
>> mode = GET_MODE (op0);
>> if (GET_MODE_BITSIZE (mode) == 0
>> || GET_MODE_BITSIZE (mode)> GET_MODE_BITSIZE (word_mode))
>> mode = word_mode;
>> mode = get_best_mode (bitsize, bitnum, bitregion_start, bitregion_end,
>> MEM_ALIGN (op0), mode, MEM_VOLATILE_P (op0));
>>
>>
>> But, because now we always have bitregion_start and bitregion_end to limit
>> the access size, it is no longer necessary to restrict the largest mode, that
>> get_best_mode may return.
>
> Note that this is not true, as get_bit_range itself may end up giving
> up with bitregion_start = bitregion_end = 0. But maybe this is not
> what you are saying here? That is, does a
>
> gcc_assert (bitregion_start != 0 || bitregion_end != 0);
>
> before the get_best_mode call work for you?

Unfortunately not in Ada, BUT there must be some implicit guarantees
what a store_bit_field(to, bitsize, bitpos, 0, 0, VOIDmode, value) may
overwrite. I always assumed the [bitpos:bitpos+bitsize-1] is just stretched
to the next MEM_ALIGN(to) or WORD_MODE boundary whichever is less?

After all extract_bit_field uses the same bounding region.

The problem is that there are cases where the mode in the CONTEXT is
just too small. And the fall-back must be bullet-proof.

>
>> This patch is very similar to the previous patch, which split up the 
>> extract_fixed_bit_field,
>>
>> This time, I just split up store_fixed_bit_field and use 
>> store_fixed_bit_field_1 to force the
>> strict-volatile-bitfield mode it necessary, and let get_best_mode find a 
>> mode, that is
>> can be used to access the field, which is no longer impacted by the memory 
>> context's selected
>> mode in this case.
>>
>> I tried this patch with an ARM-Cross compiler and a large eCos application, 
>> to see if anything
>> changes in the generated code with this patch, but 2 MB of code stays binary 
>> the same,
>> that's a good sign.
>>
>> I added the new Ada test case, and the test case from PR59134, which does no 
>> longer re-produce
>> after my previous C++ memory model patch, but this "fix" was more or less by 
>> chance.
>>
>>
>> Boot-Strap on X86_64-pc-linux-gnu (languages=all,ada,go) and regression-tests
>> still running.
>>
>>
>> Ok for trunk (when the tests succeed)?
>>
>>
>> Thanks
>> Bernd. 

Re: [PING] [PATCH] Optional alternative base_expr in finding basis for CAND_REFs

2013-12-03 Thread Richard Biener
On Tue, Dec 3, 2013 at 1:50 PM, Yufeng Zhang  wrote:
> On 12/03/13 06:48, Jeff Law wrote:
>>
>> On 12/02/13 08:47, Yufeng Zhang wrote:
>>>
>>> Ping~
>>>
>>> http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03360.html
>>
>>
>>>
>>> Thanks,
>>> Yufeng
>>>
>>> On 11/26/13 15:02, Yufeng Zhang wrote:

 On 11/26/13 12:45, Richard Biener wrote:
>
> On Thu, Nov 14, 2013 at 12:25 AM, Yufeng
> Zhangwrote:
>>
>> On 11/13/13 20:54, Bill Schmidt wrote:
>>>
>>> The second version of your original patch is ok with me with the
>>> following changes.  Sorry for the little side adventure into the
>>> next-interp logic; in the end that's going to hurt more than it
>>> helps in
>>> this case.  Thanks for having a look at it, anyway.  Thanks also for
>>> cleaning up this version to be less intrusive to common interfaces; I
>>> appreciate it.
>>
>>
>>
>> Thanks a lot for the review.  I've attached an updated patch with the
>> suggested changes incorporated.
>>
>> For the next-interp adventure, I was quite happy to do the
>> experiment; it's
>> a good chance of gaining insight into the pass.  Many thanks for
>> your prompt
>> replies and patience in guiding!
>>
>>
>>> Everything else looks OK to me.  Please ask Richard for final
>>> approval,
>>> as I'm not a maintainer.
>>
>> First a note, I need to check on voting for Bill as the slsr maintainer
>> from the steering committee.   Voting was in progress just before the
>> close of stage1 development so I haven't tallied the results :-)
>
>
> Looking forward to some good news! :)
>
>

 Yes, you are right about the non-trivial 'base' tree are rarely shared.
 The cached is introduced mainly because get_alternative_base () may
 be
 called twice on the same 'base' tree, once in the
 find_basis_for_candidate () for look-up and the other time in
 alloc_cand_and_find_basis () for record_potential_basis ().  I'm happy
 to leave out the cache if you think the benefit is trivial.
>>
>> Without some sense of how expensive the lookups are vs how often the
>> cache hits it's awful hard to know if the cache is worth it.
>>
>> I'd say take it out unless you have some sense it's really saving time.
>>It's a pretty minor implementation detail either way.
>
>
> I think the affine tree routines are generally expensive; it is worth having
> a cache to avoid calling them too many times.  I run the slsr-*.c tests
> under gcc.dg/tree-ssa/ and find out that the cache hit rates range from
> 55.6% to 90%, with 73.5% as the average.  The samples may not well represent
> the real world scenario, but they do show the fact that the 'base' tree can
> be shared to some extent.  So I'd like to have the cache in the patch.
>
>
>>

> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-slsr" } */
> +
> +typedef int arr_2[50][50];
> +
> +void foo (arr_2 a2, int v1)
> +{
> +  int i, j;
> +
> +  i = v1 + 5;
> +  j = i;
> +  a2 [i-10] [j] = 2;
> +  a2 [i] [j++] = i;
> +  a2 [i+20] [j++] = i;
> +  a2 [i-3] [i-1] += 1;
> +  return;
> +}
> +
> +/* { dg-final { scan-tree-dump-times "MEM" 5 "slsr" } } */
> +/* { dg-final { cleanup-tree-dump "slsr" } } */
>
> scanning for 5 MEMs looks non-sensical.  What transform do
> you expect?  I see other slsr testcases do similar non-sensical
> checking which is bad, too.


 As the slsr optimizes CAND_REF candidates by simply lowering them to
 MEM_REF from e.g. ARRAY_REF, I think scanning for the number of MEM_REFs
 is an effective check.  Alternatively, I can add a follow-up patch to
 add some dumping facility in replace_ref () to print out the replacing
 actions when -fdump-tree-slsr-details is on.
>>
>> I think adding some details to the dump and scanning for them would be
>> better.  That's the only change that is required for this to move forward.
>
>
> I've updated to patch to dump more details when -fdump-tree-slsr-details is
> on.  The tests have also been updated to scan for these new dumps instead of
> MEMs.
>
>
>>
>> I suggest doing it quickly.  We're well past stage1 close at this point.
>
>
> The bootstrapping on x86_64 is still running.  OK to commit if it succeeds?

I still don't like it.  It's using the wrong and too expensive tools to do
stuff.  What kind of bases are we ultimately interested in?  Browsing
the code it looks like we're having

  /* Base expression for the chain of candidates:  often, but not
 always, an SSA name.  */
  tree base_expr;

which isn't really too informative but I suppose they are all
kind-of-gimple_val()s?  That said, I wonder if you can simply
use get_addr_base_and_unit_offset in place of get_alternative_base (),
ignoring the returned offset.

Richard.

> Thanks,
>
> Yufeng
>
> gcc/
>
> * gimple-ssa-strength-reduction.c: Include tre

Re: [RFC][LIBGCC][2 of 2] 64 bit divide implementation for processor without hw divide instruction

2013-12-03 Thread Ian Lance Taylor
On Mon, Dec 2, 2013 at 10:39 PM, Kugan
 wrote:
> ping

This patch needs to be approved by an ARM maintainer.

Ian

> On 27/11/13 15:30, Kugan wrote:
>> On 27/11/13 02:07, Richard Earnshaw wrote:
>>> On 23/11/13 01:54, Kugan wrote:
>>
>> [snip]
>>
 +2013-11-22  Kugan Vivekanandarajah  
 +
 +   * libgcc/config/arm/pbapi-lib.h (HAVE_NO_HW_DIVIDE): Define for
>>>
>>> It's bpabi-lib.h
>>
>> Thanks for the review.
>>
 +   __ARM_ARCH_7_A__.
 +


>>>
>>> No, this will:
>>> 1) Do the wrong thing for Cortex-a7, A12 and A15 (which all have HW
>>> divide, and currently define __ARM_ARCH_7_A__).
>>> 2) Do the wrong thing for v7-M and v7-R devices, which have Thumb HW
>>> division instructions.
>>> 3) Do the wrong thing for all pre-v7 devices, which don't have HW division.
>>>
>>> I think the correct solution is to test !defined(__ARM_ARCH_EXT_IDIV__)
>>
>> I understand it now and updated the code as attached.
>>
>> +2013-11-27  Kugan Vivekanandarajah  
>> +
>> + * config/arm/bpapi-lib.h (TARGET_HAS_NO_HW_DIVIDE): Define for
>> + architectures that does not have hardware divide instruction.
>> + i.e. architectures that does not define __ARM_ARCH_EXT_IDIV__.
>> +
>>
>>
>> Is this OK for trunk now?
>> Thanks,
>> Kugan
>>
>


Re: [PATCH] Provide a prototype for runtime_traceback

2013-12-03 Thread Ian Lance Taylor
On Tue, Dec 3, 2013 at 1:32 AM, Richard Biener  wrote:
> On Tue, 3 Dec 2013, Richard Biener wrote:
>
>>
>> Hi, I'm getting a bootstrap fail with Go with our custom patched
>> compiler (it adds that warning):
>>
>> 09:48 < richi> ../../../libgo/runtime/go-signal.c: In function
>>'runtime_sighandler':
>> 09:48 < richi> ../../../libgo/runtime/go-signal.c:221:4: error: call to
>>function 'runtime_traceback' without a real prototype
>>[-Werror=unprototyped-calls] runtime_traceback (g);
>>
>> which is because the declaration of runtime_traceback looks like
>>
>>   void   runtime_traceback();
>>
>> and thus if you call it the compiler doesn't know the number of
>> arguments or their types and thus unwanted promotions may apply
>> that change calling conventions (for example float -> double).
>> In this case the argument is a pointer, so it's probably not
>> an issue.  Still the above is not a prototype which the patch
>> below fixes.
>>
>> Bootstrap/testing in progress on x86_64-unknown-linux-gnu
>> (let's hope this was the only one).
>
> Hmm, didn't work out (the implementation has no argument).
>
> Try the following instead.
>
> 2013-12-03  Richard Biener  
>
> libgo/
> * runtime/runtime.h (runtime_traceback): Fix prototype.
> * runtime/go-signal.c (runtime_sighandler): Don't pass
> excess argument to runtime_traceback.

Thanks.  Too much C++/Go on my part, I guess.  Committed to mainline
and 4.8 branch.

Ian


Re: [wide-int] Drop some lingering uses of precision 0

2013-12-03 Thread Kenneth Zadeck

if i did not already say so, this is fine.

kenny

On 12/02/2013 03:20 PM, Richard Sandiford wrote:

I noticed that there were still a couple of tests for zero precision.
This patch replaces them with asserts when handling separately-supplied
precisions and simply drops them when handling existing wide_ints.
(The idea is that most code would break for zero precision wide_ints
and only asserting in some use sites would be inconsistent.)

Also, to_shwi is called either with a nonzero precision argument or
with no argument.  I think it'd be clearer to split the two cases into
separate (overloaded) functions.  It's also more efficient, since the
compiler doesn't know that a variable-precision argument must be nonzero.

Tested on x86_64-linux-gnu.  OK to install?

Thanks,
Richard


Index: gcc/wide-int.cc
===
--- gcc/wide-int.cc 2013-12-02 20:03:50.112581766 +
+++ gcc/wide-int.cc 2013-12-02 20:12:22.178998274 +
@@ -275,9 +275,8 @@ wi::from_mpz (const_tree type, mpz_t x,
  wide_int
  wi::max_value (unsigned int precision, signop sgn)
  {
-  if (precision == 0)
-return shwi (0, precision);
-  else if (sgn == UNSIGNED)
+  gcc_checking_assert (precision != 0);
+  if (sgn == UNSIGNED)
  /* The unsigned max is just all ones.  */
  return shwi (-1, precision);
else
@@ -290,7 +289,8 @@ wi::max_value (unsigned int precision, s
  wide_int
  wi::min_value (unsigned int precision, signop sgn)
  {
-  if (precision == 0 || sgn == UNSIGNED)
+  gcc_checking_assert (precision != 0);
+  if (sgn == UNSIGNED)
  return uhwi (0, precision);
else
  /* The signed min is all zeros except the top bit.  This must be
@@ -1487,9 +1487,6 @@ wi::popcount (const wide_int_ref &x)
unsigned int i;
int count;
  
-  if (x.precision == 0)

-return 0;
-
/* The high order block is special if it is the last block and the
   precision is not an even multiple of HOST_BITS_PER_WIDE_INT.  We
   have to clear out any ones above the precision before doing
@@ -2082,10 +2079,6 @@ wi::ctz (const wide_int_ref &x)
  int
  wi::exact_log2 (const wide_int_ref &x)
  {
-  /* 0-precision values can only hold 0.  */
-  if (x.precision == 0)
-return -1;
-
/* Reject cases where there are implicit -1 blocks above HIGH.  */
if (x.len * HOST_BITS_PER_WIDE_INT < x.precision && x.sign_mask () < 0)
  return -1;
Index: gcc/wide-int.h
===
--- gcc/wide-int.h  2013-12-02 19:52:05.424989079 +
+++ gcc/wide-int.h  2013-12-02 20:12:22.179998282 +
@@ -644,8 +644,10 @@ class GTY(()) generic_wide_int : public
generic_wide_int (const T &, unsigned int);
  
/* Conversions.  */

-  HOST_WIDE_INT to_shwi (unsigned int = 0) const;
-  unsigned HOST_WIDE_INT to_uhwi (unsigned int = 0) const;
+  HOST_WIDE_INT to_shwi (unsigned int) const;
+  HOST_WIDE_INT to_shwi () const;
+  unsigned HOST_WIDE_INT to_uhwi (unsigned int) const;
+  unsigned HOST_WIDE_INT to_uhwi () const;
HOST_WIDE_INT to_short_addr () const;
  
/* Public accessors for the interior of a wide int.  */

@@ -735,18 +737,23 @@ inline generic_wide_int ::gener
  inline HOST_WIDE_INT
  generic_wide_int ::to_shwi (unsigned int precision) const
  {
-  if (precision == 0)
-{
-  if (is_sign_extended)
-   return this->get_val ()[0];
-  precision = this->get_precision ();
-}
if (precision < HOST_BITS_PER_WIDE_INT)
  return sext_hwi (this->get_val ()[0], precision);
else
  return this->get_val ()[0];
  }
  
+/* Return THIS as a signed HOST_WIDE_INT, in its natural precision.  */

+template 
+inline HOST_WIDE_INT
+generic_wide_int ::to_shwi () const
+{
+  if (is_sign_extended)
+return this->get_val ()[0];
+  else
+return to_shwi (this->get_precision ());
+}
+
  /* Return THIS as an unsigned HOST_WIDE_INT, zero-extending from
 PRECISION.  If THIS does not fit in PRECISION, the information
 is lost.  */
@@ -754,14 +761,20 @@ generic_wide_int ::to_shwi (uns
  inline unsigned HOST_WIDE_INT
  generic_wide_int ::to_uhwi (unsigned int precision) const
  {
-  if (precision == 0)
-precision = this->get_precision ();
if (precision < HOST_BITS_PER_WIDE_INT)
  return zext_hwi (this->get_val ()[0], precision);
else
  return this->get_val ()[0];
  }
  
+/* Return THIS as an signed HOST_WIDE_INT, in its natural precision.  */

+template 
+inline unsigned HOST_WIDE_INT
+generic_wide_int ::to_uhwi () const
+{
+  return to_uhwi (this->get_precision ());
+}
+
  /* TODO: The compiler is half converted from using HOST_WIDE_INT to
 represent addresses to using offset_int to represent addresses.
 We use to_short_addr at the interface from new code to old,
@@ -2289,9 +2302,7 @@ wi::add (const T1 &x, const T2 &y, signo
unsigned HOST_WIDE_INT xl = xi.ulow ();
unsigned HOST_WIDE_INT yl = yi.ulow ();
unsigned HOST_WIDE_INT resultl

Re: [REPOST] Invalid Code when reading from unaligned zero-sized array

2013-12-03 Thread Eric Botcazou
> The patch does add a boolean "expand_reference" parameter to
> expand_expr_real and expand_expr_real_1. I pass true when I intend to use
> the returned memory context as an array reference, instead of a value. At
> places where mis-aligned values are extracted, I do not return a register
> with the extracted mis-aligned value if expand_reference is true. When I
> have a VIEW_CONVERT_EXPR I pay attention to pass down the outer
> "expand_reference" to the inner expand_expr_real call. Expand_reference, is
> pretty much similar to the expand_modifier "EXPAND_MEMORY".

IMO that's not acceptable as-is, you're papering over the real issue which are 
the out-of-bounds accesses in the code, i.e. you essentially have an object 
with variable size and non-BLKmode.  The patch is too big an hammer for such 
an obvious lie to the middle-end.  Moreover there is not a _single_ line of 
explanation in it.

If we really want to go for a hack instead of fixing the underlying issue, it 
should be restricted to the cases where stor-layout.c goes wrong.

-- 
Eric Botcazou


Re: libsanitizer merge from upstream r196090

2013-12-03 Thread Konstantin Serebryany
On Tue, Dec 3, 2013 at 3:50 PM, Jakub Jelinek  wrote:
> On Mon, Dec 02, 2013 at 05:43:17PM +0100, Konstantin Serebryany wrote:
>> > No, so your patch doesn't regress anything. I can configure with
>> > --disable-libsanitizer to skip build of libsanitizer, although it
>> > would be nice to support RHEL5 derived long-term distributions.
>> Ok, so this does not gate the merge.
>
> Well, it regresses against 4.8, so it still is a P1 regression.

Does anyone care?
Maintaining asan&co on older platforms has a cost, and we are not
willing to pay it;
even reviewing patches like yours (still, thanks!) requires time.
The only long term strategy to support old systems is if someone works
closely with us in upstream
and we keep the code clean all the time.
If no one cares enough to do that, why should we?

I'll look at your second patch though.

>
> BTW, even the merge itself apparently regresses, powerpc* doesn't build any
> longer and it did before this merge.

The current llvm trunk builds on powerpc64 fine (just checked); we
build only the 64-bit variant.
I suppose that your patch fixes the 32-bit build.
It is broken beyond repair, I don't have any indication anyone ever used it.
Unless I hear otherwise I propose to disable 32-bit powerpc build.
64-bit variant is broken too (tests don't pass), but at least it builds.
If someone wants usable 32-bit powerpc they need to work with us upstream.

> The first attached patch fixes the build on powerpc* (tested on RHEL6/ppc*)
> and the second patch fixes the build on RHEL5/x86_64, beyond what I've
> posted earlier the patch attempts to handle the .cfi* stuff (tried looking
> if clang defines some usable macro, but couldn't find any, so no idea how
> you can find out if you are compiled with clang -fno-dwarf2-cfi-asm

Yes, we will nee to find out some other macro to hide .CFI and such.
Maybe just something like SANITIZER_ALLOW_CFI_ASM or similar.
Again, we need to keep the code clean all the time in upstream,
otherwise gcc merges get too expensive.

> (when tsan will probably not build at all), or with the -fdwarf2-cfi-asm
> default.  Probably either you need to convince llvm folks to add something,
> or add configure test, moved the linux/mroute* headers to the Linux specific
> file so that linux/in.h stuff doesn't clash with netinet/in.h etc. and
> finally hacks to avoid including linux/types.h at all costs, that header
> historically has always been terrible portability minefield, as it defines
> types that glibc headers define too.
>
> With these two patches on top of your patch, I get libsanitizer to build
> in multilib configurations on Fedora19/x86_64, RHEL6/ppc* and RHEL5/x86_64
> (in all cases both 32-bit and 64-bit builds, but not the third x32 multilib).
>
> Testsuite passes on Fedora19, on RHEL5/x86_64 tests that fail typically fail
> on
> ==2738==AddressSanitizer CHECK failed:
> ../../../../libsanitizer/sanitizer_common/sanitizer_linux_libcdep.cc:260 
> "((*tls_addr + *tls_size)) <= ((*stk_addr + *stk_size))" (0x2af8df1bc240, 
> 0x2af8df1bc000)
> which clearly is a bug in sanitizer_common,
>
> #if defined(__x86_64__) || defined(__i386__)
> // sizeof(struct thread) from glibc.
> // There has been a report of this being different on glibc 2.11 and 2.13. We
> // don't know when this change happened, so 2.14 is a conservative estimate.
> #if __GLIBC_PREREQ(2, 14)
> const uptr kThreadDescriptorSize = FIRST_32_SECOND_64(1216, 2304);
> #else
> const uptr kThreadDescriptorSize = FIRST_32_SECOND_64(1168, 2304);
> #endif



>
> uptr ThreadDescriptorSize() {
>   return kThreadDescriptorSize;
> }
> but also the InitTlsSize hack can't ever work reliably.
> If you need this info, ask glibc folks for a proper supported API.

This won't happen any time soon, right?
I'd like to ask glibc for many other things, not just this, but the latency
of the glibc changes propagating to users is too low, so we don't
bother (although we should)
E.g. we've been hit by the ugly
pthread_getattr_np/pthread_attr_getstack multiple times.
:(

> Hardcoding size of glibc private structure that glibc has changed multiple
> times (note, RHEL5 has glibc 2.5 and clearly the size doesn't match there
> even with the one you've computed on glibc 2.11) will most likely break any
> time glibc adds further fields in there, not to mention that the above
> means that if you e.g. build libsanitizer say on glibc 2.11 and run against
> glibc 2.14, it will also not work.  Plus calling _dl_get_tls_static_info,
> which is a GLIBC_PRIVATE symbol, is also going to break whenever glibc
> decides to change the ABI of that function.
> I believe libthread_db.so.1 has some APIs to query sizeof (struct pthread),
> but have never used those APIs so don't know how to query that.

I think one of us tried that with not success.

> My recommendation would be to just ifdef out this for now until you use
> some sane reliable and supportable API.  Otherwise, it may work if you are
> lucky and always build libsanitizer against the 

RE: _Cilk_spawn and _Cilk_sync for C++

2013-12-03 Thread Iyer, Balaji V
> 
> > case CILK_SPAWN_STMT:
> >   gcc_assert
> > (fn_contains_cilk_spawn_p (cfun)
> >  && lang_hooks.cilkplus.cilk_detect_spawn_and_unwrap (expr_p));
> >   if (!seen_error ())
> > {
> >   ret = (enum gimplify_status)
> > lang_hooks.cilkplus.gimplify_cilk_spawn (expr_p, pre_p,
> >  post_p);
> >   break;
> > }
> >   /* If errors are seen, then just process it as a CALL_EXPR.
> > */
> 
> Please remove these langhooks and instead add handling of
> CILK_SPAWN_STMT to c_gimplify_expr and cp_gimplify_expr.
> 
Hi Jason,
I really cannot do this because if the spawned function returns a 
value, the whole expression must be pushed into the spawn helper. Thus, this 
lang_hooks function is called in the gimplify_modify_expr and a couple other 
places.

E.g.  In:

x = _Cilk_spawn func ()

The spawn helper should contain:

x = func ();

Thanks,

Balaji V. Iyer.


Re: patch for elimination to SP when it is changed in RTL (PR57293)

2013-12-03 Thread Vladimir Makarov

On 12/3/2013, 6:54 AM, Marcus Shawcroft wrote:

On 2 December 2013 23:44, Vladimir Makarov  wrote:


If somebody with the rights approves, I can commit it tomorrow.

2013-12-02  Vladimir Makarov  

 * config/aarch64/aarch64.c (aarch64_frame_pointer_required): Check
 LR_REGNUM.
 (aarch64_can_eliminate): Don't check elimination source when
 frame_pointer_requred is false.




This is fine with me, go ahead and commit it.  Thanks /Marcus


Committed as rev. 205637 with changelog fix of a typo found by Jeff.



Re: libsanitizer merge from upstream r196090

2013-12-03 Thread Jakub Jelinek
On Tue, Dec 03, 2013 at 07:18:14PM +0400, Konstantin Serebryany wrote:
> >> > No, so your patch doesn't regress anything. I can configure with
> >> > --disable-libsanitizer to skip build of libsanitizer, although it
> >> > would be nice to support RHEL5 derived long-term distributions.
> >> Ok, so this does not gate the merge.
> >
> > Well, it regresses against 4.8, so it still is a P1 regression.
> 
> Does anyone care?

Of course.  First of all all users trying to compile gcc just to find out
it won't build out of the box.  Also users that were using asan just fine
on older platforms in gcc 4.8 and now they'll find out that the support
has been dropped.

> Maintaining asan&co on older platforms has a cost, and we are not
> willing to pay it;

Well, but then you just get approximately same cost if not higher in
maintaining configure bits for checking all the silent assumptions the code
makes and disabling libsanitizer in that case.

As you can see in the patches I've posted, most of the issues are in the
headers which you should be able to test the way I've posted yesterday, just
unpack a couple of .../usr/include trees from various distros.  The only
ones not found by that are the .cfi_* stuff, you can test it by building
with -fno-dwarf2-cfi-asm and see whether it still builds, and
the tls size issue, which needs some solution in any case, because it is
just totally broken.  Really ask the glibc folks for an API or talk to them
how to query it using libthread_db.so.1, struct pthread's size changes
frequently.

> > BTW, even the merge itself apparently regresses, powerpc* doesn't build any
> > longer and it did before this merge.
> 
> The current llvm trunk builds on powerpc64 fine (just checked); we
> build only the 64-bit variant.
> I suppose that your patch fixes the 32-bit build.
> It is broken beyond repair, I don't have any indication anyone ever used it.
> Unless I hear otherwise I propose to disable 32-bit powerpc build.
> 64-bit variant is broken too (tests don't pass), but at least it builds.
> If someone wants usable 32-bit powerpc they need to work with us upstream.

Why do you think it would be broken beyond repair?

> > The first attached patch fixes the build on powerpc* (tested on RHEL6/ppc*)
> > and the second patch fixes the build on RHEL5/x86_64, beyond what I've
> > posted earlier the patch attempts to handle the .cfi* stuff (tried looking
> > if clang defines some usable macro, but couldn't find any, so no idea how
> > you can find out if you are compiled with clang -fno-dwarf2-cfi-asm
> 
> Yes, we will nee to find out some other macro to hide .CFI and such.
> Maybe just something like SANITIZER_ALLOW_CFI_ASM or similar.
> Again, we need to keep the code clean all the time in upstream,
> otherwise gcc merges get too expensive.

Sure, just invent a macro for it and use it everywhere, for GCC that macro
can be defined based on whether __GCC_HAVE_DWARF2_CFI_ASM is defined, for
llvm/clang configure or whatever else.

> > #if defined(__x86_64__) || defined(__i386__)
> > // sizeof(struct thread) from glibc.
> > // There has been a report of this being different on glibc 2.11 and 2.13. 
> > We
> > // don't know when this change happened, so 2.14 is a conservative estimate.
> > #if __GLIBC_PREREQ(2, 14)
> > const uptr kThreadDescriptorSize = FIRST_32_SECOND_64(1216, 2304);
> > #else
> > const uptr kThreadDescriptorSize = FIRST_32_SECOND_64(1168, 2304);
> > #endif
> >
> > uptr ThreadDescriptorSize() {
> >   return kThreadDescriptorSize;
> > }
> > but also the InitTlsSize hack can't ever work reliably.
> > If you need this info, ask glibc folks for a proper supported API.
> 
> This won't happen any time soon, right?
> I'd like to ask glibc for many other things, not just this, but the latency
> of the glibc changes propagating to users is too low, so we don't
> bother (although we should)
> E.g. we've been hit by the ugly
> pthread_getattr_np/pthread_attr_getstack multiple times.
> :(

If you never ask for it, it will never be done, it is that simple.

Anyway, does the code rely on exactly the right size of struct pthread, or
is a conservative minimum fine?  Perhaps it is just a matter of adding
another case, if you tested say glibc 2.11, just don't do anything at all
for older glibcs (i.e. the same thing as for non-i?86/x86_64) - tls size 0.

Jakub


Re: [PING] [PATCH] Optional alternative base_expr in finding basis for CAND_REFs

2013-12-03 Thread Yufeng Zhang

On 12/03/13 14:20, Richard Biener wrote:

On Tue, Dec 3, 2013 at 1:50 PM, Yufeng Zhang  wrote:

On 12/03/13 06:48, Jeff Law wrote:


On 12/02/13 08:47, Yufeng Zhang wrote:


Ping~

http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03360.html





Thanks,
Yufeng

On 11/26/13 15:02, Yufeng Zhang wrote:


On 11/26/13 12:45, Richard Biener wrote:


On Thu, Nov 14, 2013 at 12:25 AM, Yufeng
Zhang wrote:


On 11/13/13 20:54, Bill Schmidt wrote:


The second version of your original patch is ok with me with the
following changes.  Sorry for the little side adventure into the
next-interp logic; in the end that's going to hurt more than it
helps in
this case.  Thanks for having a look at it, anyway.  Thanks also for
cleaning up this version to be less intrusive to common interfaces; I
appreciate it.




Thanks a lot for the review.  I've attached an updated patch with the
suggested changes incorporated.

For the next-interp adventure, I was quite happy to do the
experiment; it's
a good chance of gaining insight into the pass.  Many thanks for
your prompt
replies and patience in guiding!



Everything else looks OK to me.  Please ask Richard for final
approval,
as I'm not a maintainer.


First a note, I need to check on voting for Bill as the slsr maintainer
from the steering committee.   Voting was in progress just before the
close of stage1 development so I haven't tallied the results :-)



Looking forward to some good news! :)




Yes, you are right about the non-trivial 'base' tree are rarely shared.
 The cached is introduced mainly because get_alternative_base () may
be
called twice on the same 'base' tree, once in the
find_basis_for_candidate () for look-up and the other time in
alloc_cand_and_find_basis () for record_potential_basis ().  I'm happy
to leave out the cache if you think the benefit is trivial.


Without some sense of how expensive the lookups are vs how often the
cache hits it's awful hard to know if the cache is worth it.

I'd say take it out unless you have some sense it's really saving time.
It's a pretty minor implementation detail either way.



I think the affine tree routines are generally expensive; it is worth having
a cache to avoid calling them too many times.  I run the slsr-*.c tests
under gcc.dg/tree-ssa/ and find out that the cache hit rates range from
55.6% to 90%, with 73.5% as the average.  The samples may not well represent
the real world scenario, but they do show the fact that the 'base' tree can
be shared to some extent.  So I'd like to have the cache in the patch.







+/* { dg-do compile } */
+/* { dg-options "-O2 -fdump-tree-slsr" } */
+
+typedef int arr_2[50][50];
+
+void foo (arr_2 a2, int v1)
+{
+  int i, j;
+
+  i = v1 + 5;
+  j = i;
+  a2 [i-10] [j] = 2;
+  a2 [i] [j++] = i;
+  a2 [i+20] [j++] = i;
+  a2 [i-3] [i-1] += 1;
+  return;
+}
+
+/* { dg-final { scan-tree-dump-times "MEM" 5 "slsr" } } */
+/* { dg-final { cleanup-tree-dump "slsr" } } */

scanning for 5 MEMs looks non-sensical.  What transform do
you expect?  I see other slsr testcases do similar non-sensical
checking which is bad, too.



As the slsr optimizes CAND_REF candidates by simply lowering them to
MEM_REF from e.g. ARRAY_REF, I think scanning for the number of MEM_REFs
is an effective check.  Alternatively, I can add a follow-up patch to
add some dumping facility in replace_ref () to print out the replacing
actions when -fdump-tree-slsr-details is on.


I think adding some details to the dump and scanning for them would be
better.  That's the only change that is required for this to move forward.



I've updated to patch to dump more details when -fdump-tree-slsr-details is
on.  The tests have also been updated to scan for these new dumps instead of
MEMs.




I suggest doing it quickly.  We're well past stage1 close at this point.



The bootstrapping on x86_64 is still running.  OK to commit if it succeeds?


I still don't like it.  It's using the wrong and too expensive tools to do
stuff.  What kind of bases are we ultimately interested in?  Browsing
the code it looks like we're having

   /* Base expression for the chain of candidates:  often, but not
  always, an SSA name.  */
   tree base_expr;

which isn't really too informative but I suppose they are all
kind-of-gimple_val()s?  That said, I wonder if you can simply
use get_addr_base_and_unit_offset in place of get_alternative_base (),
ignoring the returned offset.


'base_expr' is essentially the base address of a handled_component_p, 
e.g. ARRAY_REF, COMPONENT_REF, etc.  In most case, it is the address of 
the object returned by get_inner_reference ().


Given a test case like the following:

typedef int arr_2[20][20];

void foo (arr_2 a2, int i, int j)
{
  a2[i+10][j] = 1;
  a2[i+10][j+1] = 1;
  a2[i+20][j] = 1;
}

The IR before SLSR is (on x86_64):

  _2 = (long unsigned int) i_1(D);
  _3 = _2 * 80;
  _4 = _3 + 800;
  _6 = a2_5(D) + _4;
  *_6[j_8(D)] = 1;
  _10 = j_8(D) + 1;
  *_6[_10] = 1;
  _12 = _3 + 1600;
  _13 = a2_5(D) +

Re: [wide-int] small cleanup in wide-int.*

2013-12-03 Thread Kenneth Zadeck

On 11/29/2013 05:24 AM, Richard Biener wrote:

On Thu, Nov 28, 2013 at 6:11 PM, Kenneth Zadeck
 wrote:

This patch does three things in wide-int:

1) it cleans up some comments.
2) removes a small amount of trash.
3) it changes the max size of the wide int from being 4x of
MAX_BITSIZE_MODE_ANY_INT to 2x +1.   This should improve large muls and divs
as well as perhaps help with some cache behavior.

@@ -235,8 +233,8 @@ along with GCC; see the file COPYING3.
 range of a multiply.  This code needs 2n + 2 bits.  */

  #define WIDE_INT_MAX_ELTS \
-  ((4 * MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) \
-   / HOST_BITS_PER_WIDE_INT)
+  (((2 * MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) \
+/ HOST_BITS_PER_WIDE_INT) + 1)

I always wondered why VRP (if that is the only reason we do 2*n+1)
cannot simply use FIXED_WIDE_INT(MAX_BITSIZE_MODE_ANY_INT*2 + 1)?
Other widest_int users should not suffer IMHO.  widest_int should
strictly cover all modes that the target can do any arithmetic on
(thus not XImode or OImode on x86_64).

Richard.


ok to commit
enclosed is the code to change the large multiplies in tree-vrp as well 
as sets the size of the WIDE_INT_MAX_ELTS as we have discussed.


Bootstrapped and tested on x86-64.

ok to commit?

kenny
Index: tree-vrp.c
===
--- tree-vrp.c	(revision 205597)
+++ tree-vrp.c	(working copy)
@@ -2611,22 +2611,28 @@ extract_range_from_binary_expr_1 (value_
 
   signop sign = TYPE_SIGN (expr_type);
   unsigned int prec = TYPE_PRECISION (expr_type);
-  unsigned int prec2 = (prec * 2) + (sign == UNSIGNED ? 2 : 0);
 
   if (range_int_cst_p (&vr0)
 	  && range_int_cst_p (&vr1)
 	  && TYPE_OVERFLOW_WRAPS (expr_type))
 	{
-	  wide_int sizem1 = wi::mask (prec, false, prec2);
-	  wide_int size = sizem1 + 1;
+	  /* vrp_int is twice as wide as anything that the target
+	 supports so it can support a full width multiply.  No
+	 need to add any more padding for an extra sign bit
+	 because that comes with the way that WIDE_INT_MAX_ELTS is
+	 defined.  */ 
+	  typedef FIXED_WIDE_INT (WIDE_INT_MAX_PRECISION * 2) 
+	vrp_int;
+	  vrp_int sizem1 = wi::mask  (prec, false);
+	  vrp_int size = sizem1 + 1;
 
 	  /* Extend the values using the sign of the result to PREC2.
 	 From here on out, everthing is just signed math no matter
 	 what the input types were.  */
-	  wide_int min0 = wide_int::from (vr0.min, prec2, sign);
-	  wide_int max0 = wide_int::from (vr0.max, prec2, sign);
-	  wide_int min1 = wide_int::from (vr1.min, prec2, sign);
-	  wide_int max1 = wide_int::from (vr1.max, prec2, sign);
+	  vrp_int min0 = wi::to_vrp (vr0.min);
+	  vrp_int max0 = wi::to_vrp (vr0.max);
+	  vrp_int min1 = wi::to_vrp (vr1.min);
+	  vrp_int max1 = wi::to_vrp (vr1.max);
 
 	  /* Canonicalize the intervals.  */
 	  if (sign == UNSIGNED)
@@ -2644,17 +2650,17 @@ extract_range_from_binary_expr_1 (value_
 		}
 	}
 
-	  wide_int prod0 = min0 * min1;
-	  wide_int prod1 = min0 * max1;
-	  wide_int prod2 = max0 * min1;
-	  wide_int prod3 = max0 * max1;
+	  vrp_int prod0 = min0 * min1;
+	  vrp_int prod1 = min0 * max1;
+	  vrp_int prod2 = max0 * min1;
+	  vrp_int prod3 = max0 * max1;
 
 	  /* Sort the 4 products so that min is in prod0 and max is in
 	 prod3.  */
 	  /* min0min1 > max0max1 */
 	  if (wi::gts_p (prod0, prod3))
 	{
-	  wide_int tmp = prod3;
+	  vrp_int tmp = prod3;
 	  prod3 = prod0;
 	  prod0 = tmp;
 	}
@@ -2662,21 +2668,21 @@ extract_range_from_binary_expr_1 (value_
 	  /* min0max1 > max0min1 */
 	  if (wi::gts_p (prod1, prod2))
 	{
-	  wide_int tmp = prod2;
+	  vrp_int tmp = prod2;
 	  prod2 = prod1;
 	  prod1 = tmp;
 	}
 
 	  if (wi::gts_p (prod0, prod1))
 	{
-	  wide_int tmp = prod1;
+	  vrp_int tmp = prod1;
 	  prod1 = prod0;
 	  prod0 = tmp;
 	}
 
 	  if (wi::gts_p (prod2, prod3))
 	{
-	  wide_int tmp = prod3;
+	  vrp_int tmp = prod3;
 	  prod3 = prod2;
 	  prod2 = tmp;
 	}
Index: tree.h
===
--- tree.h	(revision 205597)
+++ tree.h	(working copy)
@@ -4565,10 +4565,12 @@ namespace wi
 static const unsigned int precision = N;
   };
 
-  generic_wide_int  >
+  generic_wide_int  >
   to_widest (const_tree);
 
   generic_wide_int  > to_offset (const_tree);
+
+  generic_wide_int  > to_vrp (const_tree);
 }
 
 inline unsigned int
@@ -4586,7 +4588,7 @@ wi::int_traits ::decompose (
 			  precision);
 }
 
-inline generic_wide_int  >
+inline generic_wide_int  >
 wi::to_widest (const_tree t)
 {
   return t;
@@ -4597,6 +4599,12 @@ wi::to_offset (const_tree t)
 {
   return t;
 }
+
+inline generic_wide_int  >
+wi::to_vrp (const_tree t)
+{
+  return t;
+}
 
 template 
 inline wi::extended_tree ::extended_tree (const_tree t)
Index: wide-int.h
===
--- wide-int.h	(r

[PING][PATCH][ARM]Use of vcvt for float to fixed point conversions.

2013-12-03 Thread Renlin Li

Hi Ramana/Richard,

Could you please help to review this patch?

Thanks in advance!

Kind regards,
Renlin Li

On 20/11/13 16:27, Renlin Li wrote:

Hi all,

This patch will make the arm back-end use vcvt for float to fixed 
point conversions when applicable.


Test on arm-none-linux-gnueabi has been done on the model.
Okay for trunk?


Kind regards,
Renlin Li


gcc/ChangeLog:

2013-11-20  Renlin Li  

* config/arm/arm-protos.h (vfp_const_double_for_bits): Declare.
* config/arm/constraints.md (Dp): Define new constraint.
* config/arm/predicates.md ( const_double_vcvt_power_of_two): Define
new predicate.
* config/arm/arm.c (arm_print_operand): Add print for new fucntion.
(vfp3_const_double_for_bits): New function.
* config/arm/vfp.md (combine_vcvtf2i): Define new instruction.

gcc/testsuite/ChangeLog:

2013-11-20  Renlin Li  

* gcc.target/arm/fixed_float_conversion.c: New test case.





Re: libsanitizer merge from upstream r196090

2013-12-03 Thread Jeff Law

On 12/03/13 08:47, Jakub Jelinek wrote:

On Tue, Dec 03, 2013 at 07:18:14PM +0400, Konstantin Serebryany wrote:

No, so your patch doesn't regress anything. I can configure with
--disable-libsanitizer to skip build of libsanitizer, although it
would be nice to support RHEL5 derived long-term distributions.

Ok, so this does not gate the merge.


Well, it regresses against 4.8, so it still is a P1 regression.


Does anyone care?


Of course.  First of all all users trying to compile gcc just to find out
it won't build out of the box.  Also users that were using asan just fine
on older platforms in gcc 4.8 and now they'll find out that the support
has been dropped.
Right.  We certainly care.  Those old platforms are still in widespread 
use (for better or worse).





Maintaining asan&co on older platforms has a cost, and we are not
willing to pay it;


Well, but then you just get approximately same cost if not higher in
maintaining configure bits for checking all the silent assumptions the code
makes and disabling libsanitizer in that case.
Right.  Fixing this problem and fixing it correctly will reduce the long 
term pain for both projects.





This won't happen any time soon, right?
I'd like to ask glibc for many other things, not just this, but the latency
of the glibc changes propagating to users is too low, so we don't
bother (although we should)
E.g. we've been hit by the ugly
pthread_getattr_np/pthread_attr_getstack multiple times.
:(


If you never ask for it, it will never be done, it is that simple.
glibc pushes out a release every six months which then gets put into the 
next Fedora release which usually follows a few months later.  We're 
talking about a worst case lag time of roughly 9 months, often much 
less.  It's probably similar for the SuSE community releases.


I think everyone who has had a bad experience working with glibc's team 
in the past should try to put it behind them.  The new team running 
glibc is much more reasonable to work with -- the biggest problem now is 
rebooting the project after years of mis-management.  The progress 
they've made towards that over the last year has been tremendous.


jeff




Re: [wide-int] small cleanup in wide-int.*

2013-12-03 Thread Richard Sandiford
Kenneth Zadeck  writes:
> Index: tree-vrp.c
> ===
> --- tree-vrp.c(revision 205597)
> +++ tree-vrp.c(working copy)
> @@ -2611,22 +2611,28 @@ extract_range_from_binary_expr_1 (value_
>  
>signop sign = TYPE_SIGN (expr_type);
>unsigned int prec = TYPE_PRECISION (expr_type);
> -  unsigned int prec2 = (prec * 2) + (sign == UNSIGNED ? 2 : 0);
>  
>if (range_int_cst_p (&vr0)
> && range_int_cst_p (&vr1)
> && TYPE_OVERFLOW_WRAPS (expr_type))
>   {
> -   wide_int sizem1 = wi::mask (prec, false, prec2);
> -   wide_int size = sizem1 + 1;
> +   /* vrp_int is twice as wide as anything that the target
> +  supports so it can support a full width multiply.  No
> +  need to add any more padding for an extra sign bit
> +  because that comes with the way that WIDE_INT_MAX_ELTS is
> +  defined.  */ 
> +   typedef FIXED_WIDE_INT (WIDE_INT_MAX_PRECISION * 2) 
> + vrp_int;
> +   vrp_int sizem1 = wi::mask  (prec, false);
> +   vrp_int size = sizem1 + 1;
>  
> /* Extend the values using the sign of the result to PREC2.
>From here on out, everthing is just signed math no matter
>what the input types were.  */
> -   wide_int min0 = wide_int::from (vr0.min, prec2, sign);
> -   wide_int max0 = wide_int::from (vr0.max, prec2, sign);
> -   wide_int min1 = wide_int::from (vr1.min, prec2, sign);
> -   wide_int max1 = wide_int::from (vr1.max, prec2, sign);
> +   vrp_int min0 = wi::to_vrp (vr0.min);
> +   vrp_int max0 = wi::to_vrp (vr0.max);
> +   vrp_int min1 = wi::to_vrp (vr1.min);
> +   vrp_int max1 = wi::to_vrp (vr1.max);

I think we should avoid putting to_vrp in tree.h if vrp_int is only
local to this block.  Instead you could have:

  typedef generic_wide_int
  > vrp_int_cst;
  ...
  vrp_int_cst min0 = vr0.min;
  vrp_int_cst max0 = vr0.max;
  vrp_int_cst min1 = vr1.min;
  vrp_int_cst max1 = vr1.max;

> @@ -228,15 +228,16 @@ along with GCC; see the file COPYING3.
>  #endif
>  
>  /* The MAX_BITSIZE_MODE_ANY_INT is automatically generated by a very
> -   early examination of the target's mode file.  Thus it is safe that
> -   some small multiple of this number is easily larger than any number
> -   that that target could compute.  The place in the compiler that
> -   currently needs the widest ints is the code that determines the
> -   range of a multiply.  This code needs 2n + 2 bits.  */
> -
> +   early examination of the target's mode file.  The WIDE_INT_MAX_ELTS
> +   can accomodate at least 1 more bit so that unsigned numbers of that
> +   mode can be represented.  This will accomodate every place in the
> +   compiler except for a multiply routine in tree-vrp.  That function
> +   makes its own arrangements for larger wide-ints.  */

I think we should drop the "This will accomodate..." bit, since it'll soon
get out of date.  Maybe something like:

Note that it is still possible to create fixed_wide_ints that have
precisions greater than MAX_BITSIZE_MODE_ANY_INT.  This can be useful
when representing a double-width multiplication result, for example.  */

>  #define WIDE_INT_MAX_ELTS \
> -  ((4 * MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1) \
> -   / HOST_BITS_PER_WIDE_INT)
> +  (((MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT - 1)  \
> +/ HOST_BITS_PER_WIDE_INT) + 1)

I think this should be:

  (MAX_BITSIZE_MODE_ANY_INT / HOST_BITS_PER_WIDE_INT + 1)

We only need an extra HWI if MAX_BITSIZE_MODE_ANY_INT is an exact multiple
of HOST_BITS_PER_WIDE_INT.

Looks good to me otherwise FWIW.

You probably already realise this, but for avoidance of doubt, Richard
was also asking that we reduce MAX_BITSIZE_MODE_ANY_INT to 128 on x86_64,
since that's the largest scalar_mode_supported_p mode.

Thanks,
Richard



[PATCH, PR 58253] Make IPA-SRA created PARM_DECLs always naturally aligned

2013-12-03 Thread Martin Jambor
Hi,

the patch below fixes a failure on the epiphany target where callers
and callees do not agree on registers for parameter passing because
they see different alignment of actual arguments and formal parameters
(there is some more information on this in bugzilla).  The actual
arguments are SSA names - created by force_gimple_operand_gsi in
ipa_modify_call_arguments - which are of a naturally aligned type
while formal parameters are PARM_DECLs - directly built in
ipa_modify_formal_parameters - of the types specified in
ipa_parm_adjustment_vec which may not be aligned.

Because we use the alignment of types in ipa_parm_adjustment_vec to
signal to ipa_modify_call_arguments that it needs to built unaligned
MEM_REFs, it is ipa_modify_formal_parameters that has to fix up the
PARM_DECLs in cases where callers will produce SSA_NAMES, i.e. when
the type is a gimple_register_type.  That's what the patch below
does.

Bootstrapped and tested on x86_64-linux, only a slightly different
patch also passed bootstrap on ppc64-linux and has been confirmed to
fix the problem on epiphany.  OK for trunk?

Thanks,

Martin


2013-11-28  Martin Jambor  

PR ipa/58253
* ipa-prop.c (ipa_modify_formal_parameters): Create decls of
non-BLKmode in their naturally aligned type.

diff --git a/gcc/ipa-prop.c b/gcc/ipa-prop.c
index 712dab7..83dc53e 100644
--- a/gcc/ipa-prop.c
+++ b/gcc/ipa-prop.c
@@ -3444,7 +3444,15 @@ ipa_modify_formal_parameters (tree fndecl, 
ipa_parm_adjustment_vec adjustments)
  if (adj->by_ref)
ptype = build_pointer_type (adj->type);
  else
-   ptype = adj->type;
+   {
+ ptype = adj->type;
+ if (is_gimple_reg_type (ptype))
+   {
+ unsigned malign = GET_MODE_ALIGNMENT (TYPE_MODE (ptype));
+ if (TYPE_ALIGN (ptype) < malign)
+   ptype = build_aligned_type (ptype, malign);
+   }
+   }
 
  if (care_for_types)
new_arg_types = tree_cons (NULL_TREE, ptype, new_arg_types);


[PATCH] Fix missing plugin headers

2013-12-03 Thread David Malcolm
I attempted to build gcc-python-plugin against gcc trunk (r205358
specifically).

I ran into three kinds of issues:
  (A) tracking down where declarations have landed in the new layout
  (B) adding new headers to the list of those to be installed
  (C) some porting due to "real" changes e.g. passes becoming C++
classes.

Attached is a patch for gcc trunk to add various headers to
PLUGIN_HEADERS (either directly, or indirectly via TREE_H), ensuring
that all of the needed headers are installed.

One detail is that some headers are target-specific: on i386 there are
the headers "stringop.def" and "x86-tune.def" needed by
config/i386/i386.h.

I initially tried fixing this by adding
  PLUGIN_HEADERS += config/i386/stringop.def config/i386/x86-tune.def
to gcc/config/i386/t-i386, but this didn't work, presumably because the:
   include $(tmake_file)
happens in line 1567 of my generated Makefile, whereas PLUGIN_HEADERS is
set up way further down at line 3127.  So we perhaps could reorder these
(I don't know if this leads to another issue), but instead I invented a
macro TARGET_PLUGIN_HEADERS, setting it up in t-i386 on this target, and
relying on an empty default expansion on other targets.

Successfully bootstrapped on x86_64-unknown-linux-gnu.

OK for trunk?

For reference, the corresponding patch to fixup gcc-python-plugin to
build against gcc 4.9 (or at least current trunk) can be seen here:
https://git.fedorahosted.org/cgit/gcc-python-plugin.git/commit/?id=903202a97f30f1cf10e4c57b70c31894f0c2cc97
(posting here since it might be helpful to other plugins).
commit 022bcca7a42be2cb794d378691b2044edc424e0d
Author: David Malcolm 
Date:   Mon Dec 2 14:45:35 2013 -0500

Add missing headers to PLUGIN_HEADERS

gcc/
	* Makefile.in (BUILTINS_DEF): Add chkp-builtins.def.
	(TREE_H): Add fold-const.h.
	(PLUGIN_HEADERS): Add gimple-iterator.h, gimple-walk.h, print-tree.h,
	context.h, pass_manager.h, pass-instances.def, and new macro
	TARGET_PLUGIN_HEADERS.

	* config/i386/t-i386 (TARGET_PLUGIN_HEADERS): New.

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 4d683a0..0522e2c 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -873,7 +873,8 @@ RTL_H = $(RTL_BASE_H) $(FLAGS_H) genrtl.h
 READ_MD_H = $(OBSTACK_H) $(HASHTAB_H) read-md.h
 PARAMS_H = params.h params.def
 BUILTINS_DEF = builtins.def sync-builtins.def omp-builtins.def \
-	gtm-builtins.def sanitizer.def cilkplus.def cilk-builtins.def
+	gtm-builtins.def sanitizer.def cilkplus.def cilk-builtins.def \
+	chkp-builtins.def
 INTERNAL_FN_DEF = internal-fn.def
 INTERNAL_FN_H = internal-fn.h $(INTERNAL_FN_DEF)
 TREE_CORE_H = tree-core.h coretypes.h all-tree.def tree.def \
@@ -882,7 +883,7 @@ TREE_CORE_H = tree-core.h coretypes.h all-tree.def tree.def \
 	$(VEC_H) treestruct.def $(HASHTAB_H) \
 	double-int.h alias.h $(SYMTAB_H) $(FLAGS_H) \
 	$(REAL_H) $(FIXED_VALUE_H)
-TREE_H = tree.h $(TREE_CORE_H)  tree-check.h
+TREE_H = tree.h $(TREE_CORE_H)  tree-check.h fold-const.h
 REGSET_H = regset.h $(BITMAP_H) hard-reg-set.h
 BASIC_BLOCK_H = basic-block.h $(PREDICT_H) $(VEC_H) $(FUNCTION_H) \
 	cfg-flags.def cfghooks.h
@@ -3117,7 +3118,9 @@ PLUGIN_HEADERS = $(TREE_H) $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
   cppdefault.h flags.h $(MD5_H) params.def params.h prefix.h tree-inline.h \
   $(GIMPLE_PRETTY_PRINT_H) realmpfr.h \
   $(IPA_PROP_H) $(TARGET_H) $(RTL_H) $(TM_P_H) $(CFGLOOP_H) $(EMIT_RTL_H) \
-  version.h stringpool.h
+  version.h stringpool.h gimple-iterator.h gimple-walk.h print-tree.h \
+  context.h pass_manager.h pass-instances.def \
+  $(TARGET_PLUGIN_HEADERS)
 
 # generate the 'build fragment' b-header-vars
 s-header-vars: Makefile
diff --git a/gcc/config/i386/t-i386 b/gcc/config/i386/t-i386
index 1a76c41..c7a9fdf 100644
--- a/gcc/config/i386/t-i386
+++ b/gcc/config/i386/t-i386
@@ -16,6 +16,8 @@
 # along with GCC; see the file COPYING3.  If not see
 # .
 
+TARGET_PLUGIN_HEADERS = config/i386/stringop.def config/i386/x86-tune.def
+
 i386-c.o: $(srcdir)/config/i386/i386-c.c i386-builtin-types.inc
 	  $(COMPILE) $<
 	  $(POSTCOMPILE)


[PATCH, rs6000] Committed fix for HTM macro name typos

2013-12-03 Thread Peter Bergner
Adhemerval noticed that the _TEXASR_INSTRUCTION_FETCH_CONFLICT and
_TEXASRU_INSTRUCTION_FETCH_CONFLICT HTM macros were misspelled.
I committed the following patch as obvious to fix them.

Peter

* config/rs6000/htmintrin.h (_TEXASR_INSTRUCTION_FETCH_CONFLICT): Fix
typo in macro name.
(_TEXASRU_INSTRUCTION_FETCH_CONFLICT): Likewise.

Index: gcc/config/rs6000/htmintrin.h
===
--- gcc/config/rs6000/htmintrin.h   (revision 205640)
+++ gcc/config/rs6000/htmintrin.h   (revision 205641)
@@ -99,9 +99,9 @@ typedef uintptr_t tfhar_t;
 #define _TEXASRU_IMPLEMENTAION_SPECIFIC(TEXASRU) \
   _TEXASRU_EXTRACT_BITS(TEXASRU, 15, 1)

-#define _TEXASR_INSRUCTION_FETCH_CONFLICT(TEXASR) \
+#define _TEXASR_INSTRUCTION_FETCH_CONFLICT(TEXASR) \
   _TEXASR_EXTRACT_BITS(TEXASR, 16, 1)
-#define _TEXASRU_INSRUCTION_FETCH_CONFLICT(TEXASRU) \
+#define _TEXASRU_INSTRUCTION_FETCH_CONFLICT(TEXASRU) \
   _TEXASRU_EXTRACT_BITS(TEXASRU, 16, 1)

 #define _TEXASR_ABORT(TEXASR) \




Re: [wide-int] small cleanup in wide-int.*

2013-12-03 Thread Mike Stump
On Dec 3, 2013, at 8:52 AM, Richard Sandiford  
wrote:
> You probably already realise this, but for avoidance of doubt, Richard
> was also asking that we reduce MAX_BITSIZE_MODE_ANY_INT to 128 on x86_64,
> since that's the largest scalar_mode_supported_p mode.

Personally, I'd love to see scalar_mode_supported_p be auto-generated from the 
gen*.c programs; then, we could just add a line to that program to synthesize 
this value.  Absent that, which I think is a nice long term direction to shoot 
for, we'd need yet another #define for the target to explain the largest 
supported size.

RE: [REPOST] Invalid Code when reading from unaligned zero-sized array

2013-12-03 Thread Bernd Edlinger
On Tue, 3 Dec 2013 16:14:49, Eric Botcazou wrote:
>
>> The patch does add a boolean "expand_reference" parameter to
>> expand_expr_real and expand_expr_real_1. I pass true when I intend to use
>> the returned memory context as an array reference, instead of a value. At
>> places where mis-aligned values are extracted, I do not return a register
>> with the extracted mis-aligned value if expand_reference is true. When I
>> have a VIEW_CONVERT_EXPR I pay attention to pass down the outer
>> "expand_reference" to the inner expand_expr_real call. Expand_reference, is
>> pretty much similar to the expand_modifier "EXPAND_MEMORY".
>
> IMO that's not acceptable as-is, you're papering over the real issue which are
> the out-of-bounds accesses in the code, i.e. you essentially have an object
> with variable size and non-BLKmode. The patch is too big an hammer for such
> an obvious lie to the middle-end. Moreover there is not a _single_ line of
> explanation in it.
>

OK, I should add some comments. Right.

> If we really want to go for a hack instead of fixing the underlying issue, it
> should be restricted to the cases where stor-layout.c goes wrong.
>

To me it does not feel like a hack at all, if the expansion needs to know in 
what
context the result will be used, if it is in an LHS or in a RHS or if the 
address of
the memory is needed.

To make that perfectly clear, I just want to help.

If you or Richard want to come forward with a better proposal, that's fine.

And even if I commit something, it can be reverted anytime, when we have
something better.

Thanks
Bernd

> --
> Eric Botcazou   

Re: [GOMP4] SIMD enabled function for C/C++

2013-12-03 Thread Aldy Hernandez

On 11/30/13 10:15, Iyer, Balaji V wrote:

Hello Jakub,
I was looking at my elemental function for C patch that I fixed up and 
send as requested by Aldy, and I saw two changes there that were used for C and 
C++ and they were pretty obvious. Here are the changes. Can I just commit them?



OK as obvious.



Re: [patch] combine ICE fix

2013-12-03 Thread Mike Stump
On Dec 2, 2013, at 10:26 PM, Jeff Law  wrote:
> On 11/27/13 17:13, Cesar Philippidis wrote:
>> 
>> I looked into adding support for incremental DF scanning from df*.[ch]
>> in combine but there are a couple of problems. First of all, combine
>> does its own DF analysis. It does so because its usage falls under this
>> category (df-core.c):
>> 
>>c) If the pass modifies insns several times, this incremental
>>   updating may be expensive.
>> 
>> Furthermore, combine's DF relies on the DF scanning to be deferred, so
>> the DF_REF_DEF_COUNT values would be off. Eg, calls to SET_INSN_DELETED
>> take place before it updates the notes for those insns. Also, combine
>> has a tendency to undo its changes occasionally.
> I think at this stage of the release cycle, converting combine to incremental 
> DF is probably a no-go.  However, we should keep it in mind for the future -- 
> while hairy I'd really like to see that happen in the long term.

I think Kenny has some thoughts in this area.  I'll cc him to ensure he sees it.

Re: [REPOST] Invalid Code when reading from unaligned zero-sized array

2013-12-03 Thread Jeff Law

On 12/03/13 11:40, Bernd Edlinger wrote:




To me it does not feel like a hack at all, if the expansion needs to know in 
what
context the result will be used, if it is in an LHS or in a RHS or if the 
address of
the memory is needed.
Expansion needs some context and it's currently passed into expand_expr 
via a variety of methods.  Most are documented at the start of 
expand_expr_real.  It's unclean, but that doesn't mean we should make it 
worse.  Adding more context is something we should think long and hard 
about.


Eric's comment that we have a non-BLKmode object with a variable size I 
think is critical.  I'm pretty sure that's not a well-defined object. 
Given a variable sized object that is not BLKmode, how can we reliably 
get its size (you can't, that's part of the reason why BLKmode exists)


I'd pull on that thread.



To make that perfectly clear, I just want to help.

Understood and its greatly appreciated.



And even if I commit something, it can be reverted anytime, when we have
something better.
We try to get it right the first time :-)  Reverting bits is undesirable 
and virtually impossible if it gets out into a release.


jeff



Re: [PING] [PATCH] Optional alternative base_expr in finding basis for CAND_REFs

2013-12-03 Thread Jeff Law

On 12/03/13 08:52, Yufeng Zhang wrote:


I still don't like it.  It's using the wrong and too expensive tools
to do
stuff.  What kind of bases are we ultimately interested in?  Browsing
the code it looks like we're having

   /* Base expression for the chain of candidates:  often, but not
  always, an SSA name.  */
   tree base_expr;

which isn't really too informative but I suppose they are all
kind-of-gimple_val()s?  That said, I wonder if you can simply
use get_addr_base_and_unit_offset in place of get_alternative_base (),
ignoring the returned offset.


'base_expr' is essentially the base address of a handled_component_p,
e.g. ARRAY_REF, COMPONENT_REF, etc.  In most case, it is the address of
the object returned by get_inner_reference ().

Given a test case like the following:

typedef int arr_2[20][20];

void foo (arr_2 a2, int i, int j)
{
   a2[i+10][j] = 1;
   a2[i+10][j+1] = 1;
   a2[i+20][j] = 1;
}

The IR before SLSR is (on x86_64):

   _2 = (long unsigned int) i_1(D);
   _3 = _2 * 80;
   _4 = _3 + 800;
   _6 = a2_5(D) + _4;
   *_6[j_8(D)] = 1;
   _10 = j_8(D) + 1;
   *_6[_10] = 1;
   _12 = _3 + 1600;
   _13 = a2_5(D) + _12;
   *_13[j_8(D)] = 1;

The base_expr for the 1st and 2nd memory reference are the same, i.e.
_6, while the base_expr for a2[i+20][j] is _13.

_13 is essentially (_6 + 800), so all of the three memory references
essentially share the same base address.  As their strides are also the
same (MULT_EXPR (j, 4)), the three references can all be lowered to
MEM_REFs.  What this patch does is to use the tree affine tools to help
recognize the underlying base address expression; as it requires looking
into the definitions of SSA_NAMEs, get_addr_base_and_unit_offset ()
won't help here.

Bill has helped me exploit other ways of achieving this in SLSR, but so
far we think this is the best way to proceed.  The use of tree affine
routines has been restricted to CAND_REFs only and there is the
aforementioned cache facility to help reduce the overhead.
Right and I think Bill's opinions should carry the weight here since he 
wrote the SLSR code and will likely be its maintainer.


So let's keep the cache per your data.  OK for the trunk.  Thanks for 
your patience and understanding.


jeff



Re: [PATCH, ia64] [PR target/52731] internal compiler error: in ia64_st_address_bypass_p, at config/ia64/ia64.c:9357

2013-12-03 Thread Steve Ellcey
On Fri, 2013-11-29 at 15:01 +0300, Kirill Yukhin wrote:
> Hello,
> On 20 Nov 18:37, Kirill Yukhin wrote:
> > Hello,
> > Patch in the bottom fixes PR52731.
> > Is it ok for trunk?
> Ping?
> 
> --
> Thanks, K

OK.

Steve Ellcey
sell...@mips.com




Re: [patch] combine ICE fix

2013-12-03 Thread Kenneth Zadeck

On 12/03/2013 01:52 PM, Mike Stump wrote:

On Dec 2, 2013, at 10:26 PM, Jeff Law  wrote:

On 11/27/13 17:13, Cesar Philippidis wrote:

I looked into adding support for incremental DF scanning from df*.[ch]
in combine but there are a couple of problems. First of all, combine
does its own DF analysis. It does so because its usage falls under this
category (df-core.c):

c) If the pass modifies insns several times, this incremental
   updating may be expensive.

Furthermore, combine's DF relies on the DF scanning to be deferred, so
the DF_REF_DEF_COUNT values would be off. Eg, calls to SET_INSN_DELETED
take place before it updates the notes for those insns. Also, combine
has a tendency to undo its changes occasionally.

I think at this stage of the release cycle, converting combine to incremental 
DF is probably a no-go.  However, we should keep it in mind for the future -- 
while hairy I'd really like to see that happen in the long term.

I think Kenny has some thoughts in this area.  I'll cc him to ensure he sees it.
it is the tendency to undo things (i would use the word "frequently" 
rather than) occasionally that kept me from doing this years ago.


kenny


Re: [patch] combine ICE fix

2013-12-03 Thread Jeff Law

On 12/03/13 12:25, Kenneth Zadeck wrote:

On 12/03/2013 01:52 PM, Mike Stump wrote:

On Dec 2, 2013, at 10:26 PM, Jeff Law  wrote:

On 11/27/13 17:13, Cesar Philippidis wrote:

I looked into adding support for incremental DF scanning from df*.[ch]
in combine but there are a couple of problems. First of all, combine
does its own DF analysis. It does so because its usage falls under this
category (df-core.c):

c) If the pass modifies insns several times, this incremental
   updating may be expensive.

Furthermore, combine's DF relies on the DF scanning to be deferred, so
the DF_REF_DEF_COUNT values would be off. Eg, calls to SET_INSN_DELETED
take place before it updates the notes for those insns. Also, combine
has a tendency to undo its changes occasionally.

I think at this stage of the release cycle, converting combine to
incremental DF is probably a no-go.  However, we should keep it in
mind for the future -- while hairy I'd really like to see that happen
in the long term.

I think Kenny has some thoughts in this area.  I'll cc him to ensure
he sees it.

it is the tendency to undo things (i would use the word "frequently"
rather than) occasionally that kept me from doing this years ago.
Shove a bunch of things together, simplify, then try to recognize the 
result.  If that fails, undo everything.


In theory, this could be replaced by making a copy of the original, 
doing the combination/simplification, then recognition.  If successful, 
then update DF and remove the original I3, if not successful, drop the 
copy.  That avoids the undo nonsense.


jeff


Re: [PATCH, PR 58253] Make IPA-SRA created PARM_DECLs always naturally aligned

2013-12-03 Thread Jeff Law

On 12/03/13 09:53, Martin Jambor wrote:

Hi,

the patch below fixes a failure on the epiphany target where callers
and callees do not agree on registers for parameter passing because
they see different alignment of actual arguments and formal parameters
(there is some more information on this in bugzilla).  The actual
arguments are SSA names - created by force_gimple_operand_gsi in
ipa_modify_call_arguments - which are of a naturally aligned type
while formal parameters are PARM_DECLs - directly built in
ipa_modify_formal_parameters - of the types specified in
ipa_parm_adjustment_vec which may not be aligned.

Because we use the alignment of types in ipa_parm_adjustment_vec to
signal to ipa_modify_call_arguments that it needs to built unaligned
MEM_REFs, it is ipa_modify_formal_parameters that has to fix up the
PARM_DECLs in cases where callers will produce SSA_NAMES, i.e. when
the type is a gimple_register_type.  That's what the patch below
does.

Bootstrapped and tested on x86_64-linux, only a slightly different
patch also passed bootstrap on ppc64-linux and has been confirmed to
fix the problem on epiphany.  OK for trunk?

Thanks,

Martin


2013-11-28  Martin Jambor  

PR ipa/58253
* ipa-prop.c (ipa_modify_formal_parameters): Create decls of
non-BLKmode in their naturally aligned type.

OK for the trunk.
jeff



Re: [PATCH] Use DW_LANG_D for D

2013-12-03 Thread Cary Coutant
> This patches gen_compile_unit_die to use the DW_LANG_D DWARF language
> code for D.  Is in relation to some other D language fixes that are
> going to be submitted to gdb.

Is this for a private front end? I'm not aware of any front ends that
set the language name to "GNU D".

Since it's so trivial, though, I have no problem with this patch for
Stage 3 -- if you do have a separate front end that sets that language
string, then it's arguably a bug fix. If this patch is preparation for
more substantial changes to the GCC tree, however, I suspect you're
going to need to wait for Stage 1 to reopen anyway.

So, if this is a standalone patch, it's OK, but you also need a ChangeLog entry.

-cary


Re: [PATCH] _Cilk_for for C and C++

2013-12-03 Thread Jeff Law

On 12/03/13 06:25, Iyer, Balaji V wrote:



I understand you both now. Let me look into the OMP routines and see what it is 
doing and see how I can port it to _Cilk_for.
Thanks.  I know it's a bit of a pain, but part what's driving the desire 
to share is to reduce the long term maintenance cost for everyone.


jeff


Re: [PATCH] Add reference binding instrumentation

2013-12-03 Thread Marek Polacek
On Tue, Nov 19, 2013 at 10:07:00AM -0500, Jason Merrill wrote:
> On 11/18/2013 11:39 AM, Marek Polacek wrote:
> >+init = fold_build2 (COMPOUND_EXPR, TREE_TYPE (init),
> >+ubsan_instrument_reference (input_location, init),
> >+init);
> 
> This looks like it will evaluate init twice.

You're right.  I wanted to use cp_save_expr and/or stabilize_expr, but
that didn't work out.  So I resorted to restrict the condition a bit
and only pass INDIRECT_REFs to the ubsan routine (which, after all,
has
  if (!INDIRECT_REF_P (init))
return init;
And in that case, it seems we don't have to worry about multiple evaluation
of the initializer.
Nevertheless, I added a test that checks that we don't evaluate the
initializer twice.

BTW, we still detect e.g.
  int *p = 0;
  int &qr = ++*p;
while clang doesn't detect it at all.

So, ok for trunk?  Ubsan testsuite passes.

2013-12-03  Marek Polacek  

* c-family/c-ubsan.h (extern tree ubsan_instrument_division):
* c-family/c-ubsan.c (ubsan_instrument_return):
(ubsan_instrument_reference):
* cp/decl.c (cp_finish_decl):
* testsuite/g++.dg/ubsan/null-1.C (main):

--- gcc/c-family/c-ubsan.h.mp2  2013-12-03 19:07:28.382466550 +0100
+++ gcc/c-family/c-ubsan.h  2013-12-03 19:09:02.245819384 +0100
@@ -25,5 +25,6 @@ extern tree ubsan_instrument_division (l
 extern tree ubsan_instrument_shift (location_t, enum tree_code, tree, tree);
 extern tree ubsan_instrument_vla (location_t, tree);
 extern tree ubsan_instrument_return (location_t);
+extern tree ubsan_instrument_reference (location_t, tree);
 
 #endif  /* GCC_C_UBSAN_H  */
--- gcc/c-family/c-ubsan.c.mp2  2013-12-03 19:07:33.437485687 +0100
+++ gcc/c-family/c-ubsan.c  2013-12-03 19:08:15.649643663 +0100
@@ -190,3 +190,30 @@ ubsan_instrument_return (location_t loc)
   tree t = builtin_decl_explicit (BUILT_IN_UBSAN_HANDLE_MISSING_RETURN);
   return build_call_expr_loc (loc, t, 1, build_fold_addr_expr_loc (loc, data));
 }
+
+/* Instrument reference binding, that is, ensure that the reference
+   declaration doesn't bind the reference to a NULL pointer.  */
+
+tree
+ubsan_instrument_reference (location_t loc, tree init)
+{
+  if (!INDIRECT_REF_P (init))
+/* This may happen, e.g. int &&r4 = p;, so don't put an assert here.  */
+return init;
+
+  init = TREE_OPERAND (init, 0);
+  tree eq_expr = fold_build2 (EQ_EXPR, boolean_type_node, init,
+ build_zero_cst (TREE_TYPE (init)));
+  const struct ubsan_mismatch_data m
+= { build_zero_cst (pointer_sized_int_node),
+   build_int_cst (unsigned_char_type_node, UBSAN_REF_BINDING)};
+  tree data = ubsan_create_data ("__ubsan_null_data",
+loc, &m,
+ubsan_type_descriptor (TREE_TYPE (init),
+   true), NULL_TREE);
+  data = build_fold_addr_expr_loc (loc, data);
+  tree fn = builtin_decl_implicit (BUILT_IN_UBSAN_HANDLE_TYPE_MISMATCH);
+  fn = build_call_expr_loc (loc, fn, 2, data,
+   build_zero_cst (pointer_sized_int_node));
+  return fold_build3 (COND_EXPR, void_type_node, eq_expr, fn, void_zero_node);
+}
--- gcc/cp/decl.c.mp2   2013-12-03 19:07:38.678505371 +0100
+++ gcc/cp/decl.c   2013-12-03 20:40:34.649989640 +0100
@@ -6245,6 +6245,12 @@ cp_finish_decl (tree decl, tree init, bo
  if (decl_maybe_constant_var_p (decl))
TREE_CONSTANT (decl) = 1;
}
+  if (flag_sanitize & SANITIZE_NULL
+ && TREE_CODE (type) == REFERENCE_TYPE
+ && INDIRECT_REF_P (init))
+   init = fold_build2 (COMPOUND_EXPR, TREE_TYPE (init),
+   ubsan_instrument_reference (input_location, init),
+   init);
 }
 
   if (processing_template_decl)
--- gcc/testsuite/g++.dg/ubsan/null-1.C.mp2 2013-11-22 16:16:05.073181508 
+0100
+++ gcc/testsuite/g++.dg/ubsan/null-1.C 2013-12-03 19:23:48.351305205 +0100
@@ -0,0 +1,32 @@
+// { dg-do run }
+// { dg-options "-fsanitize=null -Wall -Wno-unused-variable -std=c++11" }
+// { dg-skip-if "" { *-*-* } { "-flto" } { "" } }
+
+typedef const long int L;
+
+int
+main (void)
+{
+  int *p = 0;
+  L *l = 0;
+
+  int &r = *p;
+  auto &r2 = *p;
+  L &lr = *l;
+
+  /* Try an rvalue reference.  */
+  auto &&r3 = *p;
+
+  /* Don't evaluate the reference initializer twice.  */
+  int i = 1;
+  int *q = &i;
+  int &qr = ++*q;
+  if (i != 2)
+__builtin_abort ();
+  return 0;
+}
+
+// { dg-output "reference binding to null pointer of type 'int'(\n|\r\n|\r)" }
+// { dg-output "\[^\n\r]*reference binding to null pointer of type 
'int'(\n|\r\n|\r)" }
+// { dg-output "\[^\n\r]*reference binding to null pointer of type 'const 
L'(\n|\r\n|\r)" }
+// { dg-output "\[^\n\r]*reference binding to null pointer of type 
'int'(\n|\r\n|\r)" }

Marek


Re: [RFC][LIBGCC][2 of 2] 64 bit divide implementation for processor without hw divide instruction

2013-12-03 Thread Jeff Law

On 12/02/13 23:39, Kugan wrote:


+2013-11-27  Kugan Vivekanandarajah  
+
+   * config/arm/bpapi-lib.h (TARGET_HAS_NO_HW_DIVIDE): Define for
+   architectures that does not have hardware divide instruction.
+   i.e. architectures that does not define __ARM_ARCH_EXT_IDIV__.
+


Is this OK for trunk now?
Yes, this part is fine too.  AFAICT, it just implements what Richard E. 
suggested ;-)


jeff


Re: [PATCH] Allow compounds with empty initializer in pedantic mode (PR c/59351)

2013-12-03 Thread Jeff Law

On 12/02/13 13:10, Marek Polacek wrote:

We triggered an assert on attached testcase, because when building the
compound literal with empty initial value complete_array_type returns
3, but we assert it returns 0.  It returns 3 only in the pedantic mode,
where empty initializer braces are forbidden.  Since we already gave
a warning, I think we could loosen the assert a bit and allow
empty initial values at that point.  sizeof on such compound literal
then yields zero, which I think is correct.
The assert exists even in GCC 4.0.

Regtested/botstrapped on x86_64-linux, ok for trunk and 4.8 and
perhaps even 4.7?

2013-12-02  Marek Polacek  

PR c/59351
c/
* c-decl.c (build_compound_literal): Allow compound literals with
empty initial value.
testsuite/
* gcc.dg/pr59351.c: New test.
I was going to ask for some details about where we detect and warn to 
review that code as well, but given your test verifies the warning as 
well, we're good to go.


OK for the trunk.  Branch maintainers have final say for their branches.

Thanks,

Jeff



Re: [PATCH] SIMD clones LTO fixes part 1 (PR lto/59326)

2013-12-03 Thread Jeff Law

On 11/28/13 15:58, Jakub Jelinek wrote:
 +  for (i = 0; i < omp_clause_num_ops[OMP_CLAUSE_CODE (expr)]; i++)

+stream_write_tree (ob, OMP_CLAUSE_OPERAND (expr, i), ref_p);
+  if (OMP_CLAUSE_CODE (expr) == OMP_CLAUSE_REDUCTION)
+{
+  /* We don't stream these right now, handle it if streaming
+of them is needed.  */
+  gcc_assert (OMP_CLAUSE_REDUCTION_GIMPLE_INIT (expr) == NULL);
+  gcc_assert (OMP_CLAUSE_REDUCTION_GIMPLE_MERGE (expr) == NULL);
+}
ISTM we will need to stream these.  Is there some reason to think they 
won't be needed?  Otherwise it looks good.


If you've got a good reason to believe these won't be needed, then it's 
OK as-is.  If not I'd like to see this stuff handed too.




jeff


Re: [PATCH] SIMD clones LTO fixes part 2 (PR lto/59326)

2013-12-03 Thread Jeff Law

On 11/28/13 16:00, Jakub Jelinek wrote:

Hi!

And here is second part of the fixes.  Still, the vect-simd-clone-12.c
testcase fails with -flto -flto-partition=1to1, so there is further work to
do, but at least all current test succeed and actually use SIMD elementals
when they should.  Bootstrapped/regtested on x86_64-linux and i686-linux,
ok for trunk?

2013-11-28  Jakub Jelinek  
Richard Biener  

PR lto/59326
* omp-low.c (simd_clone_create): Return NULL if for definition
!cgraph_function_with_gimple_body_p (old_node).  Call cgraph_get_body
before calling cgraph_function_versioning.
(expand_simd_clones): Look for "omp declare simd" attribute first.
Don't check targetm.simd_clone.compute_vecsize_and_simdlen here.
Punt if node->global.inlined_to.
(pass_omp_simd_clone::gate): Also enable if flag_ltrans.  Disable
pass if targetm.simd_clone.compute_vecsize_and_simdlen is NULL.
* lto-streamer-out.c (hash_tree): Handle OMP_CLAUSE.
lto/
* lto.c (compare_tree_sccs_1): Handle OMP_CLAUSE.
testsuite/
* gcc.dg/vect/vect-simd-clone-12.c: New test.
* gcc.dg/vect/vect-simd-clone-12a.c: New test.
* gcc.dg/vect/vect-simd-clone-10a.c: Remove extern keywords.

OK.

Jeff



Re: [PATCH] Allow compounds with empty initializer in pedantic mode (PR c/59351)

2013-12-03 Thread Marek Polacek
On Tue, Dec 03, 2013 at 01:03:43PM -0700, Jeff Law wrote:
> On 12/02/13 13:10, Marek Polacek wrote:
> >We triggered an assert on attached testcase, because when building the
> >compound literal with empty initial value complete_array_type returns
> >3, but we assert it returns 0.  It returns 3 only in the pedantic mode,
> >where empty initializer braces are forbidden.  Since we already gave
> >a warning, I think we could loosen the assert a bit and allow
> >empty initial values at that point.  sizeof on such compound literal
> >then yields zero, which I think is correct.
> >The assert exists even in GCC 4.0.
> >
> >Regtested/botstrapped on x86_64-linux, ok for trunk and 4.8 and
> >perhaps even 4.7?
> >
> >2013-12-02  Marek Polacek  
> >
> > PR c/59351
> >c/
> > * c-decl.c (build_compound_literal): Allow compound literals with
> > empty initial value.
> >testsuite/
> > * gcc.dg/pr59351.c: New test.
> I was going to ask for some details about where we detect and warn
> to review that code as well, but given your test verifies the
> warning as well, we're good to go.

Yeah, the patch should really only fix the ICE.

> OK for the trunk.  Branch maintainers have final say for their branches.

Note that I already put this into trunk and after ack by Jakub into
4.7/4.8 branches as well. 

Marek


Re: [PATCH] Masked load/store vectorization (take 6)

2013-12-03 Thread Jeff Law

On 11/28/13 16:09, Jakub Jelinek wrote:

On Wed, Nov 27, 2013 at 04:10:16PM +0100, Richard Biener wrote:

As you pinged this ... can you re-post a patch with changelog that
includes the followups as we decided?


Ok, here is the updated patch against latest trunk with the follow-ups
incorporated.  Bootstrapped/regtested on x86_64-linux and i686-linux,
ok for trunk?

2013-11-28  Jakub Jelinek  

* tree-vectorizer.h (struct _loop_vec_info): Add scalar_loop field.
(LOOP_VINFO_SCALAR_LOOP): Define.
(slpeel_tree_duplicate_loop_to_edge_cfg): Add scalar_loop argument.
* config/i386/sse.md (maskload, maskstore): New expanders.
* tree-data-ref.c (struct data_ref_loc_d): Replace pos field with ref.
(get_references_in_stmt): Don't record operand addresses, but
operands themselves.  Handle MASK_LOAD and MASK_STORE.
(find_data_references_in_stmt, graphite_find_data_references_in_stmt):
Adjust for the pos -> ref change.
* internal-fn.def (LOOP_VECTORIZED, MASK_LOAD, MASK_STORE): New
internal fns.
* tree-if-conv.c: Include target.h, expr.h, optabs.h and
tree-ssa-address.h.
(release_bb_predicate): New function.
(free_bb_predicate): Use it.
(reset_bb_predicate): Likewise.  Don't unallocate bb->aux
just to immediately allocate it again.
(if_convertible_phi_p): Add any_mask_load_store argument, if true,
handle it like flag_tree_loop_if_convert_stores.
(insert_gimplified_predicates): Likewise.  If bb dominates
loop->latch, call reset_bb_predicate.
(ifcvt_can_use_mask_load_store): New function.
(if_convertible_gimple_assign_stmt_p): Add any_mask_load_store
argument, check if some conditional loads or stores can't be
converted into MASK_LOAD or MASK_STORE.
(if_convertible_stmt_p): Add any_mask_load_store argument,
pass it down to if_convertible_gimple_assign_stmt_p.
(predicate_bbs): Don't return bool, only check if the last stmt
of a basic block is GIMPLE_COND and handle that.  For basic blocks
that dominate loop->latch assume they don't need to be predicated.
(if_convertible_loop_p_1): Only call predicate_bbs if
flag_tree_loop_if_convert_stores and free_bb_predicate in that case
afterwards, check gimple_code of stmts here.  Replace is_predicated
check with dominance check.  Add any_mask_load_store argument,
pass it down to if_convertible_stmt_p and if_convertible_phi_p,
call if_convertible_phi_p only after all if_convertible_stmt_p
calls.
(if_convertible_loop_p): Add any_mask_load_store argument,
pass it down to if_convertible_loop_p_1.
(predicate_mem_writes): Emit MASK_LOAD and/or MASK_STORE calls.
(combine_blocks): Add any_mask_load_store argument, pass
it down to insert_gimplified_predicates and call predicate_mem_writes
if it is set.  Call predicate_bbs.
(version_loop_for_if_conversion): New function.
(tree_if_conversion): Adjust if_convertible_loop_p and combine_blocks
calls.  Return todo flags instead of bool, call
version_loop_for_if_conversion if if-conversion should be just
for the vectorized loops and nothing else.
(main_tree_if_conversion): Adjust caller.  Don't call
tree_if_conversion for dont_vectorize loops if if-conversion
isn't explicitly enabled.
* tree-vect-data-refs.c (vect_check_gather): Handle
MASK_LOAD/MASK_STORE.
(vect_analyze_data_refs, vect_supportable_dr_alignment): Likewise.
* gimple.h (gimple_expr_type): Handle MASK_STORE.
* internal-fn.c (expand_LOOP_VECTORIZED, expand_MASK_LOAD,
expand_MASK_STORE): New functions.
* tree-vectorizer.c: Include tree-cfg.h and gimple-fold.h.
(vect_loop_vectorized_call, vect_loop_select): New functions.
(vectorize_loops): Don't try to vectorize loops with
loop->dont_vectorize set.  Set LOOP_VINFO_SCALAR_LOOP for if-converted
loops, fold LOOP_VECTORIZED internal call depending on if loop
has been vectorized or not.  Use vect_loop_select to attempt to
vectorize an if-converted loop before it's non-if-converted
counterpart.  If outer loop vectorization is successful in that
case, ensure the loop in the soon to be dead non-if-converted loop
is not vectorized.
* tree-vect-loop-manip.c (slpeel_duplicate_current_defs_from_edges):
New function.
(slpeel_tree_duplicate_loop_to_edge_cfg): Add scalar_loop argument.
If non-NULL, copy basic blocks from scalar_loop instead of loop, but
still to loop's entry or exit edge.
(slpeel_tree_peel_loop_to_edge): Add scalar_loop argument, pass it
down to slpeel_tree_duplicate_loop_to_edge_cfg.
(vect_do_peeling_for_loop_bound, vect_do_peeling_for_loop_ali

Re: [PATCH] SIMD clones LTO fixes part 1 (PR lto/59326)

2013-12-03 Thread Jakub Jelinek
On Tue, Dec 03, 2013 at 01:11:08PM -0700, Jeff Law wrote:
> On 11/28/13 15:58, Jakub Jelinek wrote:
>  +  for (i = 0; i < omp_clause_num_ops[OMP_CLAUSE_CODE (expr)]; i++)
> >+stream_write_tree (ob, OMP_CLAUSE_OPERAND (expr, i), ref_p);
> >+  if (OMP_CLAUSE_CODE (expr) == OMP_CLAUSE_REDUCTION)
> >+{
> >+  /* We don't stream these right now, handle it if streaming
> >+ of them is needed.  */
> >+  gcc_assert (OMP_CLAUSE_REDUCTION_GIMPLE_INIT (expr) == NULL);
> >+  gcc_assert (OMP_CLAUSE_REDUCTION_GIMPLE_MERGE (expr) == NULL);
> >+}
> ISTM we will need to stream these.  Is there some reason to think
> they won't be needed?  Otherwise it looks good.
> 
> If you've got a good reason to believe these won't be needed, then
> it's OK as-is.  If not I'd like to see this stuff handed too.

Normally omp clauses disappear during omp expansion, right now the only
clauses that LTO streamers sees are those for #pragma omp declare simd,
those are never reductions.  I chose to implement streaming of all
omp clauses rather than just the subset of them that can appear in
"omp declare simd" attribute argument, just streaming of the above
gimple seqs would be too much work for something that isn't needed.

Jakub


Re: [PATCH] SIMD clones LTO fixes part 1 (PR lto/59326)

2013-12-03 Thread Jeff Law

On 12/03/13 13:14, Jakub Jelinek wrote:

On Tue, Dec 03, 2013 at 01:11:08PM -0700, Jeff Law wrote:

On 11/28/13 15:58, Jakub Jelinek wrote:
  +  for (i = 0; i < omp_clause_num_ops[OMP_CLAUSE_CODE (expr)]; i++)

+stream_write_tree (ob, OMP_CLAUSE_OPERAND (expr, i), ref_p);
+  if (OMP_CLAUSE_CODE (expr) == OMP_CLAUSE_REDUCTION)
+{
+  /* We don't stream these right now, handle it if streaming
+of them is needed.  */
+  gcc_assert (OMP_CLAUSE_REDUCTION_GIMPLE_INIT (expr) == NULL);
+  gcc_assert (OMP_CLAUSE_REDUCTION_GIMPLE_MERGE (expr) == NULL);
+}

ISTM we will need to stream these.  Is there some reason to think
they won't be needed?  Otherwise it looks good.

If you've got a good reason to believe these won't be needed, then
it's OK as-is.  If not I'd like to see this stuff handed too.


Normally omp clauses disappear during omp expansion, right now the only
clauses that LTO streamers sees are those for #pragma omp declare simd,
those are never reductions.  I chose to implement streaming of all
omp clauses rather than just the subset of them that can appear in
"omp declare simd" attribute argument, just streaming of the above
gimple seqs would be too much work for something that isn't needed.
So the reductions are handled (expanded) prior to streaming out and we 
don't have to attach them back to something when we stream in?


Jeff


Re: [PING] [PATCH] Optional alternative base_expr in finding basis for CAND_REFs

2013-12-03 Thread Richard Biener
Yufeng Zhang  wrote:
>On 12/03/13 14:20, Richard Biener wrote:
>> On Tue, Dec 3, 2013 at 1:50 PM, Yufeng Zhang 
>wrote:
>>> On 12/03/13 06:48, Jeff Law wrote:

 On 12/02/13 08:47, Yufeng Zhang wrote:
>
> Ping~
>
> http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03360.html


>
> Thanks,
> Yufeng
>
> On 11/26/13 15:02, Yufeng Zhang wrote:
>>
>> On 11/26/13 12:45, Richard Biener wrote:
>>>
>>> On Thu, Nov 14, 2013 at 12:25 AM, Yufeng
>>> Zhang wrote:

 On 11/13/13 20:54, Bill Schmidt wrote:
>
> The second version of your original patch is ok with me with
>the
> following changes.  Sorry for the little side adventure into
>the
> next-interp logic; in the end that's going to hurt more than
>it
> helps in
> this case.  Thanks for having a look at it, anyway.  Thanks
>also for
> cleaning up this version to be less intrusive to common
>interfaces; I
> appreciate it.



 Thanks a lot for the review.  I've attached an updated patch
>with the
 suggested changes incorporated.

 For the next-interp adventure, I was quite happy to do the
 experiment; it's
 a good chance of gaining insight into the pass.  Many thanks
>for
 your prompt
 replies and patience in guiding!


> Everything else looks OK to me.  Please ask Richard for final
> approval,
> as I'm not a maintainer.

 First a note, I need to check on voting for Bill as the slsr
>maintainer
 from the steering committee.   Voting was in progress just before
>the
 close of stage1 development so I haven't tallied the results :-)
>>>
>>>
>>> Looking forward to some good news! :)
>>>
>>>
>>
>> Yes, you are right about the non-trivial 'base' tree are rarely
>shared.
>>  The cached is introduced mainly because get_alternative_base
>() may
>> be
>> called twice on the same 'base' tree, once in the
>> find_basis_for_candidate () for look-up and the other time in
>> alloc_cand_and_find_basis () for record_potential_basis ().  I'm
>happy
>> to leave out the cache if you think the benefit is trivial.

 Without some sense of how expensive the lookups are vs how often
>the
 cache hits it's awful hard to know if the cache is worth it.

 I'd say take it out unless you have some sense it's really saving
>time.
 It's a pretty minor implementation detail either way.
>>>
>>>
>>> I think the affine tree routines are generally expensive; it is
>worth having
>>> a cache to avoid calling them too many times.  I run the slsr-*.c
>tests
>>> under gcc.dg/tree-ssa/ and find out that the cache hit rates range
>from
>>> 55.6% to 90%, with 73.5% as the average.  The samples may not well
>represent
>>> the real world scenario, but they do show the fact that the 'base'
>tree can
>>> be shared to some extent.  So I'd like to have the cache in the
>patch.
>>>
>>>

>>
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-O2 -fdump-tree-slsr" } */
>>> +
>>> +typedef int arr_2[50][50];
>>> +
>>> +void foo (arr_2 a2, int v1)
>>> +{
>>> +  int i, j;
>>> +
>>> +  i = v1 + 5;
>>> +  j = i;
>>> +  a2 [i-10] [j] = 2;
>>> +  a2 [i] [j++] = i;
>>> +  a2 [i+20] [j++] = i;
>>> +  a2 [i-3] [i-1] += 1;
>>> +  return;
>>> +}
>>> +
>>> +/* { dg-final { scan-tree-dump-times "MEM" 5 "slsr" } } */
>>> +/* { dg-final { cleanup-tree-dump "slsr" } } */
>>>
>>> scanning for 5 MEMs looks non-sensical.  What transform do
>>> you expect?  I see other slsr testcases do similar non-sensical
>>> checking which is bad, too.
>>
>>
>> As the slsr optimizes CAND_REF candidates by simply lowering them
>to
>> MEM_REF from e.g. ARRAY_REF, I think scanning for the number of
>MEM_REFs
>> is an effective check.  Alternatively, I can add a follow-up
>patch to
>> add some dumping facility in replace_ref () to print out the
>replacing
>> actions when -fdump-tree-slsr-details is on.

 I think adding some details to the dump and scanning for them would
>be
 better.  That's the only change that is required for this to move
>forward.
>>>
>>>
>>> I've updated to patch to dump more details when
>-fdump-tree-slsr-details is
>>> on.  The tests have also been updated to scan for these new dumps
>instead of
>>> MEMs.
>>>
>>>

 I suggest doing it quickly.  We're well past stage1 close at this
>point.
>>>
>>>
>>> The bootstrapping on x86_64 is still running.  OK to commit if it
>succeeds?
>>
>> I still don't like it.  It's using the wrong and too expensive tools
>to do
>> stuff.  What kind of bases are we ultimately interested in?  Browsing
>> the code it looks like we're having
>>
>>/* Base expression for the chain of candidates:  often, but not
>> 

RE: [REPOST] Invalid Code when reading from unaligned zero-sized array

2013-12-03 Thread Bernd Edlinger
On Tue, 3 Dec 2013 12:16:39, Jeff Law wrote:
>
> On 12/03/13 11:40, Bernd Edlinger wrote:
>>>
>>
>> To me it does not feel like a hack at all, if the expansion needs to know in 
>> what
>> context the result will be used, if it is in an LHS or in a RHS or if the 
>> address of
>> the memory is needed.
> Expansion needs some context and it's currently passed into expand_expr
> via a variety of methods. Most are documented at the start of
> expand_expr_real. It's unclean, but that doesn't mean we should make it
> worse. Adding more context is something we should think long and hard
> about.
>
> Eric's comment that we have a non-BLKmode object with a variable size I
> think is critical. I'm pretty sure that's not a well-defined object.
> Given a variable sized object that is not BLKmode, how can we reliably
> get its size (you can't, that's part of the reason why BLKmode exists)
>
> I'd pull on that thread.
>

Well, there are examples of invalid C that produce invalid Code (without 
warning).
Garbage-In-Garbage-Out, That's OK.

Then there is this one:
struct S {
  X a;
  Y b[0];
};

it has the mode of X, which is questionable. I could detect that in expansion
and only set expand_reference if b is accessed. Maybe.

But then we have:
struct S {
  Y b[1];
};

it has the Mode of Y

and even...
struct S {
  Y b[2];
};

this one has an Integer Mode of size(Y)*2

I have no idea how to detect this in expansion, and how to handle
for instance a volatile b correctly. And I cannot declare that data structure
invalid.

Of course this depends on the back-end, what modes it supports.
And if there are movmisalign optabs, or SLOW_UNALIGNED_ACCESS.

>>
>> To make that perfectly clear, I just want to help.
> Understood and its greatly appreciated.
>
>>
>> And even if I commit something, it can be reverted anytime, when we have
>> something better.
> We try to get it right the first time :-) Reverting bits is undesirable
> and virtually impossible if it gets out into a release.
>
> jeff
> 

Re: [PATCH] Add signed integer overflow checking to ubsan

2013-12-03 Thread Jeff Law

On 11/27/13 01:13, Marek Polacek wrote:

On Fri, Nov 22, 2013 at 10:54:16AM +0100, Marek Polacek wrote:

Hi!

Working virtually out of Pago Pago.

The following is the implementation of the signed integer overflow
checking for the UndefinedBehaviorSanitizer.  I wrote some of the
generic bits; Jakub did the i?86 handlind/optabs as well as VRP/fold
bits.


I'd like to ping this patch.  Here's a rebased version that contains
a fix for miscompilation with -Os -m32.

2013-11-27  Jakub Jelinek  
 Marek Polacek  

 * opts.c (common_handle_option): Handle
 -fsanitize=signed-integer-overflow.
 * config/i386/i386.md (addv4, subv4, mulv4,
 negv3, negv3_1): Define expands.
 (*addv4, *subv4, *mulv4, *negv3): Define
 insns.
* sanitizer.def (BUILT_IN_UBSAN_HANDLE_ADD_OVERFLOW,
 BUILT_IN_UBSAN_HANDLE_SUB_OVERFLOW,
 BUILT_IN_UBSAN_HANDLE_MUL_OVERFLOW,
 BUILT_IN_UBSAN_HANDLE_NEGATE_OVERFLOW): Define.
 * ubsan.h (PROB_VERY_UNLIKELY, PROB_EVEN, PROB_VERY_LIKELY,
 PROB_ALWAYS): Define.
 (ubsan_build_overflow_builtin): Declare.
* gimple-fold.c (gimple_fold_stmt_to_constant_1): Add folding of
 internal functions.
 * ubsan.c (PROB_VERY_UNLIKELY): Don't define here.
 (ubsan_build_overflow_builtin): New function.
 (instrument_si_overflow): Likewise.
 (ubsan_pass): Add signed integer overflow checking.
 (gate_ubsan): Enable the pass also when SANITIZE_SI_OVERFLOW.
 * flag-types.h (enum sanitize_code): Add SANITIZE_SI_OVERFLOW.
 * internal-fn.c: Include ubsan.h and target.h.
 (ubsan_expand_si_overflow_addsub_check): New function.
 (ubsan_expand_si_overflow_neg_check): Likewise.
(ubsan_expand_si_overflow_mul_check): Likewise.
 (expand_UBSAN_CHECK_ADD): Likewise.
 (expand_UBSAN_CHECK_SUB): Likewise.
 (expand_UBSAN_CHECK_MUL): Likewise.
 * fold-const.c (fold_binary_loc): Don't fold A + (-B) -> A - B and
 (-A) + B -> B - A when doing the signed integer overflow checking.
 * internal-fn.def (UBSAN_CHECK_ADD, UBSAN_CHECK_SUB, UBSAN_CHECK_MUL):
 Define.
 * tree-vrp.c (extract_range_basic): Handle internal calls.
 * optabs.def (addv4_optab, subv4_optab, mulv4_optab, negv4_optab): New
 optabs.
c-family/
 * c-gimplify.c (c_gimplify_expr): If doing the integer-overflow
 sanitization, call unsigned_type_for only when !TYPE_OVERFLOW_WRAPS.
testsuite/
 * c-c++-common/ubsan/overflow-mul-2.c: New test.
 * c-c++-common/ubsan/overflow-add-1.c: New test.
 * c-c++-common/ubsan/overflow-add-2.c: New test.
 * c-c++-common/ubsan/overflow-mul-1.c: New test.
 * c-c++-common/ubsan/overflow-sub-1.c: New test.
 * c-c++-common/ubsan/overflow-sub-2.c: New test.
 * c-c++-common/ubsan/overflow-negate-1.c: New test.
Perhaps split this patch into two parts which can be reviewed 
independently, but go into the tree at the same time.  The obvious hope 
would be that Uros or one of the other x86 backend folks could chime in 
on that part.






--- gcc/ubsan.h.mp  2013-11-27 08:46:28.046629473 +0100
+++ gcc/ubsan.h 2013-11-27 08:46:57.578753342 +0100
@@ -21,6 +21,12 @@ along with GCC; see the file COPYING3.
  #ifndef GCC_UBSAN_H
  #define GCC_UBSAN_H

+/* From predict.c.  */
+#define PROB_VERY_UNLIKELY (REG_BR_PROB_BASE / 2000 - 1)
+#define PROB_EVEN  (REG_BR_PROB_BASE / 2)
+#define PROB_VERY_LIKELY   (REG_BR_PROB_BASE - PROB_VERY_UNLIKELY)
+#define PROB_ALWAYS(REG_BR_PROB_BASE)

Seems like this should factor out rather than get duplicated.



--- gcc/gimple-fold.c.mp2013-11-27 08:46:27.979629191 +0100
+++ gcc/gimple-fold.c   2013-11-27 08:46:57.556753251 +0100
@@ -2660,8 +2660,30 @@ gimple_fold_stmt_to_constant_1 (gimple s
tree fn;

if (gimple_call_internal_p (stmt))
- /* No folding yet for these functions.  */
- return NULL_TREE;
+ {
+   enum tree_code subcode = ERROR_MARK;
+   switch (gimple_call_internal_fn (stmt))
+ {
+ case IFN_UBSAN_CHECK_ADD: subcode = PLUS_EXPR; break;
+ case IFN_UBSAN_CHECK_SUB: subcode = MINUS_EXPR; break;
+ case IFN_UBSAN_CHECK_MUL: subcode = MULT_EXPR; break;

Minor detail, put the case value and associated codes on separate lines.

  case FU:
code;
more code
break;
  case BAR
blah;
break;


--- gcc/internal-fn.c.mp2013-11-27 08:46:28.014629338 +0100
+++ gcc/internal-fn.c   2013-11-27 08:46:57.559753263 +0100
@@ -31,6 +31,8 @@ along with GCC; see the file COPYING3.
  #include "gimple-expr.h"
  #include "is-a.h"
  #include "gimple.h"
+#include "ubsan.h"
+#include "target.h"

  /* The names of each internal function, indexed by function number.  */
  const char *const internal_fn_name_arr

[PATCH/AARCH64 3/6] Fix up multi-lib options

2013-12-03 Thread Andrew Pinski

Hi,
  The arguments to --with-multilib-list for AARCH64 are exclusive but currently 
is being treated as ones which are not.  This causes problems in that we get 
four library sets with --with-multilib-list=lp64,ilp32: empty, lp64, ilp32, 
lp64/ilp32.  The first and last one does not make sense and should not be there.

This patch changes the definition of MULTILIB_OPTIONS so we have a / inbetween 
the options rather than a space.

OK?  Build and tested on aarch64-elf with both --with-multilib-list=lp64,ilp32 
and without it.

Thanks,
Andrew Pinski

* config/aarch64/t-aarch64 (MULTILIB_OPTIONS): Fix definition so
that options are conflicting ones.
---
 gcc/ChangeLog|2 +-
 gcc/config/aarch64/t-aarch64 |2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

iff --git a/gcc/config/aarch64/t-aarch64 b/gcc/config/aarch64/t-aarch64
index 9f8d8cd..98a30d8 100644
--- a/gcc/config/aarch64/t-aarch64
+++ b/gcc/config/aarch64/t-aarch64
@@ -41,5 +41,5 @@ aarch-common.o: $(srcdir)/config/arm/aarch-common.c 
$(CONFIG_H) $(SYSTEM_H) \
$(srcdir)/config/arm/aarch-common.c
 
 comma=,
-MULTILIB_OPTIONS= $(patsubst %, mabi=%, $(subst $(comma), 
,$(TM_MULTILIB_CONFIG)))
+MULTILIB_OPTIONS= $(subst $(comma),/, $(patsubst %, mabi=%, $(subst 
$(comma),$(comma)mabi=,$(TM_MULTILIB_CONFIG
 MULTILIB_DIRNAMES   = $(subst $(comma), ,$(TM_MULTILIB_CONFIG))
-- 
1.7.2.5



[PATCH/middle-end 2/6] __builtin_thread_pointer and AARCH64 ILP32

2013-12-03 Thread Andrew Pinski
Hi,
  With ILP32 AARCH64, Pmode (DImode) != ptrmode (SImode) so the variable decl
has a mode of SImode while the register is DImode.  So the target that gets
passed down to expand_builtin_thread_pointer is NULL as expand does not
know how to get a subreg for a pointer type.

This fixes the problem by handling a NULL target like we are able to handle
for a non register/correct mode target inside expand_builtin_thread_pointer.

OK?  Build and tested for aarch64-elf with no regressions.

Thanks,
Andrew Pinski

* builtins.c (expand_builtin_thread_pointer): Create a new target
when the target is NULL.
---
 gcc/ChangeLog  |5 +
 gcc/builtins.c |2 +-
 2 files changed, 6 insertions(+), 1 deletions(-)

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 4f1c818..66797fa 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -5699,7 +5699,7 @@ expand_builtin_thread_pointer (tree exp, rtx target)
   if (icode != CODE_FOR_nothing)
 {
   struct expand_operand op;
-  if (!REG_P (target) || GET_MODE (target) != Pmode)
+  if (target == NULL_RTX || !REG_P (target) || GET_MODE (target) != Pmode)
target = gen_reg_rtx (Pmode);
   create_output_operand (&op, target, Pmode);
   expand_insn (icode, 1, &op);
-- 
1.7.2.5



[PATCH/AARCH64 0/6] Add ILP32 GNU/Linux support

2013-12-03 Thread Andrew Pinski
This patch set adds ILP32 support to GCC for GNU/Linux.  

A patch to size and pointer difference types for ILP32.

A middle-end to fix an ICE with __builtin_thread_pointer with ILP32.

A patch which fixes multi-lib list as the arguments to --with-multilib-list
are exclusive.

A patch which adds the trap pattern which I had submitted before but this
time with the change for the updated immediate.  This is needed to be able to
test using an older glibc which has ILP32 support included with it.  I am
working on the glibc and kernel portions still as there are a few ABI changes
which I am testing out.

A patch which fixes TLS variables with ILP32; shows up while compiling glibc
so no new testcases added.

One final patch which adds the name of the dynamic linker and passes the linker
script to the linker and allows for the multi-lib to work correctly.

All of these patches are tested incrementally.  Only the last patch depends on
the rest of the patches.  The rest can be applied independently.


Thanks,
Andrew Pinski


Andrew Pinski (6):
  2013-12-02  Andrew Pinski  
  2013-11-25  Andrew Pinski  
  2013-12-02  Andrew Pinski  
  2013-12-02  Andrew Pinski  
  2013-12-02  Andrew Pinski  
  2013-12-02  Andrew Pinski  

 gcc/builtins.c |2 +-
 gcc/config/aarch64/aarch64-linux.h |5 +-
 gcc/config/aarch64/aarch64.c   |   23 --
 gcc/config/aarch64/aarch64.h   |4 +-
 gcc/config/aarch64/aarch64.md  |   81 ++--
 gcc/config/aarch64/t-aarch64   |2 +-
 gcc/config/aarch64/t-aarch64-linux |7 +--
 8 files changed, 138 insertions(+), 28 deletions(-)

-- 
1.7.2.5



[PATCH/AARCH64 4/6] Implement the trap pattern

2013-12-03 Thread Andrew Pinski
As mentioned in http://gcc.gnu.org/ml/gcc-patches/2013-11/msg02228.html, this 
implements the trap pattern according to what was decided which brk should be 
used.

OK?  Built and tested on aarch64-elf with no regressions.  Also built a glibc 
for aarch64-linux-gnu with this patch included (LP64).

Thanks,
Andrew Pinski

* config/aarch64/aarch64.md (trap): New pattern.
---
 gcc/ChangeLog |9 +
 gcc/config/aarch64/aarch64.md |5 +
 2 files changed, 14 insertions(+), 0 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 8b3dbd7..313517f 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -290,6 +290,11 @@
   [(set_attr "type" "no_insn")]
 )
 
+(define_insn "trap"
+  [(trap_if (const_int 1) (const_int 8))]
+  ""
+  "brk #1000")
+
 (define_expand "prologue"
   [(clobber (const_int 0))]
   ""
-- 
1.7.2.5



[PATCH/AARCH64 1/6] Fix size and pointer different types for ILP32.

2013-12-03 Thread Andrew Pinski

While compiling some programs, GCC and glibc (and newlib)'s definitions of 
size_t
were not agreeing and causing format warnings to happen.  The simple testcase 
for this is:
#include 
#include 

int main(void)
{
  ssize_t t = 0x1;
  printf("%zd\n", t);
  return 0;
}
- CUT -

OK?  Built and tested for aarch64-elf without any regressions.

Thanks,
Andrew Pinski



* config/aarch64/aarch64.h (SIZE_TYPE):  Set to unsigned int for ILP32.
(PTRDIFF_TYPE): Set to int for ILP32.
---
 gcc/ChangeLog|5 +
 gcc/config/aarch64/aarch64.h |4 ++--
 2 files changed, 7 insertions(+), 2 deletions(-)

iff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h
index cead022..65d226d 100644
--- a/gcc/config/aarch64/aarch64.h
+++ b/gcc/config/aarch64/aarch64.h
@@ -144,9 +144,9 @@
 /* Using long long breaks -ansi and -std=c90, so these will need to be
made conditional for an LLP64 ABI.  */
 
-#define SIZE_TYPE  "long unsigned int"
+#define SIZE_TYPE  (TARGET_ILP32 ? "unsigned int" : "long unsigned int")
 
-#define PTRDIFF_TYPE   "long int"
+#define PTRDIFF_TYPE   (TARGET_ILP32 ? "int" : "long int")
 
 #define PCC_BITFIELD_TYPE_MATTERS  1
 
-- 
1.7.2.5



[PATCH/AARCH64 5/6] Fix TLS for ILP32.

2013-12-03 Thread Andrew Pinski
Hi,
  With ILP32, some simple usage of TLS variables causes an unrecognizable
instruction due to needing to use SImode for loading pointers from memory.
This fixes the three (tlsie_small, tlsle_small, tlsdesc_small) patterns to
support SImode for pointers.

OK?  Build and tested on aarch64-elf with no regressions.

Thanks,
Andrew Pinski

* config/aarch64/aarch64.c (aarch64_load_symref_appropriately):
Handle TLS for ILP32.
* config/aarch64/aarch64.md (tlsie_small): Change to an expand to
handle ILP32.
(tlsie_small_): New pattern.
(tlsle_small): Change to an expand to handle ILP32.
(tlsle_small_): New pattern.
(tlsdesc_small): Change to an expand to handle ILP32.
(tlsdesc_small_): New pattern.
---
 gcc/ChangeLog |   12 ++
 gcc/config/aarch64/aarch64.c  |   23 ++--
 gcc/config/aarch64/aarch64.md |   76 ++---
 3 files changed, 94 insertions(+), 17 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index b1b4eef..a3e4532 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -628,22 +628,37 @@ aarch64_load_symref_appropriately (rtx dest, rtx imm,
 
 case SYMBOL_SMALL_TLSDESC:
   {
-   rtx x0 = gen_rtx_REG (Pmode, R0_REGNUM);
+   enum machine_mode mode = GET_MODE (dest);
+   rtx x0 = gen_rtx_REG (mode, R0_REGNUM);
rtx tp;
 
+   gcc_assert (mode == Pmode || mode == ptr_mode);
+
emit_insn (gen_tlsdesc_small (imm));
tp = aarch64_load_tp (NULL);
-   emit_insn (gen_rtx_SET (Pmode, dest, gen_rtx_PLUS (Pmode, tp, x0)));
+
+   if (mode != Pmode)
+ tp = gen_lowpart (mode, tp);
+
+   emit_insn (gen_rtx_SET (mode, dest, gen_rtx_PLUS (mode, tp, x0)));
set_unique_reg_note (get_last_insn (), REG_EQUIV, imm);
return;
   }
 
 case SYMBOL_SMALL_GOTTPREL:
   {
-   rtx tmp_reg = gen_reg_rtx (Pmode);
+   enum machine_mode mode = GET_MODE (dest);
+   rtx tmp_reg = gen_reg_rtx (mode);
rtx tp = aarch64_load_tp (NULL);
+
+   gcc_assert (mode == Pmode || mode == ptr_mode);
+
emit_insn (gen_tlsie_small (tmp_reg, imm));
-   emit_insn (gen_rtx_SET (Pmode, dest, gen_rtx_PLUS (Pmode, tp, 
tmp_reg)));
+
+   if (mode != Pmode)
+ tp = gen_lowpart (mode, tp);
+
+   emit_insn (gen_rtx_SET (mode, dest, gen_rtx_PLUS (mode, tp, tmp_reg)));
set_unique_reg_note (get_last_insn (), REG_EQUIV, imm);
return;
   }
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 313517f..08fcc94 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -3577,35 +3577,85 @@
   [(set_attr "type" "call")
(set_attr "length" "16")])
 
-(define_insn "tlsie_small"
-  [(set (match_operand:DI 0 "register_operand" "=r")
-(unspec:DI [(match_operand:DI 1 "aarch64_tls_ie_symref" "S")]
+(define_expand "tlsie_small"
+  [(set (match_operand 0 "register_operand" "=r")
+(unspec [(match_operand 1 "aarch64_tls_ie_symref" "S")]
+  UNSPEC_GOTSMALLTLS))]
+  ""
+{
+  if (TARGET_ILP32)
+{
+  operands[0] = gen_lowpart (ptr_mode, operands[0]);
+  emit_insn (gen_tlsie_small_si (operands[0], operands[1]));
+}
+  else
+emit_insn (gen_tlsie_small_di (operands[0], operands[1]));
+  DONE;
+})
+
+(define_insn "tlsie_small_"
+  [(set (match_operand:PTR 0 "register_operand" "=r")
+(unspec:PTR [(match_operand 1 "aarch64_tls_ie_symref" "S")]
   UNSPEC_GOTSMALLTLS))]
   ""
-  "adrp\\t%0, %A1\;ldr\\t%0, [%0, #%L1]"
+  "adrp\\t%0, %A1\;ldr\\t%0, [%0, #%L1]"
   [(set_attr "type" "load1")
(set_attr "length" "8")]
 )
 
-(define_insn "tlsle_small"
-  [(set (match_operand:DI 0 "register_operand" "=r")
-(unspec:DI [(match_operand:DI 1 "register_operand" "r")
-   (match_operand:DI 2 "aarch64_tls_le_symref" "S")]
+
+(define_expand "tlsle_small"
+  [(set (match_operand 0 "register_operand" "=r")
+(unspec [(match_operand 1 "register_operand" "r")
+   (match_operand 2 "aarch64_tls_le_symref" "S")]
+   UNSPEC_GOTSMALLTLS))]
+  ""
+{
+  if (TARGET_ILP32)
+{
+  rtx temp = gen_reg_rtx (ptr_mode);
+  operands[1] = gen_lowpart (ptr_mode, operands[1]);
+  emit_insn (gen_tlsle_small_si (temp, operands[1], operands[2]));
+  emit_move_insn (operands[0], gen_lowpart (GET_MODE (operands[0]), temp));
+}
+  else
+emit_insn (gen_tlsle_small_di (operands[0], operands[1], operands[2]));
+  DONE;
+
+})
+
+(define_insn "tlsle_small_"
+  [(set (match_operand:PTR 0 "register_operand" "=r")
+(unspec:PTR [(match_operand:PTR 1 "register_operand" "r")
+   (match_operand 2 "aarch64_tls_le_symref" "S")]
   UNSPEC_GOTSMALLTLS))]
   ""
-  "add\\t%0, %1, #%G2\;add\\t%0, %0, #%L2"
+  "add\\t%0, %1, #%G2\;add\\t%0, %0, #%L2"
   [(

[PATCH/AARCH64 6/6] Support ILP32 multi-lib

2013-12-03 Thread Andrew Pinski
Hi,
  This is the final patch which adds support for the dynamic linker and
multi-lib directories for ILP32.  I did not change multi-arch support as
I did not know what it should be changed to and internally here at Cavium,
we don't use multi-arch.


OK?  Build and tested for aarch64-linux-gnu with and without 
--with-multilib-list=lp64,ilp32.

Thanks,
Andrew Pinski



* config/aarch64/aarch64-linux.h (GLIBC_DYNAMIC_LINKER): 
/lib/ld-linux32-aarch64.so.1
is used for ILP32.
(LINUX_TARGET_LINK_SPEC): Add linker script
file whose name depends on -mabi= and -mbig-endian.
* config/aarch64/t-aarch64-linux (MULTILIB_OSDIRNAMES): Handle LP64 
better
and handle ilp32 too.
(MULTILIB_OPTIONS): Delete.
(MULTILIB_DIRNAMES): Delete.
---
 gcc/ChangeLog  |   11 +++
 gcc/config/aarch64/aarch64-linux.h |5 +++--
 gcc/config/aarch64/t-aarch64-linux |7 ++-
 3 files changed, 16 insertions(+), 7 deletions(-)

diff --git a/gcc/config/aarch64/aarch64-linux.h 
b/gcc/config/aarch64/aarch64-linux.h
index 83efad4..408297a 100644
--- a/gcc/config/aarch64/aarch64-linux.h
+++ b/gcc/config/aarch64/aarch64-linux.h
@@ -21,7 +21,7 @@
 #ifndef GCC_AARCH64_LINUX_H
 #define GCC_AARCH64_LINUX_H
 
-#define GLIBC_DYNAMIC_LINKER "/lib/ld-linux-aarch64.so.1"
+#define GLIBC_DYNAMIC_LINKER "/lib/ld-linux%{mabi=ilp32:32}-aarch64.so.1"
 
 #define CPP_SPEC "%{pthread:-D_REENTRANT}"
 
@@ -32,7 +32,8 @@
%{rdynamic:-export-dynamic} \
-dynamic-linker " GNU_USER_DYNAMIC_LINKER " \
-X  \
-   %{mbig-endian:-EB} %{mlittle-endian:-EL}"
+   %{mbig-endian:-EB} %{mlittle-endian:-EL}\
+   -maarch64linux%{mabi=ilp32:32}%{mbig-endian:b}"
 
 #define LINK_SPEC LINUX_TARGET_LINK_SPEC
 
diff --git a/gcc/config/aarch64/t-aarch64-linux 
b/gcc/config/aarch64/t-aarch64-linux
index ca1525e..5032ea9 100644
--- a/gcc/config/aarch64/t-aarch64-linux
+++ b/gcc/config/aarch64/t-aarch64-linux
@@ -22,10 +22,7 @@ LIB1ASMSRC   = aarch64/lib1funcs.asm
 LIB1ASMFUNCS = _aarch64_sync_cache_range
 
 AARCH_BE = $(if $(findstring TARGET_BIG_ENDIAN_DEFAULT=1, $(tm_defines)),_be)
-MULTILIB_OSDIRNAMES = .=../lib64$(call 
if_multiarch,:aarch64$(AARCH_BE)-linux-gnu)
+MULTILIB_OSDIRNAMES = mabi.lp64=../lib64$(call 
if_multiarch,:aarch64$(AARCH_BE)-linux-gnu)
 MULTIARCH_DIRNAME = $(call if_multiarch,aarch64$(AARCH_BE)-linux-gnu)
 
-# Disable the multilib for linux-gnu targets for the time being; focus
-# on the baremetal targets.
-MULTILIB_OPTIONS=
-MULTILIB_DIRNAMES   =
+MULTILIB_OSDIRNAMES += mabi.ilp32=../lib32
-- 
1.7.2.5



  1   2   >