Re: how to check if target supports andnot instruction ?

2016-10-17 Thread Richard Biener
On Sat, 15 Oct 2016, Prathamesh Kulkarni wrote:

> On 13 October 2016 at 13:22, Marc Glisse  wrote:
> > On Thu, 13 Oct 2016, Prathamesh Kulkarni wrote:
> >
> >> On 12 October 2016 at 14:43, Richard Biener  wrote:
> >>>
> >>> On Wed, 12 Oct 2016, Marc Glisse wrote:
> >>>
>  On Wed, 12 Oct 2016, Prathamesh Kulkarni wrote:
> 
> > I was having a look at PR71636 and added the following pattern to
> > match.pd:
> > x & ((1U << b) - 1) -> x & ~(~0U << b)
> > However the transform is useful only if the target supports "andnot"
> > instruction.
> 
> 
>  rth was selling the transformation as a canonicalization, which is
>  beneficial
>  when there is an andnot instruction, and neutral otherwise, so it could
>  be
>  done always.
> >>>
> >>>
> >>> Well, its three instructions to three instructions and a more expensive
> >>> constant(?).  ~0U might not be available as immediate for the shift
> >>> instruction and 1U << b might be available as a bit-set instruction ...
> >>> (vs. the andnot).
> >
> >
> > True, I hadn't thought of bit-set.
> >
> >>> So yes, we might decide to canonicalize to andnot (and decide that
> >>> three binary to two binary and one unary op is "better").
> >>>
> >>> So no excuse to explore the target specific .pd fragment idea ... :/
> >>
> >> Hi,
> >> I have attached patch that adds the transform.
> >> Does that look OK ?
> >
> >
> > Why bit_not of build_zero_cst instead of build_all_ones_cst, as suggested in
> > the PR? If we only do the transformation when (1< > then we probably want to require that it has a single use (maybe even the
> > shift).
> >
> >> I am not sure how to write test-cases for it though.
> >> For the test-case:
> >> unsigned f(unsigned x, unsigned b)
> >> {
> >>  unsigned t1 = 1U << b;
> >>  unsigned t2 = t1 - 1;
> >>  unsigned t3 = x & t2;
> >>  return t3;
> >> }
> >>
> >> forwprop dump shows:
> >> Applying pattern match.pd:523, gimple-match.c:47419
> >> gimple_simplified to _6 = 4294967295 << b_1(D);
> >> _8 = ~_6;
> >> t3_5 = x_4(D) & _8;
> >>
> >> I could scan for "_6 = 4294967295 << b_1(D);"  however I suppose
> >> ~0 would depend on width of int and not always be 4294967295 ?
> >> Or should I scan for "_6 = 4294967295 << b_1(D);"
> >> and add /* { dg-require-effective int32 } */  to the test-case ?
> >
> >
> > You could check that you have ~, or that you don't have " 1 << ".
> Thanks for the suggestions.
> Does the attached patch look OK ?
> 
> For test-cases, scan-tree-dump-not "1 <<" works well for pr71636-1.c
> which tests GENERIC folding,
> however for GIMPLE folding, "1 << " still remains in the forwprop dump
> because dce isn't
> run to remove unused values.
> 
> For the test-case:
> unsigned f(unsigned x, unsigned b)
> {
>   unsigned t1 = 1U << b;
>   unsigned t2 = t1 - 1;
>   unsigned t3 = x & t2;
>   return t3;
> }
> 
> forwprop dump shows:
> Applying pattern match.pd:523, gimple-match.c:47418
> gimple_simplified to _6 = 4294967295 << b_1(D);
> _8 = ~_6;
> t3_5 = x_4(D) & _8;
> f (unsigned int x, unsigned int b)
> {
>   unsigned int t3;
>   unsigned int t2;
>   unsigned int t1;
>   unsigned int _6;
>   unsigned int _8;
> 
>   :
>   t1_2 = 1 << b_1(D);
>   t2_3 = t1_2 + 4294967295;
>   _6 = 4294967295 << b_1(D);
>   _8 = ~_6;
>   t3_5 = x_4(D) & _8;
>   return t3_5;
> 
> }
> 
> Instead I scanned for _8 = ~_6 with:
> /* { dg-final { scan-tree-dump "_\[0-9\] = ~_\[0-9\]" "forwprop1" } } */
> because rhs has bit_not and lhs doesn't.
> Is that OK ?

That's ok -- note I usually scan cddce1 instead of forwprop to have
DCE run on the IL.

Ok (with or without changing to scan cddce1 instead for not-1<<).

Thanks,
Richard.

> Bootstrap+tested on x86_64-unknown-linux-gnu.
> Cross-tested on arm*-*-*, aarch64*-*-*
> 
> Thanks,
> Prathamesh
> >
> > --
> > Marc Glisse
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: Question about sibling call epilogues & registers

2016-10-17 Thread Daniel Santos
It would probably be useful to post the actual code. The below function 
emit_msabi_outlined_restore() is is called from ix86_expand_epilogue() 
to emit the RTL to call the restore stub. Like ix86_expand_epilogue, it 
uses style == 0 to indicate that there will be a sibling call following 
the epilogue, so we will call the stub rather than jmp. But it also uses 
a call if we need to pop incoming args or are using a hard frame pointer.


The problem appears to be the lack of a function declaration causing 
get_call_reg_set_usage() (in final.c) to use the target default 
"regs_invalidated_by_call" value instead of what I've supplied with 
add_function_usage_to() and the gen_frame_load() insns for each register 
restored. I'm developing on 5.4.0 since I need a known good compiler for 
Wine testing and I plan to rebase it later.


static bool
emit_msabi_outlined_restore (const struct ix86_frame &frame, bool use_call,
 int style)
{
  struct machine_function *m = cfun->machine;
  const unsigned ncregs = NUM_X86_64_MS_CLOBBERED_REGS
  + m->outline_ms_sysv_extra_regs;
  rtvec v = rtvec_alloc (ncregs - 1 + (use_call ? 3 : 5));
  rtx insn, sym, tmp;
  rtx rsi = gen_rtx_REG (word_mode, SI_REG);
  rtx use = NULL_RTX;
  rtx note = NULL_RTX;
  unsigned i = 0;
  const struct xlogue_layout &xlogue = xlogue_layout::get_instance ();
  HOST_WIDE_INT stack_restore_offset;
  HOST_WIDE_INT reg_data_offset;
  HOST_WIDE_INT rsi_offset;
  rtx rsi_frame_load = NULL_RTX;
  HOST_WIDE_INT rsi_restore_offset = 0x7fff;
  const typeof (xlogue.regs[0]) *ri;

  gcc_assert (m->fs.sp_valid);

  stack_restore_offset = m->fs.sp_offset - frame.hard_frame_pointer_offset;
  rsi_offset = stack_restore_offset - xlogue.get_offset ();
  reg_data_offset = stack_restore_offset;

  /* adjust for alignment */
  if (m->outline_ms_sysv_offset_in)
reg_data_offset -= UNITS_PER_WORD;

  tmp = gen_rtx_PLUS (Pmode, stack_pointer_rtx, GEN_INT(rsi_offset));
  insn = emit_insn (gen_rtx_SET (VOIDmode, rsi, tmp));
  use_reg (&use, rsi);

  /* construct restore_multiple/restore_multiple_and_return insn */
  sym = xlogue.get_stub_rtx (use_call ? XLOGUE_STUB_RESTORE
  : XLOGUE_STUB_RESTORE_RET);

  /* Verify that note queue is empty. */
  gcc_assert(!queued_cfa_restores);

  /* If:
 * we need to pop incoming args,
 * this is a sibcall, or
 * we have a hard frame pointer
 then we want to call the epilogue stub instead of jumping to it. */
  if (use_call)
{
  tmp = gen_rtx_MEM (QImode, sym);
  RTVEC_ELT (v, i++) = gen_rtx_CALL (VOIDmode, tmp, const0_rtx);
}
  else
{
  rtx r10;

  RTVEC_ELT (v, i++) = ret_rtx;
  RTVEC_ELT (v, i++) = gen_rtx_USE (VOIDmode, sym);
  tmp = GEN_INT(stack_restore_offset);
  tmp = gen_rtx_PLUS (Pmode, stack_pointer_rtx, tmp);
  r10 = gen_rtx_REG (DImode, R10_REG);
  RTVEC_ELT (v, i++) = gen_rtx_SET (VOIDmode, r10, tmp);

  gcc_assert (m->fs.cfa_reg == stack_pointer_rtx);
  gcc_assert (m->fs.sp_valid);
  m->fs.sp_offset -= stack_restore_offset;

  note = gen_rtx_PLUS (Pmode, stack_pointer_rtx,
   GEN_INT(stack_restore_offset));
  note = gen_rtx_SET (VOIDmode, stack_pointer_rtx, note);
}

  RTVEC_ELT (v, i++) = gen_rtx_CLOBBER (VOIDmode,
gen_rtx_REG (CCmode, FLAGS_REG));

  for (ri = &xlogue.regs[0]; ri != &xlogue.regs[ncregs]; ++ri)
{
  enum machine_mode mode = SSE_REGNO_P(ri->regno) ? V4SFmode : 
word_mode;

  rtx reg, restore_note;
  HOST_WIDE_INT offset = ri->offset - 0x70;

  reg = gen_rtx_REG (mode, ri->regno);
  restore_note = gen_frame_load (reg, rsi, offset);

  /* Make sure RSI frame load/restore note is last */
  /* TODO: Do I really need to reorder this? */
  if (ri->regno == SI_REG)
{
  gcc_assert (!rsi_frame_load);
  rsi_frame_load = restore_note;
  rsi_restore_offset = offset;
}
  else
{
  RTVEC_ELT (v, i++) = restore_note;
  ix86_add_cfa_restore_note (NULL_RTX, reg, offset);
}
}

  /* add frame load & restore note for RSI last */
  gcc_assert (rsi_frame_load);
  RTVEC_ELT (v, i++) = rsi_frame_load;
  ix86_add_cfa_restore_note (NULL_RTX, gen_rtx_REG (DImode, SI_REG),
 rsi_restore_offset);

  gcc_assert (i == (unsigned)GET_NUM_ELEM (v));

  tmp = gen_rtx_PARALLEL (VOIDmode, v);
  if (use_call)
{
  insn = emit_call_insn (tmp);
  add_reg_note (insn, REG_CALL_DECL, sym);
  add_function_usage_to (insn, use);
}
  else
{
  insn = emit_jump_insn (tmp);
  JUMP_LABEL (insn) = ret_rtx;
  add_reg_note(insn, REG_CFA_ADJUST_CFA, note);
}

  RTX_FRAME_RELATED_P(insn) = true;
  ix86_add_queued_cfa_restore_notes (insn);

  if (use_call)
pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx,
 

Re: how to check if target supports andnot instruction ?

2016-10-17 Thread Prathamesh Kulkarni
On 17 October 2016 at 13:52, Richard Biener  wrote:
> On Sat, 15 Oct 2016, Prathamesh Kulkarni wrote:
>
>> On 13 October 2016 at 13:22, Marc Glisse  wrote:
>> > On Thu, 13 Oct 2016, Prathamesh Kulkarni wrote:
>> >
>> >> On 12 October 2016 at 14:43, Richard Biener  wrote:
>> >>>
>> >>> On Wed, 12 Oct 2016, Marc Glisse wrote:
>> >>>
>>  On Wed, 12 Oct 2016, Prathamesh Kulkarni wrote:
>> 
>> > I was having a look at PR71636 and added the following pattern to
>> > match.pd:
>> > x & ((1U << b) - 1) -> x & ~(~0U << b)
>> > However the transform is useful only if the target supports "andnot"
>> > instruction.
>> 
>> 
>>  rth was selling the transformation as a canonicalization, which is
>>  beneficial
>>  when there is an andnot instruction, and neutral otherwise, so it could
>>  be
>>  done always.
>> >>>
>> >>>
>> >>> Well, its three instructions to three instructions and a more expensive
>> >>> constant(?).  ~0U might not be available as immediate for the shift
>> >>> instruction and 1U << b might be available as a bit-set instruction ...
>> >>> (vs. the andnot).
>> >
>> >
>> > True, I hadn't thought of bit-set.
>> >
>> >>> So yes, we might decide to canonicalize to andnot (and decide that
>> >>> three binary to two binary and one unary op is "better").
>> >>>
>> >>> So no excuse to explore the target specific .pd fragment idea ... :/
>> >>
>> >> Hi,
>> >> I have attached patch that adds the transform.
>> >> Does that look OK ?
>> >
>> >
>> > Why bit_not of build_zero_cst instead of build_all_ones_cst, as suggested 
>> > in
>> > the PR? If we only do the transformation when (1<> > then we probably want to require that it has a single use (maybe even the
>> > shift).
>> >
>> >> I am not sure how to write test-cases for it though.
>> >> For the test-case:
>> >> unsigned f(unsigned x, unsigned b)
>> >> {
>> >>  unsigned t1 = 1U << b;
>> >>  unsigned t2 = t1 - 1;
>> >>  unsigned t3 = x & t2;
>> >>  return t3;
>> >> }
>> >>
>> >> forwprop dump shows:
>> >> Applying pattern match.pd:523, gimple-match.c:47419
>> >> gimple_simplified to _6 = 4294967295 << b_1(D);
>> >> _8 = ~_6;
>> >> t3_5 = x_4(D) & _8;
>> >>
>> >> I could scan for "_6 = 4294967295 << b_1(D);"  however I suppose
>> >> ~0 would depend on width of int and not always be 4294967295 ?
>> >> Or should I scan for "_6 = 4294967295 << b_1(D);"
>> >> and add /* { dg-require-effective int32 } */  to the test-case ?
>> >
>> >
>> > You could check that you have ~, or that you don't have " 1 << ".
>> Thanks for the suggestions.
>> Does the attached patch look OK ?
>>
>> For test-cases, scan-tree-dump-not "1 <<" works well for pr71636-1.c
>> which tests GENERIC folding,
>> however for GIMPLE folding, "1 << " still remains in the forwprop dump
>> because dce isn't
>> run to remove unused values.
>>
>> For the test-case:
>> unsigned f(unsigned x, unsigned b)
>> {
>>   unsigned t1 = 1U << b;
>>   unsigned t2 = t1 - 1;
>>   unsigned t3 = x & t2;
>>   return t3;
>> }
>>
>> forwprop dump shows:
>> Applying pattern match.pd:523, gimple-match.c:47418
>> gimple_simplified to _6 = 4294967295 << b_1(D);
>> _8 = ~_6;
>> t3_5 = x_4(D) & _8;
>> f (unsigned int x, unsigned int b)
>> {
>>   unsigned int t3;
>>   unsigned int t2;
>>   unsigned int t1;
>>   unsigned int _6;
>>   unsigned int _8;
>>
>>   :
>>   t1_2 = 1 << b_1(D);
>>   t2_3 = t1_2 + 4294967295;
>>   _6 = 4294967295 << b_1(D);
>>   _8 = ~_6;
>>   t3_5 = x_4(D) & _8;
>>   return t3_5;
>>
>> }
>>
>> Instead I scanned for _8 = ~_6 with:
>> /* { dg-final { scan-tree-dump "_\[0-9\] = ~_\[0-9\]" "forwprop1" } } */
>> because rhs has bit_not and lhs doesn't.
>> Is that OK ?
>
> That's ok -- note I usually scan cddce1 instead of forwprop to have
> DCE run on the IL.
>
> Ok (with or without changing to scan cddce1 instead for not-1<<).
Thanks, committed as r241229.

Regards,
Prathamesh
>
> Thanks,
> Richard.
>
>> Bootstrap+tested on x86_64-unknown-linux-gnu.
>> Cross-tested on arm*-*-*, aarch64*-*-*
>>
>> Thanks,
>> Prathamesh
>> >
>> > --
>> > Marc Glisse
>>
>
> --
> Richard Biener 
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
> 21284 (AG Nuernberg)


Clear basic block flags before using BB_VISITED for OpenACC loops processing (was: basic_block flags, BB_VISITED)

2016-10-17 Thread Thomas Schwinge
Hi!

On Fri, 14 Oct 2016 13:06:59 +0200, Richard Biener  
wrote:
> On Fri, Oct 14, 2016 at 1:00 PM, Nathan Sidwell  wrote:
> > On 10/14/16 05:28, Richard Biener wrote:
> >
> >> The BB_VISITED flag has indetermined state at the beginning of a pass.
> >> You have to ensure it is cleared yourself.
> >
> >
> > In that case the openacc (&nvptx?) passes should be modified to clear the
> > flags at their start, rather than at their end.

The gcc/config/nvptx/nvptx.c handling seems fine -- it explicitly clears
BB_VISITED for all basic block it works on.

> Yes.  But as I said, I ran into IRA ICEs (somewhere in the testsuite) when not
> cleaning up after tree-ssa-propagate.c.  So somebody has to fix IRA first.

Is there a GCC PR for that, or where are you tracking such issues?

OK to commit the following?  Is such a test case appropriate (which would
have caught this issue right away), in particular the dg-final
scan-tree-dump line?

commit 4e8abdfd25aa08abbad0c3fe2e9ec6182308f78c
Author: Thomas Schwinge 
Date:   Mon Oct 17 11:29:43 2016 +0200

Clear basic block flags before using BB_VISITED for OpenACC loops processing

gcc/
* omp-low.c (oacc_loop_discovery): Call clear_bb_flags.

gcc/testsuite/
* gcc.dg/goacc/loop-processing-1.c: New file.
---
 gcc/omp-low.c  |  9 +
 gcc/testsuite/gcc.dg/goacc/loop-processing-1.c | 18 ++
 2 files changed, 23 insertions(+), 4 deletions(-)

diff --git gcc/omp-low.c gcc/omp-low.c
index 213bf8c..5257d21 100644
--- gcc/omp-low.c
+++ gcc/omp-low.c
@@ -19340,7 +19340,9 @@ oacc_loop_sibling_nreverse (oacc_loop *loop)
 static oacc_loop *
 oacc_loop_discovery ()
 {
-  basic_block bb;
+  /* Clear basic block flags, in particular BB_VISITED which we're going to use
+ in the following.  */
+  clear_bb_flags ();
   
   oacc_loop *top = new_oacc_loop_outer (current_function_decl);
   oacc_loop_discover_walk (top, ENTRY_BLOCK_PTR_FOR_FN (cfun));
@@ -19349,9 +19351,8 @@ oacc_loop_discovery ()
  that diagnostics come out in an unsurprising order.  */
   top = oacc_loop_sibling_nreverse (top);
 
-  /* Reset the visited flags.  */
-  FOR_ALL_BB_FN (bb, cfun)
-bb->flags &= ~BB_VISITED;
+  /* Clear basic block flags again, as otherwise IRA will explode later on.  */
+  clear_bb_flags ();
 
   return top;
 }
diff --git gcc/testsuite/gcc.dg/goacc/loop-processing-1.c 
gcc/testsuite/gcc.dg/goacc/loop-processing-1.c
new file mode 100644
index 000..2f0b3a2
--- /dev/null
+++ gcc/testsuite/gcc.dg/goacc/loop-processing-1.c
@@ -0,0 +1,18 @@
+/* Make sure that OpenACC loop processing happens.  */
+/* { dg-additional-options "-O2 -fdump-tree-oaccdevlow" } */
+
+extern int place ();
+
+int vector_1 (int *ary, int size)
+{
+#pragma acc parallel num_workers (32) vector_length(32) copy(ary[0:size]) 
firstprivate (size)
+  {
+#pragma acc loop gang
+for (int jx = 0; jx < 1; jx++)
+#pragma acc loop auto
+  for (int ix = 0; ix < size; ix++)
+   ary[ix] = place ();
+  }
+}
+
+/* { dg-final { scan-tree-dump "OpenACC loops.*Loop 0\\\(0\\\).*Loop 
14\\\(1\\\).*\\\.data_dep\\\.\[0-9_\]+ = UNIQUE \\\(OACC_HEAD_MARK, 0, 1, 
20\\\);.*Head-0:.*\\\.data_dep\\\.\[0-9_\]+ = UNIQUE \\\(OACC_HEAD_MARK, 0, 1, 
20\\\);.*\\\.data_dep\\\.\[0-9_\]+ = UNIQUE \\\(OACC_FORK, 
\\\.data_dep\\\.\[0-9_\]+, 0\\\);.*Tail-0:.*\\\.data_dep\\\.\[0-9_\]+ = UNIQUE 
\\\(OACC_TAIL_MARK, \\\.data_dep\\\.\[0-9_\]+, 
1\\\);.*\\\.data_dep\\\.\[0-9_\]+ = UNIQUE \\\(OACC_JOIN, 
\\\.data_dep\\\.\[0-9_\]+, 0\\\);.*Loop 6\\\(4\\\).*\\\.data_dep\\\.\[0-9_\]+ = 
UNIQUE \\\(OACC_HEAD_MARK, 0, 1, 6\\\);.*Head-0:.*\\\.data_dep\\\.\[0-9_\]+ = 
UNIQUE \\\(OACC_HEAD_MARK, 0, 1, 6\\\);.*\\\.data_dep\\\.\[0-9_\]+ = UNIQUE 
\\\(OACC_FORK, \\\.data_dep\\\.\[0-9_\]+, 
2\\\);.*Tail-0:.*\\\.data_dep\\\.\[0-9_\]+ = UNIQUE \\\(OACC_TAIL_MARK, 
\\\.data_dep\\\.\[0-9_\]+, 1\\\);.*\\\.data_dep\\\.\[0-9_\]+ = UNIQUE 
\\\(OACC_JOIN, \\\.data_dep\\\.\[0-9_\]+, 2\\\);" "oaccdevlow" } } */


Grüße
 Thomas


Re: Clear basic block flags before using BB_VISITED for OpenACC loops processing (was: basic_block flags, BB_VISITED)

2016-10-17 Thread Richard Biener
On Mon, Oct 17, 2016 at 11:38 AM, Thomas Schwinge
 wrote:
> Hi!
>
> On Fri, 14 Oct 2016 13:06:59 +0200, Richard Biener 
>  wrote:
>> On Fri, Oct 14, 2016 at 1:00 PM, Nathan Sidwell  wrote:
>> > On 10/14/16 05:28, Richard Biener wrote:
>> >
>> >> The BB_VISITED flag has indetermined state at the beginning of a pass.
>> >> You have to ensure it is cleared yourself.
>> >
>> >
>> > In that case the openacc (&nvptx?) passes should be modified to clear the
>> > flags at their start, rather than at their end.
>
> The gcc/config/nvptx/nvptx.c handling seems fine -- it explicitly clears
> BB_VISITED for all basic block it works on.
>
>> Yes.  But as I said, I ran into IRA ICEs (somewhere in the testsuite) when 
>> not
>> cleaning up after tree-ssa-propagate.c.  So somebody has to fix IRA first.
>
> Is there a GCC PR for that, or where are you tracking such issues?

No, just tracking in my head.

> OK to commit the following?  Is such a test case appropriate (which would
> have caught this issue right away), in particular the dg-final
> scan-tree-dump line?

Ugh.  Not worse to what we do in various dwarf scanning I guess.

Doesn't failure lead to a miscompile eventually?  So you could formulate
this as a dg-do run test with a check for the desired outcome?

Richard.

> commit 4e8abdfd25aa08abbad0c3fe2e9ec6182308f78c
> Author: Thomas Schwinge 
> Date:   Mon Oct 17 11:29:43 2016 +0200
>
> Clear basic block flags before using BB_VISITED for OpenACC loops 
> processing
>
> gcc/
> * omp-low.c (oacc_loop_discovery): Call clear_bb_flags.
>
> gcc/testsuite/
> * gcc.dg/goacc/loop-processing-1.c: New file.
> ---
>  gcc/omp-low.c  |  9 +
>  gcc/testsuite/gcc.dg/goacc/loop-processing-1.c | 18 ++
>  2 files changed, 23 insertions(+), 4 deletions(-)
>
> diff --git gcc/omp-low.c gcc/omp-low.c
> index 213bf8c..5257d21 100644
> --- gcc/omp-low.c
> +++ gcc/omp-low.c
> @@ -19340,7 +19340,9 @@ oacc_loop_sibling_nreverse (oacc_loop *loop)
>  static oacc_loop *
>  oacc_loop_discovery ()
>  {
> -  basic_block bb;
> +  /* Clear basic block flags, in particular BB_VISITED which we're going to 
> use
> + in the following.  */
> +  clear_bb_flags ();
>
>oacc_loop *top = new_oacc_loop_outer (current_function_decl);
>oacc_loop_discover_walk (top, ENTRY_BLOCK_PTR_FOR_FN (cfun));
> @@ -19349,9 +19351,8 @@ oacc_loop_discovery ()
>   that diagnostics come out in an unsurprising order.  */
>top = oacc_loop_sibling_nreverse (top);
>
> -  /* Reset the visited flags.  */
> -  FOR_ALL_BB_FN (bb, cfun)
> -bb->flags &= ~BB_VISITED;
> +  /* Clear basic block flags again, as otherwise IRA will explode later on.  
> */
> +  clear_bb_flags ();
>
>return top;
>  }
> diff --git gcc/testsuite/gcc.dg/goacc/loop-processing-1.c 
> gcc/testsuite/gcc.dg/goacc/loop-processing-1.c
> new file mode 100644
> index 000..2f0b3a2
> --- /dev/null
> +++ gcc/testsuite/gcc.dg/goacc/loop-processing-1.c
> @@ -0,0 +1,18 @@
> +/* Make sure that OpenACC loop processing happens.  */
> +/* { dg-additional-options "-O2 -fdump-tree-oaccdevlow" } */
> +
> +extern int place ();
> +
> +int vector_1 (int *ary, int size)
> +{
> +#pragma acc parallel num_workers (32) vector_length(32) copy(ary[0:size]) 
> firstprivate (size)
> +  {
> +#pragma acc loop gang
> +for (int jx = 0; jx < 1; jx++)
> +#pragma acc loop auto
> +  for (int ix = 0; ix < size; ix++)
> +   ary[ix] = place ();
> +  }
> +}
> +
> +/* { dg-final { scan-tree-dump "OpenACC loops.*Loop 0\\\(0\\\).*Loop 
> 14\\\(1\\\).*\\\.data_dep\\\.\[0-9_\]+ = UNIQUE \\\(OACC_HEAD_MARK, 0, 1, 
> 20\\\);.*Head-0:.*\\\.data_dep\\\.\[0-9_\]+ = UNIQUE \\\(OACC_HEAD_MARK, 0, 
> 1, 20\\\);.*\\\.data_dep\\\.\[0-9_\]+ = UNIQUE \\\(OACC_FORK, 
> \\\.data_dep\\\.\[0-9_\]+, 0\\\);.*Tail-0:.*\\\.data_dep\\\.\[0-9_\]+ = 
> UNIQUE \\\(OACC_TAIL_MARK, \\\.data_dep\\\.\[0-9_\]+, 
> 1\\\);.*\\\.data_dep\\\.\[0-9_\]+ = UNIQUE \\\(OACC_JOIN, 
> \\\.data_dep\\\.\[0-9_\]+, 0\\\);.*Loop 6\\\(4\\\).*\\\.data_dep\\\.\[0-9_\]+ 
> = UNIQUE \\\(OACC_HEAD_MARK, 0, 1, 6\\\);.*Head-0:.*\\\.data_dep\\\.\[0-9_\]+ 
> = UNIQUE \\\(OACC_HEAD_MARK, 0, 1, 6\\\);.*\\\.data_dep\\\.\[0-9_\]+ = UNIQUE 
> \\\(OACC_FORK, \\\.data_dep\\\.\[0-9_\]+, 
> 2\\\);.*Tail-0:.*\\\.data_dep\\\.\[0-9_\]+ = UNIQUE \\\(OACC_TAIL_MARK, 
> \\\.data_dep\\\.\[0-9_\]+, 1\\\);.*\\\.data_dep\\\.\[0-9_\]+ = UNIQUE 
> \\\(OACC_JOIN, \\\.data_dep\\\.\[0-9_\]+, 2\\\);" "oaccdevlow" } } */
>
>
> Grüße
>  Thomas


Re: Clear basic block flags before using BB_VISITED for OpenACC loops processing

2016-10-17 Thread Thomas Schwinge
Hi!

On Mon, 17 Oct 2016 13:22:17 +0200, Richard Biener  
wrote:
> On Mon, Oct 17, 2016 at 11:38 AM, Thomas Schwinge
>  wrote:
> > On Fri, 14 Oct 2016 13:06:59 +0200, Richard Biener 
> >  wrote:
> >> On Fri, Oct 14, 2016 at 1:00 PM, Nathan Sidwell  wrote:
> >> > On 10/14/16 05:28, Richard Biener wrote:
> >> >
> >> >> The BB_VISITED flag has indetermined state at the beginning of a pass.
> >> >> You have to ensure it is cleared yourself.
> >> >
> >> >
> >> > In that case the openacc (&nvptx?) passes should be modified to clear the
> >> > flags at their start, rather than at their end.
> >
> > The gcc/config/nvptx/nvptx.c handling seems fine -- it explicitly clears
> > BB_VISITED for all basic block it works on.
> >
> >> Yes.  But as I said, I ran into IRA ICEs (somewhere in the testsuite) when 
> >> not
> >> cleaning up after tree-ssa-propagate.c.  So somebody has to fix IRA first.
> >
> > Is there a GCC PR for that, or where are you tracking such issues?
> 
> No, just tracking in my head.

Tsk, tsk...  ;-)


> > OK to commit the following?  Is such a test case appropriate (which would
> > have caught this issue right away), in particular the dg-final
> > scan-tree-dump line?
> 
> Ugh.  Not worse to what we do in various dwarf scanning I guess.

;-|

> Doesn't failure lead to a miscompile eventually?  So you could formulate
> this as a dg-do run test with a check for the desired outcome?

No, unfortunately.  In this case the error is "benign" such that the
OpenACC loop processing machinery will decide to not parallelize loops
that ought to be parallelized.  This won't generally cause any problem
(apart from performance regression, obviously); it just caused problems
in a few libgomp test cases that actually at run time test for
parallelized execution -- which will/did trigger only with nvptx
offloading enabled, which not too many people are testing.  The test case
I propose below will trigger also for non-offloading configurations.

> > Clear basic block flags before using BB_VISITED for OpenACC loops 
> > processing
> >
> > gcc/
> > * omp-low.c (oacc_loop_discovery): Call clear_bb_flags.
> >
> > gcc/testsuite/
> > * gcc.dg/goacc/loop-processing-1.c: New file.
> > ---
> >  gcc/omp-low.c  |  9 +
> >  gcc/testsuite/gcc.dg/goacc/loop-processing-1.c | 18 ++
> >  2 files changed, 23 insertions(+), 4 deletions(-)
> >
> > --- gcc/omp-low.c
> > +++ gcc/omp-low.c
> > @@ -19340,7 +19340,9 @@ oacc_loop_sibling_nreverse (oacc_loop *loop)
> >  static oacc_loop *
> >  oacc_loop_discovery ()
> >  {
> > -  basic_block bb;
> > +  /* Clear basic block flags, in particular BB_VISITED which we're going 
> > to use
> > + in the following.  */
> > +  clear_bb_flags ();
> >
> >oacc_loop *top = new_oacc_loop_outer (current_function_decl);
> >oacc_loop_discover_walk (top, ENTRY_BLOCK_PTR_FOR_FN (cfun));
> > @@ -19349,9 +19351,8 @@ oacc_loop_discovery ()
> >   that diagnostics come out in an unsurprising order.  */
> >top = oacc_loop_sibling_nreverse (top);
> >
> > -  /* Reset the visited flags.  */
> > -  FOR_ALL_BB_FN (bb, cfun)
> > -bb->flags &= ~BB_VISITED;
> > +  /* Clear basic block flags again, as otherwise IRA will explode later 
> > on.  */
> > +  clear_bb_flags ();
> >
> >return top;
> >  }
> > --- /dev/null
> > +++ gcc/testsuite/gcc.dg/goacc/loop-processing-1.c
> > @@ -0,0 +1,18 @@
> > +/* Make sure that OpenACC loop processing happens.  */
> > +/* { dg-additional-options "-O2 -fdump-tree-oaccdevlow" } */
> > +
> > +extern int place ();
> > +
> > +void vector_1 (int *ary, int size)
> > +{
> > +#pragma acc parallel num_workers (32) vector_length(32) copy(ary[0:size]) 
> > firstprivate (size)
> > +  {
> > +#pragma acc loop gang
> > +for (int jx = 0; jx < 1; jx++)
> > +#pragma acc loop auto
> > +  for (int ix = 0; ix < size; ix++)
> > +   ary[ix] = place ();
> > +  }
> > +}
> > +
> > +/* { dg-final { scan-tree-dump "OpenACC loops.*Loop 0\\\(0\\\).*Loop 
> > 14\\\(1\\\).*\\\.data_dep\\\.\[0-9_\]+ = UNIQUE \\\(OACC_HEAD_MARK, 0, 1, 
> > 20\\\);.*Head-0:.*\\\.data_dep\\\.\[0-9_\]+ = UNIQUE \\\(OACC_HEAD_MARK, 0, 
> > 1, 20\\\);.*\\\.data_dep\\\.\[0-9_\]+ = UNIQUE \\\(OACC_FORK, 
> > \\\.data_dep\\\.\[0-9_\]+, 0\\\);.*Tail-0:.*\\\.data_dep\\\.\[0-9_\]+ = 
> > UNIQUE \\\(OACC_TAIL_MARK, \\\.data_dep\\\.\[0-9_\]+, 
> > 1\\\);.*\\\.data_dep\\\.\[0-9_\]+ = UNIQUE \\\(OACC_JOIN, 
> > \\\.data_dep\\\.\[0-9_\]+, 0\\\);.*Loop 
> > 6\\\(4\\\).*\\\.data_dep\\\.\[0-9_\]+ = UNIQUE \\\(OACC_HEAD_MARK, 0, 1, 
> > 6\\\);.*Head-0:.*\\\.data_dep\\\.\[0-9_\]+ = UNIQUE \\\(OACC_HEAD_MARK, 0, 
> > 1, 6\\\);.*\\\.data_dep\\\.\[0-9_\]+ = UNIQUE \\\(OACC_FORK, 
> > \\\.data_dep\\\.\[0-9_\]+, 2\\\);.*Tail-0:.*\\\.data_dep\\\.\[0-9_\]+ = 
> > UNIQUE \\\(OACC_TAIL_MARK, \\\.data_dep\\\.\[0-9_\]+, 
> > 1\\\);.*\\\.data_dep\\\.\[0-9_\]+ = UNIQUE \\\(OACC_JOIN, 
> > \\\.data_dep\\\.\[0-9_\]+, 2\\\);" "oaccdevlow" } } */


Grüße

Re: Clear basic block flags before using BB_VISITED for OpenACC loops processing

2016-10-17 Thread Richard Biener
On Mon, Oct 17, 2016 at 1:47 PM, Thomas Schwinge
 wrote:
> Hi!
>
> On Mon, 17 Oct 2016 13:22:17 +0200, Richard Biener 
>  wrote:
>> On Mon, Oct 17, 2016 at 11:38 AM, Thomas Schwinge
>>  wrote:
>> > On Fri, 14 Oct 2016 13:06:59 +0200, Richard Biener 
>> >  wrote:
>> >> On Fri, Oct 14, 2016 at 1:00 PM, Nathan Sidwell  wrote:
>> >> > On 10/14/16 05:28, Richard Biener wrote:
>> >> >
>> >> >> The BB_VISITED flag has indetermined state at the beginning of a pass.
>> >> >> You have to ensure it is cleared yourself.
>> >> >
>> >> >
>> >> > In that case the openacc (&nvptx?) passes should be modified to clear 
>> >> > the
>> >> > flags at their start, rather than at their end.
>> >
>> > The gcc/config/nvptx/nvptx.c handling seems fine -- it explicitly clears
>> > BB_VISITED for all basic block it works on.
>> >
>> >> Yes.  But as I said, I ran into IRA ICEs (somewhere in the testsuite) 
>> >> when not
>> >> cleaning up after tree-ssa-propagate.c.  So somebody has to fix IRA first.
>> >
>> > Is there a GCC PR for that, or where are you tracking such issues?
>>
>> No, just tracking in my head.
>
> Tsk, tsk...  ;-)

bb-reorder.c has the same issue.

Index: gcc/bb-reorder.c
===
--- gcc/bb-reorder.c(revision 241228)
+++ gcc/bb-reorder.c(working copy)
@@ -2355,7 +2355,10 @@ reorder_basic_blocks_simple (void)
  To start with, everything points to itself, nothing is assigned yet.  */

   FOR_ALL_BB_FN (bb, cfun)
-bb->aux = bb;
+{
+  bb->aux = bb;
+  bb->flags &= ~BB_VISITED;
+}

   EXIT_BLOCK_PTR_FOR_FN (cfun)->aux = 0;


note that I didn't really understand the IRA issue (the code looks
fine from a quick look - it initializes
to BB_VISITED).  Still removing the tree-ssa-propagate.c BB_VISITED
clearing resulted in a bootstrap failure on x86_64.

Richard.

>
>> > OK to commit the following?  Is such a test case appropriate (which would
>> > have caught this issue right away), in particular the dg-final
>> > scan-tree-dump line?
>>
>> Ugh.  Not worse to what we do in various dwarf scanning I guess.
>
> ;-|
>
>> Doesn't failure lead to a miscompile eventually?  So you could formulate
>> this as a dg-do run test with a check for the desired outcome?
>
> No, unfortunately.  In this case the error is "benign" such that the
> OpenACC loop processing machinery will decide to not parallelize loops
> that ought to be parallelized.

So you can scan for "loop parallelized" instead?  I fear your pattern
is quite fragile
to maintain over time.

Richard.

>  This won't generally cause any problem
> (apart from performance regression, obviously); it just caused problems
> in a few libgomp test cases that actually at run time test for
> parallelized execution -- which will/did trigger only with nvptx
> offloading enabled, which not too many people are testing.  The test case
> I propose below will trigger also for non-offloading configurations.
>
>> > Clear basic block flags before using BB_VISITED for OpenACC loops 
>> > processing
>> >
>> > gcc/
>> > * omp-low.c (oacc_loop_discovery): Call clear_bb_flags.
>> >
>> > gcc/testsuite/
>> > * gcc.dg/goacc/loop-processing-1.c: New file.
>> > ---
>> >  gcc/omp-low.c  |  9 +
>> >  gcc/testsuite/gcc.dg/goacc/loop-processing-1.c | 18 ++
>> >  2 files changed, 23 insertions(+), 4 deletions(-)
>> >
>> > --- gcc/omp-low.c
>> > +++ gcc/omp-low.c
>> > @@ -19340,7 +19340,9 @@ oacc_loop_sibling_nreverse (oacc_loop *loop)
>> >  static oacc_loop *
>> >  oacc_loop_discovery ()
>> >  {
>> > -  basic_block bb;
>> > +  /* Clear basic block flags, in particular BB_VISITED which we're going 
>> > to use
>> > + in the following.  */
>> > +  clear_bb_flags ();
>> >
>> >oacc_loop *top = new_oacc_loop_outer (current_function_decl);
>> >oacc_loop_discover_walk (top, ENTRY_BLOCK_PTR_FOR_FN (cfun));
>> > @@ -19349,9 +19351,8 @@ oacc_loop_discovery ()
>> >   that diagnostics come out in an unsurprising order.  */
>> >top = oacc_loop_sibling_nreverse (top);
>> >
>> > -  /* Reset the visited flags.  */
>> > -  FOR_ALL_BB_FN (bb, cfun)
>> > -bb->flags &= ~BB_VISITED;
>> > +  /* Clear basic block flags again, as otherwise IRA will explode later 
>> > on.  */
>> > +  clear_bb_flags ();
>> >
>> >return top;
>> >  }
>> > --- /dev/null
>> > +++ gcc/testsuite/gcc.dg/goacc/loop-processing-1.c
>> > @@ -0,0 +1,18 @@
>> > +/* Make sure that OpenACC loop processing happens.  */
>> > +/* { dg-additional-options "-O2 -fdump-tree-oaccdevlow" } */
>> > +
>> > +extern int place ();
>> > +
>> > +void vector_1 (int *ary, int size)
>> > +{
>> > +#pragma acc parallel num_workers (32) vector_length(32) copy(ary[0:size]) 
>> > firstprivate (size)
>> > +  {
>> > +#pragma acc loop gang
>> > +for (int jx = 0; jx < 1; jx++)
>> > +#pragma acc loop auto
>> > +  for (int ix = 0; ix < size; ix++)
>> > +   ary[ix] = place ();
>> >

Re: Clear basic block flags before using BB_VISITED for OpenACC loops processing

2016-10-17 Thread Thomas Schwinge
Hi!

On Mon, 17 Oct 2016 14:08:44 +0200, Richard Biener  
wrote:
> On Mon, Oct 17, 2016 at 1:47 PM, Thomas Schwinge
>  wrote:
> > On Mon, 17 Oct 2016 13:22:17 +0200, Richard Biener 
> >  wrote:
> >> On Mon, Oct 17, 2016 at 11:38 AM, Thomas Schwinge
> >>  wrote:
> >> > On Fri, 14 Oct 2016 13:06:59 +0200, Richard Biener 
> >> >  wrote:
> >> >> On Fri, Oct 14, 2016 at 1:00 PM, Nathan Sidwell  wrote:
> >> >> > On 10/14/16 05:28, Richard Biener wrote:
> >> >> >
> >> >> >> The BB_VISITED flag has indetermined state at the beginning of a 
> >> >> >> pass.
> >> >> >> You have to ensure it is cleared yourself.
> >> >> >
> >> >> >
> >> >> > In that case the openacc (&nvptx?) passes should be modified to clear 
> >> >> > the
> >> >> > flags at their start, rather than at their end.

> >> > OK to commit the following?  Is such a test case appropriate (which would
> >> > have caught this issue right away), in particular the dg-final
> >> > scan-tree-dump line?
> >>
> >> Ugh.  Not worse to what we do in various dwarf scanning I guess.
> >
> > ;-|
> >
> >> Doesn't failure lead to a miscompile eventually?  So you could formulate
> >> this as a dg-do run test with a check for the desired outcome?
> >
> > No, unfortunately.  In this case the error is "benign" such that the
> > OpenACC loop processing machinery will decide to not parallelize loops
> > that ought to be parallelized.
> 
> So you can scan for "loop parallelized" instead?

The dump would still contain the outer loop's "Loop 0(0)" marker, so I'd
have to scan for "Head"/"Tail"/"UNIQUE" or similar instead -- but that
seems likewise fragile (for false negatives), and less useful than
scanning for the complete pattern.

> I fear your pattern
> is quite fragile
> to maintain over time.

Agreed -- but then, that's intentional: my idea for this new test case
has been to have it actually verify the expected OpenACC loop processing,
so it's clear that this pattern will need to be adjusted if changing the
OpenACC loop processing.

> >  This won't generally cause any problem
> > (apart from performance regression, obviously); it just caused problems
> > in a few libgomp test cases that actually at run time test for
> > parallelized execution -- which will/did trigger only with nvptx
> > offloading enabled, which not too many people are testing.  The test case
> > I propose below will trigger also for non-offloading configurations.

On IRC, Segher suggested to 'use {} instead of "" to avoid [all those
backslashes]' -- thanks, done.

commit 88260dc23e752c3e05c6644ee3b653a947714440
Author: Thomas Schwinge 
Date:   Mon Oct 17 15:33:09 2016 +0200

Clear basic block flags before using BB_VISITED for OpenACC loops processing

gcc/
* omp-low.c (oacc_loop_discovery): Call clear_bb_flags.

gcc/testsuite/
* gcc.dg/goacc/loop-processing-1.c: New file.
---
 gcc/omp-low.c  |  9 +
 gcc/testsuite/gcc.dg/goacc/loop-processing-1.c | 18 ++
 2 files changed, 23 insertions(+), 4 deletions(-)

diff --git gcc/omp-low.c gcc/omp-low.c
index 213bf8c..5257d21 100644
--- gcc/omp-low.c
+++ gcc/omp-low.c
@@ -19340,7 +19340,9 @@ oacc_loop_sibling_nreverse (oacc_loop *loop)
 static oacc_loop *
 oacc_loop_discovery ()
 {
-  basic_block bb;
+  /* Clear basic block flags, in particular BB_VISITED which we're going to use
+ in the following.  */
+  clear_bb_flags ();
   
   oacc_loop *top = new_oacc_loop_outer (current_function_decl);
   oacc_loop_discover_walk (top, ENTRY_BLOCK_PTR_FOR_FN (cfun));
@@ -19349,9 +19351,8 @@ oacc_loop_discovery ()
  that diagnostics come out in an unsurprising order.  */
   top = oacc_loop_sibling_nreverse (top);
 
-  /* Reset the visited flags.  */
-  FOR_ALL_BB_FN (bb, cfun)
-bb->flags &= ~BB_VISITED;
+  /* Clear basic block flags again, as otherwise IRA will explode later on.  */
+  clear_bb_flags ();
 
   return top;
 }
diff --git gcc/testsuite/gcc.dg/goacc/loop-processing-1.c 
gcc/testsuite/gcc.dg/goacc/loop-processing-1.c
new file mode 100644
index 000..619576a
--- /dev/null
+++ gcc/testsuite/gcc.dg/goacc/loop-processing-1.c
@@ -0,0 +1,18 @@
+/* Make sure that OpenACC loop processing happens.  */
+/* { dg-additional-options "-O2 -fdump-tree-oaccdevlow" } */
+
+extern int place ();
+
+void vector_1 (int *ary, int size)
+{
+#pragma acc parallel num_workers (32) vector_length(32) copy(ary[0:size]) 
firstprivate (size)
+  {
+#pragma acc loop gang
+for (int jx = 0; jx < 1; jx++)
+#pragma acc loop auto
+  for (int ix = 0; ix < size; ix++)
+   ary[ix] = place ();
+  }
+}
+
+/* { dg-final { scan-tree-dump {OpenACC loops.*Loop 0\(0\).*Loop 
14\(1\).*\.data_dep\.[0-9_]+ = UNIQUE \(OACC_HEAD_MARK, 0, 1, 
20\);.*Head-0:.*\.data_dep\.[0-9_]+ = UNIQUE \(OACC_HEAD_MARK, 0, 1, 
20\);.*\.data_dep\.[0-9_]+ = UNIQUE \(OACC_FORK, \.data_dep\.[0-9_]+, 
0\);.*Tail-0:.*\.data_dep\.[0-9_]+ = UNIQUE \(OACC_TAIL_MARK, 
\.data_dep\.[0-9_]+, 1\);.*\.data_dep\.[0-9_]+ = UNIQU

Re: Make GCC emit ASM instructions in 'gcc/except.c' for i686 MinGW targets ?

2016-10-17 Thread Jeff Law

On 10/16/2016 08:58 PM, lhmouse wrote:

Hi there,

I come up with an idea about implementing stack unwinding for
the i686-w64-mingw32 target using native Windows Structured
Exception Handling (a.k.a SEH) for efficiency reasons.

Unlike DWARF and SEH for x64, SEH for x86 is stack-based
and works like the SJLJ exception model: The operating system
keeps a thread specific pointer to an SEH node on the stack
that must be installed/uninstalled during run time.

The SEH-head pointer is stored in `fs:[0]`.
Typecially, an SEH handler is installed like this, in Intel syntax:

# typedef EXCEPTION_DISPOSITION
#   filter_function(
# EXCEPTION_RECORD *record, void *establisher_frame,
# CONTEXT *machine_context, void *dispatcher_context)
#   __attribute__((__cdecl__));
# struct x86_seh_node_header {
#   struct x86_seh_node_header *next;
#   filter_function *filter;
#   char extra_data[];
# };

sub esp, 8  # struct x86_seh_node_header this_node;
mov ecx, dword ptr fs:[0]   #
mov dword ptr[esp], ecx # this_node.next = get_thread_seh_head();
mov dword ptr[esp + 4], offset my_seh_filter
# this_node.filter = &my_seh_filter
mov dword ptr fs:[0], esp   # set_thread_seh_head(&this_node);

Before the function exits and its frame is destroyed, the node
must be uninstalled like this:

mov ecx, dword ptr fs:[0]   #
mov dword ptr fs:[0], ecx   # set_thread_seh_head(this_node.next);

Since I am looking at the SJLJ exception model and it seems using
a slim, inlined version of `setjmp()` with `__builtin_longjmp()`
that only stores 3 or 4 pointers, extending that structure should be
a simple matter. The problem is that, installation and uninstallation
of SEH nodes require target-specific ASM code generation.

Is it possible to do in 'gcc/except.c' ?

I wouldn't do this solely in gcc/except.c.

I'd probably create a new exception handling model and conditionalize 
whatever code you need based on that.  Emission of code for that new 
exception model would likely require some amount of target specific code 
called via target hooks.


jeff


Re: Re: Make GCC emit ASM instructions in 'gcc/except.c' for i686 MinGW targets ?

2016-10-17 Thread lhmouse
> I'd probably create a new exception handling model and conditionalize 
> whatever code you need based on that. 

That would require copy-n-paste of tons of code...
All this remains contingent on Microsoft's generosity because
they don't provide APIs for SEH on x86, unlike on x64.
So I have to reuse stack unwinding code from SJLJ at the moment.

> Emission of code for that new 
> exception model would likely require some amount of target specific code 
> called via target hooks.

Hooks... Er, are you talking about those global pointer-to-functions?
There are a lot, indeed.

--   
Best regards,
lh_mouse
2016-10-17




Re: Explicitly list all tree codes in gcc/tree-streamer.c:record_common_node

2016-10-17 Thread Thomas Schwinge
Hi!

On Fri, 30 Sep 2016 09:47:56 +0200, Richard Biener  
wrote:
> On Thu, Sep 29, 2016 at 4:48 PM, Thomas Schwinge
>  wrote:
> > On Mon, 19 Sep 2016 13:25:01 +0200, Richard Biener 
> >  wrote:
> >> On Mon, Sep 19, 2016 at 1:19 PM, Thomas Schwinge
> >>  wrote:
> >> > On Mon, 19 Sep 2016 10:18:35 +0200, Richard Biener 
> >> >  wrote:
> >> >> On Fri, Sep 16, 2016 at 3:32 PM, Thomas Schwinge
> >> >>  wrote:
> >> >> > --- gcc/tree-streamer.c
> >> >> > +++ gcc/tree-streamer.c
> >> >> > @@ -278,9 +278,23 @@ record_common_node (struct streamer_tree_cache_d 
> >> >> > *cache, tree node)
> >> >> >streamer_tree_cache_append (cache, node, cache->nodes.length ());
> >> >> >
> >> >> >if (POINTER_TYPE_P (node)
> >> >> > -  || TREE_CODE (node) == COMPLEX_TYPE
> >> >> >|| TREE_CODE (node) == ARRAY_TYPE)
> >> >> >  record_common_node (cache, TREE_TYPE (node));
> >> >> > +  else if (TREE_CODE (node) == COMPLEX_TYPE)
> >> >> > [...]
> >> >> >else if (TREE_CODE (node) == RECORD_TYPE)
> >
> >> [looks to me we miss handling of vector type components alltogether,
> >> maybe there are no global vector type trees ...]
> >
> > Looks like it, yes.  Would a patch like the following be reasonable,
> > which explicitly lists/handles all expected tree codes, or is something
> > like that not feasible?  (That's a subset of tree codes I gathered by a
> > partial run of the GCC testsuite, and libgomp testsuite; not claiming
> > this is complete.)
> 
> I think it would be a nice thing to have indeed.
> 
> So -- I'm inclined to approve this patch ;)

After quite a bit of testing (contrib/config-list.mk, modified to run
-fself-test with -flto, to exercise the code I'm modifying), I have now
committed this to trunk in r241246:

commit 29cfc397b0ec2c953ff929d0ba57001c7018ec0c
Author: tschwinge 
Date:   Mon Oct 17 15:56:22 2016 +

Explicitly list all tree codes in gcc/tree-streamer.c:record_common_node

gcc/
* tree-streamer.c (record_common_node): Explicitly list expected
tree codes.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@241246 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog   |  5 +
 gcc/tree-streamer.c | 30 +-
 2 files changed, 30 insertions(+), 5 deletions(-)

diff --git gcc/ChangeLog gcc/ChangeLog
index 86df616..9acc738 100644
--- gcc/ChangeLog
+++ gcc/ChangeLog
@@ -1,3 +1,8 @@
+2016-10-17  Thomas Schwinge  
+
+   * tree-streamer.c (record_common_node): Explicitly list expected
+   tree codes.
+
 2016-10-17  Richard Biener  
 
PR tree-optimization/77988
diff --git gcc/tree-streamer.c gcc/tree-streamer.c
index 7ea7096..2139e96 100644
--- gcc/tree-streamer.c
+++ gcc/tree-streamer.c
@@ -277,12 +277,28 @@ record_common_node (struct streamer_tree_cache_d *cache, 
tree node)
  in the cache as hash value.  */
   streamer_tree_cache_append (cache, node, cache->nodes.length ());
 
-  if (POINTER_TYPE_P (node)
-  || TREE_CODE (node) == COMPLEX_TYPE
-  || TREE_CODE (node) == ARRAY_TYPE)
-record_common_node (cache, TREE_TYPE (node));
-  else if (TREE_CODE (node) == RECORD_TYPE)
+  switch (TREE_CODE (node))
 {
+case ERROR_MARK:
+case FIELD_DECL:
+case FIXED_POINT_TYPE:
+case IDENTIFIER_NODE:
+case INTEGER_CST:
+case INTEGER_TYPE:
+case POINTER_BOUNDS_TYPE:
+case REAL_TYPE:
+case TREE_LIST:
+case VOID_CST:
+case VOID_TYPE:
+  /* No recursive trees.  */
+  break;
+case ARRAY_TYPE:
+case COMPLEX_TYPE:
+case POINTER_TYPE:
+case REFERENCE_TYPE:
+  record_common_node (cache, TREE_TYPE (node));
+  break;
+case RECORD_TYPE:
   /* The FIELD_DECLs of structures should be shared, so that every
 COMPONENT_REF uses the same tree node when referencing a field.
 Pointer equality between FIELD_DECLs is used by the alias
@@ -291,6 +307,10 @@ record_common_node (struct streamer_tree_cache_d *cache, 
tree node)
 nonoverlapping_component_refs_of_decl_p).  */
   for (tree f = TYPE_FIELDS (node); f; f = TREE_CHAIN (f))
record_common_node (cache, f);
+  break;
+default:
+  /* Unexpected tree code.  */
+  gcc_unreachable ();
 }
 }
 


Grüße
 Thomas


Re: [PR lto/77458] Avoid ICE in offloading with differing _FloatN, _FloatNx types

2016-10-17 Thread Thomas Schwinge
Hi!

Ping.

On Thu, 29 Sep 2016 15:18:00 +0200, Thomas Schwinge  
wrote:
> On Mon, 19 Sep 2016 13:25:01 +0200, Richard Biener 
>  wrote:
> > On Mon, Sep 19, 2016 at 1:19 PM, Thomas Schwinge
> >  wrote:
> > > On Mon, 19 Sep 2016 10:18:35 +0200, Richard Biener 
> > >  wrote:
> > >> On Fri, Sep 16, 2016 at 3:32 PM, Thomas Schwinge
> > >>  wrote:
> > >> > --- gcc/tree-streamer.c
> > >> > +++ gcc/tree-streamer.c
> > >> > @@ -278,9 +278,23 @@ record_common_node (struct streamer_tree_cache_d 
> > >> > *cache, tree node)
> > >> >streamer_tree_cache_append (cache, node, cache->nodes.length ());
> > >> >
> > >> >if (POINTER_TYPE_P (node)
> > >> > -  || TREE_CODE (node) == COMPLEX_TYPE
> > >> >|| TREE_CODE (node) == ARRAY_TYPE)
> > >> >  record_common_node (cache, TREE_TYPE (node));
> > >> > +  else if (TREE_CODE (node) == COMPLEX_TYPE)
> > >> > +{
> > >> > +  /* Assert that complex types' component types have already been 
> > >> > handled
> > >> > +(and we thus don't need to recurse here).  See PR lto/77458.  
> > >> > */
> > >> > +[...]
> 
> > >> So I very much like to go forward with this kind of change as well
> 
> > > [patch]
> 
> > Ok with [changes]
> 
> Like this?  (I'll then continue to replicate this for other tree codes.)

commit 1c7dd6d92b7805cf12aaf822e509fa384cfcfbb0
Author: Thomas Schwinge 
Date:   Wed Sep 28 12:36:59 2016 +0200

[PR lto/77458] Avoid ICE in offloading with differing _FloatN, _FloatNx 
types

gcc/
PR lto/77458
* tree-core.h (enum tree_index): Put the complex types after their
component types.
* tree-streamer.c (verify_common_node_recorded): New function.
(preload_common_nodes) : Use it.
---
 gcc/tree-core.h | 31 +--
 gcc/tree-streamer.c | 32 +++-
 2 files changed, 48 insertions(+), 15 deletions(-)

diff --git gcc/tree-core.h gcc/tree-core.h
index 1bfe682..3e3f31e 100644
--- gcc/tree-core.h
+++ gcc/tree-core.h
@@ -556,20 +556,6 @@ enum tree_index {
   TI_BOOLEAN_FALSE,
   TI_BOOLEAN_TRUE,
 
-  TI_COMPLEX_INTEGER_TYPE,
-  TI_COMPLEX_FLOAT_TYPE,
-  TI_COMPLEX_DOUBLE_TYPE,
-  TI_COMPLEX_LONG_DOUBLE_TYPE,
-
-  TI_COMPLEX_FLOAT16_TYPE,
-  TI_COMPLEX_FLOATN_NX_TYPE_FIRST = TI_COMPLEX_FLOAT16_TYPE,
-  TI_COMPLEX_FLOAT32_TYPE,
-  TI_COMPLEX_FLOAT64_TYPE,
-  TI_COMPLEX_FLOAT128_TYPE,
-  TI_COMPLEX_FLOAT32X_TYPE,
-  TI_COMPLEX_FLOAT64X_TYPE,
-  TI_COMPLEX_FLOAT128X_TYPE,
-
   TI_FLOAT_TYPE,
   TI_DOUBLE_TYPE,
   TI_LONG_DOUBLE_TYPE,
@@ -599,6 +585,23 @@ enum tree_index {
 - TI_FLOATN_NX_TYPE_FIRST  \
 + 1)
 
+  /* Put the complex types after their component types, so that in (sequential)
+ tree streaming we can assert that their component types have already been
+ handled (see tree-streamer.c:record_common_node).  */
+  TI_COMPLEX_INTEGER_TYPE,
+  TI_COMPLEX_FLOAT_TYPE,
+  TI_COMPLEX_DOUBLE_TYPE,
+  TI_COMPLEX_LONG_DOUBLE_TYPE,
+
+  TI_COMPLEX_FLOAT16_TYPE,
+  TI_COMPLEX_FLOATN_NX_TYPE_FIRST = TI_COMPLEX_FLOAT16_TYPE,
+  TI_COMPLEX_FLOAT32_TYPE,
+  TI_COMPLEX_FLOAT64_TYPE,
+  TI_COMPLEX_FLOAT128_TYPE,
+  TI_COMPLEX_FLOAT32X_TYPE,
+  TI_COMPLEX_FLOAT64X_TYPE,
+  TI_COMPLEX_FLOAT128X_TYPE,
+
   TI_FLOAT_PTR_TYPE,
   TI_DOUBLE_PTR_TYPE,
   TI_LONG_DOUBLE_PTR_TYPE,
diff --git gcc/tree-streamer.c gcc/tree-streamer.c
index 2139e96..60118dc 100644
--- gcc/tree-streamer.c
+++ gcc/tree-streamer.c
@@ -248,6 +248,32 @@ streamer_tree_cache_lookup (struct streamer_tree_cache_d 
*cache, tree t,
 }
 
 
+/* Verify that NODE is in CACHE.  */
+
+static void
+verify_common_node_recorded (struct streamer_tree_cache_d *cache, tree node)
+{
+  /* Restrict this to flag_checking only because in general violating it is
+ harmless plus we never know what happens on all targets/frontend/flag(!)
+ combinations.  */
+  if (!flag_checking)
+return;
+
+  bool found = false;
+  if (cache->node_map)
+gcc_assert (streamer_tree_cache_lookup (cache, node, NULL));
+  else
+{
+  gcc_assert (cache->nodes.exists ());
+  /* Linear search...  */
+  for (unsigned i = 0; !found && i < cache->nodes.length (); ++i)
+   if (cache->nodes[i] == node)
+ found = true;
+  gcc_assert (found);
+}
+}
+
+
 /* Record NODE in CACHE.  */
 
 static void
@@ -293,11 +319,15 @@ record_common_node (struct streamer_tree_cache_d *cache, 
tree node)
   /* No recursive trees.  */
   break;
 case ARRAY_TYPE:
-case COMPLEX_TYPE:
 case POINTER_TYPE:
 case REFERENCE_TYPE:
   record_common_node (cache, TREE_TYPE (node));
   break;
+case COMPLEX_TYPE:
+  /* Verify that a complex type's component type (node_type) has been
+handled already (and we thus don't need to recurse here).  */
+  verify_common_node_recorded (cache, TREE_TYPE (node));
+  break;
 case RECORD_TYPE:
   /* The FIELD_DECLs of structures should be shared, so that eve

Re: [RFC] Reliable compiler specification setting (at least include/lib dirs) through the process environment

2016-10-17 Thread Joseph Myers
On Sun, 16 Oct 2016, Shea Levy wrote:

> options) and clearly have the semantics we want. Ideally we would be
> able to specify something on the level of abstraction of "this directory
> should be treated like you would normally treat /usr" and get
> e.g. /include, /lib, frameworks on OS X, etc. handled properly.

What that suggests to me is options for having multiple sysroots (which 
are treated like / not like /usr, but that's the existing concept for a 
directory containing both header and library subdirectories, and you could 
combine this with a Hurd-style configuration of the expected sysroot 
subdirectories, i.e. no /usr inside the sysroot).  That would however be 
rather complex; both GCC and ld presume there is a single global sysroot 
(modulo SYSROOT_SUFFIX_SPEC / SYSROOT_HEADERS_SUFFIX_SPEC that append to 
it), as do the interfaces for other specs that pass down sysroot 
information to cc1 etc. - and ld then interprets absolute paths in linker 
scripts such as libc.so in a sysroot relative to that sysroot (so would 
need to track which sysroot a particular linker script was found in to 
know how to interpret absolute paths in it), and options such as -I=dir 
for a sysroot-relative include don't have a clear meaning with multiple 
sysroots.

I'm wary of adding environment variables as they tend to make debugging 
hard when the same compiler behaves differently for different people for 
no obvious reason.

You should not need to exclude linker options (as opposed to linker input 
files) from the command line when not linking, or compiler options when 
linking.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Who played with the GCC Bugzilla git repo?

2016-10-17 Thread Frank Ch. Eigler
Frédéric Buclin  writes:

> Someone played with the GCC Bugzilla git repo last week with no real reason:
> Author: root 
> Date:   Fri Oct 7 15:28:43 2016 +
> snap-data
> [...]

That was little old me, with the reason being to conserve local changes
with version control.

> Looks like the goal was to drop all CSS and JS files in data/assets/.

No, I believe there was some other sourceware-oriented customization in
there, but I forget the details.

> [...]  Moreover, this means that the GCC Bugzilla git repo is no
> longer in sync with the upstream Bugzilla git repo, because the one
> who played with git also committed my local changes (I didn't do it
> for a reason). I can no longer view my local changes, nor can I easily
> sync both repos with a fast-forward merge (I think). [...]

That's just a matter of tracking upstream bugzilla on one branch, and
the sourceware installation of bugzilla on another branch, and merging
from the former into the latter periodically.  I renamed "5.0" to
"5.0-sourceware", and recreated the "5.0" branch to assist this.

- FChE


[SOLVED-ish] Question about sibling call epilogues & registers

2016-10-17 Thread Daniel Santos
So the core problem was my "restore multiple" insn contained a CALL insn 
and was a call_insn. The symbol it called is in the static section of 
libgcc. However, during peephole2 pass, get_call_reg_set_usage in 
final.c didn't find a function declaration attached to the symbol and so 
defaulted to say that it depends upon and clobbers most registers. (the 
target's default set)


My solution was to change the pattern so that I do not generate a CALL 
and emit the parallel using emit_insn instead of emit_call_insn. In this 
way, I still declare everything that calling stub ("__msabi_restore_15" 
in the below case) does and it doesn't presume that I depend upon or 
clobber any other regs, so that the sibling call emitted after this 
still works. Since I'm using RSI for the base address, I'm only changing 
registers that have to be saved anyway, so the sibcall should never need 
to use one of these registers anyway (I hope :)



(insn/f 148 147 149 11 (parallel [
(use (symbol_ref:DI ("__msabi_restore_15")))
(clobber (reg:CC 17 flags))
(set (reg:V4SF 52 xmm15)
(mem/c:V4SF (plus:DI (reg:DI 4 si)
(const_int -88 [0xffa8])) [0 S16 A8]))
(set (reg:V4SF 51 xmm14)
(mem/c:V4SF (plus:DI (reg:DI 4 si)
(const_int -72 [0xffb8])) [0 S16 A8]))
(set (reg:V4SF 50 xmm13)
(mem/c:V4SF (plus:DI (reg:DI 4 si)
(const_int -56 [0xffc8])) [0 S16 A8]))
(set (reg:V4SF 49 xmm12)
(mem/c:V4SF (plus:DI (reg:DI 4 si)
(const_int -40 [0xffd8])) [0 S16 A8]))
(set (reg:V4SF 48 xmm11)
(mem/c:V4SF (plus:DI (reg:DI 4 si)
(const_int -24 [0xffe8])) [0 S16 A8]))
(set (reg:V4SF 47 xmm10)
(mem/c:V4SF (plus:DI (reg:DI 4 si)
(const_int -8 [0xfff8])) [0  S16 A8]))
(set (reg:V4SF 46 xmm9)
(mem/c:V4SF (plus:DI (reg:DI 4 si)
(const_int 8 [0x8])) [0  S16 A8]))
(set (reg:V4SF 45 xmm8)
(mem/c:V4SF (plus:DI (reg:DI 4 si)
(const_int 24 [0x18])) [0  S16 A8]))
(set (reg:V4SF 28 xmm7)
(mem/c:V4SF (plus:DI (reg:DI 4 si)
(const_int 40 [0x28])) [0  S16 A8]))
(set (reg:V4SF 27 xmm6)
(mem/c:V4SF (plus:DI (reg:DI 4 si)
(const_int 56 [0x38])) [0  S16 A8]))
(set (reg:DI 5 di)
(mem/c:DI (plus:DI (reg:DI 4 si)
(const_int 72 [0x48])) [0  S8 A8]))
(set (reg:DI 3 bx)
(mem/c:DI (plus:DI (reg:DI 4 si)
(const_int 80 [0x50])) [0  S8 A8]))
(set (reg:DI 6 bp)
(mem/c:DI (plus:DI (reg:DI 4 si)
(const_int 88 [0x58])) [0  S8 A8]))
(set (reg:DI 41 r12)
(mem/c:DI (plus:DI (reg:DI 4 si)
(const_int 96 [0x60])) [0  S8 A8]))
(set (reg:DI 4 si)
(mem/c:DI (plus:DI (reg:DI 4 si)
(const_int 64 [0x40])) [0  S8 A8]))
]) 
/home/daniel/proj/emu/wine/github/dlls/winex11.drv/window.c:1655 -1

 (expr_list:REG_UNUSED (reg:CC 17 flags)
(expr_list:REG_CFA_RESTORE (reg:DI 4 si)
(expr_list:REG_CFA_RESTORE (reg:DI 41 r12)
(expr_list:REG_CFA_RESTORE (reg:DI 6 bp)
(expr_list:REG_CFA_RESTORE (reg:DI 3 bx)
(expr_list:REG_CFA_RESTORE (reg:DI 5 di)
(expr_list:REG_CFA_RESTORE (reg:V4SF 27 xmm6)
(expr_list:REG_CFA_RESTORE (reg:V4SF 28 
xmm7)
(expr_list:REG_CFA_RESTORE 
(reg:V4SF 45 xmm8)
(expr_list:REG_CFA_RESTORE 
(reg:V4SF 46 xmm9)

(nil