Re: [RFC, LRA] Repeated looping over subreg reloads.

2013-12-05 Thread Tejas Belagod

Vladimir Makarov wrote:

On 12/4/2013, 6:15 AM, Tejas Belagod wrote:

Hi,

I'm trying to relax CANNOT_CHANGE_MODE_CLASS for aarch64 to allow all
mode changes on FP_REGS as aarch64 does not have register-packing, but
I'm running into an LRA ICE. A test case generates an RTL subreg of the
following form

(set (reg:DF 97) (subreg:DF (reg:V2DF 95) 8))

LRA has to reload the subreg because the subreg is not representable as
a full register. When LRA reloads this in
lra-constraints.c:simplyfy_operand_subreg (), it seems to reload
SUBREG_REG() and leave the byte offset alone.

i.e.

  (set (reg:V2DF 100) (reg:V2DF 95))
  (set (reg:DF 97) (subreg:DF (reg:V2DF 100) 8))

The code in lra-constraints.c is this conditional:

   /* Force a reload of the SUBREG_REG if this is a constant or PLUS or
  if there may be a problem accessing OPERAND in the outer
  mode.  */
   if ((REG_P (reg)
   
   insert_move_for_subreg (insert_before ? &before : NULL,
   insert_after ? &after : NULL,
   reg, new_reg);
 }
   

What happens subsequently is that LRA keeps looping over this RTL and
keeps reloading the SUBREG_REG() till the limit of constraint passes is
reached.

  (set (reg:V2DF 100) (reg:V2DF 95))
  (set (reg:DF 97) (subreg:DF (reg:V2DF 100) 8))

I can't see any place where this subreg is resolved (eg. into equiv
memref) before the next iteration comes around for reloading the inputs
and outputs of curr_insn. Or am I missing something some part of code
that tries reloading the subreg with different alternatives or reg classes?



I guess this behaviour is wrong.  We could spill the V2DF pseudo or put 
it into another class reg. But it is not implemented.  This code is 
actually a modified version of reload pass one.  We could implement 
alternative strategies and a check for potential loop (such code exists 
in process_alt_operands).


Could you send me the macro change and the test.  I'll look at it and 
figure out what can we do.


Hi,

Thanks for looking at this.

The macro change is in this patch 
http://gcc.gnu.org/ml/gcc-patches/2013-11/msg03638.html. The test is 
gcc.c-torture/compile/simd-3.c and when compiled with -O1 for aarch64, ICEs:


gcc/testsuite/gcc.c-torture/compile/simd-3.c:22:1: internal compiler error: 
Maximum number of LRA constraint passes is achieved (30)


Also, I'm curious to know - is it possible to vec_extract for vector mode 
subregs and zero/sign extract for scalars and spilling be the last resort if 
either of these are not possible? As you say, non-zero SUBREG_BYTE offset could 
also be resolved using a different regclass where the sub-mode could just be a 
full-register.


Thanks,
Tejas.



Remove spam in GCC mailing list

2013-12-05 Thread Tae Wong
Here's some spam posts in mailing lists:

http://gcc.gnu.org/ml/gcc-bugs/2013-07/msg01127.html
http://gcc.gnu.org/ml/gcc/2013-04/msg00190.html
http://gcc.gnu.org/ml/gcc/2013-04/msg00276.html
http://gcc.gnu.org/ml/gcc/2013-04/msg00143.html

The mailing list administrators needs to clean up spam.


Re: Dependency confusion in sched-deps

2013-12-05 Thread Michael Matz
Hi,

On Thu, 5 Dec 2013, Maxim Kuvyrkov wrote:

> Output dependency is the right type (write after write).  Anti 
> dependency is write after read, and true dependency is read after write.
> 
> Dependency type plays a role for estimating costs and latencies between 
> instructions (which affects performance), but using wrong or imprecise 
> dependency type does not affect correctness.

In the context of GCC and the middle ends memory model this statement is 
not correct.  For some dependency types we're using type based aliasing to 
disambiguate, i.e ignore that dependency, which for others we don't.  In 
particular a read-after-write memory-access dependency can be ignored if 
type info says they can't alias (because a program where both _would_ 
access the same memory would be invalid according to our mem model), but 
for write-after-read or write-after-write we cannot do that disambiguation 
(because the last write overrides the dynamic type of the memory cell even 
if it was incompatible with the one before).


Ciao,
Michael.


Libbacktrace backtrace_vector_finish

2013-12-05 Thread Jakub Jelinek
Hi!

I'm trying to understand how the backtrace_vector_* APIs are meant to work
and used, but at least for alloc.c don't see how it can work properly:

Both backtrace_vector_grow and backtrace_vector_release use
  base = realloc (vec->base, alc);
or
  vec->base = realloc (vec->base, vec->size);
(note, in the latter case it is even a memory leak if realloc fails),
but that assumes that that vec->base has been returned by malloc/realloc
etc.  But,
void 
backtrace_vector_finish (struct backtrace_state *state ATTRIBUTE_UNUSED,
 struct backtrace_vector *vec)
{
  vec->base = (char *) vec->base + vec->size;
  vec->size = 0;
}
will change vec->base so that it no longer is an address returned by
malloc/realloc, so next time you call backtrace_vector_grow, if it will
actually need to reallocate anything, it will crash in realloc or silently
misbehave.  If this works properly in mmap.c implementation, perhaps
backtrace_vector_finish in alloc.c should just backtrace_vector_release
and memset (*vec, 0, sizeof (*vec)); ?

Jakub


Re: C++ std headers and malloc, realloc poisoning

2013-12-05 Thread Jason Merrill

On 12/04/2013 04:03 PM, Jakub Jelinek wrote:

I think the most important reason is that we want to handle out of mem
cases consistently, so instead of malloc etc. we want users to use xmalloc
etc. that guarantee non-NULL returned value, or fatal error and never
returning.  For operator new that is solvable through std::set_new_handler
I guess, but for malloc we really don't want people to deal with checking
NULL return values from those everywhere.


A simple workaround would be to disable poisoning of malloc/realloc on 
OS X (or when the build machine uses libc++, if that's easy to detect).


Jason



Re: C++ std headers and malloc, realloc poisoning

2013-12-05 Thread Oleg Endo
On Thu, 2013-12-05 at 10:45 -0500, Jason Merrill wrote:
> On 12/04/2013 04:03 PM, Jakub Jelinek wrote:
> > I think the most important reason is that we want to handle out of mem
> > cases consistently, so instead of malloc etc. we want users to use xmalloc
> > etc. that guarantee non-NULL returned value, or fatal error and never
> > returning.  For operator new that is solvable through std::set_new_handler
> > I guess, but for malloc we really don't want people to deal with checking
> > NULL return values from those everywhere.
> 
> A simple workaround would be to disable poisoning of malloc/realloc on 
> OS X (or when the build machine uses libc++, if that's easy to detect).

Whether libc++ uses malloc/realloc/free in some implementation in a
header file or not is an implementation detail.  It could use it today
and stop doing so tomorrow ;)
Maybe a configure option to disable the poisoning would be better in
this case?

Cheers,
Oleg



Re: Truncate optimisation question

2013-12-05 Thread Eric Botcazou
> The comment says that we're trying to match:
> 
> 1. (set (reg:SI) (zero_extend:SI (plus:QI (mem:QI) (const_int
> 2. (set (reg:QI) (plus:QI (mem:QI) (const_int)))
> 3. (set (reg:QI) (plus:QI (subreg:QI) (const_int)))
> 4. (set (reg:CC) (compare:CC (subreg:QI) (const_int)))
> 5. (set (reg:CC) (compare:CC (plus:QI (mem:QI) (const_int
> 6. (set (reg:SI) (leu:SI (subreg:QI) (const_int)))
> 7. (set (reg:SI) (leu:SI (subreg:QI) (const_int)))
> 8. (set (reg:SI) (leu:SI (plus:QI ...)))
> 
> And I think that's what we should be matching in cases where the
> extension isn't redundant, even on RISC targets.

Which one(s) exactly?  Most of the RISC targets we have are parameterized 
(WORD_REGISTER_OPERATIONS, PROMOTE_MODE, etc) to avoid operations in modes 
smaller than the word mode.

> The problem here isn't really about which mode is on the plus,
> but whether we recognise that the extension instruction is redundant.
> I.e. we start with:
> 
> (insn 9 8 10 2 (set (reg:SI 120)
> (plus:SI (subreg:SI (reg:QI 118) 0)
> (const_int -48 [0xffd0]))) test.c:6 -1
>  (nil))
> (insn 10 9 11 2 (set (reg:SI 121)
> (and:SI (reg:SI 120)
> (const_int 255 [0xff]))) test.c:6 -1
>  (nil))
> (insn 11 10 12 2 (set (reg:CC 100 cc)
> (compare:CC (reg:SI 121)
> (const_int 9 [0x9]))) test.c:6 -1
>  (nil))
> 
> and what we want combine to do is to recognise that insn 10 is redundant
> and reduce the sequence to:
> 
> (insn 9 8 10 2 (set (reg:SI 120)
> (plus:SI (subreg:SI (reg:QI 118) 0)
> (const_int -48 [0xffd0]))) test.c:6 -1
>  (nil))
> (insn 11 10 12 2 (set (reg:CC 100 cc)
> (compare:CC (reg:SI 120)
> (const_int 9 [0x9]))) test.c:6 -1
>  (nil))
> 
> But insn 11 is redundant on all targets, not just RISC ones.
> It isn't about whether the target has a QImode addition or not.

That's theoritical though since, on x86 for example, the redundant instruction 
isn't even generated because of the QImode addition...

> Well, I think making the simplify-rtx code conditional on the target
> would be the wrong way to go.  If we really can't live with it being
> unconditional then I think we should revert it.  But like I say I think
> it would be better to make combine recognise the redundancy even with
> the new form.  (Or as I say, longer term, not to rely on combine to
> eliminate redundant extensions.)  But I don't have time to do that myself...

It helps x86 so we won't revert it.  My fear is that we'll need to add code in 
other places to RISCify back the result of this "simplification".

-- 
Eric Botcazou


Re: C++ std headers and malloc, realloc poisoning

2013-12-05 Thread Jason Merrill

On 12/05/2013 10:59 AM, Oleg Endo wrote:

On Thu, 2013-12-05 at 10:45 -0500, Jason Merrill wrote:

A simple workaround would be to disable poisoning of malloc/realloc on
OS X (or when the build machine uses libc++, if that's easy to detect).


Whether libc++ uses malloc/realloc/free in some implementation in a
header file or not is an implementation detail.  It could use it today
and stop doing so tomorrow ;)


Yep, which is why I described my suggestion as a workaround.  :)

But having the poisoning disabled when building with clang doesn't seem 
like a significant problem even if it becomes unnecessary, since any 
misuse will still show up when building stage 2 and on other platforms.



Maybe a configure option to disable the poisoning would be better in
this case?


That seems unlikely to help users.

Jason



Re: C++ std headers and malloc, realloc poisoning

2013-12-05 Thread Jakub Jelinek
On Thu, Dec 05, 2013 at 12:05:23PM -0500, Jason Merrill wrote:
> On 12/05/2013 10:59 AM, Oleg Endo wrote:
> >On Thu, 2013-12-05 at 10:45 -0500, Jason Merrill wrote:
> >>A simple workaround would be to disable poisoning of malloc/realloc on
> >>OS X (or when the build machine uses libc++, if that's easy to detect).
> >
> >Whether libc++ uses malloc/realloc/free in some implementation in a
> >header file or not is an implementation detail.  It could use it today
> >and stop doing so tomorrow ;)
> 
> Yep, which is why I described my suggestion as a workaround.  :)
> 
> But having the poisoning disabled when building with clang doesn't
> seem like a significant problem even if it becomes unnecessary,
> since any misuse will still show up when building stage 2 and on
> other platforms.

Guess the problem is that clang pretends to be (old) version of GCC.
Otherwise all the poisioning, which is guarded by:
#if (GCC_VERSION >= 3000)
wouldn't be applied.  So perhaps we want a hack there && !defined __clang__
or similar.

Jakub


Re: C++ std headers and malloc, realloc poisoning

2013-12-05 Thread Oleg Endo
On Thu, 2013-12-05 at 18:11 +0100, Jakub Jelinek wrote:
> On Thu, Dec 05, 2013 at 12:05:23PM -0500, Jason Merrill wrote:
> > On 12/05/2013 10:59 AM, Oleg Endo wrote:
> > >On Thu, 2013-12-05 at 10:45 -0500, Jason Merrill wrote:
> > >>A simple workaround would be to disable poisoning of malloc/realloc on
> > >>OS X (or when the build machine uses libc++, if that's easy to detect).
> > >
> > >Whether libc++ uses malloc/realloc/free in some implementation in a
> > >header file or not is an implementation detail.  It could use it today
> > >and stop doing so tomorrow ;)
> > 
> > Yep, which is why I described my suggestion as a workaround.  :)
> > 
> > But having the poisoning disabled when building with clang doesn't
> > seem like a significant problem even if it becomes unnecessary,
> > since any misuse will still show up when building stage 2 and on
> > other platforms.
> 
> Guess the problem is that clang pretends to be (old) version of GCC.
> Otherwise all the poisioning, which is guarded by:
> #if (GCC_VERSION >= 3000)
> wouldn't be applied.  So perhaps we want a hack there && !defined __clang__
> or similar.

The problem is not clang but the exposed internals of libc++ (at least
the version Apple currently ships).  The problem would be the same if
GCC was used as the compiler but with libc++ instead of libstdc++ (it
seems some people have been trying to do that, see
http://lists.cs.uiuc.edu/pipermail/cfe-dev/2010-August/010149.html)

BTW, the #include  in sh.c also triggered the
"do_not_use_isalpha_with_safe_ctype" stuff in include/safe-ctype.h,
which is a similar problem (isalpha being used in some implementation in
libc++).

Cheers,
Oleg



Re: Truncate optimisation question

2013-12-05 Thread Richard Sandiford
Eric Botcazou  writes:
>> The comment says that we're trying to match:
>> 
>> 1. (set (reg:SI) (zero_extend:SI (plus:QI (mem:QI) (const_int
>> 2. (set (reg:QI) (plus:QI (mem:QI) (const_int)))
>> 3. (set (reg:QI) (plus:QI (subreg:QI) (const_int)))
>> 4. (set (reg:CC) (compare:CC (subreg:QI) (const_int)))
>> 5. (set (reg:CC) (compare:CC (plus:QI (mem:QI) (const_int
>> 6. (set (reg:SI) (leu:SI (subreg:QI) (const_int)))
>> 7. (set (reg:SI) (leu:SI (subreg:QI) (const_int)))
>> 8. (set (reg:SI) (leu:SI (plus:QI ...)))
>> 
>> And I think that's what we should be matching in cases where the
>> extension isn't redundant, even on RISC targets.
>
> Which one(s) exactly?  Most of the RISC targets we have are parameterized 
> (WORD_REGISTER_OPERATIONS, PROMOTE_MODE, etc) to avoid operations in modes 
> smaller than the word mode.

The first one, sorry.

>> The problem here isn't really about which mode is on the plus,
>> but whether we recognise that the extension instruction is redundant.
>> I.e. we start with:
>> 
>> (insn 9 8 10 2 (set (reg:SI 120)
>> (plus:SI (subreg:SI (reg:QI 118) 0)
>> (const_int -48 [0xffd0]))) test.c:6 -1
>>  (nil))
>> (insn 10 9 11 2 (set (reg:SI 121)
>> (and:SI (reg:SI 120)
>> (const_int 255 [0xff]))) test.c:6 -1
>>  (nil))
>> (insn 11 10 12 2 (set (reg:CC 100 cc)
>> (compare:CC (reg:SI 121)
>> (const_int 9 [0x9]))) test.c:6 -1
>>  (nil))
>> 
>> and what we want combine to do is to recognise that insn 10 is redundant
>> and reduce the sequence to:
>> 
>> (insn 9 8 10 2 (set (reg:SI 120)
>> (plus:SI (subreg:SI (reg:QI 118) 0)
>> (const_int -48 [0xffd0]))) test.c:6 -1
>>  (nil))
>> (insn 11 10 12 2 (set (reg:CC 100 cc)
>> (compare:CC (reg:SI 120)
>> (const_int 9 [0x9]))) test.c:6 -1
>>  (nil))
>> 
>> But insn 11 is redundant on all targets, not just RISC ones.
>> It isn't about whether the target has a QImode addition or not.
>
> That's theoritical though since, on x86 for example, the redundant 
> instruction 
> isn't even generated because of the QImode addition...

Not for this testcase, sure, but we use an SImode addition and keep the
equivalent redundant extension until combine for:

int foo (unsigned char *x)
{
  return (((unsigned int) *x - 48) & 0xff) < 10;
}

immediately before combine:
(insn 7 6 8 2 (parallel [
(set (reg:SI 93 [ D.1753 ])
(plus:SI (reg:SI 92 [ D.1753 ])
(const_int -48 [0xffd0])))
(clobber (reg:CC 17 flags))
]) /tmp/foo.c:3 261 {*addsi_1}
 (expr_list:REG_DEAD (reg:SI 92 [ D.1753 ])
(expr_list:REG_UNUSED (reg:CC 17 flags)
(nil
(insn 8 7 9 2 (set (reg:SI 94 [ D.1753 ])
(zero_extend:SI (subreg:QI (reg:SI 93 [ D.1753 ]) 0))) /tmp/foo.c:3 133 
{*zero_extendqisi2}
 (expr_list:REG_DEAD (reg:SI 93 [ D.1753 ])
(nil)))
(insn 9 8 10 2 (set (reg:CC 17 flags)
(compare:CC (reg:SI 94 [ D.1753 ])
(const_int 9 [0x9]))) /tmp/foo.c:3 7 {*cmpsi_1}
 (expr_list:REG_DEAD (reg:SI 94 [ D.1753 ])
(nil)))

What saves us isn't QImode addition but QImode comparison:

combine:
(insn 7 6 8 2 (parallel [
(set (reg:SI 93 [ D.1753 ])
(plus:SI (reg:SI 92 [ D.1753 ])
(const_int -48 [0xffd0])))
(clobber (reg:CC 17 flags))
]) /tmp/foo.c:3 261 {*addsi_1}
 (expr_list:REG_DEAD (reg:SI 92 [ D.1753 ])
(expr_list:REG_UNUSED (reg:CC 17 flags)
(nil
(note 8 7 9 2 NOTE_INSN_DELETED)
(insn 9 8 10 2 (set (reg:CC 17 flags)
(compare:CC (subreg:QI (reg:SI 93 [ D.1753 ]) 0)
(const_int 9 [0x9]))) /tmp/foo.c:3 5 {*cmpqi_1}
 (expr_list:REG_DEAD (reg:SI 93 [ D.1753 ])
(nil)))

movzbl  (%rdi), %eax
subl$48, %eax
cmpb$9, %al
setbe   %al
movzbl  %al, %eax
ret

(The patch didn't affect things here.)

FWIW, change the testcase to:

int foo (unsigned char *x)
{
  return (((unsigned int) *x - 48) & 0x1ff) < 10;
}

and we keep the redundant AND, again regardless of whether the patch is
applied.

>> Well, I think making the simplify-rtx code conditional on the target
>> would be the wrong way to go.  If we really can't live with it being
>> unconditional then I think we should revert it.  But like I say I think
>> it would be better to make combine recognise the redundancy even with
>> the new form.  (Or as I say, longer term, not to rely on combine to
>> eliminate redundant extensions.)  But I don't have time to do that myself...
>
> It helps x86 so we won't revert it.  My fear is that we'll need to add
> code in other places to RISCify back the result of this
> "simplification".

But that's the problem with trying to do the optimisation in this way.
We first simplify a truncation of an SImode addition 

Re: Libbacktrace backtrace_vector_finish

2013-12-05 Thread Ian Lance Taylor
On Thu, Dec 5, 2013 at 7:32 AM, Jakub Jelinek  wrote:
>
> I'm trying to understand how the backtrace_vector_* APIs are meant to work
> and used, but at least for alloc.c don't see how it can work properly:
>
> Both backtrace_vector_grow and backtrace_vector_release use
>   base = realloc (vec->base, alc);
> or
>   vec->base = realloc (vec->base, vec->size);
> (note, in the latter case it is even a memory leak if realloc fails),
> but that assumes that that vec->base has been returned by malloc/realloc
> etc.  But,
> void
> backtrace_vector_finish (struct backtrace_state *state ATTRIBUTE_UNUSED,
>  struct backtrace_vector *vec)
> {
>   vec->base = (char *) vec->base + vec->size;
>   vec->size = 0;
> }
> will change vec->base so that it no longer is an address returned by
> malloc/realloc, so next time you call backtrace_vector_grow, if it will
> actually need to reallocate anything, it will crash in realloc or silently
> misbehave.  If this works properly in mmap.c implementation, perhaps
> backtrace_vector_finish in alloc.c should just backtrace_vector_release
> and memset (*vec, 0, sizeof (*vec)); ?

You're quite right.  That was dumb.  Thanks for noticing.  Fixed with
this patch.  Committed to mainline and 4.8 branch.

Ian

2013-12-05  Ian Lance Taylor  

* alloc.c (backtrace_vector_finish): Add error_callback and data
parameters.  Call backtrace_vector_release.  Return address base.
* mmap.c (backtrace_vector_finish): Add error_callback and data
parameters.  Return address base.
* dwarf.c (read_function_info): Get new address base from
backtrace_vector_finish.
* internal.h (backtrace_vector_finish): Update declaration.
Index: dwarf.c
===
--- dwarf.c	(revision 205711)
+++ dwarf.c	(working copy)
@@ -2535,19 +2535,23 @@ read_function_info (struct backtrace_sta
   if (pfvec->count == 0)
 return;
 
-  addrs = (struct function_addrs *) pfvec->vec.base;
   addrs_count = pfvec->count;
 
   if (fvec == NULL)
 {
   if (!backtrace_vector_release (state, &lvec.vec, error_callback, data))
 	return;
+  addrs = (struct function_addrs *) pfvec->vec.base;
 }
   else
 {
   /* Finish this list of addresses, but leave the remaining space in
 	 the vector available for the next function unit.  */
-  backtrace_vector_finish (state, &fvec->vec);
+  addrs = ((struct function_addrs *)
+	   backtrace_vector_finish (state, &fvec->vec,
+	error_callback, data));
+  if (addrs == NULL)
+	return;
   fvec->count = 0;
 }
 
Index: internal.h
===
--- internal.h	(revision 205711)
+++ internal.h	(working copy)
@@ -233,13 +233,17 @@ extern void *backtrace_vector_grow (stru
 struct backtrace_vector *vec);
 
 /* Finish the current allocation on VEC.  Prepare to start a new
-   allocation.  The finished allocation will never be freed.  */
+   allocation.  The finished allocation will never be freed.  Returns
+   a pointer to the base of the finished entries, or NULL on
+   failure.  */
 
-extern void backtrace_vector_finish (struct backtrace_state *state,
- struct backtrace_vector *vec);
+extern void* backtrace_vector_finish (struct backtrace_state *state,
+  struct backtrace_vector *vec,
+  backtrace_error_callback error_callback,
+  void *data);
 
-/* Release any extra space allocated for VEC.  Returns 1 on success, 0
-   on failure.  */
+/* Release any extra space allocated for VEC.  This may change
+   VEC->base.  Returns 1 on success, 0 on failure.  */
 
 extern int backtrace_vector_release (struct backtrace_state *state,
  struct backtrace_vector *vec,
Index: mmap.c
===
--- mmap.c	(revision 205711)
+++ mmap.c	(working copy)
@@ -230,12 +230,19 @@ backtrace_vector_grow (struct backtrace_
 
 /* Finish the current allocation on VEC.  */
 
-void
-backtrace_vector_finish (struct backtrace_state *state ATTRIBUTE_UNUSED,
-			 struct backtrace_vector *vec)
+void *
+backtrace_vector_finish (
+  struct backtrace_state *state ATTRIBUTE_UNUSED,
+  struct backtrace_vector *vec,
+  backtrace_error_callback error_callback ATTRIBUTE_UNUSED,
+  void *data ATTRIBUTE_UNUSED)
 {
+  void *ret;
+
+  ret = vec->base;
   vec->base = (char *) vec->base + vec->size;
   vec->size = 0;
+  return ret;
 }
 
 /* Release any extra space allocated for VEC.  */
Index: alloc.c
===
--- alloc.c	(revision 205711)
+++ alloc.c	(working copy)
@@ -113,12 +113,24 @@ backtrace_vector_grow (struct backtrace_
 
 /* Finish the current allocation on VEC.  */
 
-void
-backtrace_vector_finish (struct backtrace_state *state ATTRIBUTE_UNUSED,
-			 struct backtrace_vector *vec)
+void *
+backtrace_vector_finish (struct backtrace_state *state,
+			 struct ba

Re: Dependency confusion in sched-deps

2013-12-05 Thread shmeel gutl

On 05-Dec-13 02:39 AM, Maxim Kuvyrkov wrote:

Dependency type plays a role for estimating costs and latencies between 
instructions (which affects performance), but using wrong or imprecise 
dependency type does not affect correctness.
On multi-issue architectures it does make a difference. Anti dependence 
permits the two instructions to be issued during the same cycle whereas 
true dependency and output dependency would forbid this.


Or am I misinterpreting your comment?



gcc-4.8-20131205 is now available

2013-12-05 Thread gccadmin
Snapshot gcc-4.8-20131205 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.8-20131205/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.8 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_8-branch 
revision 205719

You'll find:

 gcc-4.8-20131205.tar.bz2 Complete GCC

  MD5=c5f3079d76068b3d2a89356c278ef4cd
  SHA1=b5e77ad4395561ac3f13f3a635ed5d704bab3786

Diffs from 4.8-20131128 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.8
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Re: Dependency confusion in sched-deps

2013-12-05 Thread Maxim Kuvyrkov
On 6/12/2013, at 4:25 am, Michael Matz  wrote:

> Hi,
> 
> On Thu, 5 Dec 2013, Maxim Kuvyrkov wrote:
> 
>> Output dependency is the right type (write after write).  Anti 
>> dependency is write after read, and true dependency is read after write.
>> 
>> Dependency type plays a role for estimating costs and latencies between 
>> instructions (which affects performance), but using wrong or imprecise 
>> dependency type does not affect correctness.
> 
> In the context of GCC and the middle ends memory model this statement is 
> not correct.  For some dependency types we're using type based aliasing to 
> disambiguate, i.e ignore that dependency, which for others we don't.  In 
> particular a read-after-write memory-access dependency can be ignored if 
> type info says they can't alias (because a program where both _would_ 
> access the same memory would be invalid according to our mem model), but 
> for write-after-read or write-after-write we cannot do that disambiguation 
> (because the last write overrides the dynamic type of the memory cell even 
> if it was incompatible with the one before).

Yes, this is correct for dependencies between memory locations in the general 
context of GCC.  [Below clarifications are for Paolo's benefit and anyone 
else's who wants to find out how GCC scheduling works.]

Scheduler dependency analysis is a user of the aforementioned alias analysis 
and it simply won't create a dependency between instructions if alias analysis 
tells it that it is OK to do so.  In the context of scheduler, the dependencies 
(and their types) are between instructions, not individual registers or memory 
locations.  The mere fact of two instructions having a dependency of any kind 
will make the scheduler produce correct code.  The difference between two 
instructions having true vs anti vs output dependency will manifest itself in 
how close the 2nd instruction will be issued to the 1st one.

Furthermore, when two instructions have dependencies on several items (e.g., 
both on register and on memory location), the resulting dependency type is set 
to the greater of dependency types of all dependent items: true-dependency 
having most weight, followed by anti-dependency, followed by output-dependency.

Consider instructions

[r1] = r2
r1 = [r2]

The scheduler dependency analysis will find an anti-dependency on r1 and 
true-dependency on memory locations (assuming [r1] and [r2] may alias).  The 
resulting dependency between instructions will be true-dependency and the 
instructions will be scheduled several cycles apart.  However, one might argue 
that [r1] and [r2] are unlikely to alias and scheduling these instructions 
back-to-back (downgrading dependency type from true to anti) would produce 
better code on average.  This is one of countless improvements that could be 
made to GCC scheduler.

--
Maxim Kuvyrkov
www.kugelworks.com




Re: Dependency confusion in sched-deps

2013-12-05 Thread Maxim Kuvyrkov
On 6/12/2013, at 8:44 am, shmeel gutl  wrote:

> On 05-Dec-13 02:39 AM, Maxim Kuvyrkov wrote:
>> Dependency type plays a role for estimating costs and latencies between 
>> instructions (which affects performance), but using wrong or imprecise 
>> dependency type does not affect correctness.
> On multi-issue architectures it does make a difference. Anti dependence 
> permits the two instructions to be issued during the same cycle whereas true 
> dependency and output dependency would forbid this.
> 
> Or am I misinterpreting your comment?

On VLIW-flavoured machines without resource conflict checking -- "yes", it is 
critical not to use anti dependency where an output or true dependency exist.  
This is the case though, only because these machines do not follow sequential 
semantics for instruction execution (i.e., effects from previous instructions 
are not necessarily observed by subsequent instructions on the same/close 
cycles.

On machines with internal resource conflict checking having a wrong type on the 
dependency should not cause wrong behavior, but "only" suboptimal performance.

Thank you,

--
Maxim Kuvyrkov
www.kugelworks.com




Re: Hmmm, I think we've seen this problem before (lto build):

2013-12-05 Thread Trevor Saunders
On Mon, Dec 02, 2013 at 12:16:18PM +0100, Richard Biener wrote:
> On Sun, Dec 1, 2013 at 12:30 PM, Toon Moene  wrote:
> > http://gcc.gnu.org/ml/gcc-testresults/2013-12/msg1.html
> >
> > FAILED: Bootstrap (build config: lto; languages: fortran; trunk revision
> > 205557) on x86_64-unknown-linux-gnu
> >
> > In function 'release',
> > inlined from 'release' at /home/toon/compilers/gcc/gcc/vec.h:1428:3,
> > inlined from '__base_dtor ' at
> > /home/toon/compilers/gcc/gcc/vec.h:1195:0,
> > inlined from 'compute_antic_aux' at
> > /home/toon/compilers/gcc/gcc/tree-ssa-pre.c:2212:0,
> > inlined from 'compute_antic' at
> > /home/toon/compilers/gcc/gcc/tree-ssa-pre.c:2493:0,
> > inlined from 'do_pre' at
> > /home/toon/compilers/gcc/gcc/tree-ssa-pre.c:4738:23,
> > inlined from 'execute' at
> > /home/toon/compilers/gcc/gcc/tree-ssa-pre.c:4818:0:
> > /home/toon/compilers/gcc/gcc/vec.h:312:3: error: attempt to free a non-heap
> > object 'worklist' [-Werror=free-nonheap-object]
> >::free (v);
> >^
> > lto1: all warnings being treated as errors
> > make[4]: *** [/dev/shm/wd26755/cczzGuTZ.ltrans13.ltrans.o] Error 1
> > make[4]: *** Waiting for unfinished jobs
> > lto-wrapper: make returned 2 exit status
> > /usr/bin/ld: lto-wrapper failed
> > collect2: error: ld returned 1 exit status
> 
> Yes, I still see this - likely caused by IPA-CP / partial inlining and a 
> "bogus"
> warning for unreachable code.

I'm really sorry about long delay here, I took a week off for
thanksgiving then was pretty busy with other stuff :/

If I remove the now useless  worklist.release (); on line 2211 of
tree-ssa-pre.c lto bootstrap gets passed this issue to a stage 2 / 3
comparison failure.  However doing that also causes these two test
failures in a normal bootstrap / regression test cycle

Tests that now fail, but worked before:

unix/-m32: 17_intro/headers/c++200x/stdc++.cc (test for excess errors)
unix/-m32: 17_intro/headers/c++200x/stdc++_multiple_inclusion.cc (test
for excess errors)

both of these failures are because of this ICE

Executing on host: /tmp/tmp.rsz07gSDni/test-objdir/./gcc/xg++
-shared-libgcc -B/tmp/tmp.rsz07gSDni/test-objdir/./gcc -nostdinc++
-L/tmp/tmp.rsz07gSDni/test-objdir/x86_64-unknown-linux-gnu/32/libstdc++-v3/src
-L/tmp/tmp.rsz07gSDni/test-objdir/x86_64-unknown-linux-gnu/32/libstdc++-v3/src/.libs
-L/tmp/tmp.rsz07gSDni/test-objdir/x86_64-unknown-linux-gnu/32/libstdc++-v3/libsupc++/.libs
-B/tmp/tmp.rsz07gSDni/test-install/x86_64-unknown-linux-gnu/bin/
-B/tmp/tmp.rsz07gSDni/test-install/x86_64-unknown-linux-gnu/lib/
-isystem
/tmp/tmp.rsz07gSDni/test-install/x86_64-unknown-linux-gnu/include
-isystem
/tmp/tmp.rsz07gSDni/test-install/x86_64-unknown-linux-gnu/sys-include
-m32
-B/tmp/tmp.rsz07gSDni/test-objdir/x86_64-unknown-linux-gnu/32/libstdc++-v3/src/.libs
-fdiagnostics-color=never -D_GLIBCXX_ASSERT -fmessage-length=0
-ffunction-sections -fdata-sections -g -O2 -D_GNU_SOURCE -g -O2
-D_GNU_SOURCE -DLOCALEDIR="." -nostdinc++
-I/tmp/tmp.rsz07gSDni/test-objdir/x86_64-unknown-linux-gnu/32/libstdc++-v3/include/x86_64-unknown-linux-gnu
-I/tmp/tmp.rsz07gSDni/test-objdir/x86_64-unknown-linux-gnu/32/libstdc++-v3/include
-I/tmp/tmp.rsz07gSDni/libstdc++-v3/libsupc++
-I/tmp/tmp.rsz07gSDni/libstdc++-v3/include/backward
-I/tmp/tmp.rsz07gSDni/libstdc++-v3/testsuite/util
/tmp/tmp.rsz07gSDni/libstdc++-v3/testsuite/17_intro/headers/c++200x/stdc++_multiple_inclusion.cc
-std=gnu++0x -S  -m32 -o stdc++_multiple_inclusion.s(timeout = 600)
spawn /tmp/tmp.rsz07gSDni/test-objdir/./gcc/xg++ -shared-libgcc
-B/tmp/tmp.rsz07gSDni/test-objdir/./gcc -nostdinc++
-L/tmp/tmp.rsz07gSDni/test-objdir/x86_64-unknown-linux-gnu/32/libstdc++-v3/src
-L/tmp/tmp.rsz07gSDni/test-objdir/x86_64-unknown-linux-gnu/32/libstdc++-v3/src/.libs
-L/tmp/tmp.rsz07gSDni/test-objdir/x86_64-unknown-linux-gnu/32/libstdc++-v3/libsupc++/.libs
-B/tmp/tmp.rsz07gSDni/test-install/x86_64-unknown-linux-gnu/bin/
-B/tmp/tmp.rsz07gSDni/test-install/x86_64-unknown-linux-gnu/lib/
-isystem
/tmp/tmp.rsz07gSDni/test-install/x86_64-unknown-linux-gnu/include
-isystem
/tmp/tmp.rsz07gSDni/test-install/x86_64-unknown-linux-gnu/sys-include
-m32
-B/tmp/tmp.rsz07gSDni/test-objdir/x86_64-unknown-linux-gnu/32/libstdc++-v3/src/.libs
-fdiagnostics-color=never -D_GLIBCXX_ASSERT -fmessage-length=0
-ffunction-sections -fdata-sections -g -O2 -D_GNU_SOURCE -g -O2
-D_GNU_SOURCE -DLOCALEDIR="." -nostdinc++
-I/tmp/tmp.rsz07gSDni/test-objdir/x86_64-unknown-linux-gnu/32/libstdc++-v3/include/x86_64-unknown-linux-gnu
-I/tmp/tmp.rsz07gSDni/test-objdir/x86_64-unknown-linux-gnu/32/libstdc++-v3/include
-I/tmp/tmp.rsz07gSDni/libstdc++-v3/libsupc++
-I/tmp/tmp.rsz07gSDni/libstdc++-v3/include/backward
-I/tmp/tmp.rsz07gSDni/libstdc++-v3/testsuite/util
/tmp/tmp.rsz07gSDni/libstdc++-v3/testsuite/17_intro/headers/c++200x/stdc++_multiple_inclusion.cc
-std=gnu++0x -S -m32 -o stdc++_multiple_inclusion.s^M
cc1plus: internal compiler error: Segmentation fault^M
0xb8745f crash_signal^M
  ../..