Re: Variable DECL_SIZE

2012-10-04 Thread Richard Guenther
On Wed, Oct 3, 2012 at 7:05 PM, Ilya Enkovich  wrote:
> Hi,
>
> I fall into ssa verification failure when try to pass field's
> DECL_SIZE as an operand for CALL_EXPR. The fail occurs if field's size
> is not a constant. In such case DECL_SIZE holds a VAR_DECL and I need
> to find it's proper SSA_NAME. I thought it may be the default one but
> gimple_default_def returns NULL. What is the right way to obtain
> correct ssa_name in such case?

There is no way.  You have to know that you need it's size at the point of
gimplification.  Later there is no way to recover it.

Richard.

>
> Thanks,
> Ilya


Re: Functions that are CSEable but not pure

2012-10-04 Thread Richard Guenther
On Wed, Oct 3, 2012 at 9:00 PM, Jason Merrill  wrote:
> In C++ there is a common idiom called "initialize on first use".  In its
> simplest form it looks like
>
> int& lazy_i()
> {
>   static int i = init;
>   return i;
> }
>
> If the initialization is expensive or order-sensitive, this is a useful
> alternative to initialization on load
> (http://www.parashift.com/c++-faq/static-init-order-on-first-use.html).
>
> An interesting property of such functions is that they only have
> side-effects the first time they are called, so subsequent calls can be
> optimized away to just use the return value of the first call.

If the result is not needed, are we allowed to remove a call to this
function?  Basically, is the function allowed to "detect" if it was
called before?
If we can remove it, thus the function may not exploit knowledge on wheter
it was called before then we can also perform partial redundancy elimination in

 if ()
   lazy_i ();
 lazy_i ();

so I really hope we can also DCE these calls - otherwise their property would
be really weird, as CSE also involves a "DCE" piece:

 i = lazy_i ();
j = lazy_j ();

->
  i = lazy_i ();
  lazy_j ();
  j = i;

->

 i = lazy_i ();
 j = i;

But if we are allowed to DCE a lazy_i call with unused result then it really is
"pure".  Basically with this property we consider the static variable
it writes to
as local, as it cannot be inspected in any other way than by calling the
function or inspecting its return value.

So - what's wrong with using pure?  That it doesn't "feel" correct?
The I suppose
we should simply re-word its definition ;)

> Currently there is no way to express this in GCC so that the optimizers know
> that multiple calls are redundant.  Marking the function as pure causes
> calls to be CSEd as desired, but also asserts that the first call has no
> side-effects, which is not the case.
>
> My implementation of dynamic initialization of TLS variables as mandated by
> the C++11 and OpenMP standards uses this idiom implicitly, so I'd really
> like to be able to optimize away multiple calls.
>
> Right now we have the following ECF flags (among others):
>
> const = no read, no write, no side-effects, cseable
> pure = no write, no side-effects, cseable
> novops = no read, no write

note that I consider novops a red-herring.  novops is basically
'const' volatile, but we don't have  a volatile flag on calls (we
have it, but that's for the store to the result).  I'd rather get rid
of novops ...

> looping_const_or_pure = wait, actually there are side-effects
>
> Seems like the difference between novops and looping_const_or_pure is
> whether calls are subject to CSE.  Is that right?

No, novops and looping_const_or_pure tell you whether they are subject
to DCE, all const/pure functions are subject to CSE, novops functions
are not (as said above, novops is really a volatile const).

> Rather than checking these flag bundles, it seems to me that we ought to
> break them up into
>
> no reads
> no writes
> no side-effects
> cseable side-effects

no side-effects and cseable side-effects doesn't sound like a good
distinction to me - side-effects are not 'cse'-able or they would not
be side-effects.  But I suppose it all boils down to my question above,
where the PRE case is really the most interesting (can the formerly
second call "detect" it used the initialized path or the non-initialized
path?)

> So const  would be noread|nowrite|noside
>pure   would be nowrite|noside
> const|looping would be noread|nowrite|cseside
> pure|looping  would be nowrite|cseside
>novops would be noread|nowrite
>
> and the behavior I want would be just cseside.
>
> Does this make sense?  Anyone more familiar with this area of the code want
> to implement it for me?  :}

I'd say it's already implemented, just poorly documented.  Use pure!

Richard.

> Jason


[AVR] Missing avr51-flash1.x in avr target specific tests?

2012-10-04 Thread Senthil Kumar Selvaraj
Some tests in gcc/testsuite/gcc.target/avr/torture (builtins-2.c, for
e.g.) have -Tavr51-flash1.x specified in dg-options. The tests currently
fail with an unable to open linker script error for that file.

Is that linker script supposed to be checked into source control? Or am
I missing some configure/build option?

Regards
Senthil


Re: reverse bitfield patch

2012-10-04 Thread Richard Guenther
On Wed, Oct 3, 2012 at 12:07 AM, DJ Delorie  wrote:
>
> Here's my current patch for the bitfield reversal feature I've been
> working on for a while, with an RX-specific pragma to apply it
> "globally".  Could someone please review this?  It would be nice
> to get it in before stage1 closes again...

ChangeLog missing, new functions need a toplevel comment documenting
function, argument and return value as per coding conventions.

Richard.

>
> Index: gcc/doc/extend.texi
> ===
> --- gcc/doc/extend.texi (revision 192009)
> +++ gcc/doc/extend.texi (working copy)
> @@ -5427,12 +5427,74 @@ Note that the type visibility is applied
>  associated with the class (vtable, typeinfo node, etc.).  In
>  particular, if a class is thrown as an exception in one shared object
>  and caught in another, the class must have default visibility.
>  Otherwise the two shared objects will be unable to use the same
>  typeinfo node and exception handling will break.
>
> +@item bit_order
> +Normally, GCC allocates bitfields from either the least significant or
> +most significant bit in the underlying type, such that bitfields
> +happen to be allocated from lowest address to highest address.
> +Specifically, big-endian targets allocate the MSB first, where
> +little-endian targets allocate the LSB first.  The @code{bit_order}
> +attribute overrides this default, allowing you to force allocation to
> +be MSB-first, LSB-first, or the opposite of whatever gcc defaults to.  The
> +@code{bit_order} attribute takes an optional argument:
> +
> +@table @code
> +
> +@item native
> +This is the default, and also the mode when no argument is given.  GCC
> +allocates LSB-first on little endian targets, and MSB-first on big
> +endian targets.
> +
> +@item swapped
> +Bitfield allocation is the opposite of @code{native}.
> +
> +@item lsb
> +Bits are allocated LSB-first.
> +
> +@item msb
> +Bits are allocated MSB-first.
> +
> +@end table
> +
> +A short example demonstrates bitfield allocation:
> +
> +@example
> +struct __attribute__((bit_order(msb))) @{
> +  char a:3;
> +  char b:3;
> +@} foo = @{ 3, 5 @};
> +@end example
> +
> +With LSB-first allocation, @code{foo.a} will be in the 3 least
> +significant bits (mask 0x07) and @code{foo.b} will be in the next 3
> +bits (mask 0x38).  With MSB-first allocation, @code{foo.a} will be in
> +the 3 most significant bits (mask 0xE0) and @code{foo.b} will be in
> +the next 3 bits (mask 0x1C).
> +
> +Note that it is entirely up to the programmer to define bitfields that
> +make sense when swapped.  Consider:
> +
> +@example
> +struct __attribute__((bit_order(msb))) @{
> +  short a:7;
> +  char b:6;
> +@} foo = @{ 3, 5 @};
> +@end example
> +
> +On some targets, or if the struct is @code{packed}, GCC may only use
> +one byte of storage for A despite it being a @code{short} type.
> +Swapping the bit order of A would cause it to overlap B.  Worse, the
> +bitfield for B may span bytes, so ``swapping'' would no longer be
> +defined as there is no ``char'' to swap within.  To avoid such
> +problems, the programmer should either fully-define each underlying
> +type, or ensure that their target's ABI allocates enough space for
> +each underlying type regardless of how much of it is used.
> +
>  @end table
>
>  To specify multiple attributes, separate them by commas within the
>  double parentheses: for example, @samp{__attribute__ ((aligned (16),
>  packed))}.
>
> Index: gcc/c-family/c-common.c
> ===
> --- gcc/c-family/c-common.c (revision 192009)
> +++ gcc/c-family/c-common.c (working copy)
> @@ -310,12 +310,13 @@ struct visibility_flags visibility_optio
>
>  static tree c_fully_fold_internal (tree expr, bool, bool *, bool *);
>  static tree check_case_value (tree);
>  static bool check_case_bounds (tree, tree, tree *, tree *);
>
>  static tree handle_packed_attribute (tree *, tree, tree, int, bool *);
> +static tree handle_bitorder_attribute (tree *, tree, tree, int, bool *);
>  static tree handle_nocommon_attribute (tree *, tree, tree, int, bool *);
>  static tree handle_common_attribute (tree *, tree, tree, int, bool *);
>  static tree handle_noreturn_attribute (tree *, tree, tree, int, bool *);
>  static tree handle_hot_attribute (tree *, tree, tree, int, bool *);
>  static tree handle_cold_attribute (tree *, tree, tree, int, bool *);
>  static tree handle_noinline_attribute (tree *, tree, tree, int, bool *);
> @@ -601,12 +602,14 @@ const unsigned int num_c_common_reswords
>  const struct attribute_spec c_common_attribute_table[] =
>  {
>/* { name, min_len, max_len, decl_req, type_req, fn_type_req, handler,
> affects_type_identity } */
>{ "packed", 0, 0, false, false, false,
>   handle_packed_attribute , false},
> +  { "bit_order",  0, 1, false, true, false,
> + handle_bitorder_a

Re: inlined memcpy/memset degradation in gcc 4.6 or later

2012-10-04 Thread Richard Guenther
On Tue, Oct 2, 2012 at 4:19 PM, Walter Lee  wrote:
>
> On TILE-Gx, I'm observing a degradation in inlined memcpy/memset in
> gcc 4.6 and later versus gcc 4.4.  Though I find the problem on
> TILE-Gx, I think this is a problem for any architectures with
> SLOW_UNALIGNED_ACCESS set to 1.
>
> Consider the following program:
>
> struct foo {
>   int x;
> };
>
> void copy(struct foo* f0, struct foo* f1)
> {
>   memcpy (f0, f1, sizeof(struct foo));
> }
>
> In gcc 4.4, I get the desired inline memcpy:
>
> copy:
> ld4sr1, r1
> st4 r0, r1
> jrp lr
>
> In gcc 4.7, however, I get inlined byte-by-byte copies:
>
> copy:
> ld1u_add r10, r1, 1
> st1_add  r0, r10, 1
> ld1u_add r10, r1, 1
> st1_add  r0, r10, 1
> ld1u_add r10, r1, 1
> st1_add  r0, r10, 1
> ld1u r10, r1
> st1  r0, r10
> jrp  lr
>
> The inlining of memcpy is done in expand_builtin_memcpy in builtins.c.
> Tracing through that, I see that the alignment of src_align and
> dest_align, which is computed by get_pointer_alignment, has degraded:
> in gcc 4.4 they are 32 bits, but in gcc 4.7 they are 8 bits.  This
> causes the loads generated by the inlined memcopy to be per-byte
> instead of per-4-byte.
>
> Looking further, gcc 4.7 uses the "align" field in "struct
> ptr_info_def" to compute the alignment.  This field appears to be
> initialized in get_ptr_info in tree-ssanames.c but it is always
> initialized to 1 byte and does not appear to change.  gcc 4.4 computes
> its alignment information differently.
>
> I get the same byte-copies with gcc 4.8 and gcc 4.6.
>
> I see a couple related open PRs: 50417, 53535, but no suggested fixes
> for them yet.  Can anyone advise on how this can be fixed?  Should I
> file a new bug, or add this info to one of the existing PRs?

There is no way to fix it.  memcpy does not require aligned arguments
and the merely presence of a typed pointer contains zero information
of alignment for the middle-end.  If you want to excercise C's notion
of alignemnt requirements then do not use memcpy but

 *f0 = *f1;

which works equally well.  ISTR Tom had various patches improving
alignment info for pointers but we cannot improve this degenerate
case.  Btw, the new beavior even fixed bugs.

Richard.

> Thanks,
>
> Walter
>


Re: Functions that are CSEable but not pure

2012-10-04 Thread Jakub Jelinek
On Thu, Oct 04, 2012 at 10:42:59AM +0200, Richard Guenther wrote:
> On Wed, Oct 3, 2012 at 9:00 PM, Jason Merrill  wrote:
> If the result is not needed, are we allowed to remove a call to this
> function?

No.  Unless you know the same function has been already called.

> So - what's wrong with using pure?  That it doesn't "feel" correct?

No, it can modify memory.
Consider:

a.h:
extern int g;
struct S
{
  S () { __atomic_add_fetch (&g, 1, __ATOMIC_RELAXED); s = 7; }
  int s;
};
extern thread_local S s;

liba.C:
#include "a.h"
thread_local S s;

libb.C:
#include "a.h"

int g;

void
foo (S *&p, S *&q, int &a, int &b)
{
  a = g;
  p = &s; // this runs S::S() in the current thread
  b = g;
  q = &s;
}

I think the plan was for these functions not to return any value,
so that they could be aliases for all thread_local vars defined in the same
TU.  Then you really can't DCE the first call, you can the second call.
If it returns void, CSEable is probably a bad word.
Even if it returns value and the value is unused, it still needs to be
called the first time in a function, and can modify anything the ctor(s)
can modify.

Jakub


[LRA] liveness update not done after elimination??

2012-10-04 Thread Steven Bosscher
Hello Vlad,

Is it intentional that DF_LR_IN and DF_LR_OUT are not updated after
"Updating elimination of equiv for reg..."? I have some checking in
place in process_bb_lives at the end of the function, and it triggers
on the test case. (Checking code and test case is at the end of this
e-mail.) It show:

LIVEIN-mismatch in bb 5:
66, 69,
66, 69, 73
73
t.c: In function 'get_mem_attrs':
t.c:47:1: internal compiler error: in process_bb_lives, at lra-lives.c:866
 }
 ^

And that's correct if you look at the LR_IN set of basic block 5 in
the IRA  (.193r.ira) dump:

;; basic block 2, loop depth 0, count 0, freq 1, maybe hot
...
9 r73:SI=[argp:DI]
  REG_EQUIV: [argp:DI]
...
;; lr  out   6 [bp] 7 [sp] 16 [argp] 20 [frame] 66 69 73
...
;; basic block 5, loop depth 0, count 0, freq 5014, maybe hot
;;  prev block 4, next block 6, flags: (RTL)
;;  pred:   4
;; bb 5 artificial_defs: { }
;; bb 5 artificial_uses: { u22(6){ }u23(7){ }u24(16){ }u25(20){ }}
;; lr  in6 [bp] 7 [sp] 16 [argp] 20 [frame] 66 69 73
;; lr  use   6 [bp] 7 [sp] 16 [argp] 20 [frame] 69 73
;; lr  def   17 [flags] 74 76
;; live  in  6 [bp] 7 [sp] 16 [argp] 20 [frame] 66 69 73
;; live  gen 17 [flags] 74 76
;; live  kill
   23 NOTE_INSN_BASIC_BLOCK
   24 r74:DI=sign_extend(r73:SI)
  REG_DEAD: r73:SI
   26 r76:DI=zero_extend([r74:DI+`mode_size'])
  REG_DEAD: r74:DI
   27 flags:CCZ=cmp(r76:DI,[r69:DI])
  REG_DEAD: r76:DI
  REG_DEAD: r69:DI
   28 pc={(flags:CCZ==0)?L57:pc}
  REG_DEAD: flags:CCZ
  REG_BR_PROB: 0x19e
;;  succ:   8
;;  6
;; lr  out   6 [bp] 7 [sp] 16 [argp] 20 [frame] 66
;; live  out 6 [bp] 7 [sp] 16 [argp] 20 [frame] 66


But that insn setting reg 73 has been eliminated according to the LRA
(.194r.reload) dump:

Updating elimination of equiv for reg 73
...
  Removing equiv init insn 9 (freq=1)
9 r73:SI=[argp:DI+0x30]
  REG_EQUIV: [argp:DI]
deleting insn with uid = 9.
 Choosing alt 0 in insn 15:  (0) r
 Choosing alt 0 in insn 18:  (0) r
Changing pseudo 73 in operand 0 of insn 21 on equiv [argp:DI+0x30]
 Choosing alt 0 in insn 21:  (0) rm  (1) re
Changing pseudo 73 in operand 1 of insn 24 on equiv [argp:DI+0x30]
 Choosing alt 1 in insn 24:  (0) r  (1) rm
 Choosing alt 0 in insn 26:  (0) =r  (1) qm
 Choosing alt 1 in insn 27:  (0) r  (1) rm
 Choosing alt 1 in insn 31:  (0) m  (1) re

You do update the DF_LR sets elsewhere in LRA, so I was wondering if
it is intentional or not to leave the DF_LR sets unchanged after
elimination.

Ciao!
Steven



-- 8< --

   EXECUTE_IF_SET_IN_SPARSESET (pseudos_live, i)
mark_pseudo_dead (i);
+{
+  bitmap b = BITMAP_ALLOC (®_obstack);
+  bitmap_copy (b, DF_LR_IN (bb));
+  bitmap_clear_range (b, 0, FIRST_PSEUDO_REGISTER);
+  EXECUTE_IF_SET_IN_BITMAP (DF_LR_IN (bb), FIRST_PSEUDO_REGISTER, j, bi)
+if (sparseset_bit_p (pseudos_live, j))
+  bitmap_clear_bit (b, j);
+  if (! bitmap_empty_p (b))
+{
+  fprintf (stderr, "LIVEIN-mismatch in bb %d:\n", bb->index);
+  EXECUTE_IF_SET_IN_BITMAP (DF_LR_IN (bb),
FIRST_PSEUDO_REGISTER, j, bi)
+if (sparseset_bit_p (pseudos_live, j))
+  fprintf (stderr, "%u, ", j);
+  fprintf (stderr, "\n");
+  bitmap_clear_range (DF_LR_IN (bb), 0, FIRST_PSEUDO_REGISTER);
+  dump_bitmap (stderr, DF_LR_IN (bb));
+  dump_bitmap (stderr, b);
+  gcc_unreachable ();
+}
+  BITMAP_FREE (b);
+}
-- 8< --

-- 8< --
typedef long unsigned int size_t;

union tree_node;
typedef union tree_node *tree;
struct rtx_def;
typedef struct rtx_def *rtx;

struct rtx_def {
  union u {
long hwint[1];
  } u;
};

typedef struct mem_attrs
{
  tree expr;
  rtx offset;
  rtx size;
  int alias;
  unsigned int align;
} mem_attrs;

extern void *ggc_alloc_stat (size_t);

typedef struct htab *htab_t;
extern void ** htab_find_slot (htab_t, const void *, int);
static htab_t mem_attrs_htab;

extern unsigned char mode_size[256];

mem_attrs *get_mem_attrs (int, tree, rtx, rtx, unsigned int,
  unsigned char, int);

mem_attrs *
get_mem_attrs (int alias, tree expr, rtx offset, rtx size,
   unsigned int align, unsigned char addrspace, int mode)
{
  mem_attrs attrs;
  void **slot;
  if (alias == 0
  && (size == 0
  || (mode != 1 && mode_size[mode] == size->u.hwint[0])))
return 0;
  attrs.alias = alias;
  slot = htab_find_slot (mem_attrs_htab, &attrs, 1);
  return (mem_attrs *) *slot;
}
-- 8< --


Re: Functions that are CSEable but not pure

2012-10-04 Thread Jason Merrill

On 10/04/2012 08:45 AM, Jakub Jelinek wrote:

On Thu, Oct 04, 2012 at 10:42:59AM +0200, Richard Guenther wrote:

On Wed, Oct 3, 2012 at 9:00 PM, Jason Merrill  wrote:
If the result is not needed, are we allowed to remove a call to this
function?


No.  Unless you know the same function has been already called.


Yes, we are.  If we optimize away a use of the variable, we can also 
remove the call to the function.  We can also hoist the call so its 
side-effects occur before other side-effects, as the standard only 
requires that the initialization of the variable occur some time between 
thread creation and the first use of the variable.



So - what's wrong with using pure?  That it doesn't "feel" correct?


No, it can modify memory.


Right, that's the issue.  The back end assumes that calls to pure 
functions won't clobber memory; calls to these functions can clobber 
arbitrary memory, but only on the first call.



I think the plan was for these functions not to return any value,


No, I'm talking about the wrapper function which returns a reference to 
the variable (as in my example).


Jason



Re: Functions that are CSEable but not pure

2012-10-04 Thread Jakub Jelinek
On Thu, Oct 04, 2012 at 08:56:03AM -0400, Jason Merrill wrote:
> >I think the plan was for these functions not to return any value,
> 
> No, I'm talking about the wrapper function which returns a reference
> to the variable (as in my example).

Sure, but I thought you want to inline the wrapper function as soon as
possible.  Or do you want to keep it as some kind of builtin that gets
expanded during expansion to the test and call?

Jakub


Re: Functions that are CSEable but not pure

2012-10-04 Thread Richard Guenther
On Thu, Oct 4, 2012 at 2:56 PM, Jason Merrill  wrote:
> On 10/04/2012 08:45 AM, Jakub Jelinek wrote:
>>
>> On Thu, Oct 04, 2012 at 10:42:59AM +0200, Richard Guenther wrote:
>>>
>>> On Wed, Oct 3, 2012 at 9:00 PM, Jason Merrill  wrote:
>>> If the result is not needed, are we allowed to remove a call to this
>>> function?
>>
>>
>> No.  Unless you know the same function has been already called.
>
>
> Yes, we are.  If we optimize away a use of the variable, we can also remove
> the call to the function.  We can also hoist the call so its side-effects
> occur before other side-effects, as the standard only requires that the
> initialization of the variable occur some time between thread creation and
> the first use of the variable.
>
>
>>> So - what's wrong with using pure?  That it doesn't "feel" correct?
>>
>>
>> No, it can modify memory.
>
>
> Right, that's the issue.  The back end assumes that calls to pure functions
> won't clobber memory; calls to these functions can clobber arbitrary memory,
> but only on the first call.

Ugh.  Especially with the above (you can DCE those calls) makes this
severly mis-specified ... and any implementation error-prone (look what
mess our losely defined 'malloc' attribute opened ...).

I thought of a testcase like

 int *p = get_me ();
 .. = *p;
 int *q = get_me ();
 .. = *q;

and get_me allocating/initalizing and returning a singleton.

But you tell me it's more complicated and get_me () needs to
be a barrier for any load/store (as it may modify arbitrary memory,
but only on the "first" call).

I think that "may modify arbitrary memory" isn't going to fly and
my answer would be, better don't try to optimize anything here,
at least not in generic terms.  How would you handle this in
the alias oracle?  How would you then make CSE recognize
two functions return the same value and are CSEable?

>> I think the plan was for these functions not to return any value,
>
>
> No, I'm talking about the wrapper function which returns a reference to the
> variable (as in my example).

Can you come up with a short but complete testcase illustrating the issue
better (preferably C, so I don't need to read-in magic points where construction
happens)?

Thanks,
Richard.

> Jason
>


IRA subregs playing badly with REG_EQUIV (Was: Reload reloading outdated values ?)

2012-10-04 Thread Frederic Riss
Thanks to the help of segher and iant on IRC (thanks again!), I
narrowed my problem down to something I can fully understand and
explain (well I hope).

There is a bad interaction between the IRA handling of subregs and
reload's use of REG_EQUIV annotations. Each one seems to be in its
right to do what it does, but the end result is wrong generated code.
Here's what happens, using some simplified RTL:

Before IRA:

(insn 1 [(set (subreg:SI (reg:DI 6835) 0)
 (set (subreg:SI (reg:DI 6835) 4)])
(insn 2 (set (reg:SI 229) (mem:SI (subreg:SI (reg:DI 6835) 0)))
(REG_EQUIV (mem:SI (subreg:SI (reg:DI 6835) 0
(insn 3 (set (reg:SI 3346) (...)))
(insn 4 (use (reg:SI 3346))
(REG_DEAD (reg:SI 3346)))
(insn 5 (use (reg:SI 229)))
(insn 6 (use (subreg:SI (reg:DI 6835) 4))
(REG_DEAD (reg:DI 6835)))

IRA does the following allocation:
  pseudo reg 6835 gets hard reg 8
  pseudo reg 3346 gets hard reg 8
  pseudo reg 229 gets spilled

IRA is in its right to allocate hard reg 8 for both reg 3346 and 6835,
because it tracks conflicts at the word level, and SI reg 3346 only
conflicts with the high part of DI reg 6835. Had reg 229 been
allocated to a hard register, the code would be correct.

In my example though, the register pressure is high and reg 229 gets
no hard register. Reload then chooses to use the equivalent memory
location for this register. We end up with a reload for (insn 5) that
does:

(insn 6061 (set (reg:SI 44) (mem:SI (reg:SI 8)))
(insn 5 (use (reg:SI 44)))

The expression used in (insn 6061) for the reload value is the
allocated version of the REG_EQUIV expression, but at the point it
gets inserted, r8 has been clobbered by (insn 4).

To try to sum it up, the use of the memory equivalence by reload
extends the liveness of one half the reg 6835, thus generating code
that isn't wrong because the interference graph computed by IRA isn't
right anymore.

I'm not sure what the right fix is:
 1/ Denying REG_EQUIVs which contain subregs
 2/ Tightening the interference graph computed by IRA
 3/ Some hack in reload to prevent this equivalence to be used in this case

1/ seems simple but I'm not sure about its consequences. 2/ would
incur a degradation in register allocation quality. I don't know about
3/. What do people think?


Re: [AVR] Missing avr51-flash1.x in avr target specific tests?

2012-10-04 Thread Georg-Johann Lay
Senthil Kumar Selvaraj wrote:
> Some tests in gcc/testsuite/gcc.target/avr/torture (builtins-2.c, for
> e.g.) have -Tavr51-flash1.x specified in dg-options. The tests currently
> fail with an unable to open linker script error for that file.
> 
> Is that linker script supposed to be checked into source control? Or am
> I missing some configure/build option?

I found no way to specify a linker script for one single test and supply that
linker script in the testsuite source tree.


It's likely that I used the wrong approach just because my limited knowledge on
DejaGNU and GCC testsuite framework.

The linker script must be located in a place where the compiler will search for
it, e.g. some directory that contains the .x and that you added per -L to the
ldflags in your DejaGNU board description.

Johann



Re: Functions that are CSEable but not pure

2012-10-04 Thread Jason Merrill

On 10/04/2012 09:07 AM, Richard Guenther wrote:

Ugh.  Especially with the above (you can DCE those calls) makes this
severly mis-specified ... and any implementation error-prone (look what
mess our losely defined 'malloc' attribute opened ...).

I thought of a testcase like

  int *p = get_me ();
  .. = *p;
  int *q = get_me ();
  .. = *q;

and get_me allocating/initalizing and returning a singleton.


Right.


But you tell me it's more complicated and get_me () needs to
be a barrier for any load/store (as it may modify arbitrary memory,
but only on the "first" call).


Yes, because the initialization is user-written code.


I think that "may modify arbitrary memory" isn't going to fly and
my answer would be, better don't try to optimize anything here,
at least not in generic terms.  How would you handle this in
the alias oracle?  How would you then make CSE recognize
two functions return the same value and are CSEable?


For aliasing purposes, the call is like a call to a normal function. 
For CSE purposes, we want to recognize identical calls and combine them. 
 I don't know the GCC bits well enough to be any more specific.



Can you come up with a short but complete testcase illustrating the issue
better (preferably C, so I don't need to read-in magic points where construction
happens)?


int init_count;
int data;

void init()
{
  static int initialized;
  if (!initialized)
{
  data = ++init_count;
  initialized = 1;
}
}

inline int *get_me() __attribute ((pure));
inline int *get_me()
{
  init ();
  return &data;
}

int sink;

int main()
{
  sink = init_count;
  int *p = get_me();
  if (init_count != 1)
__builtin_abort();
  int *q = get_me();
  if (init_count != 1)
__builtin_abort();
  return *p + *q;
}

On this testcase, gcc -O2 doesn't reload init_count after the call to 
get_me because it thinks that the call can't have modified init_count. 
I want the compiler to know that it is safe to discard the redundant 
assignment, but not make assumptions about memory.


On 10/04/2012 08:59 AM, Jakub Jelinek wrote:> On Thu, Oct 04, 2012 at 
08:56:03AM -0400, Jason Merrill wrote:

> Sure, but I thought you want to inline the wrapper function as soon as
> possible.  Or do you want to keep it as some kind of builtin that gets
> expanded during expansion to the test and call?

Ah, I see your point; if get_me is inlined we end up with two calls to 
init, so it would be good to mark init with the same hypothetical attribute.


Jason



Re: IRA subregs playing badly with REG_EQUIV (Was: Reload reloading outdated values ?)

2012-10-04 Thread Ian Lance Taylor
On Thu, Oct 4, 2012 at 6:40 AM, Frederic Riss  wrote:
> Thanks to the help of segher and iant on IRC (thanks again!), I
> narrowed my problem down to something I can fully understand and
> explain (well I hope).
>
> There is a bad interaction between the IRA handling of subregs and
> reload's use of REG_EQUIV annotations. Each one seems to be in its
> right to do what it does, but the end result is wrong generated code.
> Here's what happens, using some simplified RTL:
>
> Before IRA:
>
> (insn 1 [(set (subreg:SI (reg:DI 6835) 0)
>  (set (subreg:SI (reg:DI 6835) 4)])
> (insn 2 (set (reg:SI 229) (mem:SI (subreg:SI (reg:DI 6835) 0)))
> (REG_EQUIV (mem:SI (subreg:SI (reg:DI 6835) 0
> (insn 3 (set (reg:SI 3346) (...)))
> (insn 4 (use (reg:SI 3346))
> (REG_DEAD (reg:SI 3346)))
> (insn 5 (use (reg:SI 229)))
> (insn 6 (use (subreg:SI (reg:DI 6835) 4))
> (REG_DEAD (reg:DI 6835)))
>
> IRA does the following allocation:
>   pseudo reg 6835 gets hard reg 8
>   pseudo reg 3346 gets hard reg 8
>   pseudo reg 229 gets spilled
>
> IRA is in its right to allocate hard reg 8 for both reg 3346 and 6835,
> because it tracks conflicts at the word level, and SI reg 3346 only
> conflicts with the high part of DI reg 6835. Had reg 229 been
> allocated to a hard register, the code would be correct.
>
> In my example though, the register pressure is high and reg 229 gets
> no hard register. Reload then chooses to use the equivalent memory
> location for this register. We end up with a reload for (insn 5) that
> does:
>
> (insn 6061 (set (reg:SI 44) (mem:SI (reg:SI 8)))
> (insn 5 (use (reg:SI 44)))
>
> The expression used in (insn 6061) for the reload value is the
> allocated version of the REG_EQUIV expression, but at the point it
> gets inserted, r8 has been clobbered by (insn 4).
>
> To try to sum it up, the use of the memory equivalence by reload
> extends the liveness of one half the reg 6835, thus generating code
> that isn't wrong because the interference graph computed by IRA isn't
> right anymore.
>
> I'm not sure what the right fix is:
>  1/ Denying REG_EQUIVs which contain subregs
>  2/ Tightening the interference graph computed by IRA
>  3/ Some hack in reload to prevent this equivalence to be used in this case
>
> 1/ seems simple but I'm not sure about its consequences. 2/ would
> incur a degradation in register allocation quality. I don't know about
> 3/. What do people think?

I have not really looked at how IRA works.  But I think that it must
track subreg life and death somehow.  So another approach would be for
validate_equiv_mem in ira.c to observe when a memory address uses a
subreg that has died.  It already checks for full registers that have
died.  That code could be extended to handle subregs.

Ian


Re: Functions that are CSEable but not pure

2012-10-04 Thread Richard Guenther
On Thu, Oct 4, 2012 at 5:22 PM, Jason Merrill  wrote:
> On 10/04/2012 09:07 AM, Richard Guenther wrote:
>>
>> Ugh.  Especially with the above (you can DCE those calls) makes this
>> severly mis-specified ... and any implementation error-prone (look what
>> mess our losely defined 'malloc' attribute opened ...).
>>
>> I thought of a testcase like
>>
>>   int *p = get_me ();
>>   .. = *p;
>>   int *q = get_me ();
>>   .. = *q;
>>
>> and get_me allocating/initalizing and returning a singleton.
>
>
> Right.
>
>
>> But you tell me it's more complicated and get_me () needs to
>> be a barrier for any load/store (as it may modify arbitrary memory,
>> but only on the "first" call).
>
>
> Yes, because the initialization is user-written code.
>
>
>> I think that "may modify arbitrary memory" isn't going to fly and
>> my answer would be, better don't try to optimize anything here,
>> at least not in generic terms.  How would you handle this in
>> the alias oracle?  How would you then make CSE recognize
>> two functions return the same value and are CSEable?
>
>
> For aliasing purposes, the call is like a call to a normal function. For CSE
> purposes, we want to recognize identical calls and combine them.  I don't
> know the GCC bits well enough to be any more specific.
>
>
>> Can you come up with a short but complete testcase illustrating the issue
>> better (preferably C, so I don't need to read-in magic points where
>> construction
>> happens)?
>
>
> int init_count;
> int data;
>
> void init()
> {
>   static int initialized;
>   if (!initialized)
> {
>   data = ++init_count;
>   initialized = 1;
> }
> }
>
> inline int *get_me() __attribute ((pure));
> inline int *get_me()
> {
>   init ();
>   return &data;
> }
>
> int sink;
>
> int main()
> {
>   sink = init_count;
>   int *p = get_me();
>   if (init_count != 1)
> __builtin_abort();
>   int *q = get_me();
>   if (init_count != 1)
> __builtin_abort();
>   return *p + *q;
> }
>
> On this testcase, gcc -O2 doesn't reload init_count after the call to get_me
> because it thinks that the call can't have modified init_count. I want the
> compiler to know that it is safe to discard the redundant assignment, but
> not make assumptions about memory.

But isn't it a fact that it _cannot_ modify init_count?  If the second call
is CSEable then it cannot have side-effects that are observable at
the call site.  Is the following an example you would consider to fall
under your CSEing?

int init_count;
int data;
int initialized;

void init()
{
  if (!initialized)
{
  data = ++init_count;
  initialized = 1;
}
}

inline int *get_me() __attribute ((pure));
inline int *get_me()
{
  init ();
  return &data;
}

int sink;

int main()
{
  sink = init_count;
  int *p = get_me();
  if (init_count != 1)
__builtin_abort();
  initialized = 0;
  int *q = get_me();
  if (init_count != 2)
__builtin_abort();
  return *p + *q;
}

?  If so, then why can we assume get_me returns the same pointer even here?

Richard.

>
> On 10/04/2012 08:59 AM, Jakub Jelinek wrote:> On Thu, Oct 04, 2012 at
> 08:56:03AM -0400, Jason Merrill wrote:
>> Sure, but I thought you want to inline the wrapper function as soon as
>> possible.  Or do you want to keep it as some kind of builtin that gets
>> expanded during expansion to the test and call?
>
> Ah, I see your point; if get_me is inlined we end up with two calls to init,
> so it would be good to mark init with the same hypothetical attribute.
>
> Jason
>


Re: Functions that are CSEable but not pure

2012-10-04 Thread Richard Guenther
On Thu, Oct 4, 2012 at 7:08 PM, Richard Guenther
 wrote:
> On Thu, Oct 4, 2012 at 5:22 PM, Jason Merrill  wrote:
>> On 10/04/2012 09:07 AM, Richard Guenther wrote:
>>>
>>> Ugh.  Especially with the above (you can DCE those calls) makes this
>>> severly mis-specified ... and any implementation error-prone (look what
>>> mess our losely defined 'malloc' attribute opened ...).
>>>
>>> I thought of a testcase like
>>>
>>>   int *p = get_me ();
>>>   .. = *p;
>>>   int *q = get_me ();
>>>   .. = *q;
>>>
>>> and get_me allocating/initalizing and returning a singleton.
>>
>>
>> Right.
>>
>>
>>> But you tell me it's more complicated and get_me () needs to
>>> be a barrier for any load/store (as it may modify arbitrary memory,
>>> but only on the "first" call).
>>
>>
>> Yes, because the initialization is user-written code.
>>
>>
>>> I think that "may modify arbitrary memory" isn't going to fly and
>>> my answer would be, better don't try to optimize anything here,
>>> at least not in generic terms.  How would you handle this in
>>> the alias oracle?  How would you then make CSE recognize
>>> two functions return the same value and are CSEable?
>>
>>
>> For aliasing purposes, the call is like a call to a normal function. For CSE
>> purposes, we want to recognize identical calls and combine them.  I don't
>> know the GCC bits well enough to be any more specific.
>>
>>
>>> Can you come up with a short but complete testcase illustrating the issue
>>> better (preferably C, so I don't need to read-in magic points where
>>> construction
>>> happens)?
>>
>>
>> int init_count;
>> int data;
>>
>> void init()
>> {
>>   static int initialized;
>>   if (!initialized)
>> {
>>   data = ++init_count;
>>   initialized = 1;
>> }
>> }
>>
>> inline int *get_me() __attribute ((pure));
>> inline int *get_me()
>> {
>>   init ();
>>   return &data;
>> }
>>
>> int sink;
>>
>> int main()
>> {
>>   sink = init_count;
>>   int *p = get_me();
>>   if (init_count != 1)
>> __builtin_abort();
>>   int *q = get_me();
>>   if (init_count != 1)
>> __builtin_abort();
>>   return *p + *q;
>> }
>>
>> On this testcase, gcc -O2 doesn't reload init_count after the call to get_me
>> because it thinks that the call can't have modified init_count. I want the
>> compiler to know that it is safe to discard the redundant assignment, but
>> not make assumptions about memory.
>
> But isn't it a fact that it _cannot_ modify init_count?  If the second call
> is CSEable then it cannot have side-effects that are observable at
> the call site.  Is the following an example you would consider to fall
> under your CSEing?
>
> int init_count;
> int data;
> int initialized;
>
> void init()
> {
>   if (!initialized)
> {
>   data = ++init_count;
>   initialized = 1;
> }
> }
>
> inline int *get_me() __attribute ((pure));
> inline int *get_me()
> {
>   init ();
>   return &data;
> }
>
> int sink;
>
> int main()
> {
>   sink = init_count;
>   int *p = get_me();
>   if (init_count != 1)
> __builtin_abort();
>   initialized = 0;
>   int *q = get_me();
>   if (init_count != 2)
> __builtin_abort();
>   return *p + *q;
> }
>
> ?  If so, then why can we assume get_me returns the same pointer even here?

Or similar:

int main()
{
  sink = init_count;
  int *p = get_me();
  if (init_count != 1)
__builtin_abort();
}

is this required to not abort?  p is unused and with 'pure' you'd DCE the call.

That is, I am confused about the distinction you seem to make between
the static variable 'initialized' and the global 'init_count'.  You seem to
imply that the attribute would mean that we can CSE side-effects to
global memory but that those side-effects may be value-changing!

In practice for any CSE implementation this hypotetical "cse-side-effects"
attribute would only allow us to CSE if there are no intermediate
side-effects between two such calls, and as soon as the second get_me
call would be CSEd we'd also CSE the init_count value (which you
didn't want to CSE?!)

Still confused ;)

Richard.

> Richard.
>
>>
>> On 10/04/2012 08:59 AM, Jakub Jelinek wrote:> On Thu, Oct 04, 2012 at
>> 08:56:03AM -0400, Jason Merrill wrote:
>>> Sure, but I thought you want to inline the wrapper function as soon as
>>> possible.  Or do you want to keep it as some kind of builtin that gets
>>> expanded during expansion to the test and call?
>>
>> Ah, I see your point; if get_me is inlined we end up with two calls to init,
>> so it would be good to mark init with the same hypothetical attribute.
>>
>> Jason
>>


Re: Functions that are CSEable but not pure

2012-10-04 Thread Jakub Jelinek
On Thu, Oct 04, 2012 at 07:08:02PM +0200, Richard Guenther wrote:
> But isn't it a fact that it _cannot_ modify init_count?  If the second call
> is CSEable then it cannot have side-effects that are observable at
> the call site.  Is the following an example you would consider to fall
> under your CSEing?
> 
> int init_count;
> int data;
> int initialized;
> 
> void init()
> {
>   if (!initialized)
> {
>   data = ++init_count;
>   initialized = 1;
> }
> }
> 
> inline int *get_me() __attribute ((pure));
> inline int *get_me()
> {
>   init ();
>   return &data;
> }
> 
> int sink;
> 
> int main()
> {
>   sink = init_count;
>   int *p = get_me();
>   if (init_count != 1)
> __builtin_abort();
>   initialized = 0;
>   int *q = get_me();
>   if (init_count != 2)
> __builtin_abort();
>   return *p + *q;
> }

The above isn't a singleton function, as you are clearing initialized,
therefore it doesn't have the properties the thread_local var get_address
wrapper has.  The singleton function really is
void singleton (void)
{
  static __thread bool initialized;
  if (!initialized) {
initialized = true;
call_some_function_that_may_modify_memory ();
  }
}
and has side effects just first time in a thread.

Jakub


Re: Functions that are CSEable but not pure

2012-10-04 Thread Jason Merrill

On 10/04/2012 01:15 PM, Richard Guenther wrote:

That is, I am confused about the distinction you seem to make between
the static variable 'initialized' and the global 'init_count'.


The distinction is that "initialized" is an implementation detail that 
once set is never cleared; it is the mechanism by which only the first 
call has side effects.


init_count is a normal user variable.

Jason



Re: Functions that are CSEable but not pure

2012-10-04 Thread Richard Guenther
On Thu, Oct 4, 2012 at 7:15 PM, Jakub Jelinek  wrote:
> On Thu, Oct 04, 2012 at 07:08:02PM +0200, Richard Guenther wrote:
>> But isn't it a fact that it _cannot_ modify init_count?  If the second call
>> is CSEable then it cannot have side-effects that are observable at
>> the call site.  Is the following an example you would consider to fall
>> under your CSEing?
>>
>> int init_count;
>> int data;
>> int initialized;
>>
>> void init()
>> {
>>   if (!initialized)
>> {
>>   data = ++init_count;
>>   initialized = 1;
>> }
>> }
>>
>> inline int *get_me() __attribute ((pure));
>> inline int *get_me()
>> {
>>   init ();
>>   return &data;
>> }
>>
>> int sink;
>>
>> int main()
>> {
>>   sink = init_count;
>>   int *p = get_me();
>>   if (init_count != 1)
>> __builtin_abort();
>>   initialized = 0;
>>   int *q = get_me();
>>   if (init_count != 2)
>> __builtin_abort();
>>   return *p + *q;
>> }
>
> The above isn't a singleton function, as you are clearing initialized,
> therefore it doesn't have the properties the thread_local var get_address
> wrapper has.  The singleton function really is
> void singleton (void)
> {
>   static __thread bool initialized;
>   if (!initialized) {
> initialized = true;
> call_some_function_that_may_modify_memory ();
>   }
> }
> and has side effects just first time in a thread.

So I suppose the testcase that would be "valid" but break with using
pure would be instead

int init_count;
int data;

void init()
{
  static int initialized;
  if (!initialized)
{
  data = ++init_count;
  initialized = 1;
}
}

inline int *get_me() __attribute ((pure));
inline int *get_me()
{
  init ();
  return &data;
}

int main()
{
  int x = init_count;
  int *p = get_me();
  if (init_count == x)
__builtin_abort();
  int *q = get_me();
  if (init_count == x)
__builtin_abort();
}

here when get_me is pure we CSE init_count over the _first_ call of get_me.
Technically we only might CSE it over the second call of get_me.  Thus there
is flow-sensitive information coming out of nowhere when we try to skip
the second get_me () call in looking up the init_count load after it (coming
out of nowhere for the alias-oracle query which only includes 'init_count'
and the stmt 'int *q = get_me ()').

Thus my answer is, for the alias-oracle the hypothetical 'singleton_init'
attribute provides no useful information.  Thus CSE and/or DCE of
'singleton_init'
calls has to use special code (which I can think of being added to DCE, harder
to existing SCCVN, maybe easier to DOM) which will usually not fit into
the algorithm (so you might be able to teach FRE elimination of how to
elminiate the 2nd call to get_me () but you will need a 2nd FRE pass to
then figure out that init_count can be CSEd as well - which makes this
all very undesirable).  Inlining will also make this harder (you lose
the attribute and fall into the lucas situation ...).

At some point this "singleton"-ness should be auto-detected (there is
inlined code in SPEC 2k lucas that would benefit from constant propagation,
and SPEC 2k6 libquantum that uses a related concept - change / restore
of a global in a function to the net effect that the function should not be
considered clobbering it).

Richard.

> Jakub


Re: inlined memcpy/memset degradation in gcc 4.6 or later

2012-10-04 Thread Joe Buck

On Tue, Oct 2, 2012 at 4:19 PM, Walter Lee  wrote:
> > On TILE-Gx, I'm observing a degradation in inlined memcpy/memset in
> > gcc 4.6 and later versus gcc 4.4.  Though I find the problem on
> > TILE-Gx, I think this is a problem for any architectures with
> > SLOW_UNALIGNED_ACCESS set to 1.
> >
> > Consider the following program:
> >
> > struct foo {
> >   int x;
> > };
> >
> > void copy(struct foo* f0, struct foo* f1)
> > {
> >   memcpy (f0, f1, sizeof(struct foo));
> > }
> >
> > In gcc 4.4, I get the desired inline memcpy: ...
> > In gcc 4.7, however, I get inlined byte-by-byte copies: ...

On Thu, Oct 04, 2012 at 01:58:54PM +0200, Richard Guenther wrote:
> There is no way to fix it.  memcpy does not require aligned arguments
> and the merely presence of a typed pointer contains zero information
> of alignment for the middle-end.  If you want to excercise C's notion
> of alignemnt requirements then do not use memcpy but
> 
>  *f0 = *f1;
> 
> which works equally well.

Perhaps I'm missing something.  While memcpy is not permitted to assume
alignment of its arguments, copy is.  Otherwise, if I wrote

void copy(struct foo* f0, struct foo* f1)
{
*f0 = *f1;  
}

the compiler would also be forbidden from assuming any alignment.  So,
when compiling "copy", do we lack the information that the memcpy call is
to the standard ISO memcpy function?  If we know that it is the standard
function we should be able to do the copy assuming everything is properly
aligned.


> Btw, the new beavior even fixed bugs.

Could you point to a PR that was fixed by the change?  There must be some
way to distinguish this case from those cases.


Re: Functions that are CSEable but not pure

2012-10-04 Thread Jason Merrill

On 10/04/2012 01:42 PM, Richard Guenther wrote:

So I suppose the testcase that would be "valid" but break with using
pure would be instead

int main()
{
   int x = init_count;
   int *p = get_me();
   if (init_count == x)
 __builtin_abort();
   int *q = get_me();
   if (init_count == x)
 __builtin_abort();
}

here when get_me is pure we CSE init_count over the _first_ call of get_me.


That's OK for C++ thread_local semantics; the initialization is 
specified to happen some time before the first use, so the testcase is 
making an invalid assumption.  This might not be desirable for 
user-written singleton functions, though.


Jason



Re: ARM/getting rid of superfluous zero extension

2012-10-04 Thread Eric Botcazou
> Any suggestion about how I could avoid generating this zero_extension?

Redundant extensions have been a hot topic for some time.  The combiner should 
catch the local easy cases, we have ree.c for the nonlocal easy cases and Tom 
recently posted:
  http://gcc.gnu.org/ml/gcc-patches/2012-07/msg00408.html
which should catch more complex cases.

I guess someone should gather the various missed optimization cases, draw a 
global picture of the situation and see how the various approaches could be 
fitted together.

-- 
Eric Botcazou


Re: reverse bitfield patch

2012-10-04 Thread DJ Delorie

> ChangeLog missing, new functions need a toplevel comment documenting
> function, argument and return value as per coding conventions.

Any review of the patch itself?  I know the overhead is not there...


libbacktrace and darwin

2012-10-04 Thread Jack Howarth
   Is libbacktrace currently functional in gcc trunk and is it expected
to function on darwin? While I could understand it not working on installed
binaries of FSF gcc that were stripped, I would think it should work for 
make check in the build tree since all of the debug code should be present
in the object files. Or doesn't libbacktrace know to look there for the
dwarf code? Thanks in advance for any clarifications.
  Jack


Re: libbacktrace and darwin

2012-10-04 Thread Ian Lance Taylor
On Thu, Oct 4, 2012 at 1:32 PM, Jack Howarth  wrote:
>Is libbacktrace currently functional in gcc trunk and is it expected
> to function on darwin? While I could understand it not working on installed
> binaries of FSF gcc that were stripped, I would think it should work for
> make check in the build tree since all of the debug code should be present
> in the object files. Or doesn't libbacktrace know to look there for the
> dwarf code? Thanks in advance for any clarifications.

libbacktrace is functional in GCC trunk.  However, it does not yet
support the Mach-O object file format.  I hope to work on that at some
point, but it would be great if somebody else tackled it.  It's
probably straightforward to implement based on code in
libiberty/simple-object-mach-o.c.  The libbacktrace code can be
simpler than the libiberty code--compare libiberty/simple-object-elf.c
and libbacktrace/elf.c.

Ian


Re: ARM/getting rid of superfluous zero extension

2012-10-04 Thread David Edelsohn
On Thu, Oct 4, 2012 at 2:18 PM, Eric Botcazou  wrote:
>> Any suggestion about how I could avoid generating this zero_extension?
>
> Redundant extensions have been a hot topic for some time.  The combiner should
> catch the local easy cases, we have ree.c for the nonlocal easy cases and Tom
> recently posted:
>   http://gcc.gnu.org/ml/gcc-patches/2012-07/msg00408.html
> which should catch more complex cases.
>
> I guess someone should gather the various missed optimization cases, draw a
> global picture of the situation and see how the various approaches could be
> fitted together.

Kenny already did that as part of the thread with Tom.  Tom's patch
also is incomplete.

- David


Re: ARM/getting rid of superfluous zero extension

2012-10-04 Thread Kenneth Zadeck

David and i have been talking about this for some time.

what is needed is a real global optimization algorithm.  my leaning is 
to make do it at the rtl level because that is where everything has been 
exposed.   but it would be a lot easier in ssa form.


The first step in my opinion is to ask the backend what kind of 
operations are expensive and what kinds of operations are cheap. for 
instance some risc machines can do comparisons at any mode and some only 
do them on the width of a register. also some machines have different 
costs associated with sign or zero extension.  I really think that you 
not only need to connect up the sources and syncs but you have to 
understand the space of what kind of transformations changes are going 
to be cheapest.
People do not seem to ask this kind of question at the tree level which 
is one of the reasons that i lean to doing it at the rtl level.


but the chain that goes from a load or some other operation that has a 
specific width output to the
sync that has a specific width input can be a long one and only 
something equivalent to ssa or dataflow is going to let you connect the 
true sources with the true syncs.


this is the problem with tom's patch.One of the examples that i have 
seen is a loop that has di mode add of an index variable and (had) an si 
mode comparison to exit the loop.However, later passes removed the 
si mode comparison because on the ppc it used the loop count 
instruction.but the add of the index variable with sign extension 
remained because it was an index variable and so the value fed back into 
itself.


you are never going clean this up unless you do something global so that 
you can break the recurrence and see that there is no sync to that web 
that really needs the value to be si mode.


kenny

On 10/04/2012 06:16 PM, David Edelsohn wrote:

On Thu, Oct 4, 2012 at 2:18 PM, Eric Botcazou  wrote:

Any suggestion about how I could avoid generating this zero_extension?

Redundant extensions have been a hot topic for some time.  The combiner should
catch the local easy cases, we have ree.c for the nonlocal easy cases and Tom
recently posted:
   http://gcc.gnu.org/ml/gcc-patches/2012-07/msg00408.html
which should catch more complex cases.

I guess someone should gather the various missed optimization cases, draw a
global picture of the situation and see how the various approaches could be
fitted together.

Kenny already did that as part of the thread with Tom.  Tom's patch
also is incomplete.

- David




Error reporting functions

2012-10-04 Thread Iyer, Balaji V
Hi Everyone,
This question is mainly for some future submission. Am I allowed to use 
"fatal_error (..)"? Mainly, I want to use it in cases where I want to say "if 
this error has occurred, I see no reason to go forward with compilation."

Thanks,

Balaji V. Iyer.


Re: Error reporting functions

2012-10-04 Thread Ian Lance Taylor
On Thu, Oct 4, 2012 at 5:21 PM, Iyer, Balaji V  wrote:

> This question is mainly for some future submission. Am I allowed to 
> use "fatal_error (..)"? Mainly, I want to use it in cases where I want to say 
> "if this error has occurred, I see no reason to go forward with compilation."

You are allowed to use it in exceptional circumstances.  Not normally
"I see no reason to go forward" but "I can not go forward because some
critical operation has failed."  There is also internal_error for,
well, internal errors; use that for data corruption.  And gcc_assert
and gcc_unreachable for things that can not happen.

Ian


Re: [LRA] liveness update not done after elimination??

2012-10-04 Thread Vladimir Makarov

On 12-10-04 8:53 AM, Steven Bosscher wrote:

Hello Vlad,

Is it intentional that DF_LR_IN and DF_LR_OUT are not updated after
"Updating elimination of equiv for reg..."? I have some checking in
place in process_bb_lives at the end of the function, and it triggers
on the test case. (Checking code and test case is at the end of this
e-mail.) It show:

LIVEIN-mismatch in bb 5:
66, 69,
66, 69, 73
73
t.c: In function 'get_mem_attrs':
t.c:47:1: internal compiler error: in process_bb_lives, at lra-lives.c:866
  }
  ^

And that's correct if you look at the LR_IN set of basic block 5 in
the IRA  (.193r.ira) dump:

Ops.  I'll look at this.  As I remember, I thought about the problem.  
But I did not wanted to update live info because it is expensive and 
constraint equiv substitution by safe cases.  Somehow I added unsafe 
cases (containing pseudos) and even added the code to spill pseudos in 
substitutions when they start to conflict with others because of 
widening live ranges but forgot about live info. I'll try to fix it as 
soon as possible.