Re: RFA: fix dbr_schedule to leave unique ids unique

2012-10-20 Thread Richard Sandiford
Joern Rennecke  writes:
> Quoting Richard Sandiford :
>> Joern Rennecke  writes:
>>> Quoting Richard Sandiford :
 The fact that we even have shared unique ids is pretty bad -- and surely
 a contradiction in terms -- but I think both ways of handling them rely
 on the length being the same for all copies.  If we don't record a length
 (your version), we won't set something_changed if the length of this copy
 did in fact change, which could lead to branches remaining too short.
>>>
>>> That is not the case, as the current length of the inner insn is added to
>>> new_length (of the outer insn).
>>>
 If we do record a length (current version),
>>>
>>> In the current version, the length of the inner insn is calculated anew
>>> each iteration, so only the out-of-sequence copy suffers from the bogosity.
>>>
 we could end up changing
 the length of previous copies that have already been processed in
 this iteration.
>>>
>>> Both that, and also using the length or the previous insn for the inner
>>> length calculation, and ratcheting them up together.
>>> In fact, this can make delay slot instructions ineligible for the delay slot
>>> they are in.  Also, the unwanted length cross-interference can violate
>>> alignment invariants, and this leads to invalid code on ARCompact.
>>
>> But if you're saying that ARCompat wants different copies of the same insn
>> (with the same uid) to have different lengths, then please fix dbr_schedule
>> so that it uses different uids.  The "i == 0" thing is just too hackish IMO.
>
> Implemented with the attached patch.
> I've been wondering if I should use copy_insn, but that doesn't seem to make
> any real difference after reload.
> Also, copying only PATTERN, INSN_LOCATION, and REG_NOTES into the new
> insn obtained from make_insn_raw had seemed a possibility, but there is no
> firm assurance that we will only see insn with the code INSN.

Yeah, that's unfortunate.

> In the end, continuing to use copy_rtx to make the copy seems least likely to
> break something.  And now that the copying is in one place, it's easier to
> change the way it is done if you want to try that.

Agreed.

> We waste a bit of memory (temporarily, as it is garbage collected) by using
> make_insn_raw just to increment cur_insn_uid.  Alternatives would be to
> move the function inside emit-rtl.c, or have emit-rtl.h provide a means to
> increment cur_insn_uid.  Well, or we could directly access  
> crtl->emit.x_cur_insn_uid, but that would certainly be hackish.

I agree using crtl->emit.x_cur_insn_uid from reorg.c would be hackish.
I think it would be better to put the new function in emit-rtl.c
and just have:

/* Return a copy of INSN that can be used in a SEQUENCE delay slot,
   on that assumption that INSN itself remains in its original place.  */

rtx
copy_delay_slot_insn (rtx insn)
{
  /* Copy INSN with its rtx_code, all its notes, location etc.  */
  insn = copy_rtx (insn);
  INSN_UID (insn) = cur_insn_uid++;
  return insn;
}

OK with that change, thanks.

Richard


Re: Ping: RFA: add lock_length attribute to break branch-shortening cycles

2012-10-20 Thread Richard Sandiford
Joern Rennecke  writes:
> @@ -1165,6 +1175,7 @@ shorten_branches (rtx first ATTRIBUTE_UN
>   get the current insn length.  If it has changed, reflect the change.
>   When nothing changes for a full pass, we are done.  */
>  
> +  bool first_pass ATTRIBUTE_UNUSED = true;
>while (something_changed)
>  {
>something_changed = 0;
> @@ -1220,6 +1231,7 @@ shorten_branches (rtx first ATTRIBUTE_UN
> rtx prev;
> int rel_align = 0;
> addr_diff_vec_flags flags;
> +   enum machine_mode vec_mode;
>  
> /* Avoid automatic aggregate initialization.  */
> flags = ADDR_DIFF_VEC_FLAGS (body);
> @@ -1298,9 +1310,12 @@ shorten_branches (rtx first ATTRIBUTE_UN
> else
>   max_addr += align_fuzz (max_lab, rel_lab, 0, 0);
>   }
> -   PUT_MODE (body, CASE_VECTOR_SHORTEN_MODE (min_addr - rel_addr,
> - max_addr - rel_addr,
> - body));
> +   vec_mode = CASE_VECTOR_SHORTEN_MODE (min_addr - rel_addr,
> +max_addr - rel_addr, body);
> +   if (first_pass
> +   || (GET_MODE_SIZE (vec_mode)
> +   >= GET_MODE_SIZE (GET_MODE (body
> + PUT_MODE (body, vec_mode);
> if (JUMP_TABLES_IN_TEXT_SECTION
> || readonly_data_section == text_section)
>   {

I think instead the set-up loop should have:

  if (GET_CODE (body) == ADDR_VEC || GET_CODE (body) == ADDR_DIFF_VEC)
{
#ifdef CASE_VECTOR_SHORTEN_MODE
  if (increasing && GET_CODE (body) == ADDR_DIFF_VEC)
PUT_MODE (body, CASE_VECTOR_SHORTEN_MODE (0, 0, body));
#endif
  /* This only takes room if read-only data goes into the text
 section.  */
  if (JUMP_TABLES_IN_TEXT_SECTION
  || readonly_data_section == text_section)
insn_lengths[uid] = (XVECLEN (body,
  GET_CODE (body) == ADDR_DIFF_VEC)
 * GET_MODE_SIZE (GET_MODE (body)));
  /* Alignment is handled by ADDR_VEC_ALIGN.  */
}

(with just the CASE_VECTOR_SHORTEN_MODE part being new).
We then start with the most optimistic length possible,
as with everything else.

The main shortening if statement should then be conditional on:

#ifdef CASE_VECTOR_SHORTEN_MODE
  if (increasing
  && JUMP_P (insn)
  && GET_CODE (PATTERN (insn)) == ADDR_DIFF_VEC)
...

(testing "increasing" rather than "optimize").  The code for changing
the mode should simply be:

  if (GET_MODE_SIZE (vec_mode)
  >= GET_MODE_SIZE (GET_MODE (body
PUT_MODE (body, vec_mode);

with first_pass no longer being necessary.

OK with that change, if you agree.

Richard


Fix C FE __builtin_unreachable definition

2012-10-20 Thread Jan Hubicka
Hi,
this patch fixes BUILT_IN_UNREACHABLE declaration of C frontned.  The function
is also pure (so DSE can do its job).  As a special case ECF flags for
CONST & NORETURN also add looping, so this declaration is correct.

The implicit declaration of the builtin is already set this way.

Bootstrapped/regtested x86_64-linux, comitted as obvious.

Honza

* builtins.def (BUILT_IN_UNREACHABLE): Make 
ATTR_CONST_NORETURN_NOTHROW_LEAF_LIST.
* builtin-attrs.def (ATTR_CONST_NORETURN_NOTHROW_LEAF_LIST): Define.
Index: builtins.def
===
--- builtins.def(revision 192537)
+++ builtins.def(working copy)
@@ -728,7 +728,7 @@ DEF_GCC_BUILTIN(BUILT_IN_SETJMP,
 DEF_EXT_LIB_BUILTIN(BUILT_IN_STRFMON, "strfmon", 
BT_FN_SSIZE_STRING_SIZE_CONST_STRING_VAR, ATTR_FORMAT_STRFMON_NOTHROW_3_4)
 DEF_LIB_BUILTIN(BUILT_IN_STRFTIME, "strftime", 
BT_FN_SIZE_STRING_SIZE_CONST_STRING_CONST_PTR, ATTR_FORMAT_STRFTIME_NOTHROW_3_0)
 DEF_GCC_BUILTIN(BUILT_IN_TRAP, "trap", BT_FN_VOID, 
ATTR_NORETURN_NOTHROW_LEAF_LIST)
-DEF_GCC_BUILTIN(BUILT_IN_UNREACHABLE, "unreachable", BT_FN_VOID, 
ATTR_NORETURN_NOTHROW_LEAF_LIST)
+DEF_GCC_BUILTIN(BUILT_IN_UNREACHABLE, "unreachable", BT_FN_VOID, 
ATTR_CONST_NORETURN_NOTHROW_LEAF_LIST)
 DEF_GCC_BUILTIN(BUILT_IN_UNWIND_INIT, "unwind_init", BT_FN_VOID, 
ATTR_NULL)
 DEF_GCC_BUILTIN(BUILT_IN_UPDATE_SETJMP_BUF, "update_setjmp_buf", 
BT_FN_VOID_PTR_INT, ATTR_NULL)
 DEF_GCC_BUILTIN(BUILT_IN_VA_COPY, "va_copy", 
BT_FN_VOID_VALIST_REF_VALIST_ARG, ATTR_NOTHROW_LEAF_LIST)
Index: builtin-attrs.def
===
--- builtin-attrs.def   (revision 192537)
+++ builtin-attrs.def   (working copy)
@@ -131,6 +131,8 @@ DEF_ATTR_TREE_LIST (ATTR_NORETURN_NOTHRO
ATTR_NULL, ATTR_NOTHROW_LIST)
 DEF_ATTR_TREE_LIST (ATTR_NORETURN_NOTHROW_LEAF_LIST, ATTR_NORETURN,\
ATTR_NULL, ATTR_NOTHROW_LEAF_LIST)
+DEF_ATTR_TREE_LIST (ATTR_CONST_NORETURN_NOTHROW_LEAF_LIST, ATTR_CONST,\
+   ATTR_NULL, ATTR_NORETURN_NOTHROW_LEAF_LIST)
 DEF_ATTR_TREE_LIST (ATTR_MALLOC_NOTHROW_LIST, ATTR_MALLOC, \
ATTR_NULL, ATTR_NOTHROW_LIST)
 DEF_ATTR_TREE_LIST (ATTR_MALLOC_NOTHROW_LEAF_LIST, ATTR_MALLOC,\


Re: Ping: RFA: add lock_length attribute to break branch-shortening cycles

2012-10-20 Thread Joern Rennecke

Quoting Richard Sandiford :


I think instead the set-up loop should have:

  if (GET_CODE (body) == ADDR_VEC || GET_CODE (body) == ADDR_DIFF_VEC)
{
#ifdef CASE_VECTOR_SHORTEN_MODE
  if (increasing && GET_CODE (body) == ADDR_DIFF_VEC)
PUT_MODE (body, CASE_VECTOR_SHORTEN_MODE (0, 0, body));
#endif
  /* This only takes room if read-only data goes into the text
 section.  */
  if (JUMP_TABLES_IN_TEXT_SECTION
  || readonly_data_section == text_section)
insn_lengths[uid] = (XVECLEN (body,
  GET_CODE (body) == ADDR_DIFF_VEC)
 * GET_MODE_SIZE (GET_MODE (body)));
  /* Alignment is handled by ADDR_VEC_ALIGN.  */
}

(with just the CASE_VECTOR_SHORTEN_MODE part being new).
We then start with the most optimistic length possible,
as with everything else.


Well, ports could always tailor the initial mode with CASE_VECTOR_MODE,
but it is indeed simpler when the branch shortening pass provide a
sensible initialization.

How about putting this at the end of this block:
#ifdef CASE_VECTOR_SHORTEN_MODE
  if (optimize)
{
  /* Look for ADDR_DIFF_VECs, and initialize their minimum and maximum
 label fields.  */

Of course increasing would have to be set earlier.

With regards to in what way it would make sense to change increasing to
something other than optimize, I think we could have a flag to control
this, which is turned on by default at -O1; it could then be turned off
if people want only some other (quicker?) optimizations, or if they want
to work around a machine-specific bug triggered by a single source file.
I.e. optimize might be set when increasing is not.  Vice versa it makes
little sense - iterating branch shortening is certainly an optimization.
So is using CASE_VECTOR_SHORTEN_MODE, but it does not require iterating,
as we've already calculated the required addresses in the preliminary pass
before the main loop.
So it makes sense to keep the condition for CASE_VECTOR_SHORTEN_MODE
as 'optimize' (or use a different flag, e.g. case_vector_shorten_p),
and have this at the end of this initial CASE_VECTOR_SHORTEN_MODE block:

  flags.min_after_base = min > rel;
  flags.max_after_base = max > rel;
  ADDR_DIFF_VEC_FLAGS (pat) = flags;

+ if (increasing)
+   PUT_MODE (body, CASE_VECTOR_SHORTEN_MODE (0, 0, body));
}
}
#endif /* CASE_VECTOR_SHORTEN_MODE */

  continue;
}
#endif /* CASE_VECTOR_SHORTEN_MODE */




The main shortening if statement should then be conditional on:

#ifdef CASE_VECTOR_SHORTEN_MODE
  if (increasing
  && JUMP_P (insn)
  && GET_CODE (PATTERN (insn)) == ADDR_DIFF_VEC)


As explained above, CASE_VECTOR_SHORTEN_MODE can make in the
decreasing/non-iterating branch shortening mode.

so I think the code for changing the mode should be
   if (!increasing
  || (GET_MODE_SIZE (vec_mode)
  >= GET_MODE_SIZE (GET_MODE (body
PUT_MODE (body, vec_mode);


[PATCH, ARM] Subregs of VFP registers in big-endian mode

2012-10-20 Thread Julian Brown
Hi,

Quite a few tests fail for big-endian multilibs which use VFP
instructions at present. One reason for many of these is glaringly
obvious once you notice it: for D registers interpreted as two S
registers, the lower-numbered register is always the less-significant
part of the value, and the higher-numbered register the
more-significant -- regardless of the endianness the processor is
running in.

However, for big-endian mode, when DFmode values are represented in
memory (or indeed core registers), the opposite is true. So, a subreg
expression such as the following will work fine on core registers (or
e.g. pseudos assigned to stack slots):

(subreg:SI (reg:DF) 0)

but, when applied to a VFP register Dn, it should be resolved to the
hard register S(n*2+1). At present though, it resolves to S(n*2) -- i.e.
the wrong half of the value (for WORDS_BIG_ENDIAN, such a subreg should
be the most-significant part of the value). For the relatively few cases
where DFmode values are interpreted as a pair of (integer) words, this
means that wrong code is generated.

My feeling is that implementing a "proper" solution to this problem is
probably impractical -- the closest existing macros to control
behaviour aren't sufficient for this case:

* FLOAT_WORDS_BIG_ENDIAN only refers to memory layout, which is correct
  as is it.

* REG_WORDS_BIG_ENDIAN controls whether values are stored in big-endian
  order in registers, but refers to *all* registers. We only want to
  change the behaviour for the VFP registers. Defining a new macro
  FLOAT_REG_WORDS_BIG_ENDIAN wouldn't do, because the behaviour would
  differ depending on the hard register under observation: that seems
  like too much to ask of generic machinery in the middle-end.

So, the attached patch just avoids the problem, by pretending that
greater-than-word-size values in VFP registers, in big-endian mode, are
opaque and cannot be subreg'ed. In practice, for at least the test case
I looked at, this isn't as much of a pessimisation as you might expect
-- the value in question might already be stored in core registers
(e.g. for function arguments with -mfloat-abi=softfp), so can be
retrieved directly from those rather than via memory.

This is the testsuite delta for current FSF mainline, with multilibs
adjusted to build for little/big-endian, and using options
"-mbig-endian -mfloat-abi=softfp -mfpu=vfpv3" for testing:

FAIL -> PASS: be-code-on-qemu/g++.sum:g++.dg/torture/type-generic-1.C  -O1  
execution test
FAIL -> PASS: be-code-on-qemu/g++.sum:g++.dg/torture/type-generic-1.C  -O2  
execution test
FAIL -> PASS: be-code-on-qemu/g++.sum:g++.dg/torture/type-generic-1.C  -O2 
-flto -fno-use-linker-plugin -flto-partition=none  execution test
FAIL -> PASS: be-code-on-qemu/g++.sum:g++.dg/torture/type-generic-1.C  -O2 
-flto -fuse-linker-plugin -fno-fat-lto-objects  execution test
FAIL -> PASS: be-code-on-qemu/g++.sum:g++.dg/torture/type-generic-1.C  -O3 
-fomit-frame-pointer  execution test
FAIL -> PASS: be-code-on-qemu/g++.sum:g++.dg/torture/type-generic-1.C  -O3 -g  
execution test
FAIL -> PASS: be-code-on-qemu/g++.sum:g++.dg/torture/type-generic-1.C  -Os  
execution test
FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/ieee/copysign1.c 
execution,  -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects 
FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/ieee/mzero6.c 
execution,  -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects 
FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr35456.c 
execution,  -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects 
FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr44683.c 
execution,  -O1 
FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr44683.c 
execution,  -O2 
FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr44683.c 
execution,  -O2 -flto -fno-use-linker-plugin -flto-partition=none 
FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr44683.c 
execution,  -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects 
FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr44683.c 
execution,  -O3 -fomit-frame-pointer 
FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr44683.c 
execution,  -O3 -g 
FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr44683.c 
execution,  -Og -g 
FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr44683.c 
execution,  -Os 
FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.dg/compat/scalar-by-value-3 
c_compat_x_tst.o-c_compat_y_tst.o execute 
FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.dg/torture/type-generic-1.c  -O1  
execution test
FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.dg/torture/type-generic-1.c  -O2  
execution test
FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.dg/torture/type-generic-1.c  -O2 
-flto -fno-use-linker-plugin -flto-partition=none  execution test
FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.dg/torture/type-generic-1.c  -O2 
-flto -fuse-linker-plugin -fno-fat-lto-objects  execution test
FAIL -> 

Re: Ping: RFA: add lock_length attribute to break branch-shortening cycles

2012-10-20 Thread Richard Sandiford
Joern Rennecke  writes:
> Quoting Richard Sandiford :
>> I think instead the set-up loop should have:
>>
>>   if (GET_CODE (body) == ADDR_VEC || GET_CODE (body) == ADDR_DIFF_VEC)
>>  {
>> #ifdef CASE_VECTOR_SHORTEN_MODE
>>if (increasing && GET_CODE (body) == ADDR_DIFF_VEC)
>>  PUT_MODE (body, CASE_VECTOR_SHORTEN_MODE (0, 0, body));
>> #endif
>>/* This only takes room if read-only data goes into the text
>>   section.  */
>>if (JUMP_TABLES_IN_TEXT_SECTION
>>|| readonly_data_section == text_section)
>>  insn_lengths[uid] = (XVECLEN (body,
>>GET_CODE (body) == ADDR_DIFF_VEC)
>>   * GET_MODE_SIZE (GET_MODE (body)));
>>/* Alignment is handled by ADDR_VEC_ALIGN.  */
>>  }
>>
>> (with just the CASE_VECTOR_SHORTEN_MODE part being new).
>> We then start with the most optimistic length possible,
>> as with everything else.
>
> Well, ports could always tailor the initial mode with CASE_VECTOR_MODE,
> but it is indeed simpler when the branch shortening pass provide a
> sensible initialization.

That, plus I think CASE_VECTOR_MODE should always be the conservatively
correct mode.

> How about putting this at the end of this block:
> #ifdef CASE_VECTOR_SHORTEN_MODE
>if (optimize)
>  {
>/* Look for ADDR_DIFF_VECs, and initialize their minimum and maximum
>   label fields.  */

OK, that does sound better.

> With regards to in what way it would make sense to change increasing to
> something other than optimize, I think we could have a flag to control
> this, which is turned on by default at -O1; it could then be turned off
> if people want only some other (quicker?) optimizations, or if they want
> to work around a machine-specific bug triggered by a single source file.
> I.e. optimize might be set when increasing is not.  Vice versa it makes
> little sense - iterating branch shortening is certainly an optimization.
> So is using CASE_VECTOR_SHORTEN_MODE, but it does not require iterating,
> as we've already calculated the required addresses in the preliminary pass
> before the main loop.
> So it makes sense to keep the condition for CASE_VECTOR_SHORTEN_MODE
> as 'optimize' (or use a different flag, e.g. case_vector_shorten_p),
> and have this at the end of this initial CASE_VECTOR_SHORTEN_MODE block:
>
>flags.min_after_base = min > rel;
>flags.max_after_base = max > rel;
>ADDR_DIFF_VEC_FLAGS (pat) = flags;
>
> + if (increasing)
> +   PUT_MODE (body, CASE_VECTOR_SHORTEN_MODE (0, 0, body));
>  }
>  }
> #endif /* CASE_VECTOR_SHORTEN_MODE */
>
>continue;
>  }
> #endif /* CASE_VECTOR_SHORTEN_MODE */
>
>
>
>> The main shortening if statement should then be conditional on:
>>
>> #ifdef CASE_VECTOR_SHORTEN_MODE
>>if (increasing
>>&& JUMP_P (insn)
>>&& GET_CODE (PATTERN (insn)) == ADDR_DIFF_VEC)
>
> As explained above, CASE_VECTOR_SHORTEN_MODE can make in the
> decreasing/non-iterating branch shortening mode.
>
> so I think the code for changing the mode should be
> if (!increasing
>|| (GET_MODE_SIZE (vec_mode)
>>= GET_MODE_SIZE (GET_MODE (body
>  PUT_MODE (body, vec_mode);

OK, sounds good.

Richard


Re: Tidy store_bit_field_1 & co.

2012-10-20 Thread Eric Botcazou
>   * expmed.c (lowpart_bit_field_p): New function.
>   (store_bit_field_1): Remove unit, offset, bitpos and byte_offset
>   from the outermost scope.  Express conditions in terms of bitnum
>   rather than offset, bitpos and byte_offset.  Split the plain move
>   cases into two, one for memory accesses and one for register accesses.
>   Allow simplify_gen_subreg to fail rather than calling validate_subreg.
>   Move the handling of multiword OP0s after the code that coerces VALUE
>   to an integer mode.  Use simplify_gen_subreg for this case and assert
>   that it succeeds.  If the field still spans several words, pass it
>   directly to store_split_bit_field.  Assume after that point that
>   both sources and register targets fit within a word.  Replace
>   x-prefixed variables with non-prefixed forms.  Compute the bitpos
>   for insv register operands directly in the chosen unit size, rather
>   than going through an intermediate BITS_PER_WORD unit size.
>   Update the call to store_fixed_bit_field.
>   (store_fixed_bit_field): Replace the bitpos and offset parameters
>   with a single bitnum parameter, of the same form as store_bit_field.
>   Assume that OP0 contains the full field.  Simplify the memory offset
>   calculation.  Assert that the processed OP0 has an integral mode.
>   (store_split_bit_field): Update the call to store_fixed_bit_field.

This looks good to me, modulo:

>/* If the target is a register, overwriting the entire object, or storing
> - a full-word or multi-word field can be done with just a SUBREG. +
> a full-word or multi-word field can be done with just a SUBREG.  */ +  if
> (!MEM_P (op0)
> +  && bitnum % BITS_PER_WORD == 0
> +  && bitsize == GET_MODE_BITSIZE (fieldmode)
> +  && (bitsize == GET_MODE_BITSIZE (GET_MODE (op0))
> +   || bitsize % BITS_PER_WORD == 0))
> +{
> +  /* Use the subreg machinery either to narrow OP0 to the required
> +  words or to cope with mode punning between equal-sized modes.  */
> +  rtx sub = simplify_gen_subreg (fieldmode, op0, GET_MODE (op0),
> +  bitnum / BITS_PER_UNIT);
> +  if (sub)
> + {
> +   emit_move_insn (sub, value);
> +   return true;
> + }
> +}

Are you sure that you don't need to keep the bitnum == 0 condition in the 
first arm that was present in the previous patch?  And, on second thoughts, 
the first formulation was more in keeping with the comment just above (sorry 
about that).  So I'd reinstate it and swap the arms:

  /* If the target is a register, overwriting the entire object, or storing
 a full-word or multi-word field can be done with just a SUBREG.  */
  if (!MEM_P (op0)
  && bitsize == GET_MODE_BITSIZE (fieldmode)
  && ((bitsize == GET_MODE_BITSIZE (GET_MODE (op0)) && bitnum == 0)
  || (bitsize % BITS_PER_WORD == 0 && bitnum % BITS_PER_WORD == 0)))
{
  /* Use the subreg machinery either to narrow OP0 to the required
words or to cope with mode punning between equal-sized modes.  */
  rtx sub = simplify_gen_subreg (fieldmode, op0, GET_MODE (op0),
bitnum / BITS_PER_UNIT);
  if (sub)
   {
 emit_move_insn (sub, value);
 return true;
   }
}

In any case, no need to retest, I'd apply it and wait for the apocalypse. :-)

-- 
Eric Botcazou


loop-unroll.c TLC 1/4

2012-10-20 Thread Jan Hubicka
Hi,
the TLC path I sent last week became outdated for few reaons.  I decided to 
split it up for easier reviewing.
This is simple correcntess issue I am comitting as obvoius - my last update to 
loop-iv missed the fact
that loop-iv bounds may depend on further conditions. In that case we can not 
record them as hard upper
bound.
I also double checked that we actually use the bounds only when they are 
unconditional, so I hopefully
did not introduced any missed optimization.

Bootstrapped/regtested x86_64-linux, comitted as obvoius.

* loop-iv.c (iv_number_of_iterations): Record the upper bound
only if there are no further conditions on it.
Index: loop-iv.c
===
--- loop-iv.c   (revision 192632)
+++ loop-iv.c   (working copy)
@@ -2593,8 +2593,10 @@ iv_number_of_iterations (struct loop *lo
 ? iv0.base
 : mode_mmin);
  max = (up - down) / inc + 1;
- record_niter_bound (loop, double_int::from_uhwi (max),
- false, true);
+ if (!desc->infinite
+ && !desc->assumptions)
+   record_niter_bound (loop, double_int::from_uhwi (max),
+   false, true);
 
  if (iv0.step == const0_rtx)
{
@@ -2806,15 +2808,19 @@ iv_number_of_iterations (struct loop *lo
 
   desc->const_iter = true;
   desc->niter = val & GET_MODE_MASK (desc->mode);
-  record_niter_bound (loop, double_int::from_uhwi (desc->niter),
- false, true);
+  if (!desc->infinite
+ && !desc->assumptions)
+record_niter_bound (loop, double_int::from_uhwi (desc->niter),
+   false, true);
 }
   else
 {
   max = determine_max_iter (loop, desc, old_niter);
   gcc_assert (max);
-  record_niter_bound (loop, double_int::from_uhwi (max),
- false, true);
+  if (!desc->infinite
+ && !desc->assumptions)
+   record_niter_bound (loop, double_int::from_uhwi (max),
+   false, true);
 
   /* simplify_using_initial_values does a copy propagation on the registers
 in the expression for the number of iterations.  This prolongs life


Re: Fix array bound niter estimate (PR middle-end/54937)

2012-10-20 Thread Jan Hubicka
> > > > What about the conservative variant of simply
> > > > 
> > > >   else
> > > > delta = double_int_one;
> > > 
> > > I think it would be bad idea: it makes us to completely unroll one 
> > > interation
> > > too many that bloats code for no benefit. No optimization cancels the 
> > > path in
> > > CFG because of undefined effect and thus the code will be output (unless 
> > > someone
> > > smarter, like VRP, cleans up later, but it is more an exception than 
> > > rule.)
> > 
> > OK, on deper tought I guess I can add double_int_one always at that spot and
> > once we are done with everything I can walk nb_iter_bound for all statements
> > known to not be executed on last iteration and record them to pointer set.
> > 
> > Finally I can walk from header in DFS stopping on loop exits, side effects 
> > and
> > those stateemnts.  If I visit no loop exit or side effect I know I can lower
> > iteration count by 1 (in estimate_numbers_of_iterations_loop).
> > 
> > This will give accurate answer and requires just little extra bookkeeping.
> > 
> > I will give this a try.
> 
> Here is updated patch.  It solves the testcase and gives better estimates 
> than before.
> 
> Here is obvious improvements: record_estimate can put all statements to the 
> list not only
> those that dominates loop latch and maybe_lower_iteration_bound can track 
> lowest estimate
> it finds on its walk.  This will need bit more work and I am thus sending the 
> bugfix
> separately, because I think it should go to 4.7, too.

This patch fixes a trivial bug that solves one regression in fortran testsuite. 
 There are
two left:
gfortran.dg/bounds_check_9.f90 and gfortran.dg/bounds_check_fail_2.f90

I am quite convinced the code makes correct decision here and I put request to 
the original
PR http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31119

We was overly conservative here just because we did not handle multipl exit
loops, so the testcase passed by accident rather than by decision.  Perhaps
this is actually bug in -fbounds-check implementation?

Honza
> 
> Honza
> 
>   * tree-ssa-loop-niter.c (record_estimate): Remove confused
>   dominators check.
>   (maybe_lower_iteration_bound): New function.
>   (estimate_numbers_of_iterations_loop): Use it.

Index: tree-ssa-loop-niter.c
===
--- tree-ssa-loop-niter.c   (revision 192632)
+++ tree-ssa-loop-niter.c   (working copy)
@@ -2535,7 +2541,6 @@ record_estimate (struct loop *loop, tree
 gimple at_stmt, bool is_exit, bool realistic, bool upper)
 {
   double_int delta;
-  edge exit;
 
   if (dump_file && (dump_flags & TDF_DETAILS))
 {
@@ -2570,14 +2577,10 @@ record_estimate (struct loop *loop, tree
 }
 
   /* Update the number of iteration estimates according to the bound.
- If at_stmt is an exit or dominates the single exit from the loop,
- then the loop latch is executed at most BOUND times, otherwise
- it can be executed BOUND + 1 times.  */
-  exit = single_exit (loop);
-  if (is_exit
-  || (exit != NULL
- && dominated_by_p (CDI_DOMINATORS,
-exit->src, gimple_bb (at_stmt
+ If at_stmt is an exit then the loop latch is executed at most BOUND times,
+ otherwise it can be executed BOUND + 1 times.  We will lower the estimate
+ later if such statement must be executed on last iteration  */
+  if (is_exit)
 delta = double_int_zero;
   else
 delta = double_int_one;
@@ -2953,6 +2956,88 @@ gcov_type_to_double_int (gcov_type val)
   return ret;
 }
 
+/* See if every path cross the loop goes through a statement that is known
+   to not execute at the last iteration. In that case we can decrese iteration
+   count by 1.  */
+
+static void
+maybe_lower_iteration_bound (struct loop *loop)
+{
+  pointer_set_t *not_executed_last_iteration = pointer_set_create ();
+  pointer_set_t *visited;
+  struct nb_iter_bound *elt;
+  bool found = false;
+  VEC (basic_block, heap) *queue = NULL;
+
+  for (elt = loop->bounds; elt; elt = elt->next)
+{
+  if (!elt->is_exit
+ && elt->bound.ult (loop->nb_iterations_upper_bound))
+   {
+ found = true;
+ pointer_set_insert (not_executed_last_iteration, elt->stmt);
+   }
+}
+  if (!found)
+{
+  pointer_set_destroy (not_executed_last_iteration);
+  return;
+}
+  visited = pointer_set_create ();
+  VEC_safe_push (basic_block, heap, queue, loop->header);
+  pointer_set_insert (visited, loop->header);
+  found = false;
+
+  while (VEC_length (basic_block, queue) && !found)
+{
+  basic_block bb = VEC_pop (basic_block, queue);
+  gimple_stmt_iterator gsi;
+  bool stmt_found = false;
+
+  for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
+   {
+ gimple stmt = gsi_stmt (gsi);
+ if (pointer_set_contains (not_executed_last_iteration, stmt))
+   {
+ stmt_found 

[lra] patch to fix testsuite regressions

2012-10-20 Thread Vladimir Makarov
  After recent patches there were too many regressions of LRA on GCC 
testsuite on x86 and x86-64.


  The following patch fixes all of them.

  It was successfully bootstrapped on x86/x86-64.

  Committed as rev. 192637.

2012-10-20  Vladimir Makarov  

* lra.c (check_rtx): Don't check UNSPEC address.  Fix typo with
comparing RTX_AUTOINC.
* lra-constraints.c (extract_local_address): Swap operands if
necessary.  Assign to disp before extract_local_address call.  Fix
typo with PLUS comparison.  Use CONSTANT_P.
(simplify_operand_subreg): Put constant into memory for subreg
with mixed modes.
(process_alt_operands): Uncomment code for checking DEAD not for
early clobber.
* config/i386/i386.c (ix86_spill_class): Don't spill to SSE
regs when TARGET_MMX.

Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 192634)
+++ config/i386/i386.c  (working copy)
@@ -40819,7 +40819,7 @@ ix86_autovectorize_vector_sizes (void)
 static reg_class_t
 ix86_spill_class (reg_class_t rclass, enum machine_mode mode)
 {
-  if (TARGET_SSE && TARGET_GENERAL_REGS_SSE_SPILL
+  if (TARGET_SSE && TARGET_GENERAL_REGS_SSE_SPILL && ! TARGET_MMX
   && hard_reg_set_subset_p (reg_class_contents[rclass],
reg_class_contents[GENERAL_REGS])
   && (mode == SImode || (TARGET_64BIT && mode == DImode)))
Index: lra-constraints.c
===
--- lra-constraints.c   (revision 192634)
+++ lra-constraints.c   (working copy)
@@ -524,6 +524,7 @@ extract_loc_address_regs (bool top_p, en
   {
rtx *arg0_loc = &XEXP (x, 0);
rtx *arg1_loc = &XEXP (x, 1);
+   rtx *tloc;
rtx arg0 = *arg0_loc;
rtx arg1 = *arg1_loc;
enum rtx_code code0 = GET_CODE (arg0);
@@ -543,23 +544,34 @@ extract_loc_address_regs (bool top_p, en
code1 = GET_CODE (arg1);
  }
 
+   if (CONSTANT_P (arg0)
+   || code1 == PLUS || code1 == MULT || code1 == ASHIFT)
+ {
+   tloc = arg1_loc;
+   arg1_loc = arg0_loc;
+   arg0_loc = tloc;
+   arg0 = *arg0_loc;
+   code0 = GET_CODE (arg0);
+   arg1 = *arg1_loc;
+   code1 = GET_CODE (arg1);
+ }
/* If this machine only allows one register per address, it
   must be in the first operand.  */
if (MAX_REGS_PER_ADDRESS == 1 || code == LO_SUM)
  {
-   extract_loc_address_regs (false, mode, as, arg0_loc, false, code,
- code1, modify_p, ad);
lra_assert (ad->disp_loc == NULL);
ad->disp_loc = arg1_loc;
+   extract_loc_address_regs (false, mode, as, arg0_loc, false, code,
+ code1, modify_p, ad);
  }
/* Base + disp addressing  */
-   else if (code != PLUS && code0 != MULT && code0 != ASHIFT
+   else if (code0 != PLUS && code0 != MULT && code0 != ASHIFT
 && CONSTANT_P (arg1))
  {
-   extract_loc_address_regs (false, mode, as, arg0_loc, false, PLUS,
- code1, modify_p, ad);
lra_assert (ad->disp_loc == NULL);
ad->disp_loc = arg1_loc;
+   extract_loc_address_regs (false, mode, as, arg0_loc, false, PLUS,
+ code1, modify_p, ad);
  }
/* If index and base registers are the same on this machine,
   just record registers in any non-constant operands.  We
@@ -575,14 +587,13 @@ extract_loc_address_regs (bool top_p, en
extract_loc_address_regs (false, mode, as, arg1_loc, true, PLUS,
  code0, modify_p, ad);
  }
-   /* It might be index * scale + disp. */
-   else if (code1 == CONST_INT || code1 == CONST_DOUBLE
-|| code1 == SYMBOL_REF || code1 == CONST || code1 == LABEL_REF)
+   /* It might be [base + ]index * scale + disp. */
+   else if (CONSTANT_P (arg1))
  {
lra_assert (ad->disp_loc == NULL);
ad->disp_loc = arg1_loc;
extract_loc_address_regs (false, mode, as, arg0_loc, context_p,
- PLUS, code1, modify_p, ad);
+ PLUS, code0, modify_p, ad);
  }
/* If both operands are registers but one is already a hard
   register of index or reg-base class, give the other the
@@ -624,11 +635,18 @@ extract_loc_address_regs (bool top_p, en
 
 case MULT:
 case ASHIFT:
-  extract_loc_address_regs (false, mode, as, &XEXP (*loc, 0), true,
-   outer_code, code, modify_p, ad);
-  lra_assert (ad->index_loc == NULL);
-  ad->index_loc = loc;
-  break;
+  {
+   rtx *arg0_loc = &XEXP (x, 0);
+   enum rtx_code code0 = GET_CODE (*arg0_loc);
+  

loop-unroll.c TLC 2/4

2012-10-20 Thread Jan Hubicka
Hi,
this patch fixes heuristic on decide_unroll_constant_iterations to take into
account the profile: even when the loop is known to have constant loop
iteration bound, it doesn't need to really iterate many times.  So use profile
and loop_max to double check that this is the case.

Bootstrapped/regtested x86_64-linux, comitted.

Honza

* loop-unroll.c (decide_unroll_constant_iterations): Don't
perform unrolling for loops with low iterations bounds or estimates.
Index: loop-unroll.c
===
--- loop-unroll.c   (revision 192632)
+++ loop-unroll.c   (working copy)
@@ -519,6 +519,7 @@ decide_unroll_constant_iterations (struc
 {
   unsigned nunroll, nunroll_by_av, best_copies, best_unroll = 0, n_copies, i;
   struct niter_desc *desc;
+  double_int iterations;
 
   if (!(flags & UAP_UNROLL))
 {
@@ -561,8 +562,14 @@ decide_unroll_constant_iterations (struc
   return;
 }
 
-  /* Check whether the loop rolls enough to consider.  */
-  if (desc->niter < 2 * nunroll)
+  /* Check whether the loop rolls enough to consider.  
+ Consult also loop bounds and profile; in the case the loop has more
+ than one exit it may well loop less than determined maximal number
+ of iterations.  */
+  if (desc->niter < 2 * nunroll
+  || ((estimated_loop_iterations (loop, &iterations)
+  || max_loop_iterations (loop, &iterations))
+ && iterations.ult (double_int::from_shwi (2 * nunroll
 {
   if (dump_file)
fprintf (dump_file, ";; Not unrolling loop, doesn't roll\n");
Index: testsuite/gcc.dg/tree-prof/unroll-1.c
===
--- testsuite/gcc.dg/tree-prof/unroll-1.c   (revision 0)
+++ testsuite/gcc.dg/tree-prof/unroll-1.c   (revision 0)
@@ -0,0 +1,24 @@
+/* { dg-options "-O3 -fdump-rtl-loop2_unroll -funroll-loops -fno-peel-loops" } 
*/
+void abort ();
+
+int a[1000];
+int
+__attribute__ ((noinline))
+t()
+{
+  int i;
+  for (i=0;i<1000;i++)
+if (!a[i])
+  return 1;
+  abort ();
+}
+main()
+{
+  int i;
+  for (i=0;i<1000;i++)
+t();
+  return 0;
+}
+/* { dg-final-use { scan-rtl-dump "Considering unrolling loop with constant 
number of iterations" "loop2_unroll" } } */
+/* { dg-final-use { cleanup-rtl-dump "Not unrolling loop, doesn't roll" } } */
+/* { dg-options "-O3 -fdump-rtl-loop2_unroll -funroll-loops -fno-peel-loops" } 
*/


Re: [PATCH, ARM] Fix PR44557 (Thumb-1 ICE)

2012-10-20 Thread Janis Johnson
On 10/19/2012 11:41 PM, Ramana Radhakrishnan wrote:
> On Tue, Oct 16, 2012 at 10:25 AM, Chung-Lin Tang
>  wrote:
>> On 12/9/27 6:25 AM, Janis Johnson wrote:
>>> On 09/26/2012 01:58 AM, Chung-Lin Tang wrote:
>>>
>>> +/* { dg-do compile } */
>>> +/* { dg-options "-mthumb -O1 -march=armv5te -fno-omit-frame-pointer 
>>> -fno-forward-propagate" }  */
>>> +/* { dg-require-effective-target arm_thumb1_ok } */
>>>
>>> This test will fail to compile for test flags that conflict with
>>> the -march option, and the specified -march option might be
>>> overridden with similar options from other test flags.  The problem
>>> might have also been seen for other -march options.  I recommend
>>> leaving it off and omitting the dg-require so the test can be run
>>> for more multilibs.
>>
>> I'm not sure, as the intent is to test a Thumb-1 case here. If the
>> maintainers think we should adjust the testcase, I'm of course fine with it.
> 
> I think this is OK but you need to prune out the conflict warnings to
> reduce annoyance for folks doing multilib testing and it does look
> like more than one group.
> 
> Longer term I wonder if we should reorganise gcc.target/arm and indeed
> gcc.target/aarch64 . Janis, do you have any other ideas ?
> 
> * to contain a torture script that goes through all combinations of
> architectures and fpus' / arm / thumb for all the tests.
> * another sub-level directory for such directed tests where multilib
> options aren't applied which are essentially from regressions.
> 
> However I don't know of an easy way by which we can ignore said
> multilib flags ?
> 
> Ramana

Multilib flags are added deep in DejaGnu, and we would need to have a
local copy of a large procedure in order to do that.

Do enough people run a default multilib that we could use a torture
script only for a multilib with no flags?

Janis


Re: [RFC] Fix PR rtl-optimization/54315 (partially)

2012-10-20 Thread Eric Botcazou
> The patch was fully tested on x86_64-suse-linux, where it removes half of
> the useless stores in the original testcase for PR rtl-optimization/54315,
> and manually tested for arm-linux-gnueabi (for now), where it also removes
> stores for small structures.  Comments?
>
> 2012-10-08  Eric Botcazou  
> 
>   * calls.c (expand_call): Don't deal specifically with BLKmode values
>   returned in naked registers.
>   * expr.h (copy_blkmode_from_reg): Adjust prototype.
>   * expr.c (copy_blkmode_from_reg): Rename first parameter into TARGET and
>   make it required.  Assert that SRCREG hasn't BLKmode.  Add a couple of
>   short-circuits for common cases and be prepared for sub-word registers.
>   (expand_assignment): Call copy_blkmode_from_reg for BLKmode values
>   returned in naked registers.
>   (store_expr): Likewise.
>   (store_field): Likewise.

Applied after testing on x86_64-suse-linux and mips64el-linux-gnu, where it 
also removes stores for small structures (at least for n32).

-- 
Eric Botcazou


Re: [Committed] S/390: Add support for the new IBM zEnterprise EC12

2012-10-20 Thread Gerald Pfeifer
On Wed, 10 Oct 2012, Andreas Krebbel wrote:
> the attached patch adds initial support for the latest release of
> the IBM mainframe series - the IBM zEnterprise EC12 (zEC12).

Nice.  Can you please also add a note to the release notes at
gcc-4.8/changes.html ?

In principle, I'm also in favor of adding a news item to our
main page for updates like this since it shows how GCC is
evolving and supporting the latest hardware releases (even
if, like here, the code changes are not huge).

Gerald


Re: [PATCH, ARM] Subregs of VFP registers in big-endian mode

2012-10-20 Thread Andrew Pinski
On Sat, Oct 20, 2012 at 4:38 AM, Julian Brown  wrote:
> Hi,
>
> Quite a few tests fail for big-endian multilibs which use VFP
> instructions at present. One reason for many of these is glaringly
> obvious once you notice it: for D registers interpreted as two S
> registers, the lower-numbered register is always the less-significant
> part of the value, and the higher-numbered register the
> more-significant -- regardless of the endianness the processor is
> running in.
>
> However, for big-endian mode, when DFmode values are represented in
> memory (or indeed core registers), the opposite is true. So, a subreg
> expression such as the following will work fine on core registers (or
> e.g. pseudos assigned to stack slots):
>
> (subreg:SI (reg:DF) 0)
>
> but, when applied to a VFP register Dn, it should be resolved to the
> hard register S(n*2+1). At present though, it resolves to S(n*2) -- i.e.
> the wrong half of the value (for WORDS_BIG_ENDIAN, such a subreg should
> be the most-significant part of the value). For the relatively few cases
> where DFmode values are interpreted as a pair of (integer) words, this
> means that wrong code is generated.
>
> My feeling is that implementing a "proper" solution to this problem is
> probably impractical -- the closest existing macros to control
> behaviour aren't sufficient for this case:
>
> * FLOAT_WORDS_BIG_ENDIAN only refers to memory layout, which is correct
>   as is it.
>
> * REG_WORDS_BIG_ENDIAN controls whether values are stored in big-endian
>   order in registers, but refers to *all* registers. We only want to
>   change the behaviour for the VFP registers. Defining a new macro
>   FLOAT_REG_WORDS_BIG_ENDIAN wouldn't do, because the behaviour would
>   differ depending on the hard register under observation: that seems
>   like too much to ask of generic machinery in the middle-end.
>
> So, the attached patch just avoids the problem, by pretending that
> greater-than-word-size values in VFP registers, in big-endian mode, are
> opaque and cannot be subreg'ed. In practice, for at least the test case
> I looked at, this isn't as much of a pessimisation as you might expect
> -- the value in question might already be stored in core registers
> (e.g. for function arguments with -mfloat-abi=softfp), so can be
> retrieved directly from those rather than via memory.
>
> This is the testsuite delta for current FSF mainline, with multilibs
> adjusted to build for little/big-endian, and using options
> "-mbig-endian -mfloat-abi=softfp -mfpu=vfpv3" for testing:
>
> FAIL -> PASS: be-code-on-qemu/g++.sum:g++.dg/torture/type-generic-1.C  -O1  
> execution test
> FAIL -> PASS: be-code-on-qemu/g++.sum:g++.dg/torture/type-generic-1.C  -O2  
> execution test
> FAIL -> PASS: be-code-on-qemu/g++.sum:g++.dg/torture/type-generic-1.C  -O2 
> -flto -fno-use-linker-plugin -flto-partition=none  execution test
> FAIL -> PASS: be-code-on-qemu/g++.sum:g++.dg/torture/type-generic-1.C  -O2 
> -flto -fuse-linker-plugin -fno-fat-lto-objects  execution test
> FAIL -> PASS: be-code-on-qemu/g++.sum:g++.dg/torture/type-generic-1.C  -O3 
> -fomit-frame-pointer  execution test
> FAIL -> PASS: be-code-on-qemu/g++.sum:g++.dg/torture/type-generic-1.C  -O3 -g 
>  execution test
> FAIL -> PASS: be-code-on-qemu/g++.sum:g++.dg/torture/type-generic-1.C  -Os  
> execution test
> FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/ieee/copysign1.c 
> execution,  -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects
> FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/ieee/mzero6.c 
> execution,  -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects
> FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr35456.c 
> execution,  -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects
> FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr44683.c 
> execution,  -O1
> FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr44683.c 
> execution,  -O2
> FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr44683.c 
> execution,  -O2 -flto -fno-use-linker-plugin -flto-partition=none
> FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr44683.c 
> execution,  -O2 -flto -fuse-linker-plugin -fno-fat-lto-objects
> FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr44683.c 
> execution,  -O3 -fomit-frame-pointer
> FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr44683.c 
> execution,  -O3 -g
> FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr44683.c 
> execution,  -Og -g
> FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.c-torture/execute/pr44683.c 
> execution,  -Os
> FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.dg/compat/scalar-by-value-3 
> c_compat_x_tst.o-c_compat_y_tst.o execute
> FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.dg/torture/type-generic-1.c  -O1  
> execution test
> FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.dg/torture/type-generic-1.c  -O2  
> execution test
> FAIL -> PASS: be-code-on-qemu/gcc.sum:gcc.dg/torture/type-generic-1.c  

[wwwdocs,Java] Replace sources.redhat.com by sourceware.org

2012-10-20 Thread Gerald Pfeifer
...and some other simplifications and improvements I noticed on
the way.

This was triggered by a note that the sources.redhat.com DNS entry
is going to go away at some point in the future that I got yesterday.

Applied.

Gerald


2012-10-21  Gerald Pfeifer  

* news.html: Replace references to sources.redhat.com by
sourceware.org.
Avoid a reference to CVS.
Some style adjustments to the February 8, 2001 entry.

Index: news.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/java/news.html,v
retrieving revision 1.12
diff -u -3 -p -r1.12 news.html
--- news.html   19 Sep 2010 20:35:03 -  1.12
+++ news.html   21 Oct 2012 02:02:51 -
@@ -153,7 +153,7 @@ code size heuristics.  It is enabled by 
 
 Gary Benson from Red Hat has released 
 http://people.redhat.com/gbenson/naoko/";>Naoko: a subset 
-of the http://sources.redhat.com/rhug/";>rhug packages 
+of the http://sourceware.org/rhug/";>rhug packages 
 that have been repackaged for eventual inclusion in Red Hat Linux. 
 Naoko basically comprises binary RPMS of Ant, Tomcat, and their 
 dependencies built with gcj.
@@ -172,8 +172,8 @@ A team of hackers from Red Hat has relea
 of http://www.eclipse.org/";>Eclipse, a free software IDE
 written in Java, that has been compiled with a modified gcj.
 You can find more information
-http://sources.redhat.com/eclipse/";>here.  We'll be
-integrating the required gcj patches into cvs in the near future.
+http://sourceware.org/eclipse/";>here.  We'll be
+integrating the required gcj patches in the near future.
 
 
 July 31, 2003
@@ -426,7 +426,7 @@ find bugs!
 February 8, 2001
 
 Made use of Warren Levy's change to the
-http://sources.redhat.com/mauve/";>Mauve test suite to handle
+http://sourceware.org/mauve/";>Mauve test suite to handle
 regressions.
 Modifications have been made to mauve.exp to copy the newly created
 xfails file of known library failures from the source tree
@@ -434,9 +434,9 @@ to the directory where the libjava '
 This allows the testsuite to ignore XFAILs and thus highlight
 true regressions in the library.  The Mauve tests are
 automatically run as part of a libjava
-'make check' as long as the Mauve suite is accessible
-and the env var MAUVEDIR is set to point to the top-level
-of the http://sources.redhat.com/mauve/download.html";>Mauve 
source.
+make check as long as the Mauve suite is accessible and the
+environment variable MAUVEDIR is set to point to the top-level
+of the Mauve sources.
 
 
 January 28, 2001


[lra] patch to fix PR54991

2012-10-20 Thread Vladimir Makarov

The following patch fixes PR54991.

Committed as rev. 192645.

2012-10-20  Vladimir Makarov  

PR rtl-optimization/54991
* lra-constraints.c (lra_constraints): Change equiv memory check
on reverse equivalence check.
(inherit_in_ebb): Invalidate usage insns for multi-word hard regs.

Index: lra-constraints.c
===
--- lra-constraints.c   (revision 192637)
+++ lra-constraints.c   (working copy)
@@ -3552,18 +3552,24 @@ lra_constraints (bool first_p)
else if ((x = get_equiv_substitution (regno_reg_rtx[i])) != NULL_RTX)
  {
bool pseudo_p = contains_reg_p (x, false, false);
+   rtx set, insn;
 
/* We don't use DF for compilation speed sake.  So it is
   problematic to update live info when we use an
   equivalence containing pseudos in more than one BB.  */
if ((pseudo_p && multi_block_pseudo_p (i))
-   /* We check that a pseudo in rhs of the init insn is
-  not dying in the insn.  Otherwise, the live info
-  at the beginning of the corresponding BB might be
-  wrong after we removed the insn.  When the equiv can
-  be a constant, the right hand side of the init insn
-  can be a pseudo.  */
-   || (ira_reg_equiv[i].memory == NULL_RTX
+   /* If it is not a reverse equivalence, we check that a
+  pseudo in rhs of the init insn is not dying in the
+  insn.  Otherwise, the live info at the beginning of
+  the corresponding BB might be wrong after we
+  removed the insn.  When the equiv can be a
+  constant, the right hand side of the init insn can
+  be a pseudo.  */
+   || (! ((insn = ira_reg_equiv[i].init_insns) != NULL_RTX
+  && INSN_P (insn)
+  && (set = single_set (insn)) != NULL_RTX
+  && REG_P (SET_DEST (set))
+  && (int) REGNO (SET_DEST (set)) == i)
&& init_insn_rhs_dead_pseudo_p (i)))
  ira_reg_equiv[i].defined_p = false;
else if (! first_p && pseudo_p)
@@ -,7 +4450,7 @@ static bitmap_head temp_bitmap;
 static bool
 inherit_in_ebb (rtx head, rtx tail)
 {
-  int i, src_regno, dst_regno;
+  int i, src_regno, dst_regno, nregs;
   bool change_p, succ_p;
   rtx prev_insn, next_usage_insns, set, last_insn;
   enum reg_class cl;
@@ -4607,13 +4613,26 @@ inherit_in_ebb (rtx head, rtx tail)
   reg_renumber[dst_regno]);
AND_COMPL_HARD_REG_SET (live_hard_regs, s);
  }
-   /* We should invalidate potential inheritance for the
-  current insn usages to the next usage insns (see
-  code below) as the output pseudo prevents this.  */
-   if (reg_renumber[dst_regno] < 0
-   || (reg->type == OP_OUT && ! reg->subreg_p))
- /* Invalidate.  */
- usage_insns[dst_regno].check = 0;
+   /* We should invalidate potential inheritance or
+  splitting for the current insn usages to the next
+  usage insns (see code below) as the output pseudo
+  prevents this.  */
+   if ((dst_regno >= FIRST_PSEUDO_REGISTER
+&& reg_renumber[dst_regno] < 0)
+   || (reg->type == OP_OUT && ! reg->subreg_p
+   && (dst_regno < FIRST_PSEUDO_REGISTER
+   || reg_renumber[dst_regno] >= 0)))
+ {
+   /* Invalidate.  */
+   if (dst_regno >= FIRST_PSEUDO_REGISTER)
+ usage_insns[dst_regno].check = 0;
+   else
+ {
+   nregs = hard_regno_nregs[dst_regno][reg->biggest_mode];
+   for (i = 0; i < nregs; i++)
+ usage_insns[dst_regno + i].check = 0;
+ }
+ }
  }
  if (! JUMP_P (curr_insn))
for (i = 0; i < to_inherit_num; i++)


Committed, libgcc MMIX: implement static marking of program and data memory

2012-10-20 Thread Hans-Peter Nilsson
With a simulator that doesn't just allocate zeros on any access, it's
necessary to tell the simulator the bounds of defined memory, both for
static and dynamically allocated memory.  This patch implements static
code and data allocatation; zero'd data and constants may not be
otherwise loaded.  A patch to newlib is about to be committed to
mark dynamically allocated memory.

The attached mmix-sim.ch (a "literate programming" / ctangle "change
file"; the equivalent of a patch) implements such marking and memory
checking.  Put it into the untarred mmix-20110831.tar.gz (may work
with other versions) before compiling the "mmix" simulator.  Beware
that the distribution terms of the mmix simulator requires (in my
layman interpretation) that the resulting program is not distributed
as any part of the original mmixware package.

There was some fallout from this new checking, but I believe all
needed patches have been at least submitted, though not all approved
and committed.

libgcc:
* config/mmix/crti.S: Mark program and data addresses using PRELD.
Remove typo'd and unnecessary alignment-LOC for .data.  Remove
no-longer-needed LDBU insns.

Index: crti.S
===
--- crti.S  (revision 192353)
+++ crti.S  (working copy)
@@ -35,20 +35,25 @@ see the files COPYING3 and COPYING.RUNTI
 % respectively, so the compiler can switch between them pretending they're
 % segments.

-% This little treasure is here so the 32 lowest address bits of user data
-% will not be zero.  Because of truncation, that would cause testcase
-% gcc.c-torture/execute/980701-1.c to incorrectly fail.
+% This little treasure (some contents) is required so the 32 lowest
+% address bits of user data will not be zero.  Because of truncation,
+% that would cause testcase gcc.c-torture/execute/980701-1.c to
+% incorrectly fail.

.data   ! mmixal:= 8H LOC Data_Segment
.p2align 3
-   LOC @+(8-@)@7
-   OCTA 2009
+dstart OCTA 2009

.text   ! mmixal:= 9H LOC 8B; LOC #100
.global Main

 % The __Stack_start symbol is provided by the link script.
 stackppOCTA __Stack_start
+crtstxtOCTA _init  % Assumed to be the lowest executed address.
+   OCTA __etext% Assumed to be beyond the highest executed address.
+
+crtsdatOCTA dstart % Assumed to be the lowest accessed address.
+   OCTA _end   % Assumed to be beyond the highest accessed address.

 % "Main" is the magic symbol the simulator jumps to.  We want to go
 % on to "main".
@@ -56,16 +61,47 @@ stackpp OCTA __Stack_start
 Main   SETL$255,32
PUT rG,$255

+% Make sure we have valid memory for addresses in .text and .data (and
+% .bss, but we include this in .data), for the benefit of mmo-using
+% simulators that require validation of addresses for which contents
+% is not present.  Due to its implicit-zero nature, zeros in contents
+% may be left out in the mmo format, but we don't know the boundaries
+% of those zero-chunks; for mmo files from binutils, they correspond
+% to the beginning and end of sections in objects before linking.  We
+% validate the contents by executing PRELD (0; one byte) on each
+% 2048-byte-boundary of our .text .data, and we assume this size
+% matches the magic lowest-denominator chunk-size for all
+% validation-requiring simulators.  The effect of the PRELD (any size)
+% is assumed to be the same as initial loading of the contents, as
+% long as the PRELD happens before the first PUSHJ/PUSHGO.  If it
+% happens after that, we'll need to distinguish between
+% access-for-execution and read/write access.
+
+   GETA$255,crtstxt
+   LDOU$2,$255,0
+   ANDNL   $2,#7ff % Align the start at a 2048-boundary.
+   LDOU$3,$255,8
+   SETL$4,2048
+0H PRELD   0,$2,0
+   ADDU$2,$2,$4
+   CMP $255,$2,$3
+   BN  $255,0B
+
+   GETA$255,crtsdat
+   LDOU$2,$255,0
+   ANDNL   $2,#7ff
+   LDOU$3,$255,8
+0H PRELD   0,$2,0
+   ADDU$2,$2,$4
+   CMP $255,$2,$3
+   BN  $255,0B
+
 % Initialize the stack pointer.  It is supposedly made a global
 % zero-initialized (allowed to change) register in crtn.S; we use the
 % explicit number.
GETA$255,stackpp
LDOU$254,$255,0

-% Make sure we get more than one mem, to simplify counting cycles.
-   LDBU$255,$1,0
-   LDBU$255,$1,1
-
PUSHJ   $2,_init

 #ifdef __MMIX_ABI_GNU__

brgds, H-PCopyright 2012 Hans-Peter Nilsson.  This file may be freely copied and
distributed, provided that no changes whatsoever are made.  Ah, just
kidding: you may change it as you like, except that this paragraph,
including the above attribution, must be kept unmodified and the
distribution terms not limited.  Add your own attribution below if you
change anything, so people don't blame me.  Special permission is
granted for the copyri

Ping: [RFA:] Fix frame-pointer-clobbering in builtins.c:expand_builtin_setjmp_receiver

2012-10-20 Thread Hans-Peter Nilsson
CC:ing middle-end maintainers this time.  I was a bit surprised
when Eric Botcazou wrote in his review, quoted below, that he's
not one of you.  Maybe approve that too?

On Mon, 15 Oct 2012, Hans-Peter Nilsson wrote:

> On Fri, 12 Oct 2012, Eric Botcazou wrote:
> > > (insn 168 49 51 3 (set (reg/f:DI 253 $253)
> > > (plus:DI (reg/f:DI 253 $253)
> > > (const_int 24 [0x18])))
> > > /tmp/mmiximp2/gcc/gcc/testsuite/gcc.c-torture/execute/built-in-setjmp.c:21
> > > -1 (nil))
> > > (insn 51 168 52 3 (clobber (reg/f:DI 253 $253))
> ...
>
> > > Note that insn 168 deleted, which seems a logical optimization.  The
> > > bug is to emit the clobber, not that the restoring insn is removed.
> >
> > Had that worked in the past for MMIX?
>
> Yes, for svn revision 106027 (20051030) 4.1.0-era (!)
> 
> where the test must have passed, as
> gcc.c-torture/execute/built-in-setjmp.c is at least four years
> older than that.
>
> >  If so, what changed recently?
>
> By "these days" I didn't mean "recent", just not "eons ago". :)
> I see in a gcc-test-results posting from Mike Stein (whom I'd
> like to thank for test-results posting over the years), matching
> FAILs for svn revision 126095 (20070628) 4.3.0-era
> .
>
> Sorry, I have nothing in between those reports, my bad.  Though
> I see no point narrowing down the failing revision further here
> IMO; as mentioned the bug is not that the restoring insn is
> removed.
>
> > Agreed.  However, I'd suggest rescuing the comment for the ELIMINABLE_REGS
> > block from expand_nl_goto_receiver as it still sounds valid to me.
>
> Oops, my bad; I see I removed all the good comments.  Fixed.
>
> > >   * stmt.c (expand_nl_goto_receiver): Remove almost-copy of
> > >   expand_builtin_setjmp_receiver.
> > >   (expand_label): Adjust, call expand_builtin_setjmp_receiver
> > >   with NULL for the label parameter.
> > >   * builtins.c (expand_builtin_setjmp_receiver): Don't clobber
> > >   the frame-pointer.  Adjust comments.
> > >   [HAVE_builtin_setjmp_receiver]: Emit builtin_setjmp_receiver
> > >   only if LABEL is non-NULL.
> >
> > I cannot formally approve, but this looks good to me modulo:
>
> > > +   If RECEIVER_LABEL is NULL, instead the port-specific parts of a
> > > +   nonlocal goto handler are emitted.  */
> >
> > The "port-specific parts" wording is a bit confusing I think.  I'd just 
> > write:
> >
> >   If RECEIVER_LABEL is NULL, instead contruct a nonlocal goto handler.
>
> Sure.  Thanks for the review.  Updated patch below.  As nothing
> was changed from the previous post but comments as per the
> review (mostly moving / reviving, fixing one grammo), already
> covered by the changelog quoted above, the previous testing is
> still valid.
>
> Ok for trunk, approvers?
>
> Index: gcc/builtins.c
> ===
> --- gcc/builtins.c(revision 192353)
> +++ gcc/builtins.c(working copy)
> @@ -885,14 +885,15 @@ expand_builtin_setjmp_setup (rtx buf_add
>  }
>
>  /* Construct the trailing part of a __builtin_setjmp call.  This is
> -   also called directly by the SJLJ exception handling code.  */
> +   also called directly by the SJLJ exception handling code.
> +   If RECEIVER_LABEL is NULL, instead contruct a nonlocal goto handler.  */
>
>  void
>  expand_builtin_setjmp_receiver (rtx receiver_label ATTRIBUTE_UNUSED)
>  {
>rtx chain;
>
> -  /* Clobber the FP when we get here, so we have to make sure it's
> +  /* Mark the FP as used when we get here, so we have to make sure it's
>   marked as used by this function.  */
>emit_use (hard_frame_pointer_rtx);
>
> @@ -907,17 +908,28 @@ expand_builtin_setjmp_receiver (rtx rece
>  #ifdef HAVE_nonlocal_goto
>if (! HAVE_nonlocal_goto)
>  #endif
> -{
> -  emit_move_insn (virtual_stack_vars_rtx, hard_frame_pointer_rtx);
> -  /* This might change the hard frame pointer in ways that aren't
> -  apparent to early optimization passes, so force a clobber.  */
> -  emit_clobber (hard_frame_pointer_rtx);
> -}
> +/* First adjust our frame pointer to its actual value.  It was
> +   previously set to the start of the virtual area corresponding to
> +   the stacked variables when we branched here and now needs to be
> +   adjusted to the actual hardware fp value.
> +
> +   Assignments to virtual registers are converted by
> +   instantiate_virtual_regs into the corresponding assignment
> +   to the underlying register (fp in this case) that makes
> +   the original assignment true.
> +   So the following insn will actually be decrementing fp by
> +   STARTING_FRAME_OFFSET.  */
> +emit_move_insn (virtual_stack_vars_rtx, hard_frame_pointer_rtx);
>
>  #if !HARD_FRAME_POINTER_IS_ARG_POINTER
>if (fixed_regs[ARG_POINTER_REGNUM])
>  {
>  #ifdef ELIMINABLE_REGS
> +  /* If the argument 

Committed: skip testsuite/23_containers/bitset/45713.cc for mmix-*-*.

2012-10-20 Thread Hans-Peter Nilsson
For mmix-knuth-mmixware, MAX_FIXED_MODE_SIZE is the default,
GET_MODE_BITSIZE (DImode), which of course isn't larger than the
size-type, the same size on this 64-bit target.  I don't think making
it larger (i.e. TImode) would help: that seems instead likely to
introduce awkward spurious non-host_integerp ()-related code
differences between hosts with/without a 128-bit integer type.
The minor benefit would be to be able to handle objects larger than
1/8 of the (architecturall) address space.  Besides, of course,
supporting test-cases like the one below.  Committed.

* testsuite/23_containers/bitset/45713.cc: Skip for mmix-*-*.
Tweak sizetype-related comment.

Index: libstdc++-v3/testsuite/23_containers/bitset/45713.cc
===
--- libstdc++-v3/testsuite/23_containers/bitset/45713.cc(revision 
192646)
+++ libstdc++-v3/testsuite/23_containers/bitset/45713.cc(working copy)
@@ -16,9 +16,9 @@
 // .

 // The testcase requires bitsizetype to be wider than sizetype,
-// otherwise types/vars with 0x2000 bytes or larger can't be used.
-// See http://gcc.gnu.org/PR54897
-// { dg-do compile { target { ! { avr*-*-* cris*-*-* h8300*-*-* mcore*-*-* 
moxie*-*-* } } } }
+// otherwise types/vars with (e.g. for 32-bit sizetype) 0x2000
+// bytes or larger can't be used.  See http://gcc.gnu.org/PR54897
+// { dg-do compile { target { ! { avr*-*-* cris*-*-* h8300*-*-* mcore*-*-* 
moxie*-*-* mmix-*-* } } } }

 #include 

brgds, H-P


[PATCH][xtensa] Remove unused variable

2012-10-20 Thread Chung-Lin Tang
Hi Sterling,
the last thread pointer builtin changes left an unused 'arg' variable in
xtensa_expand_builtin(), which triggered a new warning. Thanks to
Jan-Benedict for testing this. Attached patch was committed as obvious.

Thanks,
Chung-Lin

* config/xtensa/xtensa.c (xtensa_expand_builtin): Remove unused
'arg' variable.
Index: config/xtensa/xtensa.c
===
--- config/xtensa/xtensa.c  (revision 192647)
+++ config/xtensa/xtensa.c  (working copy)
@@ -3133,7 +3133,6 @@ xtensa_expand_builtin (tree exp, rtx target,
 {
   tree fndecl = TREE_OPERAND (CALL_EXPR_FN (exp), 0);
   unsigned int fcode = DECL_FUNCTION_CODE (fndecl);
-  rtx arg;
 
   switch (fcode)
 {