RE: A Question About LRA/reload

2014-12-10 Thread Ajit Kumar Agarwal


-Original Message-
From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of Jeff Law
Sent: Tuesday, December 09, 2014 11:26 PM
To: Vladimir Makarov; lin zuojian; gcc@gcc.gnu.org
Subject: Re: A Question About LRA/reload

On 12/09/14 10:10, Vladimir Makarov wrote:
> generate the correct code in many cases even for x86.  Jeff Law tried 
> IRA coloring reusage too for reload but whole RA became slower 
> (although he achieved performance improvements on x86).
>>Right.  After IRA was complete, I'd walk over the unallocated allocnos and 
>>split their ranges at EBB boundaries.  That created new allocnos with a 
>>smaller ??>>conflict set and reduced the conflict set for the original 
>>unallocated allocnos.

Jeff: In the above approach of splitting the ranges for unallocated allocnos is 
aggressive or based on the approach
 of some heuristics that the Live ranges for unallocated allocnos is not 
touched inside the EBBs.

>>After I'd done that splitting for all the EBBs, I called back into 
>>ira_reassign_pseudos to try to assign the original unallocated allocnos as 
>>well as the new >>allocnos.

>>To get good results, much of IRA's cost analysis had to be redone from 
>>scratch.  And from a compile-time standpoint, that's a killer.

>>The other approach I was looking at was a backwards walk through each block.  
>>When I found an insn with an unallocated pseudo that would trigger one of 
various range spliting techniques to try and free up a hard register.  Then 
>>again I'd call into ira_reassign_pseudos to try the allocations again.  This 
>>got even >>better results, but was obviously even more compile-time expensive.

After the above splitting, are you building the conflict graph again to assign 
the new allocnos. If the Conflict graph is built again,
this will affect the compile time.

Thanks & Regards
Ajit
I don't think much, if any, of that work is relevant given the current 
structure and effectiveness of LRA.

jeff


Re: pointer math vs named address spaces

2014-12-10 Thread Richard Biener
On Wed, Dec 10, 2014 at 2:24 AM, Richard Henderson  wrote:
> On 12/04/2014 01:54 AM, Richard Biener wrote:
>> Apart from what Joseph already said using 'sizetype' in the middle-end
>> for sizes and offsets is really really deep-rooted into the compiler.
>> What you see above is one aspect - POINTER_PLUS_EXPR offsets
>> are forced to have sizetype type.  But you'd also see it in the inability
>> to create larger than sizetype objects (DECL_SIZE_UNITs type).
>>
>> So for the middle-end part I'd suggest you make sure that sizetype
>> covers the largest pointer-mode precision your target offers.  That of course
>> doesn't fix what Joseph pointed out - that the user will still run into 
>> issues
>> when writing C programs or when using the C runtime (I suppose TR 18037
>> doesn't specify alternate memcpy for different address spaces?)
>
> I'd prefer it if the middle-end were more flexible about what types it allows
> for the offset of POINTER_PLUS_EXPR.  E.g. any integer type of an appropriate 
> mode.

What is an appropriate mode?  But yes, in general I agree that having
the offset type of POINTER_PLUS_EXPR fixed to sizetype (or any
compatible type) is technically not necessary - though changing that
requires touching a _lot_ of code (everyone using size_{un,bin}op
on it).  I've tried the more simplistic idea of allowing ssizetype as well
and failed due to all the fallout ... (that was ~2 years ago).

> I'd expect the right answer is to match targetm.addr_space.address_mode rather
> than .pointer_mode, so that any extension, if necessary, is already exposed to
> the optimizers.  But I suppose it's also possible that might pessimize things
> for these weird targets...

Well.  On the tree level we can certainly say that POINTER_PLUS_EXPR
is simply first converting the offset to an appropriate type and then performing
the addition.  The choice of that type is then defered to the RTL expander.

For C code doing ptr[i] you still have the issue that sizeof (*ptr) * i needs
to be carried out in a large enough type (or use a widening multiplication
or do not lower to pointer arithmetic).  Currently the C frontend chooses
a type with the precision of sizetype for this.

Richard.

>
> r~


Mirror

2014-12-10 Thread William Laeder
On the page: https://www.gnu.org/software/gcc/mirrors.html

The St. Louis Mirror is not configured properly, all it shows is a
welcome page from apache. The gcc's file system structure (assuming
its the same for all mirrors) does not also exist.


Re: pointer math vs named address spaces

2014-12-10 Thread Richard Henderson
On 12/10/2014 05:36 AM, Richard Biener wrote:
> On Wed, Dec 10, 2014 at 2:24 AM, Richard Henderson  wrote:
>> On 12/04/2014 01:54 AM, Richard Biener wrote:
>>> Apart from what Joseph already said using 'sizetype' in the middle-end
>>> for sizes and offsets is really really deep-rooted into the compiler.
>>> What you see above is one aspect - POINTER_PLUS_EXPR offsets
>>> are forced to have sizetype type.  But you'd also see it in the inability
>>> to create larger than sizetype objects (DECL_SIZE_UNITs type).
>>>
>>> So for the middle-end part I'd suggest you make sure that sizetype
>>> covers the largest pointer-mode precision your target offers.  That of 
>>> course
>>> doesn't fix what Joseph pointed out - that the user will still run into 
>>> issues
>>> when writing C programs or when using the C runtime (I suppose TR 18037
>>> doesn't specify alternate memcpy for different address spaces?)
>>
>> I'd prefer it if the middle-end were more flexible about what types it allows
>> for the offset of POINTER_PLUS_EXPR.  E.g. any integer type of an 
>> appropriate mode.
> 
> What is an appropriate mode?

That was supposed to be answered in the next paragraph.  ;-)

> But yes, in general I agree that having
> the offset type of POINTER_PLUS_EXPR fixed to sizetype (or any
> compatible type) is technically not necessary - though changing that
> requires touching a _lot_ of code (everyone using size_{un,bin}op
> on it).  I've tried the more simplistic idea of allowing ssizetype as well
> and failed due to all the fallout ... (that was ~2 years ago).

Ug.  That's true.  I hadn't even started to think what the implications of
changing the type might be.  Oh well.


r~


Re: A Question About LRA/reload

2014-12-10 Thread Jeff Law

On 12/10/14 02:02, Ajit Kumar Agarwal wrote:

Right.  After IRA was complete, I'd walk over the unallocated
allocnos and split their ranges at EBB boundaries.  That created
new allocnos with a smaller ??>>conflict set and reduced the
conflict set for the original unallocated allocnos.


Jeff: In the above approach of splitting the ranges for unallocated
allocnos is aggressive or based on the approach of some heuristics
that the Live ranges for unallocated allocnos is not touched inside
the EBBs.
It was focused on allocnos which were live at EBB boundaries and 
splitting the ranges at those boundaries.


In the case where the allocno was live across the EBB, but not used/set 
in the EBB, the split effectively homes the allocno in its stack slot 
across those EBBs.  That reduces the conflicts for the original allocno 
(and possibly other allocnos that need hard registers).


You can actually use that property to free up hard registers as well. 
If there's allocno(s) which are not used/set in an EBB, but which got a 
hard register, you can split their range at the EBB boundary.  This 
results in those allocnos homing into their stack slot across those EBBs 
(or some other hard register).  Which in turn frees one or more hard 
registers within the EBB.


In fact, you can take that concept quite a bit further and use it as the 
fundamental basis for range splitting by simply changing the range over 
which you want the allocnos to be transparent.  The range could be an 
EBB, BB a few insns or a single insn.


And that was the basis for backwards walk through the BB approach I was 
experimenting with.  We just walked insns backwards in the BB, when we 
encountered an insn with an unallocated allocno, we split the range of 
some other allocno to free up a suitable hard register.  First we'd look 
for an allocno that was transparent across the EBB, then a BB, then the 
largest range from the insn needing the reload to the closest prior 
use/set.  By splitting across these larger ranges, we tended to free up 
a hard register over a large range and it could often be used to satisfy 
multiple unassigned allocnos.  I was in the process of refactoring the 
code to handle things like register pairs and such when I got pulled 
away to other things.  I don't recall if this variant was ever 
bootstrapping, but it was getting great fill rates compared to reload.





After the above splitting, are you building the conflict graph again
to assign the new allocnos. If the Conflict graph is built again,
this will affect the compile time.
The conflict graph was incrementally updated IIRC.  The compile-time 
issues were mostly to get the register classes and ira cost models accurate.


Jeff


[RFC] GCC vector extension: binary operators vs. differing signedness

2014-12-10 Thread Ulrich Weigand
Hello,

we've noticed the following behavior of the GCC vector extension, and were
wondering whether this is actually intentional:

When you use binary operators on two vectors, GCC will accept not only operands
that use the same vector type, but also operands whose types only differ in
signedness of the vector element type.  The result type of such an operation
(in C) appears to be the type of the first operand of the binary operator.

For example, the following test case compiles:

typedef signed int vector_signed_int __attribute__ ((vector_size (16)));
typedef unsigned int vector_unsigned_int __attribute__ ((vector_size (16)));

vector_unsigned_int test (vector_unsigned_int x, vector_signed_int y)
{
  return x + y;
}

However, this variant

vector_unsigned_int test1 (vector_unsigned_int x, vector_signed_int y)
{
  return y + x;
}

fails to build:

xxx.c: In function 'test1':
xxx.c:12:3: note: use -flax-vector-conversions to permit conversions between 
vectors with differing element types or numbers of subparts
   return y + x;
   ^
xxx.c:12:10: error: incompatible types when returning type 'vector_signed_int 
{aka __vector(4) int}' but 'vector_unsigned_int {aka __vector(4) unsigned int}' 
was expected
   return y + x;
  ^

Given a commutative operator, this behavior seems surprising.


Note that for C++, the behavior is apparently different: both test
and test1 above compile as C++ code, but this version:

vector_signed_int test2 (vector_unsigned_int x, vector_signed_int y)
{
  return y + x;
}

which builds on C, fails on C++ with:

xxx.C:17:14: note: use -flax-vector-conversions to permit conversions between 
vectors with differing element types or numbers of subparts
   return y + x;
  ^
xxx.C:17:14: error: cannot convert 'vector_unsigned_int {aka __vector(4) 
unsigned int}' to 'vector_signed_int {aka __vector(4) int}' in return

This C vs. C++ mismatch likewise seems surprising.


Now, the manual page for the GCC vector extension says:

You cannot operate between vectors of different lengths or different signedness 
without a cast.

And the change log of GCC 4.3, where the strict vector type checks (and the
above-mentioned -flax-vector-conversions option) were introduced, says:

Implicit conversions between generic vector types are now only permitted 
when the two vectors in question have the same number of elements and 
compatible element types. (Note that the restriction involves compatible 
element types, not implicitly-convertible element types: thus, a vector type 
with element type int may not be implicitly converted to a vector type with 
element type unsigned int.) This restriction, which is in line with 
specifications for SIMD architectures such as AltiVec, may be relaxed using the 
flag -flax-vector-conversions. This flag is intended only as a compatibility 
measure and should not be used for new code. 

Both of these statements appear to imply (as far as I can tell) that all
the functions above ought to be rejected (unless -flax-vector-conversions).

So at the very least, we should bring the documentation in line with the
actual behavior.  However, as seen above, that actual behavior is probably
not really useful in any case, at least in C.


So I'm wondering whether we should:

A. Bring C in line with C++ by making the result of a vector binary operator
   use the unsigned type if the two input types differ in signedness?

and/or

B. Enforce that both operands to a vector binary operator must have the same
   type (except for opaque vector types) unless -flax-vector-conversions?


Thanks,
Ulrich


PS: FYI some prior discussion of related issues that I found:

https://gcc.gnu.org/ml/gcc/2006-10/msg00235.html
https://gcc.gnu.org/ml/gcc/2006-10/msg00682.html
https://gcc.gnu.org/ml/gcc-patches/2006-11/msg00926.html

https://gcc.gnu.org/ml/gcc-patches/2013-08/msg01634.html
https://gcc.gnu.org/ml/gcc-patches/2013-09/msg00450.html


-- 
  Dr. Ulrich Weigand
  GNU/Linux compilers and toolchain
  ulrich.weig...@de.ibm.com



gcc-4.9-20141210 is now available

2014-12-10 Thread gccadmin
Snapshot gcc-4.9-20141210 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.9-20141210/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.9 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_9-branch 
revision 218611

You'll find:

 gcc-4.9-20141210.tar.bz2 Complete GCC

  MD5=5407a78fb304f37a085ab6208618f23a
  SHA1=d80f6e5017f17e29f79c2315314f5788977381ac

Diffs from 4.9-20141203 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.9
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Do we create new insn in combine? Or can we rely on INSN_LUID checking the order of instuctions?

2014-12-10 Thread Bin.Cheng
Hi,
I am looking into distribute_notes, one reason why it's complicated is
the live range of register noted by REG_DEAD could be both shrunk or
extended.  When live range shrinks, we need to search backwards to
find previous reference and mark it as REG_DEAD (or delete the
definition if there is no reference anymore); when live range extends,
we need to search forward to see if we can mark later reference as
REG_DEAD.  Maybe the reason why distribute_notes is so vulnerable is
because it guesses how to distribute DEAD note based on other
information (elim_ix, i2, i3, etc..), rather than how register's live
range changes.
For example, PR62151 shows the case in which the REG_DEAD should be
discarded, but distribute_notes falsely tries to shrink the live range
(even worse, from a wrong point), resulting in wrong instruction
deleted.

So I am wondering if I can rely on INSN_LUID checking orders of
difference instruction.  If it can be done, I can easily differentiate
live range shrink and extend.
Further question is, if we don't insert new insns, can I use INSN_LUID
safely for this purpose?

Thanks,
bin