RE: A Question About LRA/reload
-Original Message- From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of Jeff Law Sent: Tuesday, December 09, 2014 11:26 PM To: Vladimir Makarov; lin zuojian; gcc@gcc.gnu.org Subject: Re: A Question About LRA/reload On 12/09/14 10:10, Vladimir Makarov wrote: > generate the correct code in many cases even for x86. Jeff Law tried > IRA coloring reusage too for reload but whole RA became slower > (although he achieved performance improvements on x86). >>Right. After IRA was complete, I'd walk over the unallocated allocnos and >>split their ranges at EBB boundaries. That created new allocnos with a >>smaller ??>>conflict set and reduced the conflict set for the original >>unallocated allocnos. Jeff: In the above approach of splitting the ranges for unallocated allocnos is aggressive or based on the approach of some heuristics that the Live ranges for unallocated allocnos is not touched inside the EBBs. >>After I'd done that splitting for all the EBBs, I called back into >>ira_reassign_pseudos to try to assign the original unallocated allocnos as >>well as the new >>allocnos. >>To get good results, much of IRA's cost analysis had to be redone from >>scratch. And from a compile-time standpoint, that's a killer. >>The other approach I was looking at was a backwards walk through each block. >>When I found an insn with an unallocated pseudo that would trigger one of various range spliting techniques to try and free up a hard register. Then >>again I'd call into ira_reassign_pseudos to try the allocations again. This >>got even >>better results, but was obviously even more compile-time expensive. After the above splitting, are you building the conflict graph again to assign the new allocnos. If the Conflict graph is built again, this will affect the compile time. Thanks & Regards Ajit I don't think much, if any, of that work is relevant given the current structure and effectiveness of LRA. jeff
Re: pointer math vs named address spaces
On Wed, Dec 10, 2014 at 2:24 AM, Richard Henderson wrote: > On 12/04/2014 01:54 AM, Richard Biener wrote: >> Apart from what Joseph already said using 'sizetype' in the middle-end >> for sizes and offsets is really really deep-rooted into the compiler. >> What you see above is one aspect - POINTER_PLUS_EXPR offsets >> are forced to have sizetype type. But you'd also see it in the inability >> to create larger than sizetype objects (DECL_SIZE_UNITs type). >> >> So for the middle-end part I'd suggest you make sure that sizetype >> covers the largest pointer-mode precision your target offers. That of course >> doesn't fix what Joseph pointed out - that the user will still run into >> issues >> when writing C programs or when using the C runtime (I suppose TR 18037 >> doesn't specify alternate memcpy for different address spaces?) > > I'd prefer it if the middle-end were more flexible about what types it allows > for the offset of POINTER_PLUS_EXPR. E.g. any integer type of an appropriate > mode. What is an appropriate mode? But yes, in general I agree that having the offset type of POINTER_PLUS_EXPR fixed to sizetype (or any compatible type) is technically not necessary - though changing that requires touching a _lot_ of code (everyone using size_{un,bin}op on it). I've tried the more simplistic idea of allowing ssizetype as well and failed due to all the fallout ... (that was ~2 years ago). > I'd expect the right answer is to match targetm.addr_space.address_mode rather > than .pointer_mode, so that any extension, if necessary, is already exposed to > the optimizers. But I suppose it's also possible that might pessimize things > for these weird targets... Well. On the tree level we can certainly say that POINTER_PLUS_EXPR is simply first converting the offset to an appropriate type and then performing the addition. The choice of that type is then defered to the RTL expander. For C code doing ptr[i] you still have the issue that sizeof (*ptr) * i needs to be carried out in a large enough type (or use a widening multiplication or do not lower to pointer arithmetic). Currently the C frontend chooses a type with the precision of sizetype for this. Richard. > > r~
Mirror
On the page: https://www.gnu.org/software/gcc/mirrors.html The St. Louis Mirror is not configured properly, all it shows is a welcome page from apache. The gcc's file system structure (assuming its the same for all mirrors) does not also exist.
Re: pointer math vs named address spaces
On 12/10/2014 05:36 AM, Richard Biener wrote: > On Wed, Dec 10, 2014 at 2:24 AM, Richard Henderson wrote: >> On 12/04/2014 01:54 AM, Richard Biener wrote: >>> Apart from what Joseph already said using 'sizetype' in the middle-end >>> for sizes and offsets is really really deep-rooted into the compiler. >>> What you see above is one aspect - POINTER_PLUS_EXPR offsets >>> are forced to have sizetype type. But you'd also see it in the inability >>> to create larger than sizetype objects (DECL_SIZE_UNITs type). >>> >>> So for the middle-end part I'd suggest you make sure that sizetype >>> covers the largest pointer-mode precision your target offers. That of >>> course >>> doesn't fix what Joseph pointed out - that the user will still run into >>> issues >>> when writing C programs or when using the C runtime (I suppose TR 18037 >>> doesn't specify alternate memcpy for different address spaces?) >> >> I'd prefer it if the middle-end were more flexible about what types it allows >> for the offset of POINTER_PLUS_EXPR. E.g. any integer type of an >> appropriate mode. > > What is an appropriate mode? That was supposed to be answered in the next paragraph. ;-) > But yes, in general I agree that having > the offset type of POINTER_PLUS_EXPR fixed to sizetype (or any > compatible type) is technically not necessary - though changing that > requires touching a _lot_ of code (everyone using size_{un,bin}op > on it). I've tried the more simplistic idea of allowing ssizetype as well > and failed due to all the fallout ... (that was ~2 years ago). Ug. That's true. I hadn't even started to think what the implications of changing the type might be. Oh well. r~
Re: A Question About LRA/reload
On 12/10/14 02:02, Ajit Kumar Agarwal wrote: Right. After IRA was complete, I'd walk over the unallocated allocnos and split their ranges at EBB boundaries. That created new allocnos with a smaller ??>>conflict set and reduced the conflict set for the original unallocated allocnos. Jeff: In the above approach of splitting the ranges for unallocated allocnos is aggressive or based on the approach of some heuristics that the Live ranges for unallocated allocnos is not touched inside the EBBs. It was focused on allocnos which were live at EBB boundaries and splitting the ranges at those boundaries. In the case where the allocno was live across the EBB, but not used/set in the EBB, the split effectively homes the allocno in its stack slot across those EBBs. That reduces the conflicts for the original allocno (and possibly other allocnos that need hard registers). You can actually use that property to free up hard registers as well. If there's allocno(s) which are not used/set in an EBB, but which got a hard register, you can split their range at the EBB boundary. This results in those allocnos homing into their stack slot across those EBBs (or some other hard register). Which in turn frees one or more hard registers within the EBB. In fact, you can take that concept quite a bit further and use it as the fundamental basis for range splitting by simply changing the range over which you want the allocnos to be transparent. The range could be an EBB, BB a few insns or a single insn. And that was the basis for backwards walk through the BB approach I was experimenting with. We just walked insns backwards in the BB, when we encountered an insn with an unallocated allocno, we split the range of some other allocno to free up a suitable hard register. First we'd look for an allocno that was transparent across the EBB, then a BB, then the largest range from the insn needing the reload to the closest prior use/set. By splitting across these larger ranges, we tended to free up a hard register over a large range and it could often be used to satisfy multiple unassigned allocnos. I was in the process of refactoring the code to handle things like register pairs and such when I got pulled away to other things. I don't recall if this variant was ever bootstrapping, but it was getting great fill rates compared to reload. After the above splitting, are you building the conflict graph again to assign the new allocnos. If the Conflict graph is built again, this will affect the compile time. The conflict graph was incrementally updated IIRC. The compile-time issues were mostly to get the register classes and ira cost models accurate. Jeff
[RFC] GCC vector extension: binary operators vs. differing signedness
Hello, we've noticed the following behavior of the GCC vector extension, and were wondering whether this is actually intentional: When you use binary operators on two vectors, GCC will accept not only operands that use the same vector type, but also operands whose types only differ in signedness of the vector element type. The result type of such an operation (in C) appears to be the type of the first operand of the binary operator. For example, the following test case compiles: typedef signed int vector_signed_int __attribute__ ((vector_size (16))); typedef unsigned int vector_unsigned_int __attribute__ ((vector_size (16))); vector_unsigned_int test (vector_unsigned_int x, vector_signed_int y) { return x + y; } However, this variant vector_unsigned_int test1 (vector_unsigned_int x, vector_signed_int y) { return y + x; } fails to build: xxx.c: In function 'test1': xxx.c:12:3: note: use -flax-vector-conversions to permit conversions between vectors with differing element types or numbers of subparts return y + x; ^ xxx.c:12:10: error: incompatible types when returning type 'vector_signed_int {aka __vector(4) int}' but 'vector_unsigned_int {aka __vector(4) unsigned int}' was expected return y + x; ^ Given a commutative operator, this behavior seems surprising. Note that for C++, the behavior is apparently different: both test and test1 above compile as C++ code, but this version: vector_signed_int test2 (vector_unsigned_int x, vector_signed_int y) { return y + x; } which builds on C, fails on C++ with: xxx.C:17:14: note: use -flax-vector-conversions to permit conversions between vectors with differing element types or numbers of subparts return y + x; ^ xxx.C:17:14: error: cannot convert 'vector_unsigned_int {aka __vector(4) unsigned int}' to 'vector_signed_int {aka __vector(4) int}' in return This C vs. C++ mismatch likewise seems surprising. Now, the manual page for the GCC vector extension says: You cannot operate between vectors of different lengths or different signedness without a cast. And the change log of GCC 4.3, where the strict vector type checks (and the above-mentioned -flax-vector-conversions option) were introduced, says: Implicit conversions between generic vector types are now only permitted when the two vectors in question have the same number of elements and compatible element types. (Note that the restriction involves compatible element types, not implicitly-convertible element types: thus, a vector type with element type int may not be implicitly converted to a vector type with element type unsigned int.) This restriction, which is in line with specifications for SIMD architectures such as AltiVec, may be relaxed using the flag -flax-vector-conversions. This flag is intended only as a compatibility measure and should not be used for new code. Both of these statements appear to imply (as far as I can tell) that all the functions above ought to be rejected (unless -flax-vector-conversions). So at the very least, we should bring the documentation in line with the actual behavior. However, as seen above, that actual behavior is probably not really useful in any case, at least in C. So I'm wondering whether we should: A. Bring C in line with C++ by making the result of a vector binary operator use the unsigned type if the two input types differ in signedness? and/or B. Enforce that both operands to a vector binary operator must have the same type (except for opaque vector types) unless -flax-vector-conversions? Thanks, Ulrich PS: FYI some prior discussion of related issues that I found: https://gcc.gnu.org/ml/gcc/2006-10/msg00235.html https://gcc.gnu.org/ml/gcc/2006-10/msg00682.html https://gcc.gnu.org/ml/gcc-patches/2006-11/msg00926.html https://gcc.gnu.org/ml/gcc-patches/2013-08/msg01634.html https://gcc.gnu.org/ml/gcc-patches/2013-09/msg00450.html -- Dr. Ulrich Weigand GNU/Linux compilers and toolchain ulrich.weig...@de.ibm.com
gcc-4.9-20141210 is now available
Snapshot gcc-4.9-20141210 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/4.9-20141210/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 4.9 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_9-branch revision 218611 You'll find: gcc-4.9-20141210.tar.bz2 Complete GCC MD5=5407a78fb304f37a085ab6208618f23a SHA1=d80f6e5017f17e29f79c2315314f5788977381ac Diffs from 4.9-20141203 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-4.9 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
Do we create new insn in combine? Or can we rely on INSN_LUID checking the order of instuctions?
Hi, I am looking into distribute_notes, one reason why it's complicated is the live range of register noted by REG_DEAD could be both shrunk or extended. When live range shrinks, we need to search backwards to find previous reference and mark it as REG_DEAD (or delete the definition if there is no reference anymore); when live range extends, we need to search forward to see if we can mark later reference as REG_DEAD. Maybe the reason why distribute_notes is so vulnerable is because it guesses how to distribute DEAD note based on other information (elim_ix, i2, i3, etc..), rather than how register's live range changes. For example, PR62151 shows the case in which the REG_DEAD should be discarded, but distribute_notes falsely tries to shrink the live range (even worse, from a wrong point), resulting in wrong instruction deleted. So I am wondering if I can rely on INSN_LUID checking orders of difference instruction. If it can be done, I can easily differentiate live range shrink and extend. Further question is, if we don't insert new insns, can I use INSN_LUID safely for this purpose? Thanks, bin