Re: gnat 4.9.2 on arm with rtems
> @Arnaud: I saw quite a lot of #pragma Debug-lines in the rts-code. Is there a > simple way of activating them without having to recompile gnat? No, you need to compile the runtime with -gnata to enable assertions and enable support for pragma Debug. You can add gnata to GNATLIBFLAGS in libada/Makefile.in for instance. Note that enabling assertions may be enough to locate the early error/inconsistency although since you likely have a memory corruption, this isn't guaranteed. Arno
Re: how to tweak x86 code generation to instrument certain opcodes with CC trap?
On Mon, Oct 26, 2015 at 8:47 PM, Yasser Shalabi wrote: > So back to square one. Any tips on what code/config-files I need to > modify with to get GCC to emit additional opcodes for certain > instructions? Maybe you should try cross-compiling. It looks like you have already succeeded with the instrumentation, it was just that the generated code is not in good enough shape to run during the build process. This should not be an issue with a cross-compilation.
Re: Prototype implementation: Improving effectiveness and generality of auto-vectorization
On Mon, Oct 26, 2015 at 6:59 AM, sameera wrote: > On Wednesday 21 October 2015 05:23 PM, Richard Biener wrote: >> >> On Thu, Oct 15, 2015 at 12:20 PM, sameera >> wrote: >>> >>> Hi Richard, >>> >>> This is with reference to our discussion at GNU Tools Cauldron 2015 >>> regarding my talk titled "Improving the effectiveness and generality of >>> GCC >>> auto-vectorization." We, at Imaginations Technologies, have further >>> worked >>> on finalizing the algorithms for transformations to achieve efficient >>> target-aware reordering instruction selection. We have implemented the >>> prototype in python to demonstrate the capabilities of the algorithm and >>> verify the claims made in the presentation. >>> >>> I am attaching the prototype along with the documented algorithm for >>> review >>> purposes. We would be starting with design and implementation of the same >>> in >>> GCC and would be glad to receive comments, feedback and suggestions. >> >> >> So I finally sat down and skimmed over auto_vectorization.txt. The first >> thing >> I notice is that you constrain your input language to sth you define >> yourself. >> In the end you will need to work on GIMPLE in SSA form. Thus the >> Algorithm as described needs to construct its AST from that GIMPLE >> representation. > > Richard, we have defined the input language for convenience in prototype > implementation. However, we will be using GIMPLE as our IR. As per grammar > of our tree, p-tree denote the permute order associated with the statement, > whereas c-tree is actually the GIMPLE instruction, which performs compute > operation. I tried looking at structures used in SLP, however they can not > be used as they are, as main difference between current SLP implementation > in GCC versus our representation is that, permute order in SLP is part of > the tree node in current GCC, whereas in our representation permute order is > represented as independent tree-node. Hence, I have created new tree > structure for our pass, which will create p-tree nodes for permute order, > and c-tree node which points to appropriate gimple statement. Yes, that's the whole purpose - get the vectorizer (and SLP) a better data structure which is explicit about permutes. >> Loop count and data reference analysis is provided by GCC and you need to >> work >> with the way their result is presented. > > I am trying to figure out where and how interleave pattern encapsulating > whole loop can be represented, as the interleave pattern not only has the > loop related information, but also the order in which dest array is being > written. The data reference analysis can be used nicely with the data > structures we have designed - as introduction of p-tree nodes does not alter > the attributes of c-tree (GIMPLE stmt). > > I am also trying to identify relations between chain of recurrences for each > SSA variable and vec_size associated with each tree-node in our structure. > Logically, both of them compute same information, and I am seeing if it can > be propagated in our tree. > >> >> As with the presentation the paper is mostly about optimizing the >> interleaving >> code. That's fine in principle but I suspect that the AST needs to get >> explicit >> support for "vectorized stmts" that perform type conversions (like type >> conversions themselves or widening operations), that is, represent the >> fact >> that for certain loops you need N vectorized stmts for stmt A and M >> vectorized >> stmts for stmt B. This is an important observation once you get to the >> point >> supporting targets with multiple vector sizes (and instructions like the >> x86_64 >> integer - double conversions which go from 128bit to 256bit vectors). > > Yes, we haven't given much thought about the type conversions, because our > assumption is that type conversions are order preserving transformations > (c-tree), and not order altering transformations (p-tree). Because of which > those instructions will be generated as any other compute instruction is > generated. And I see that GCC is also having same assumption, because of > which it treats vec_perm_const patterns different from vec_pack/unpack* > patterns though the instructions generated can be same. And, as each > statement can have its own vectorization count, the scenario that you are > mentioning can be taken care. However, I will again look more into it, if we > need to take additional care for type conversions. Thanks. >> >> >> I somewhat miss an idea on how to deal with irregular SLP cases - that is, >> does the AST model each (current) SLP node as a single statement or >> does it model individual vector elements (so you can insert no-op >> compensation >> code to make the operations match). Consider >> >> a[i] = b[i] * 3; >> a[i+1] = b[i+1]; >> >> which can be vectorized with SLP if you realize you can multiply b[i+1] by >> 1. > > > The pass structure we are having for our optimization is as follows: > - New pass target-aware-loop-vect with
Question about subregs on constants
Hi, what does speak against folding SUBREGs on constants in fold_rtx? CSE does refuse to propagate constants into subreg expressions probably because fold_rtx does not handle it - and in fact a subreg on a constant does not seem to be defined. I'm wondering why this is the case? What's the problem with simplifying subregs on constants? If there is a good reason not to fold things like: (subreg:DI (const_int 1 [0x1]) 0) What about simplifying? (and:DI (subreg:DI (const_int 1 [0x1]) 0) (const_int 63 [0x3f]))) May we take care of it in simplify_binary_operation perhaps? I ran into these problems when trying to fix the shift patterns in the S/390 back end: https://gcc.gnu.org/ml/gcc-patches/2015-08/msg00346.html I see a few performance regressions with it due to missed optimizations. Bye, -Andreas-
Re: Question about subregs on constants
On Tue, Oct 27, 2015 at 05:47:19PM +0100, Andreas Krebbel wrote: > Hi, > > what does speak against folding SUBREGs on constants in fold_rtx? > > CSE does refuse to propagate constants into subreg expressions probably > because fold_rtx does not > handle it - and in fact a subreg on a constant does not seem to be defined. > I'm wondering why this > is the case? What's the problem with simplifying subregs on constants? > > If there is a good reason not to fold things like: > (subreg:DI (const_int 1 [0x1]) 0) This is invalid RTL, so it shouldn't be generated at all. The problem is that CONST_INT has VOIDmode, and a valid SUBREG needs both inner and outer mode to figure out which bits it is talking about. Therefore, wherever you end up with replacing SUBREG_REG with CONST_INT or other modeless RTL, there is a bug; instead of that the code should be using something like simplify_replace_rtx or simplify_replace_fn_rtx, where the result is immediately simplified at the point where the original inner mode is still known. Jakub
Re: inline asm and multi-alternative constraints
On 10/25/2015 09:41 PM, David Wohlferd wrote: Does gcc's inline asm support multi-alternative constraints? Or are they only supported for md? The fact that it is doc'ed with the other constraints (https://gcc.gnu.org/onlinedocs/gcc/Constraints.html) says it works for inline. But https://gcc.gnu.org/bugzilla/show_bug.cgi?id=10396#c17 says it only works for md. I've got a patch ready to remove this section from the non-md docs (attached). But there probably needs to be more support than a 11 year old comment to approve it. Dropping a supported feature is always controversial. But if it doesn't work, perhaps less so. After all, doc'ing something that doesn't work is just as bad. dw PS If it *is* supported, then the docs need some work. I think Richard corrected me last I spoke on this topic :-) They *are* supported. ie, something like this should work on a ciscy target. asm("add %0,%1" : "=r,m"(x) : "rim,ri"(y))
gcc-5-20151027 is now available
Snapshot gcc-5-20151027 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/5-20151027/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 5 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-5-branch revision 229465 You'll find: gcc-5-20151027.tar.bz2 Complete GCC MD5=fba2ddcf8d19cc78b84aa347183538bc SHA1=171bff0d5bae594c7e97190bed0c50085eb94ea9 Diffs from 5-20151020 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-5 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
_Fract types and conversion routines
I have a question about the _Fract types and their conversion routines. If I compile this program: extern void abort (void); int main () { signed char a = -1; _Sat unsigned _Fract b = a; if (b != 0.0ur) abort(); return 0; } with -O0 and on a MIPS32 system where char is 1 byte and unsigned (int) is 4 bytes I see a call to '__satfractqiuhq' for the conversion. Now I think the 'qi' part of the name is for the 'from type' of the conversion, a 1 byte signed type (signed char), and the 'uhq' part is for the 'to' part of the conversion. But 'uhq' would be a 2 byte unsigned fract, and the unsigned fract type on MIPS should be 4 bytes (unsigned int is 4 bytes). So shouldn't GCC have generated a call to __satfractqiusq instead? Or am I confused? Steve Ellcey sell...@imgtec.com