Re: A question about DCE
Revital, * Revital1 Eres <[EMAIL PROTECTED]> [2008-10-15 02:20]: > > r77 is defined as 'fixed register' which is a register that the register > allocator can not use. (triggers by SPU option -mfixed-range) > r77 is used to pass information to some other routine at run-time (the > next instruction is branch to this routine; the routine does not exist > at the compile time of the function which contains this instruction). You are using r77 as a function argument for this other routine. I believe there is a way to tell GCC that this function uses a special ABI, that it uses different parameter registers. Trevor
Request for acceptance of new port (Cell SPU)
Dear Steering Committee, We, Sony Computer Entertainment, would like to contribute a port for a new target, the Cell SPU, and seek acceptance from the Steering Committee to do so. (David Edelsohn indicated that before submitting patches we should request acceptance for the new port from the Steering Committee.) Thank you, Trevor Smigiel
aligned attribute and the new operator (pr/15795)
Hi, I would like to reopen the discussion for pr/15795, or at least get clarification on the current resolution of WONTFIX. http://gcc.gnu.org/bugzilla/show_bug.cgi?id=15795 Let me state right at the beginning that I am also volunteering to do the actual work to come up with an agreed solution and implement it. Briefly, pr/15795 was submitted to address the issue of calling the new operator for a class which has an alignment greater than that guaranteed by the underlying malloc(). It specifically talks about alignment for vector types, but it applies to any alignment. For example, a 16 byte vector type typically requires 16 byte alignment and malloc() might only guarantee 8 byte alignment. When a class contains that vector the class requires alignment of 16, but a new operator that calls that malloc() will not always return properly aligned memory. The pr proposes some solutions: solution: operator new should always return 16 byte aligned memory response: glibc doesn't do that, neither should libstdc++ I agree with this response. A solution that handles any alignment is better, simply changing the default isn't sufficient. solution: create an additional signature for operator new which has a parameter that indicates the alignment. The compiler should call this version of the operator when necessary and the standard library should provide an appropriate implementation. response: "would interact badly with overriding the default operator new." At first glance, this could break some existing code, but I think this solution could be made to work. I'll discuss below. solution: the library will provide an implementation of a new operator with an additional parameter, but the user is responsible for calling it. response: doesn't work well with existing template class libraries, like STL or Qt. I agree with this response. (There was no agreement or disagreement with this response in the pr.) For the second proposed solution, let's define precisely the cases that "interact badly". I can think of only one case, defined by these conditions: - a type has an alignment greater than what malloc() guarantees - operator new for that type does not call the default implementation for operator new - an object of that type is created using operator new - an appropriate implementation of operator new with the additional alignment parameter is not provided for this type. Clearly, if the compiler does not call the user defined version of operator new in this case, the code is likely to break. Are there other cases which "interact badly"? I'm hoping a solution to this case is as simple as: - when the compiler would call the aligned version of operator new it first checks which definition of operator new would have been called if it were not aligned. If there is an aligned version of operator new in the same place*, call it, otherwise call the non-aligned version and issue a warning. (* for "place" fill in the appropriate C++ jargon for it to make sense, e.g., namespace, class, scope.) - the default versions of operator new and the aligned version of operator new should be defined in the same section. That way, when a user overrides the default operator new, they will get a link error (duplicate definitions of new) unless they also define the aligned version of operator new. Can anyone identify situations where this wouldn't work? In the case where it generates a warning the code might not work because of improper alignment, but at that point I would consider it the users problem. While I'm here let me also point out that an object which is allocated on the stack and has alignment greater than what the stack guarantees is also an issue. I have a patch which fixes this for any alignment, though it doesn't take advantage of stack ordering. Thanks, Trevor
Re: aligned attribute and the new operator (pr/15795)
* Ian Lance Taylor <[EMAIL PROTECTED]> [2006-10-10 10:15]: > [EMAIL PROTECTED] writes: > > > - the default versions of operator new and the aligned version of > > operator new should be defined in the same section. That way, > > when a user overrides the default operator new, they will get > > a link error (duplicate definitions of new) unless they also > > define the aligned version of operator new. > > That would be a bad idea since it would break standard conformant > programs which only override the default operator new. Ahh, right. Regardless, I've realized this technique won't work anyway. No error would be given when the implementation of operator new is declared weak, which is common enough. There are some other possibilities, none of them perfect, like a command line option, or teaching the linker something about operator new. > You are implicitly assuming that we know the alignment of memory > returned by new. If we know that, then whenever we need aligned > memory, we can ask for a bit more memory and force the alignment. > That will always work and avoids the concerns of overriding new. We > can generate more optimal code by providing a command line option > which specifies the alignment of memory returned by new. I think I'm assuming the same thing GCC already assumes about new. I don't see any thing in GCC that attempts to align the result of a call to operator new. For some targets it doesn't matter, for example, when load and store instructions work on unaligned addresses, but pr/15795 points out a case where this doesn't work. Forcing the alignment of the result of new isn't going to work because we have no way to reconstruct the correct address when calling delete. We could save the correct address at the beginning of the memory returned, but that is an ABI change. If we are willing to consider an ABI change, I think an approach that allows new to call some form of memalign would be better than having the compiler force alignment after calling new. Are we open to making an ABI change? For now, I will still promote the original idea of having a second version of new with an alignment argument. And require a command line option to emit calls to the aligned version of the default new operator. Is a command line option sufficient? Trevor
Re: __builtin_expect for indirect function calls
If possible, I agree it seems natural to extend __builtin_expect. My concern would be backwards compatibility. Currently, the prototype for __builtin_expect is long __builtin_expect (long expression, long constant); Extending it to functions would change it to T __builtin_expect (T expression, T expected); With these additional semantics and restrictions: - when the return value is being used as a call expression: * T is the type of 'expression' * 'expected' is allowed to be a non-constant - when the return value is not being used as a call expression: * T is type 'long; * 'expected' must be a compile-time constant Given the above definition, I don't think there is any backwards compatibility issues because we are inspecting the context of the use of __builtin_expect. Rather than the above definition, we could choose not to inspect the context and just say: * T is the type of 'expression' * 'expected' is allowed to be a non-constant In this case I think there would only be compatibility issues with unusual uses of __builtin_expect, for example, if it was being used in a function argument and its type effected overload resolution. Or if the argument was a float and was being implicitly converted to a long (with a warning). There would also be code which previously gave warnings but does not with the extended __builtin_expect. I'm ok with either of these definitions, if extending __builtin_expect is the preferred way to go. Are either of these definitions ok? Or are there other ideas how to define it? Trevor * Mark Mitchell <[EMAIL PROTECTED]> [2008-01-03 12:12]: > Hans-Peter Nilsson wrote: > > On Mon, 17 Dec 2007, [EMAIL PROTECTED] wrote: > >> When we can't hint the real target, we want to hint the most common > >> target. There are potentially clever ways for the compiler to do this > >> automatically, but I'm most interested in giving the user some way to do > >> it explicitly. One possiblity is to have something similar to > >> __builtin_expect, but for functions. For example, I propose: > >> > >> __builtin_expect_call (FP, PFP) > > > > Is there a hidden benefit? I mean, isn't this really > > expressable using builtin_expect as-is, at least when it comes > > to the syntax? > > That was my first thought as well. Before we add __builtin_expect_call, > I think there needs to be a justification of why this can't be done with > __builtin_expect as-is. > > -- > Mark Mitchell > CodeSourcery > [EMAIL PROTECTED] > (650) 331-3385 x713
Re: __builtin_expect for indirect function calls
> > > > which returns the value of FP with the same type as FP, and tells the > > compiler that PFP is the expected target of FP. Trival examples: > > > > typedef void (*fptr_t)(void); > > > > extern void foo(void); > > > > void > > call_fp (fptr_t fp) > > { > > /* Call the function pointed to by fp, but predict it as if it is > >calling foo() */ > > __builtin_expect_call (fp, foo)(); > > __builtin_expect (fp, foo); /* alt __builtin_expect (fp == foo, 1); */ > fp (); > > } > > > > void > > call_fp_predicted (fptr_t fp, fptr_t predicted) > > { > > /* same as above but the function we are calling doesn't have to be > >known at compile time */ > > __builtin_expect_call (fp, predicted)(); > > __builtin_expect (fp, predicted); > fp(); > > I guess the information just isn't readily available in the > preferred form when needed and *that* part could more or less > simply be fixed? The main reason I didn't like this is that in some other context 'fp' could be any expression, potentially with side effects. This would require either special handling by the compiler, or the user would have to be sure to write it such that the side effects only happen once. Trevor
Re: [RFC] Improve Tree-SSA if-conversion - convergence of efforts
Tehila asked me a while ago to comment based on my experience with the RTL if convert pass and the discussions some of us had at the GCC summit. Sorry it took me so long to respond. The target I care about (Cell SPU) has some things that make an aggressive if convert very useful and profitable. It has conditional moves for every mode (there is a single, unified register file), never traps on illegal addresses (addresses always wrap to the 256KB address space), and branches are expensive (there is no hardware cache). The main limitation with the RTL if-convert pass is that it only recognizes specific patterns. It is easy to write a complicated if statement (just using normal C with &&/||) that would never get converted because it ends up with basic blocks that have many in edges that if-convert generally doesn't handle. (In our internal tree we modified the RTL pass to handle some cases of multiple in-edges, and can handle any number of insns in a basic block.) I haven't looked at the tree-SSA if-convert code yet, but based on what was described to me at the summit it seemed to be taking the same approach as the RTL pass. Recognize certain patterns and convert it. I would like to see an approach that is able to take an arbitrary flow graph without backedges and convert it to straight line code, limited only by the cost model and impossible cases (e.g., inline asm). I'm not sure how that would be achieved in a target neutral way. Trevor * Tehila Meyzels <[EMAIL PROTECTED]> [2007-07-31 06:50]: > > Hi, > > I'd like to bring up on the list a discussion that a bunch of people (most > of those CC-ed above) started at the GCC Summit: > > Lately, there were few efforts, that are not necessarily related to each > other, but are all relevant to if-conversion. > Each of them has its own restriction, like a specific control-flow, target > dependent information, permission to transform speculative loads, etc. > > Few patches that I'm aware of are: > 1. Conditional store sinking, by Michael Matz: > http://gcc.gnu.org/ml/gcc-patches/2007-04/msg00724.html > > 2. If -conversion for multiple IF_THEN_ELSE clauses, by Victor Kaplansky: > http://gcc.gnu.org/ml/gcc-patches/2006-05/msg00265.html > Also mentioned here: http://gcc.gnu.org/wiki/AutovectBranchOptimizations > (2.3.3) > > 3. (unconditional) Store sinking (4.1.1 based), by Revital Eres and Victor > Kaplansky: > http://gcc.gnu.org/ml/gcc-patches/2006-05/msg00265.html (same patch as > previous) > Also mentioned here: http://gcc.gnu.org/wiki/AutovectBranchOptimizations > (2.3.2) > > 4. Conditional load hoisting (4.1.1 based), by myself: > http://gcc.gnu.org/ml/gcc-patches/2007-07/msg02168.html > > 5. Maybe more? > > You're welcome to share your/others related works here... > > > I'd like to suggest to converge all these efforts into a single improved > tree-level if-conversion pass (i.e., on the top of tree-if-conv.c). > Currently, the tree-level if-conversion pass is quite limited in several > ways, and mostly with respect to handling of loads/stores (it basically > doesn't handle them), but not only. > > There are several reasons why to store-sinking and load-hoisting should be > combined with the if-conversion pass: > 1. Store-sinking/load hoisting effect one another and they both can create > new opportunities for if-conversion (not only in vectorizable loops, for > example). > Currently, load-store motion pass happens too late and thus don't help > the (tree-ssa) if-converter. > 2. Store-sinking/load hoisting may have an overhead and may degrade > performance unless the relevant conditional branch gets if-converted. > > Issues/Questions to be considered and discussed: > 1. Cost model and machine dependency issues: > - When is it profitable to perform these motions? What is the algorithm > to decide whether there is a good chance for if-conversion? > - Target dependency - What to check? > A. Are there scalar select/cmove/predicated instructions (like in > SPU)? > B. Are there vector select/cmove/predicated instructions (like in > PowerPC)? + will the loop be vectorized? > C. Are speculative loads allowed? Do memory accesses trap? > D. More? > > 2. Which transformations we want to take care of in this pass? >A. Conditional/unconditional loads/stores. >B. PHI nodes with operands that are neither constants nor SSA NAMES > (Currently, this is not supported in tree-if-conv.c). >C. PHIOPT transformations (i.e., merge the PHIOPT pass into this pass > maybe)? >D More? > 3. New control-flow graphs we want to support (besides the regular > IF_THEN_ELSE, diamond-based): > A. Nested diamonds. > B. Sequential diamonds. > C. More? > 4. After we complete this pass, will the RTL-level ifcvt be needed? > I guess the answer is yes, but I would like to hear more opinions. > > Any comments/ideas/thoughts are really appreciated. > > Thanks, > Tehila. > >
__builtin_expect for indirect function calls
Hi, I'm looking for comments on a possible GCC extensions described below. For the target I'm interested in, Cell SPU, taken branches are only predicted correctly by explicitly inserting a specific instructions (a hint instruction) that says "the branch at address A is branching to address B". This allows the processor to prefetch the instructions at B, potentially with no penalty. For indirect function calls, the ideal case is we know the target soon enough at run-time that the hint instruction simply specifies the real target. Soon enough means about 18 cycles before the execution of the branch. I don't have any numbers as to how often this happens, but there are enough cases where it doesn't. When we can't hint the real target, we want to hint the most common target. There are potentially clever ways for the compiler to do this automatically, but I'm most interested in giving the user some way to do it explicitly. One possiblity is to have something similar to __builtin_expect, but for functions. For example, I propose: __builtin_expect_call (FP, PFP) which returns the value of FP with the same type as FP, and tells the compiler that PFP is the expected target of FP. Trival examples: typedef void (*fptr_t)(void); extern void foo(void); void call_fp (fptr_t fp) { /* Call the function pointed to by fp, but predict it as if it is calling foo() */ __builtin_expect_call (fp, foo)(); } void call_fp_predicted (fptr_t fp, fptr_t predicted) { /* same as above but the function we are calling doesn't have to be known at compile time */ __builtin_expect_call (fp, predicted)(); } I believe I can add this just for the SPU target without effecting anything else, but it could be useful for other targets. Are there any comments about the name, semantics, or usefulness of this extension? Thanks, Trevor