Re: GCC 5.1.1 Status Report (2015-06-22)
On Tue, 30 Jun 2015, Jason Merrill wrote: > I'm interested in your thoughts on fixing c++/65945 in 5.2. > > It's trivial to fix the alignment of nullptr_t, but I was concerned about ABI > impact. On further research it seems that it won't cause any trouble with > argument alignment, since that doesn't seem to rely on TYPE_ALIGN at all; I > think the only ABI breakage would come from unaligned nullptr_t fields in > classes, which I expect to be very rare. The testcases that were breaking on > SPARC and ARM without this fix have to do with local stack slots, which are > not part of an interface. > > So I think we can change this without breaking a significant amount of code, > and better to break it now than after we've settled into the new library ABI. > We should certainly mention it prominently in the release notes if we do, and > I've added a -Wabi warning for the field alignment change. > > Does this make sense to you? Yes, that makes sense to me. Richard. > Jason > > -- Richard Biener SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Dilip Upmanyu, Graham Norton, HRB 21284 (AG Nuernberg)
Re: pa indirect_jump instruction
On Tue, Jun 30, 2015 at 09:53:31PM +0100, Richard Sandiford wrote: > I have a series of patches to convert all non-optab instructions to > the target-insns.def interface. config-list.mk showed up one problem > though. The pa indirect_jump pattern is: > > ;;; Hope this is only within a function... > (define_insn "indirect_jump" > [(set (pc) (match_operand 0 "register_operand" "r"))] > "GET_MODE (operands[0]) == word_mode" > "bv%* %%r0(%0)" > [(set_attr "type" "branch") >(set_attr "length" "4")]) > > so the C condition depends on operands[], which isn't usually allowed > for named patterns. We get away with it at the moment because we only > test for the existence of HAVE_indirect_jump, not its value: yeah, I hit this a while ago and filed bug 66114. It looks like I had trouble with fr30 too, is that fixed now? Trev
Re: Code size issues on FP-emulation on libgcc compared to LLVM's compiler_rt
Had anyone a chance to compare FP implementation in compiler_rt? I still wonder why the sizes differ so much, Incomplete implementation in compiler_rt? compiler_rt claims it is IEEE-compliant. 2015-06-30 23:10 GMT+03:00 Joseph Myers : > On Tue, 30 Jun 2015, H.J. Lu wrote: > >> > soft-fp is expected to be used on 32-bit and 64-bit systems for which a >> > few kB code size is insignificant. >> >> Size is very important for IA MCU. Would it be acceptable to update >> soft-fp to optimize for size with >> >> #ifdef __OPTIMIZE_SIZE__ >> #else >> #endif > > I don't think there's any low-hanging fruit for size optimization. If you > wanted to save that 6 or 7 kB (total, across all the float and double code > in libgcc, as compared to fp-bit or ieeelib and mentioned in the Summit > paper) you'd structure the library completely differently, making no > attempt to support rounding modes, exceptions, signs of NaNs, choice of > NaN results or any particular choice for anything not specified in IEEE, > and using common functions whenever appropriate for things such as > unpacking / packing, or shared addition / subtraction code, instead of > inlining everything with macros. The result would be a completely > different library design that wouldn't have anything much to share with > soft-fp. Indeed, such a library might best be written in assembly code > for each architecture (much like the existing code for ARM in libgcc). > > It would not surprise me if carefully examining the code generated for > soft-fp functions (possibly the final assembly code, possibly the > optimized GIMPLE) would show up scope for a few microoptimizations in GCC > where it could optimize the soft-fp code better, but I expect the effects > of such microoptimizations would be fairly small. > > -- > Joseph S. Myers > jos...@codesourcery.com
Re: gcc feature request / RFC: extra clobbered regs
On 06/30/2015 05:37 PM, Jakub Jelinek wrote: On Tue, Jun 30, 2015 at 02:22:33PM -0700, Andy Lutomirski wrote: I'm working on a massive set of cleanups to Linux's syscall handling. We currently have a nasty optimization in which we don't save rbx, rbp, r12, r13, r14, and r15 on x86_64 before calling C functions. This works, but it makes the code a huge mess. I'd rather save all regs in asm and then call C code. Unfortunately, this will add five cycles (on SNB) to one of the hottest paths in the kernel. To counteract it, I have a gcc feature request that might not be all that crazy. When writing C functions intended to be called from asm, what if we could do: __attribute__((extra_clobber("rbx", "rbp", "r12", "r13", "r14", "r15"))) void func(void); This will save enough pushes and pops that it could easily give us our five cycles back and then some. It's also easy to be compatible with old GCC versions -- we could just omit the attribute, since preserving a register is always safe. Thoughts? Is this totally crazy? Is it easy to implement? (I'm not necessarily suggesting that we do this for the syscall bodies themselves. I want to do it for the entry and exit helpers, so we'd still lose the five cycles in the full fast-path case, but we'd do better in the slower paths, and the slower paths are becoming increasingly important in real workloads.) GCC already supports -ffixed-REG, -fcall-used-REG and -fcall-saved-REG options, which allow to tweak the calling conventions; but it is per translation unit right now. It isn't clear which of these options you mean with the extra_clobber. I assume you are looking for a possibility to change this to be per-function, with caller with a different calling convention having to adjust for different ABI callee. To some extent, recent GCC versions do that automatically with -fipa-ra already - if some call used registers are not clobbered by some call and the caller can analyze that callee, it can stick values in such registers across the call. I'd say the most natural API for this would be to allow f{fixed,call-{used,saved}}-REG in target attribute. One consequence of frequent changing calling convention per function or register usage could be GCC slowdown. RA calculates too many data and it requires a lot of time to recalculate them after something in the register usage convention is changed. Another consequence would be that RA fails generate the code in some cases and even worse the failure might depend on version of GCC (I already saw PRs where RA worked for an asm in one GCC version because a pseudo was changed by equivalent constant and failed in another GCC version where it did not happen). Other than that I don't see other complications with implementing such feature.
Re: gcc feature request / RFC: extra clobbered regs
On Wed, Jul 1, 2015 at 8:23 AM, Vladimir Makarov wrote: > > > On 06/30/2015 05:37 PM, Jakub Jelinek wrote: >> >> On Tue, Jun 30, 2015 at 02:22:33PM -0700, Andy Lutomirski wrote: >>> >>> I'm working on a massive set of cleanups to Linux's syscall handling. >>> We currently have a nasty optimization in which we don't save rbx, >>> rbp, r12, r13, r14, and r15 on x86_64 before calling C functions. >>> This works, but it makes the code a huge mess. I'd rather save all >>> regs in asm and then call C code. >>> >>> Unfortunately, this will add five cycles (on SNB) to one of the >>> hottest paths in the kernel. To counteract it, I have a gcc feature >>> request that might not be all that crazy. When writing C functions >>> intended to be called from asm, what if we could do: >>> >>> __attribute__((extra_clobber("rbx", "rbp", "r12", "r13", "r14", >>> "r15"))) void func(void); >>> >>> This will save enough pushes and pops that it could easily give us our >>> five cycles back and then some. It's also easy to be compatible with >>> old GCC versions -- we could just omit the attribute, since preserving >>> a register is always safe. >>> >>> Thoughts? Is this totally crazy? Is it easy to implement? >>> >>> (I'm not necessarily suggesting that we do this for the syscall bodies >>> themselves. I want to do it for the entry and exit helpers, so we'd >>> still lose the five cycles in the full fast-path case, but we'd do >>> better in the slower paths, and the slower paths are becoming >>> increasingly important in real workloads.) >> >> GCC already supports -ffixed-REG, -fcall-used-REG and -fcall-saved-REG >> options, which allow to tweak the calling conventions; but it is per >> translation unit right now. It isn't clear which of these options >> you mean with the extra_clobber. >> I assume you are looking for a possibility to change this to be >> per-function, with caller with a different calling convention having to >> adjust for different ABI callee. To some extent, recent GCC versions >> do that automatically with -fipa-ra already - if some call used registers >> are not clobbered by some call and the caller can analyze that callee, >> it can stick values in such registers across the call. >> I'd say the most natural API for this would be to allow >> f{fixed,call-{used,saved}}-REG in target attribute. >> >> > One consequence of frequent changing calling convention per function or > register usage could be GCC slowdown. RA calculates too many data and it > requires a lot of time to recalculate them after something in the register > usage convention is changed. Do you mean that RA precalculates things based on the calling convention and saves it across functions? Hmm. I don't think this would be a big problem in my intended use case -- there would only be a handful of functions using this extension, and they'd have very few non-asm callers. > > Another consequence would be that RA fails generate the code in some cases > and even worse the failure might depend on version of GCC (I already saw PRs > where RA worked for an asm in one GCC version because a pseudo was changed > by equivalent constant and failed in another GCC version where it did not > happen). > Would this be a problem generating code for a function with extra "used" regs or just a problem generating code to call such a function. I imagine that, in the former case, RA's job would be easier, not harder, since there would be more registers to work with. In practice, though, I think it would just end up changing the prologue and epilogue. --Andy
Re: gcc feature request / RFC: extra clobbered regs
On Wed, Jul 01, 2015 at 11:23:17AM -0400, Vladimir Makarov wrote: > >>(I'm not necessarily suggesting that we do this for the syscall bodies > >>themselves. I want to do it for the entry and exit helpers, so we'd > >>still lose the five cycles in the full fast-path case, but we'd do > >>better in the slower paths, and the slower paths are becoming > >>increasingly important in real workloads.) > >GCC already supports -ffixed-REG, -fcall-used-REG and -fcall-saved-REG > >options, which allow to tweak the calling conventions; but it is per > >translation unit right now. It isn't clear which of these options > >you mean with the extra_clobber. > >I assume you are looking for a possibility to change this to be > >per-function, with caller with a different calling convention having to > >adjust for different ABI callee. To some extent, recent GCC versions > >do that automatically with -fipa-ra already - if some call used registers > >are not clobbered by some call and the caller can analyze that callee, > >it can stick values in such registers across the call. > >I'd say the most natural API for this would be to allow > >f{fixed,call-{used,saved}}-REG in target attribute. > > > > > One consequence of frequent changing calling convention per function or > register usage could be GCC slowdown. RA calculates too many data and it > requires a lot of time to recalculate them after something in the register > usage convention is changed. That is true. i?86/x86_64 is a switchable target, so at least for the case of info computed for the callee with non-standard calling convention such info can be computed just once when the function with such a target attribute would be seen first. But for the caller side, I agree not everything can be precomputed, if we can't use e.g. regsets saved in the callee; as a single function can call different functions with different ABIs. But to some extent we have that already with -fipa-ra, don't we? Jakub
Re: Code size issues on FP-emulation on libgcc compared to LLVM's compiler_rt
The only idea on size difference I have is: headers text in many of FP-emulation files from compiler_rt contains lines like: // This file implements quad-precision soft-float addition ***with the IEEE-754 default rounding*** (to nearest, ties to even). 2015-07-01 16:59 GMT+03:00 Zinovy Nis : > Had anyone a chance to compare FP implementation in compiler_rt? I > still wonder why the sizes differ so much, Incomplete implementation > in compiler_rt? > compiler_rt claims it is IEEE-compliant. > > 2015-06-30 23:10 GMT+03:00 Joseph Myers : >> On Tue, 30 Jun 2015, H.J. Lu wrote: >> >>> > soft-fp is expected to be used on 32-bit and 64-bit systems for which a >>> > few kB code size is insignificant. >>> >>> Size is very important for IA MCU. Would it be acceptable to update >>> soft-fp to optimize for size with >>> >>> #ifdef __OPTIMIZE_SIZE__ >>> #else >>> #endif >> >> I don't think there's any low-hanging fruit for size optimization. If you >> wanted to save that 6 or 7 kB (total, across all the float and double code >> in libgcc, as compared to fp-bit or ieeelib and mentioned in the Summit >> paper) you'd structure the library completely differently, making no >> attempt to support rounding modes, exceptions, signs of NaNs, choice of >> NaN results or any particular choice for anything not specified in IEEE, >> and using common functions whenever appropriate for things such as >> unpacking / packing, or shared addition / subtraction code, instead of >> inlining everything with macros. The result would be a completely >> different library design that wouldn't have anything much to share with >> soft-fp. Indeed, such a library might best be written in assembly code >> for each architecture (much like the existing code for ARM in libgcc). >> >> It would not surprise me if carefully examining the code generated for >> soft-fp functions (possibly the final assembly code, possibly the >> optimized GIMPLE) would show up scope for a few microoptimizations in GCC >> where it could optimize the soft-fp code better, but I expect the effects >> of such microoptimizations would be fairly small. >> >> -- >> Joseph S. Myers >> jos...@codesourcery.com
Re: Code size issues on FP-emulation on libgcc compared to LLVM's compiler_rt
On 01/07/15 16:34, Zinovy Nis wrote: > The only idea on size difference I have is: > > headers text in many of FP-emulation files from compiler_rt contains lines > like: > > // This file implements quad-precision soft-float addition ***with the > IEEE-754 default rounding*** (to nearest, ties to even). > nearest rounding and no exception flags. in other words they assume no fenv access.
CFCs
Sent from my iPhone
Re: Code size issues on FP-emulation on libgcc compared to LLVM's compiler_rt
On Wed, 1 Jul 2015, Zinovy Nis wrote: > Had anyone a chance to compare FP implementation in compiler_rt? I > still wonder why the sizes differ so much, Incomplete implementation > in compiler_rt? > compiler_rt claims it is IEEE-compliant. If you examine the implementation approaches, you will see that apart from the compiler_rt code not being set up for rounding mode and exceptions support (and in some cases, it can be hard to completely optimize generic code as much as code that never has to consider those issues), it (for addition) does normalization in one place, and swaps the arguments in one place so as to know which has the larger magnitude, whereas soft-fp tries to reduce the amount of processing in each case by only normalizing when and to the extent needed and duplicating code for each choice of which argument has the larger exponent (and having separate code for the case of equal exponents). -- Joseph S. Myers jos...@codesourcery.com
Re: gcc feature request / RFC: extra clobbered regs
On 07/01/2015 11:31 AM, Jakub Jelinek wrote: On Wed, Jul 01, 2015 at 11:23:17AM -0400, Vladimir Makarov wrote: (I'm not necessarily suggesting that we do this for the syscall bodies themselves. I want to do it for the entry and exit helpers, so we'd still lose the five cycles in the full fast-path case, but we'd do better in the slower paths, and the slower paths are becoming increasingly important in real workloads.) GCC already supports -ffixed-REG, -fcall-used-REG and -fcall-saved-REG options, which allow to tweak the calling conventions; but it is per translation unit right now. It isn't clear which of these options you mean with the extra_clobber. I assume you are looking for a possibility to change this to be per-function, with caller with a different calling convention having to adjust for different ABI callee. To some extent, recent GCC versions do that automatically with -fipa-ra already - if some call used registers are not clobbered by some call and the caller can analyze that callee, it can stick values in such registers across the call. I'd say the most natural API for this would be to allow f{fixed,call-{used,saved}}-REG in target attribute. One consequence of frequent changing calling convention per function or register usage could be GCC slowdown. RA calculates too many data and it requires a lot of time to recalculate them after something in the register usage convention is changed. That is true. i?86/x86_64 is a switchable target, so at least for the case of info computed for the callee with non-standard calling convention such info can be computed just once when the function with such a target attribute would be seen first. Yes, more clever way could be used. We can can calculate the info for specific calling convention, save it and reuse it for the function with the same attributes. The compilation speed will be ok even with the current implementation if there are few calling convention changes. But for the caller side, I agree not everything can be precomputed, if we can't use e.g. regsets saved in the callee; as a single function can call different functions with different ABIs. But to some extent we have that already with -fipa-ra, don't we? Yes, for -fipa-ra if we saw the function, we know what registers it actually clobbers. If we did not processed it yet, we use the worst case scenario (clobbering all clobbered registers according to calling convention). Actually it raise a question for me. If we describe that a function clobbers more than calling convention and then use it as a value (assigning a variable or passing as an argument) and loosing a track of it and than call it. How can RA know what the call clobbers actually. So for the function with the attributes we should prohibit use it as a value or make the attributes as a part of the function type, or at least say it is unsafe. So now I see this as a *bigger problem* with this extension. Although I guess it already exists as we have description of different ABI as an extension.
Re: gcc feature request / RFC: extra clobbered regs
On Wed, Jul 1, 2015 at 10:35 AM, Vladimir Makarov wrote: > Actually it raise a question for me. If we describe that a function > clobbers more than calling convention and then use it as a value (assigning > a variable or passing as an argument) and loosing a track of it and than > call it. How can RA know what the call clobbers actually. So for the > function with the attributes we should prohibit use it as a value or make > the attributes as a part of the function type, or at least say it is unsafe. I think it should be part of the type. This shouldn't compile: void func(void) __attribute__((used_reg("r12"))); void (*x)(void); x = func; --Andy
Re: gcc feature request / RFC: extra clobbered regs
On Wed, Jul 01, 2015 at 01:35:16PM -0400, Vladimir Makarov wrote: > Actually it raise a question for me. If we describe that a function > clobbers more than calling convention and then use it as a value (assigning > a variable or passing as an argument) and loosing a track of it and than > call it. How can RA know what the call clobbers actually. So for the > function with the attributes we should prohibit use it as a value or make > the attributes as a part of the function type, or at least say it is unsafe. > So now I see this as a *bigger problem* with this extension. Although I > guess it already exists as we have description of different ABI as an > extension. Unfortunately target attribute is function decl attribute rather than function type. And having more attributes affect switchable targets will be non-fun. Jakub
Re: gcc feature request / RFC: extra clobbered regs
On 07/01/2015 11:27 AM, Andy Lutomirski wrote: On Wed, Jul 1, 2015 at 8:23 AM, Vladimir Makarov wrote: On 06/30/2015 05:37 PM, Jakub Jelinek wrote: On Tue, Jun 30, 2015 at 02:22:33PM -0700, Andy Lutomirski wrote: I'm working on a massive set of cleanups to Linux's syscall handling. We currently have a nasty optimization in which we don't save rbx, rbp, r12, r13, r14, and r15 on x86_64 before calling C functions. This works, but it makes the code a huge mess. I'd rather save all regs in asm and then call C code. Unfortunately, this will add five cycles (on SNB) to one of the hottest paths in the kernel. To counteract it, I have a gcc feature request that might not be all that crazy. When writing C functions intended to be called from asm, what if we could do: __attribute__((extra_clobber("rbx", "rbp", "r12", "r13", "r14", "r15"))) void func(void); This will save enough pushes and pops that it could easily give us our five cycles back and then some. It's also easy to be compatible with old GCC versions -- we could just omit the attribute, since preserving a register is always safe. Thoughts? Is this totally crazy? Is it easy to implement? (I'm not necessarily suggesting that we do this for the syscall bodies themselves. I want to do it for the entry and exit helpers, so we'd still lose the five cycles in the full fast-path case, but we'd do better in the slower paths, and the slower paths are becoming increasingly important in real workloads.) GCC already supports -ffixed-REG, -fcall-used-REG and -fcall-saved-REG options, which allow to tweak the calling conventions; but it is per translation unit right now. It isn't clear which of these options you mean with the extra_clobber. I assume you are looking for a possibility to change this to be per-function, with caller with a different calling convention having to adjust for different ABI callee. To some extent, recent GCC versions do that automatically with -fipa-ra already - if some call used registers are not clobbered by some call and the caller can analyze that callee, it can stick values in such registers across the call. I'd say the most natural API for this would be to allow f{fixed,call-{used,saved}}-REG in target attribute. One consequence of frequent changing calling convention per function or register usage could be GCC slowdown. RA calculates too many data and it requires a lot of time to recalculate them after something in the register usage convention is changed. Do you mean that RA precalculates things based on the calling convention and saves it across functions? RA calculates a lot info (register classes, class x class relations etc) based on register usage convention (fixed regs, call used registers etc). If register usage convention is not changed from previous function compilation, RA reuses the info. Otherwise, RA recalculates it. Hmm. I don't think this would be a big problem in my intended use case -- there would only be a handful of functions using this extension, and they'd have very few non-asm callers. Good. I guess it will be rarely used and people will tolerate some extra compilation time. Another consequence would be that RA fails generate the code in some cases and even worse the failure might depend on version of GCC (I already saw PRs where RA worked for an asm in one GCC version because a pseudo was changed by equivalent constant and failed in another GCC version where it did not happen). Would this be a problem generating code for a function with extra "used" regs or just a problem generating code to call such a function. I imagine that, in the former case, RA's job would be easier, not harder, since there would be more registers to work with. Sorry, I meant that the problem will be mostly when the attributes describe more fixed regs. If you describe more clobbered regs, they still can be used for allocator which can spill/restore them (around calls) when they can not be used. Still i think there will be some rare and complicated cases where even describing only clobbered regs can make RA fails in a function calling the function with additional clobbered regs. In practice, though, I think it would just end up changing the prologue and epilogue.
Re: gcc feature request / RFC: extra clobbered regs
On 07/01/2015 01:43 PM, Jakub Jelinek wrote: On Wed, Jul 01, 2015 at 01:35:16PM -0400, Vladimir Makarov wrote: Actually it raise a question for me. If we describe that a function clobbers more than calling convention and then use it as a value (assigning a variable or passing as an argument) and loosing a track of it and than call it. How can RA know what the call clobbers actually. So for the function with the attributes we should prohibit use it as a value or make the attributes as a part of the function type, or at least say it is unsafe. So now I see this as a *bigger problem* with this extension. Although I guess it already exists as we have description of different ABI as an extension. Unfortunately target attribute is function decl attribute rather than function type. And having more attributes affect switchable targets will be non-fun. Making attributes a part of type probably creates a lot issues too. Although I am not a front-end developer, still I think it is hard to implement in front-end. Sticking fully to this approach, it would be logical to describe this as a debug info (I am not sure it is even possible). Portability would be an issue too. It is hard to prevent for a regular C developer to assign such function to variable because it is ok on his system while the compilation of such code may fail on another system.
Re: gcc feature request / RFC: extra clobbered regs
On Wed, Jul 1, 2015 at 10:43 AM, Jakub Jelinek wrote: > On Wed, Jul 01, 2015 at 01:35:16PM -0400, Vladimir Makarov wrote: >> Actually it raise a question for me. If we describe that a function >> clobbers more than calling convention and then use it as a value (assigning >> a variable or passing as an argument) and loosing a track of it and than >> call it. How can RA know what the call clobbers actually. So for the >> function with the attributes we should prohibit use it as a value or make >> the attributes as a part of the function type, or at least say it is unsafe. >> So now I see this as a *bigger problem* with this extension. Although I >> guess it already exists as we have description of different ABI as an >> extension. > > Unfortunately target attribute is function decl attribute rather than > function type. And having more attributes affect switchable targets will be > non-fun. Just to make sure we're on the same page here, if I write: extern void normal_func(void); void weird_func(void) __attribute__((used_regs("r12"))) { // do something normal_func(); // do something } I'd want the code that calls normal_func() to be understand that normal_func() *will* preserve r12 despite the fact that weird_func is allowed to clobber r12. I think this means that the attribute would have to be an attribute of a function, not of the RA while compiling the function. --Andy
making the new if-converter not mangle IR that is already vectorizer-friendly
Dear all, [Please feel free to skip to the second instance of "end of introductions" and read the introduction sections later or never.] Hi! My name is Abe. Although I`m from New York City, I`ve been living in Texas for about 5 years now, due to having been "sucked in" to Texas by Texas A&M University and staying in Texas for an excellent job at the Samsung Austin R&D Center ["SARC"], where the compiler team of which I am a part is working on GCC. As some of you already know, at SARC we are working on a new "if converter" to help convert simple "if"-based blocks of code that appear inside loops into an autovectorizer-friendly form that closely resembles the C ternary operator ["c ? x : y"]. GCC already has such a converter, but it is off by default, in part because it is unsafe: if enabled, it can cause certain code to be transformed in such a way that it malfunctions even though the non-converted code worked just fine with the same inputs. The new converter, originally by my teammate Sebastian Pop, is safer [almost-always safe *]; we are working on getting it into good-enough shape that the always-safe transformations can be turned on by default whenever the autovectorizer is on. * Always safe for stores, sometimes a little risky for loads: speculative loads might cause multithreaded programs with insufficient locking to fail due to writes by another thread being "lost"/"missed", even though the same program works OK "by luck" when compiled without if-conversion of loads. This risk comes mainly/only from what the relevant literature calls a "half hammock": an "if" with a "then" section but no "else" section [or effectively vice-versa, e.g. an empty "then" and a non-empty "else"]. In this case, e.g. "if (c) X[x] = Y[y];" with no attached "else" section is risky to fully if-convert in the event of the code being compiled running multithreaded and not having been written with all the locking it really needs. Respectively, e.g. "if (c) ; /* empty ''then'' */ else X[x] = Y[y];". [end of introductions] One of the reasons the new if converter has not yet been submitted for incorporation into GCC`s trunk is that it still has some performance regressions WRT the old converter, and most of those are "true regressions", i.e. not just because the old converter was less safe and the additional safety is what is causing the loss, but rather because there is more work to do before the patch is ready. As of this writing, the new if converter sometimes tries to "convert" something that is already vectorizer-friendly, and in doing so it renders that code now-NOT-vectorizer-friendly. I think this is the first class of true regression that I should fix. The question is how to do so. My first choice was to try to have the if converter ask the vectorizer "is this code already vectorizable?", but that seems to not be feasible. The second choice is to use loop versioning to defer the decision to the vectorizer itself. In other words, when the if converter sees what it "thinks" is an opportunity to convert a loop, it will duplicate that loop inside a new "if" and convert exactly one of the duplicates, producing something like this: if (__vectorizable_0001__) { /* if-converted loop */ } else { /* original loop */ } Under this plan, the vectorizer will be modified to detect code such as the above, and when appropriate will check to see if the if-converted code is vectorizable, then replacing e.g. "__vectorizable_0001__" with either (0) or (1) as per the check, thus allowing dead-code elimination to clean up the temporary mess. Sebastian has suggested that we add something like "tree ifConversion_condition;" to "struct loop" so that we can keep track of e.g. "__vectorizable_0001__" and use it in the vectorizer. When there is no relevant conversion, this field will be set to null. This seems like a good plan to me. I/we seek your feedback on the above. Regards, Abe
gcc-4.9-20150701 is now available
Snapshot gcc-4.9-20150701 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/4.9-20150701/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 4.9 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_9-branch revision 225283 You'll find: gcc-4.9-20150701.tar.bz2 Complete GCC MD5=2c52d82836abd52743fce577121aa0a2 SHA1=dfaa9f98bb6e067d209e98ac5f5dcbb924e95f31 Diffs from 4.9-20150624 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-4.9 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
RFC: Add R_X86_64_INDBR_GOTPCREL and R_386_INDBR_GOT32
To avoid indirect branch to locally defined functions, I am proposing to add a new relocation, R_X86_64_INDBR_GOTPCREL, to x86-64 psABI: 1. When branching to an external function, foo, toolchain generates call/jmp *foo@GOTPCREL(%rip) with R_X86_64_INDBR_GOTPCREL relocation, instead of call/jmp foo[@PLT] 2. When function foo is locally defined, linker converts call/jmp *foo@GOTPCREL(%rip) to nop call/jmp foo 3. Otherwise, linker treats R_X86_64_INDBR_GOTPCREL the same way as R_X86_64_GOTPCREL. For i386 psABI, we add R_386_INDBR_GOT32: 1. When branching to an external function, foo, in non-PIC mode, toolchain generates call/jmp *foo@GOT with R_386_INDBR_GOT32 relocation, instead of call/jmp foo and in PIC mode call/jmp *foo@GOT(%reg) with R_386_INDBR_GOT32 relocation and REG holds the address of GOT, instead of call/jmp foo@PLT 2. When function foo is locally defined, linker converts call/jmp *foo@GOT[(%reg)] to nop call/jmp foo 3. Otherwise, a. In PIC mode, linker treats R_386_INDBR_GOT32 the same way as R_386_GOT32 and "call/jmp *foo@GOT" is unsupported. b. In no-PIC mode, linker computes its relocation value as relocation value of R_386_GOT32 plus the address of GOT and converts call/jmp *foo@GOT(%reg) to call/jmp *foo@GOT if needed. This new relocation effectively turns off lazy binding on function, foo. For i386, compiler is free to choose any register to hold the address of GOT and there is no need to make EBX a fixed register when branching to an external function in PIC mode. With this new relocation, only a one-byte NOP prefix overhead is added when function, foo, which compiler determines is external, turns out to be local at link-time, because of -Bsymbolic or a definition in another input object file which compiler has no knowledge of. The new -fno-plt GCC option can use R_X86_64_INDBR_GOTPCREL and R_386_INDBR_GOT32 relocations if linker supports them to avoid indirect branch to internal functions. For x86-64 GCC, it is implemented in assembler and linker. Assembler should generate R_X86_64_INDBR_GOTPCREL relocation, instead of R_X86_64_GOTPCREL relocation for “call/jmp *foo@GOTPCREL(%rip)” For i386 GCC, most is implemented in assembler and linker. Assembler should generate R_386_INDBR_GOT32 relocation, instead of R_386_GOT32 relocation, for “call/jmp *foo@GOT(%reg)”. GCC also needs to modify to generate “call/jmp *foo@GOT” in non-PIC mode. H.J.
rl78 vs cse vs memory_address_addr_space
In this bit of code in explow.c: /* By passing constant addresses through registers we get a chance to cse them. */ if (! cse_not_expected && CONSTANT_P (x) && CONSTANT_ADDRESS_P (x)) x = force_reg (address_mode, x); On the rl78 it results in code that's a bit too complex for later passes to be optimized fully. Is there any way to indicate that the above force_reg() is bad for a particular target?
Re: gcc feature request / RFC: extra clobbered regs
On 07/01/2015 10:43 AM, Jakub Jelinek wrote: > On Wed, Jul 01, 2015 at 01:35:16PM -0400, Vladimir Makarov wrote: >> Actually it raise a question for me. If we describe that a function >> clobbers more than calling convention and then use it as a value (assigning >> a variable or passing as an argument) and loosing a track of it and than >> call it. How can RA know what the call clobbers actually. So for the >> function with the attributes we should prohibit use it as a value or make >> the attributes as a part of the function type, or at least say it is unsafe. >> So now I see this as a *bigger problem* with this extension. Although I >> guess it already exists as we have description of different ABI as an >> extension. > > Unfortunately target attribute is function decl attribute rather than > function type. And having more attributes affect switchable targets will be > non-fun. > How on Earth does that work with existing switchable ABIs? Keep in mind that we already support multiple ABIs... -hpa