Re: Can shrink-wrapping ever move prologue past an ASM statement?
Hi, On Tue, Jul 07, 2015 at 01:44:15PM -0500, Segher Boessenkool wrote: > On Tue, Jul 07, 2015 at 07:53:49PM +0200, Martin Jambor wrote: > > I've been asked to look into the item one of > > http://permalink.gmane.org/gmane.linux.kernel/1990397 and found out > > that at least shrink-wrapping happily moves prologue past an asm > > statement which can be bad if the asm statement contains a call > > instruction. > > > > Am I right concluding that this is a bug? Looking into the manual and > > at requires_stack_frame_p() in shrink-wrap.c, I do not see any obvious > > way of marking the asm statement as requiring the stack frame (but I > > will not mind being proven wrong). Do we want to create one, such as > > only disallowing moving prologue past volatile asm statements? Any > > other ideas? > > For architectures like PowerPC where all calls clobber some register, > you can write e.g. > > asm("bl func" : : : "lr"); > > and all is well (better than saving/restoring LR manually, too). > > > For other archs, e.g. x86-64, you can do > > register void *sp asm("%sp"); > asm volatile("call func" : "+r"(sp)); > Thanks, this is very clever and apparently what is going to be used: https://lkml.org/lkml/2015/7/7/1071 > and the result seems to be optimal as well. > > > Some special clobber, maybe "stack" (like "memory", which won't work) > could be nicer? What should the *exact* semantics of that be? > Well, I only have had a quick look at where things "go wrong" and have not spent much time thinking about all the things that coud break by moving an asm statement before stack frame setup so I don't know. I'm CCing Josh who has been bitten by this, perhaps he can share some insight. Nevertheless, given that this was discovered only now, it might not be a common problem but of course it may become one if we improve shrink-wrapping further. Using a register variable seems like a workaround, so having a "stack" input operand is probably worth some further thought. Martin
Re: Can shrink-wrapping ever move prologue past an ASM statement?
On Tue, Jul 07, 2015 at 02:25:34PM -0600, Jeff Law wrote: > On 07/07/2015 11:53 AM, Martin Jambor wrote: > >Hi, > > > >I've been asked to look into the item one of > >http://permalink.gmane.org/gmane.linux.kernel/1990397 and found out > >that at least shrink-wrapping happily moves prologue past an asm > >statement which can be bad if the asm statement contains a call > >instruction. > > > >Am I right concluding that this is a bug? Looking into the manual and > >at requires_stack_frame_p() in shrink-wrap.c, I do not see any obvious > >way of marking the asm statement as requiring the stack frame (but I > >will not mind being proven wrong). Do we want to create one, such as > >only disallowing moving prologue past volatile asm statements? Any > >other ideas? > Shouldn't this be driven by dataflow? Sure, the correct way of handling this internally would be to put the stack pointer register into the use set of the asm statement. Please disregard my mention of volatile asms, if that is what you are referring to, that indeed does not seem wise. But see the other part of this thread about a possible "stack" input operand. Thanks, Martin
Re: Question about always executed info computed in tree-ssa-loop-im.c
On Wed, Jul 8, 2015 at 8:52 AM, Bin.Cheng wrote: > Hi, > Function fill_always_executed_in_1 computes basic blocks' always > executed information, and it has below code and comment: > > /* In a loop that is always entered we may proceed anyway. > But record that we entered it and stop once we leave it. */ > inn_loop = bb->loop_father; > > Then in following iterations, it breaks the loop if basic block not > belonging to the inner loop is encountered. This means basic blocks > after inner loop won't have always executed information computed, even > they dominates the original loop's latch. > > Am I missing something? Why is that? To improve here it would need to verify that all exits of the inner loop exit to the same BB of the outer loop and that no exit skips any blocks in the outer loop. Like for for (;;) { for (;;) { if (x) goto skip; if (y) break } foo(); skip: } so it is just a simple and conservative algorithm it seems. It's also quadratic in the number of BBs of the outermost loop. > Thanks, > bin
Re: making the new if-converter not mangle IR that is already vectorizer-friendly
Abe wrote: [snip] Predication of instructions can help to remove the burden of the conditional branch, but is not available on all relevant architectures. In some architectures that are typically implemented in modern high-speed processors -- i.e. with high core frequency, caches, speculative execution, etc. -- there is not full support for predication [as there is, for example, in 32-bit ARM] but there _is_ support for "conditional move" [hereinafter "cmove"]. If/when the ISA in question supports cmove to/from main memory, perhaps it would be profitable to use two cmoves back-to-back with opposite conditions and the same register [destination for load, source for store] to implement e.g. "temp = c ? X[x] : Y[y]" and/or "temp = C[c] ? X[x] : Y[y]" etc. Even without cmove to/from main memory, two cmoves back-to-back with opposite conditions could still be used, e.g. [not for a real-world ISA]: load X[x] -> reg1 load Y[y] -> reg2 cmove c ? reg1 -> reg3 cmove (!c) ? reg2 -> reg3 Or even better if the ISA can support something like: load X[x] -> reg1 load Y[y] -> reg2 cmove (c ? reg1 : reg2) -> reg3 However, this is a code-gen issue from my POV, and at the present time all the if-conversion work is occurring at the GIMPLE level. If anybody reading this knows how I could make the if converter generate GIMPLE that leads to code-gen that is better for at least one ISA Ramana writes that your patch for PR46029 generates more 'csel's (i.e. the second form you write) for AArch64: https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01721.html of course this says nothing about whether there is *some* other ISA that gets regressed! HTH Alan
Re: Question about always executed info computed in tree-ssa-loop-im.c
On Wed, Jul 8, 2015 at 5:51 PM, Richard Biener wrote: > On Wed, Jul 8, 2015 at 8:52 AM, Bin.Cheng wrote: >> Hi, >> Function fill_always_executed_in_1 computes basic blocks' always >> executed information, and it has below code and comment: >> >> /* In a loop that is always entered we may proceed anyway. >> But record that we entered it and stop once we leave it. */ >> inn_loop = bb->loop_father; >> >> Then in following iterations, it breaks the loop if basic block not >> belonging to the inner loop is encountered. This means basic blocks >> after inner loop won't have always executed information computed, even >> they dominates the original loop's latch. >> >> Am I missing something? Why is that? > > To improve here it would need to verify that all exits of the inner loop > exit to the same BB of the outer loop and that no exit skips any blocks > in the outer loop. Like for But we are working on dominating tree anyway. Won't dominating latch be enough? Thanks, bin > > for (;;) > { > for (;;) { if (x) goto skip; if (y) break } > foo(); > skip: > } > > so it is just a simple and conservative algorithm it seems. It's also > quadratic in the number of BBs of the outermost loop. > >> Thanks, >> bin
Re: Question about always executed info computed in tree-ssa-loop-im.c
On Wed, Jul 8, 2015 at 5:58 PM, Bin.Cheng wrote: > On Wed, Jul 8, 2015 at 5:51 PM, Richard Biener > wrote: >> On Wed, Jul 8, 2015 at 8:52 AM, Bin.Cheng wrote: >>> Hi, >>> Function fill_always_executed_in_1 computes basic blocks' always >>> executed information, and it has below code and comment: >>> >>> /* In a loop that is always entered we may proceed anyway. >>> But record that we entered it and stop once we leave it. */ >>> inn_loop = bb->loop_father; >>> >>> Then in following iterations, it breaks the loop if basic block not >>> belonging to the inner loop is encountered. This means basic blocks >>> after inner loop won't have always executed information computed, even >>> they dominates the original loop's latch. >>> >>> Am I missing something? Why is that? >> >> To improve here it would need to verify that all exits of the inner loop >> exit to the same BB of the outer loop and that no exit skips any blocks >> in the outer loop. Like for > But we are working on dominating tree anyway. Won't dominating latch be > enough? > > Thanks, > bin >> >> for (;;) >> { >> for (;;) { if (x) goto skip; if (y) break } >> foo(); >> skip: >> } >> >> so it is just a simple and conservative algorithm it seems. It's also >> quadratic in the number of BBs of the outermost loop. What we need to do is check if inner loop's exit goes to basic block not belonging to outer loop? Thanks, bin
Re: Can shrink-wrapping ever move prologue past an ASM statement?
On Wed, Jul 08, 2015 at 11:23:09AM +0200, Martin Jambor wrote: > > For other archs, e.g. x86-64, you can do > > > > register void *sp asm("%sp"); > > asm volatile("call func" : "+r"(sp)); > Well, I only have had a quick look at where things "go wrong" and have > not spent much time thinking about all the things that coud break by > moving an asm statement before stack frame setup so I don't know. What goes wrong is that the "plain" asm (just the call) does not say it needs the prologue of this function to be before it. But it needs some of its effects: at least it needs the stack pointer changes (by the prologue) to be done before it. Writing the asm with a clobber of the stack pointer causes all stack accesses to go via the frame pointer, which causes pretty horrible code. Using just an input of the stack pointer, like in register void *sp asm("%sp"); asm volatile("call func" : : "r"(sp)); also works, at least in my simple test case; but the version that also writes to sp prevents the asm's movement more. > Nevertheless, given that this was discovered only now, it might not be > a common problem but of course it may become one if we improve > shrink-wrapping further. GCC 5 and/or GCC 6 make shrink wrapping more aggressive, quite nice. > Using a register variable seems like a workaround, That is only a workaround for the fact that there is no other way to say "this input should be in this specific register" (in general; some ports have such constraints for some registers). Having the asm take the stack pointer as input is hackish yes. > so having a "stack" input operand is probably worth some > further thought. It would be nice if you could just say "this asm has to stay behind the prologue", yes. I suggested using a special clobber, not an input, but that is just syntax quibbles. The question is if this usage is common enough to warrant a special syntax at all. Segher
Re: Question about always executed info computed in tree-ssa-loop-im.c
On Wed, Jul 8, 2015 at 12:01 PM, Bin.Cheng wrote: > On Wed, Jul 8, 2015 at 5:58 PM, Bin.Cheng wrote: >> On Wed, Jul 8, 2015 at 5:51 PM, Richard Biener >> wrote: >>> On Wed, Jul 8, 2015 at 8:52 AM, Bin.Cheng wrote: Hi, Function fill_always_executed_in_1 computes basic blocks' always executed information, and it has below code and comment: /* In a loop that is always entered we may proceed anyway. But record that we entered it and stop once we leave it. */ inn_loop = bb->loop_father; Then in following iterations, it breaks the loop if basic block not belonging to the inner loop is encountered. This means basic blocks after inner loop won't have always executed information computed, even they dominates the original loop's latch. Am I missing something? Why is that? >>> >>> To improve here it would need to verify that all exits of the inner loop >>> exit to the same BB of the outer loop and that no exit skips any blocks >>> in the outer loop. Like for >> But we are working on dominating tree anyway. Won't dominating latch be >> enough? Hmm, yes. >> Thanks, >> bin >>> >>> for (;;) >>> { >>> for (;;) { if (x) goto skip; if (y) break } >>> foo(); >>> skip: >>> } >>> >>> so it is just a simple and conservative algorithm it seems. It's also >>> quadratic in the number of BBs of the outermost loop. > What we need to do is check if inner loop's exit goes to basic block > not belonging to outer loop? Doesn't FOR_EACH_EDGE (e, ei, bb->succs) if (!flow_bb_inside_loop_p (loop, e->dest)) break; if (e) break; catch this? Richard. > Thanks, > bin
CFG transformation of loops with continue statement inside the loops.
All: While/For ( condition1) { Some code here. If(condition2 ) continue; Some code here. } Fig(1) For the above loop in Fig(1) there will be two backedges and multiple latches. The below code can be transformed to the below in order to have a single backedge. While/For (condition1) { Some code here. If( condition2) Goto latch; Some code here. Latch: } Fig(2). With the transformation shown in Fig(2) the presence of GoTo inside loops affect and disables many optimizations. To enable the loops in Fig(1) can also be transformed to multiple loops with each backedge that will make the affected optimizations enabled and the transformed Multiple loops will enable many optimizations that are disabled with the presence of GoTo in Fig(2) and multiple latches given in Fig(1). Thoughts? Thanks & Regards Ajit
Re: [AD] UltraGDB, an alternative tool to debug GCC, GDB, LLDB, etc. on Windows and Linux
On Wed, Jul 8, 2015 at 3:18 AM, Sergio Durigan Junior wrote: > > I tried to find the source code of this package, but I could not find > it. Do you have a URL or something you can provide? > Source code is now available at https://github.com/ultragdb/org.eclipse.cdt -- Xu,Chiheng(徐持恒)
Re: Can shrink-wrapping ever move prologue past an ASM statement?
On Wed, Jul 08, 2015 at 05:36:31AM -0500, Segher Boessenkool wrote: > On Wed, Jul 08, 2015 at 11:23:09AM +0200, Martin Jambor wrote: > > > For other archs, e.g. x86-64, you can do > > > > > > register void *sp asm("%sp"); > > > asm volatile("call func" : "+r"(sp)); > > > > > Well, I only have had a quick look at where things "go wrong" and have > > not spent much time thinking about all the things that coud break by > > moving an asm statement before stack frame setup so I don't know. > > What goes wrong is that the "plain" asm (just the call) does not say it > needs the prologue of this function to be before it. But it needs some > of its effects: at least it needs the stack pointer changes (by the > prologue) to be done before it. > > Writing the asm with a clobber of the stack pointer causes all stack > accesses to go via the frame pointer, which causes pretty horrible > code. As far as I can tell, most (but not all) kernel stack accesses already occur via the frame pointer anyway. Is that bad? Can you clarify what you mean by horrible code? > Using just an input of the stack pointer, like in > > register void *sp asm("%sp"); > asm volatile("call func" : : "r"(sp)); > > also works, at least in my simple test case; but the version that also > writes to sp prevents the asm's movement more. > > > Nevertheless, given that this was discovered only now, it might not be > > a common problem but of course it may become one if we improve > > shrink-wrapping further. > > GCC 5 and/or GCC 6 make shrink wrapping more aggressive, quite nice. > > > Using a register variable seems like a workaround, > > That is only a workaround for the fact that there is no other way to > say "this input should be in this specific register" (in general; > some ports have such constraints for some registers). > > Having the asm take the stack pointer as input is hackish yes. > > > so having a "stack" input operand is probably worth some > > further thought. > > It would be nice if you could just say "this asm has to stay behind > the prologue", yes. I suggested using a special clobber, not an input, > but that is just syntax quibbles. > > The question is if this usage is common enough to warrant a special > syntax at all. In the kernel it only affects a handful of code locations which are mostly wrapped in macros, so I don't really see the need for a special syntax. IMO, the existing syntax is fine for the kernel. Thanks! -- Josh
Re: Can shrink-wrapping ever move prologue past an ASM statement?
On Wed, Jul 08, 2015 at 11:22:34AM -0500, Josh Poimboeuf wrote: > > Writing the asm with a clobber of the stack pointer causes all stack > > accesses to go via the frame pointer, which causes pretty horrible > > code. > > As far as I can tell, most (but not all) kernel stack accesses already > occur via the frame pointer anyway. Is that bad? Can you clarify what > you mean by horrible code? If you already have a frame pointer anyway it won't get much worse, sure :-) If you don't have one yet you will get one, and a stack frame, and all that boilerplate. You don't want to force that if you don't need it. Segher
Re: making the new if-converter not mangle IR that is already vectorizer-friendly
[Abe wrote:] [snip] Even without cmove to/from main memory, two cmoves back-to-back with opposite conditions could still be used, e.g. [not for a real-world ISA]: load X[x] -> reg1 load Y[y] -> reg2 cmove c ? reg1 -> reg3 cmove (!c) ? reg2 -> reg3 Or even better if the ISA can support something like: load X[x] -> reg1 load Y[y] -> reg2 cmove (c ? reg1 : reg2) -> reg3 However, this is a code-gen issue from my POV, and at the present time all the if-conversion work is occurring at the GIMPLE level. If anybody reading this knows how I could make the if converter generate GIMPLE that leads to code-gen that is better for at least one ISA [Alan wrote:] Ramana writes that your patch for PR46029 generates more 'csel's (i.e. the second form you write) for AArch64: https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01721.html Thanks to Ramana for doing that analysis and thanks to Alan for informing me. of course this says nothing about whether there is *some* other ISA that gets regressed! After finishing fixing the known regressions, I intend/plan to reg-test for AArch64; after that, I think I`m going to need some community help to reg-test for other ISAs. Regards, Abe
Re: Can shrink-wrapping ever move prologue past an ASM statement?
On Wed, Jul 08, 2015 at 11:37:35AM -0500, Segher Boessenkool wrote: > On Wed, Jul 08, 2015 at 11:22:34AM -0500, Josh Poimboeuf wrote: > > > Writing the asm with a clobber of the stack pointer causes all stack > > > accesses to go via the frame pointer, which causes pretty horrible > > > code. > > > > As far as I can tell, most (but not all) kernel stack accesses already > > occur via the frame pointer anyway. Is that bad? Can you clarify what > > you mean by horrible code? > > If you already have a frame pointer anyway it won't get much worse, > sure :-) > > If you don't have one yet you will get one, and a stack frame, and > all that boilerplate. You don't want to force that if you don't > need it. Ah. We're only clobbering the stack pointer when asm() has a call instruction. In that case we need to save and setup the frame pointer anyway. So I don't think it's a problem. -- Josh
Re: X18 on AArch64
Am 07.07.2015 um 23:01 schrieb pins...@gmail.com: > What does the elf abi say about x18, I thought it was just another temp. If > the target does not use it as a platform reg. Note there are many assembly > files which might use x18 also due to that. > > So this means you need wrapper functions when moving between the different > abis. Nothing much can be done in gcc really. Please push this back to wine > instead. > > Also I think windows had a bad choice of using x18 when there was a system > register already for tls. > > Thanks, > Andrew Hi, I guess i haven't explained the problem too good. The problem already starts with Wines environment which includes things we can't do something about like the loader/dynamic linker... Using wrapper functions in Wine is not feasible, that'd be a massive infrastructure change to huge and old codebase, would need a lot of work and in the end won't get upstream. What's needed is, is that distributions don't use x18 by default, so we need a good patch for that, hence i ask for a usable chain register (X8-X15?) Truly it would be better to upstream that to gcc, but I don't have much hope left. For the ELF ABI thing, i didn't found specific calling conventions or register usage description in it... For the windows thing, good choices are not part of their development process ;)
Re: X18 on AArch64
> On Jul 8, 2015, at 11:39 AM, André Hentschel wrote: > >> Am 07.07.2015 um 23:01 schrieb pins...@gmail.com: >> What does the elf abi say about x18, I thought it was just another temp. If >> the target does not use it as a platform reg. Note there are many assembly >> files which might use x18 also due to that. >> >> So this means you need wrapper functions when moving between the different >> abis. Nothing much can be done in gcc really. Please push this back to wine >> instead. >> >> Also I think windows had a bad choice of using x18 when there was a system >> register already for tls. >> >> Thanks, >> Andrew > > Hi, > I guess i haven't explained the problem too good. > The problem already starts with Wines environment which includes things we > can't do something about like the loader/dynamic linker... > Using wrapper functions in Wine is not feasible, that'd be a massive > infrastructure change to huge and old codebase, would need a lot of work > and in the end won't get upstream. So what. Changing gcc and distros for one specific thing would be a step backwards. Changing wine to better support huge differences in abi would be better. > What's needed is, is that distributions don't use x18 by default, so we need > a good patch for that, hence i ask for a usable chain register (X8-X15?) I think this is a bad idea because it would mean old code will no longer work. > Truly it would be better to upstream that to gcc, but I don't have much hope > left. > For the ELF ABI thing, i didn't found specific calling conventions or > register usage description in it... It is part of the aacps calling convention that arm provides. Thanks, Andrew > For the windows thing, good choices are not part of their development process > ;) >
libgomp: gomp_ptrlock*() specification
Hello, from the use cases in libgomp it seems that you may set the value of a ptrlock object at most once via gomp_ptrlock_set() if it wasn't already initialized to a non-NULL value via gomp_ptrlock_init()? Is this correct? Why is there a specialized Linux implementation? Is this to save one extra integer for the mutex? -- Sebastian Huber, embedded brains GmbH Address : Dornierstr. 4, D-82178 Puchheim, Germany Phone : +49 89 189 47 41-16 Fax : +49 89 189 47 41-09 E-Mail : sebastian.huber at embedded-brains.de PGP : Public key available on request. Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.
Re: libgomp: gomp_ptrlock*() specification
On Wed, Jul 08, 2015 at 09:36:43PM +0200, Sebastian Huber wrote: > from the use cases in libgomp it seems that you may set the value of a > ptrlock object at most once via gomp_ptrlock_set() if it wasn't already > initialized to a non-NULL value via gomp_ptrlock_init()? Is this correct? Please see https://gcc.gnu.org/ml/gcc-patches/2008-03/msg01278.html for details. > Why is there a specialized Linux implementation? Is this to save one extra > integer for the mutex? Not just that, it is simply significantly more efficient, which is very important for such a frequently used synchronization primitive. You are free to define it for some other target also efficiently. Jakub
s390: larl for Simode on 64-bit
Is there any reason that LARL can't be used to load a 32-bit symbolic value, in 64-bit mode? On TPF (64-bit) the app has the option of being loaded in the first 4Gb so that all symbols are also valid 32-bit addresses, for backward compatibility. (and if not, the linker would complain) Index: s390.md === --- s390.md (revision 225579) +++ s390.md (working copy) @@ -1845,13 +1845,13 @@ emit_symbolic_move (operands); }) (define_insn "*movsi_larl" [(set (match_operand:SI 0 "register_operand" "=d") (match_operand:SI 1 "larl_operand" "X"))] - "!TARGET_64BIT && TARGET_CPU_ZARCH + "TARGET_CPU_ZARCH && !FP_REG_P (operands[0])" "larl\t%0,%1" [(set_attr "op_type" "RIL") (set_attr "type""larl") (set_attr "z10prop" "z10_fwd_A1")])
Re: Can shrink-wrapping ever move prologue past an ASM statement?
On Wed, Jul 08, 2015 at 11:22:34AM -0500, Josh Poimboeuf wrote: > On Wed, Jul 08, 2015 at 05:36:31AM -0500, Segher Boessenkool wrote: > > On Wed, Jul 08, 2015 at 11:23:09AM +0200, Martin Jambor wrote: > > > > For other archs, e.g. x86-64, you can do > > > > > > > > register void *sp asm("%sp"); > > > > asm volatile("call func" : "+r"(sp)); I've found that putting "sp" in the clobber list also seems to work: asm volatile("call func" : : : "sp"); This syntax is nicer because it doesn't need a local variable associated with the register. Do you see any issues with this approach? -- Josh
Re: s390: larl for Simode on 64-bit
On 07/08/2015 02:33 PM, DJ Delorie wrote: Is there any reason that LARL can't be used to load a 32-bit symbolic value, in 64-bit mode? On TPF (64-bit) the app has the option of being loaded in the first 4Gb so that all symbols are also valid 32-bit addresses, for backward compatibility. (and if not, the linker would complain) It would seem that we'd want the compiler to know when the app is going to be loaded into that first 4G so that it can use the more efficient addressing modes. Jeff
Re: s390: larl for Simode on 64-bit
In the TPF case, the software has to explicitly mark such pointers as SImode (such things happen only when structures that contain addresses can't change size, for backwards compatibility reasons[1]): int * __attribute__((mode(SImode))) ptr; ptr = &some_var; so I wouldn't consider this the "default" case for those apps, just *a* case that needs to be handled "well enough", and the user is already telling the compiler that they assume those addresses are 32-bit (that either the whole app, or at least the part with that object, will be linked below 4Gb). The majority of the addresses are handled as 64-bit. [1] /me refrains from commenting on the worth of such practices, just that they exist and need to be (and have been) supported.
Re: Can shrink-wrapping ever move prologue past an ASM statement?
On Wed, Jul 08, 2015 at 03:51:12PM -0500, Josh Poimboeuf wrote: > > > > > For other archs, e.g. x86-64, you can do > > > > > > > > > > register void *sp asm("%sp"); > > > > > asm volatile("call func" : "+r"(sp)); > > I've found that putting "sp" in the clobber list also seems to work: > > asm volatile("call func" : : : "sp"); > > This syntax is nicer because it doesn't need a local variable associated > with the register. Do you see any issues with this approach? Like I said, that forces the function to have a frame pointer. That might not matter for x86 and your application (because you already have a frame pointer for other reasons). Clobbering sp in inline asm isn't a terribly sane thing to do (and neither is writing to it, as in my code above), of course; we don't *actually* clobber it here (that is, write an unspecified value to it), but still. Just reading it would be better for your sanity, and is also enough to prevent shrink-wrapping the asm. I do believe it should work though, I see no issues with it _that_ way. You'll have to test and see (it works fine in trivial tests). Segher
Re: Can shrink-wrapping ever move prologue past an ASM statement?
On 07/08/2015 02:51 PM, Josh Poimboeuf wrote: On Wed, Jul 08, 2015 at 11:22:34AM -0500, Josh Poimboeuf wrote: On Wed, Jul 08, 2015 at 05:36:31AM -0500, Segher Boessenkool wrote: On Wed, Jul 08, 2015 at 11:23:09AM +0200, Martin Jambor wrote: For other archs, e.g. x86-64, you can do register void *sp asm("%sp"); asm volatile("call func" : "+r"(sp)); I've found that putting "sp" in the clobber list also seems to work: asm volatile("call func" : : : "sp"); This syntax is nicer because it doesn't need a local variable associated with the register. Do you see any issues with this approach? Given that SP isn't subject to register allocation, I'd expect it's fine. Note that some folks have (loudly) requested that GCC issue an error if an asm tries to clobber sp. The call doesn't actually clobber the stack pointer does it? ISTM that a use of sp makes more sense and is better "future proof'd" than clobbering sp. Jeff
Re: Can shrink-wrapping ever move prologue past an ASM statement?
On Wed, Jul 08, 2015 at 04:14:20PM -0500, Segher Boessenkool wrote: > On Wed, Jul 08, 2015 at 03:51:12PM -0500, Josh Poimboeuf wrote: > > > > > > For other archs, e.g. x86-64, you can do > > > > > > > > > > > > register void *sp asm("%sp"); > > > > > > asm volatile("call func" : "+r"(sp)); > > > > I've found that putting "sp" in the clobber list also seems to work: > > > > asm volatile("call func" : : : "sp"); > > > > This syntax is nicer because it doesn't need a local variable associated > > with the register. Do you see any issues with this approach? > > Like I said, that forces the function to have a frame pointer. That might > not matter for x86 and your application (because you already have a frame > pointer for other reasons). > > Clobbering sp in inline asm isn't a terribly sane thing to do (and neither > is writing to it, as in my code above), of course; we don't *actually* > clobber it here (that is, write an unspecified value to it), but still. > Just reading it would be better for your sanity, and is also enough to > prevent shrink-wrapping the asm. > > I do believe it should work though, I see no issues with it _that_ way. > You'll have to test and see (it works fine in trivial tests). My apologies, I misread your earlier comments. You did mention clobbering as a possibility. I'll do some testing with it as an input operand to see if that suffices. Thanks again. -- Josh
Re: Can shrink-wrapping ever move prologue past an ASM statement?
On Wed, Jul 08, 2015 at 03:15:12PM -0600, Jeff Law wrote: > >For other archs, e.g. x86-64, you can do > > > > register void *sp asm("%sp"); > > asm volatile("call func" : "+r"(sp)); > > > >I've found that putting "sp" in the clobber list also seems to work: > > > > asm volatile("call func" : : : "sp"); > > > >This syntax is nicer because it doesn't need a local variable associated > >with the register. Do you see any issues with this approach? > Given that SP isn't subject to register allocation, I'd expect it's > fine. Note that some folks have (loudly) requested that GCC issue an > error if an asm tries to clobber sp. > > The call doesn't actually clobber the stack pointer does it? ISTM that > a use of sp makes more sense and is better "future proof'd" than > clobbering sp. LRA thinks that when an asm clobbers sp, it can no longer eliminate to sp. When an asm "merely" writes to sp, it is fine with that. In both cases everything happily pretends sp does not actually change. So it is a good thing that in fact it doesn't ;-) All these approaches assume the only thing you really want to happen before the asm call is the sp changes done in the prologue; if the prologue does other things you depend on (setting up fp?), if you want futureproofitivity, you want to make the asm depend on that as well. Segher
gcc-4.9-20150708 is now available
Snapshot gcc-4.9-20150708 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/4.9-20150708/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 4.9 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_9-branch revision 225585 You'll find: gcc-4.9-20150708.tar.bz2 Complete GCC MD5=1e59d4617757bcd6194c1f2bca6d8673 SHA1=13cff3f9356d09143a1570607724486d665d646e Diffs from 4.9-20150701 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-4.9 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
Re: s390: larl for Simode on 64-bit
On 07/08/2015 03:05 PM, DJ Delorie wrote: In the TPF case, the software has to explicitly mark such pointers as SImode (such things happen only when structures that contain addresses can't change size, for backwards compatibility reasons[1]): int * __attribute__((mode(SImode))) ptr; ptr = &some_var; So in effect, we have two pointer sizes, 64 being the default, but we can also get a 32 bit pointer via the syntax above? Wow, I'm surprised that works. And the only time we'd be able to use larl is a dereference of a pointer declared with the syntax above. Right OK for the trunk with a simple testcase. I think you can just scan the assembler output for the larl instruction. so I wouldn't consider this the "default" case for those apps, just *a* case that needs to be handled "well enough", and the user is already telling the compiler that they assume those addresses are 32-bit (that either the whole app, or at least the part with that object, will be linked below 4Gb). The majority of the addresses are handled as 64-bit. [1] /me refrains from commenting on the worth of such practices, just that they exist and need to be (and have been) supported. Understood, but we also need to make sure that we don't do something that breaks things. Thus I needed to know the tidbit about explicitly declaring those pointers as SImode. jeff
Re: s390: larl for Simode on 64-bit
> So in effect, we have two pointer sizes, 64 being the default, but > we can also get a 32 bit pointer via the syntax above? Wow, I'm > surprised that works. Yup, been that way for many years. > And the only time we'd be able to use larl is a dereference of a > pointer declared with the syntax above. Right larl would be used to load the address of an object to *initialize* such a pointer, but yes. Regular pointers still use larl but as a DImode operation. I.e. larl will always load a 64-bit value into a register, even if gcc will only use the 32 LSBs. > OK for the trunk with a simple testcase. I think you can just scan > the assembler output for the larl instruction. Will do, but it's part of a bigger patch. I just wanted to make sure there wasn't some side-effect of larl that precluded this use.