date:20150708

Re: Can shrink-wrapping ever move prologue past an ASM statement?

2015-07-08 Thread Martin Jambor

Hi,

On Tue, Jul 07, 2015 at 01:44:15PM -0500, Segher Boessenkool wrote:
> On Tue, Jul 07, 2015 at 07:53:49PM +0200, Martin Jambor wrote:
> > I've been asked to look into the item one of
> > http://permalink.gmane.org/gmane.linux.kernel/1990397 and found out
> > that at least shrink-wrapping happily moves prologue past an asm
> > statement which can be bad if the asm statement contains a call
> > instruction.
> > 
> > Am I right concluding that this is a bug?  Looking into the manual and
> > at requires_stack_frame_p() in shrink-wrap.c, I do not see any obvious
> > way of marking the asm statement as requiring the stack frame (but I
> > will not mind being proven wrong).  Do we want to create one, such as
> > only disallowing moving prologue past volatile asm statements?  Any
> > other ideas?
> 
> For architectures like PowerPC where all calls clobber some register,
> you can write e.g.
> 
>   asm("bl func" : : : "lr");
> 
> and all is well (better than saving/restoring LR manually, too).
> 
> 
> For other archs, e.g. x86-64, you can do
> 
>   register void *sp asm("%sp");
>   asm volatile("call func" : "+r"(sp));
> 

Thanks, this is very clever and apparently what is going to be used:

https://lkml.org/lkml/2015/7/7/1071

> and the result seems to be optimal as well.
> 
> 
> Some special clobber, maybe "stack" (like "memory", which won't work)
> could be nicer?  What should the *exact* semantics of that be?
> 

Well, I only have had a quick look at where things "go wrong" and have
not spent much time thinking about all the things that coud break by
moving an asm statement before stack frame setup so I don't know.  I'm
CCing Josh who has been bitten by this, perhaps he can share some
insight.

Nevertheless, given that this was discovered only now, it might not be
a common problem but of course it may become one if we improve
shrink-wrapping further.  Using a register variable seems like a
workaround, so having a "stack" input operand is probably worth some
further thought.

Martin

Re: Can shrink-wrapping ever move prologue past an ASM statement?

2015-07-08 Thread Martin Jambor

On Tue, Jul 07, 2015 at 02:25:34PM -0600, Jeff Law wrote:
> On 07/07/2015 11:53 AM, Martin Jambor wrote:
> >Hi,
> >
> >I've been asked to look into the item one of
> >http://permalink.gmane.org/gmane.linux.kernel/1990397 and found out
> >that at least shrink-wrapping happily moves prologue past an asm
> >statement which can be bad if the asm statement contains a call
> >instruction.
> >
> >Am I right concluding that this is a bug?  Looking into the manual and
> >at requires_stack_frame_p() in shrink-wrap.c, I do not see any obvious
> >way of marking the asm statement as requiring the stack frame (but I
> >will not mind being proven wrong).  Do we want to create one, such as
> >only disallowing moving prologue past volatile asm statements?  Any
> >other ideas?
> Shouldn't this be driven by dataflow?

Sure, the correct way of handling this internally would be to put the
stack pointer register into the use set of the asm statement.  Please
disregard my mention of volatile asms, if that is what you are
referring to, that indeed does not seem wise.  But see the other part
of this thread about a possible "stack" input operand.

Thanks,

Martin

Re: Question about always executed info computed in tree-ssa-loop-im.c

2015-07-08 Thread Richard Biener

On Wed, Jul 8, 2015 at 8:52 AM, Bin.Cheng  wrote:
> Hi,
> Function fill_always_executed_in_1 computes basic blocks' always
> executed information, and it has below code and comment:
>
>   /* In a loop that is always entered we may proceed anyway.
>  But record that we entered it and stop once we leave it.  */
>   inn_loop = bb->loop_father;
>
> Then in following iterations, it breaks the loop if basic block not
> belonging to the inner loop is encountered.  This means basic blocks
> after inner loop won't have always executed information computed, even
> they dominates the original loop's latch.
>
> Am I missing something?  Why is that?

To improve here it would need to verify that all exits of the inner loop
exit to the same BB of the outer loop and that no exit skips any blocks
in the outer loop.  Like for

  for (;;)
{
  for (;;) { if (x) goto skip; if (y) break }
  foo();
skip:
}

so it is just a simple and conservative algorithm it seems.  It's also
quadratic in the number of BBs of the outermost loop.

> Thanks,
> bin

Re: making the new if-converter not mangle IR that is already vectorizer-friendly

2015-07-08 Thread Alan Lawrence


Abe wrote:
[snip]

Predication of instructions can help to remove the burden of the conditional 
branch, but is not available on all relevant architectures.
In some architectures that are typically implemented in modern high-speed 
processors -- i.e. with high core frequency, caches, speculative execution, 
etc. --
there is not full support for predication [as there is, for example, in 32-bit ARM] but there _is_ 
support for "conditional move" [hereinafter "cmove"].
If/when the ISA in question supports cmove to/from main memory, perhaps it 
would be profitable to use two cmoves back-to-back with opposite conditions
and the same register [destination for load, source for store] to implement e.g. "temp = c ? 
X[x] : Y[y]" and/or "temp = C[c] ? X[x] : Y[y]" etc.
Even without cmove to/from main memory, two cmoves back-to-back with opposite 
conditions could still be used, e.g. [not for a real-world ISA]:
   load X[x] -> reg1
   load Y[y] -> reg2
   cmove   c  ? reg1 -> reg3
   cmove (!c) ? reg2 -> reg3

Or even better if the ISA can support something like:
   load X[x] -> reg1
   load Y[y] -> reg2
   cmove (c ? reg1 : reg2) -> reg3

However, this is a code-gen issue from my POV, and at the present time all the 
if-conversion work is occurring at the GIMPLE level.
If anybody reading this knows how I could make the if converter generate GIMPLE 
that leads to code-gen that is better for at
least one ISA


Ramana writes that your patch for PR46029 generates more 'csel's (i.e. the 
second form you write) for AArch64: 
https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01721.html


of course this says nothing about whether there is *some* other ISA that gets 
regressed!


HTH
Alan

Re: Question about always executed info computed in tree-ssa-loop-im.c

2015-07-08 Thread Bin.Cheng

On Wed, Jul 8, 2015 at 5:51 PM, Richard Biener
 wrote:
> On Wed, Jul 8, 2015 at 8:52 AM, Bin.Cheng  wrote:
>> Hi,
>> Function fill_always_executed_in_1 computes basic blocks' always
>> executed information, and it has below code and comment:
>>
>>   /* In a loop that is always entered we may proceed anyway.
>>  But record that we entered it and stop once we leave it.  */
>>   inn_loop = bb->loop_father;
>>
>> Then in following iterations, it breaks the loop if basic block not
>> belonging to the inner loop is encountered.  This means basic blocks
>> after inner loop won't have always executed information computed, even
>> they dominates the original loop's latch.
>>
>> Am I missing something?  Why is that?
>
> To improve here it would need to verify that all exits of the inner loop
> exit to the same BB of the outer loop and that no exit skips any blocks
> in the outer loop.  Like for
But we are working on dominating tree anyway.  Won't dominating latch be enough?

Thanks,
bin
>
>   for (;;)
> {
>   for (;;) { if (x) goto skip; if (y) break }
>   foo();
> skip:
> }
>
> so it is just a simple and conservative algorithm it seems.  It's also
> quadratic in the number of BBs of the outermost loop.
>
>> Thanks,
>> bin

Re: Question about always executed info computed in tree-ssa-loop-im.c

2015-07-08 Thread Bin.Cheng

On Wed, Jul 8, 2015 at 5:58 PM, Bin.Cheng  wrote:
> On Wed, Jul 8, 2015 at 5:51 PM, Richard Biener
>  wrote:
>> On Wed, Jul 8, 2015 at 8:52 AM, Bin.Cheng  wrote:
>>> Hi,
>>> Function fill_always_executed_in_1 computes basic blocks' always
>>> executed information, and it has below code and comment:
>>>
>>>   /* In a loop that is always entered we may proceed anyway.
>>>  But record that we entered it and stop once we leave it.  */
>>>   inn_loop = bb->loop_father;
>>>
>>> Then in following iterations, it breaks the loop if basic block not
>>> belonging to the inner loop is encountered.  This means basic blocks
>>> after inner loop won't have always executed information computed, even
>>> they dominates the original loop's latch.
>>>
>>> Am I missing something?  Why is that?
>>
>> To improve here it would need to verify that all exits of the inner loop
>> exit to the same BB of the outer loop and that no exit skips any blocks
>> in the outer loop.  Like for
> But we are working on dominating tree anyway.  Won't dominating latch be 
> enough?
>
> Thanks,
> bin
>>
>>   for (;;)
>> {
>>   for (;;) { if (x) goto skip; if (y) break }
>>   foo();
>> skip:
>> }
>>
>> so it is just a simple and conservative algorithm it seems.  It's also
>> quadratic in the number of BBs of the outermost loop.
What we need to do is check if inner loop's exit goes to basic block
not belonging to outer loop?

Thanks,
bin

Re: Can shrink-wrapping ever move prologue past an ASM statement?

2015-07-08 Thread Segher Boessenkool

On Wed, Jul 08, 2015 at 11:23:09AM +0200, Martin Jambor wrote:
> > For other archs, e.g. x86-64, you can do
> > 
> > register void *sp asm("%sp");
> > asm volatile("call func" : "+r"(sp));

> Well, I only have had a quick look at where things "go wrong" and have
> not spent much time thinking about all the things that coud break by
> moving an asm statement before stack frame setup so I don't know.

What goes wrong is that the "plain" asm (just the call) does not say it
needs the prologue of this function to be before it.  But it needs some
of its effects: at least it needs the stack pointer changes (by the
prologue) to be done before it.

Writing the asm with a clobber of the stack pointer causes all stack
accesses to go via the frame pointer, which causes pretty horrible
code.

Using just an input of the stack pointer, like in

register void *sp asm("%sp");
asm volatile("call func" : : "r"(sp));

also works, at least in my simple test case; but the version that also
writes to sp prevents the asm's movement more.

> Nevertheless, given that this was discovered only now, it might not be
> a common problem but of course it may become one if we improve
> shrink-wrapping further.

GCC 5 and/or GCC 6 make shrink wrapping more aggressive, quite nice.

> Using a register variable seems like a workaround,

That is only a workaround for the fact that there is no other way to
say "this input should be in this specific register" (in general;
some ports have such constraints for some registers).

Having the asm take the stack pointer as input is hackish yes.

> so having a "stack" input operand is probably worth some
> further thought.

It would be nice if you could just say "this asm has to stay behind
the prologue", yes.  I suggested using a special clobber, not an input,
but that is just syntax quibbles.

The question is if this usage is common enough to warrant a special
syntax at all.

Segher

Re: Question about always executed info computed in tree-ssa-loop-im.c

2015-07-08 Thread Richard Biener

On Wed, Jul 8, 2015 at 12:01 PM, Bin.Cheng  wrote:
> On Wed, Jul 8, 2015 at 5:58 PM, Bin.Cheng  wrote:
>> On Wed, Jul 8, 2015 at 5:51 PM, Richard Biener
>>  wrote:
>>> On Wed, Jul 8, 2015 at 8:52 AM, Bin.Cheng  wrote:
 Hi,
 Function fill_always_executed_in_1 computes basic blocks' always
 executed information, and it has below code and comment:

   /* In a loop that is always entered we may proceed anyway.
  But record that we entered it and stop once we leave it.  */
   inn_loop = bb->loop_father;

 Then in following iterations, it breaks the loop if basic block not
 belonging to the inner loop is encountered.  This means basic blocks
 after inner loop won't have always executed information computed, even
 they dominates the original loop's latch.

 Am I missing something?  Why is that?
>>>
>>> To improve here it would need to verify that all exits of the inner loop
>>> exit to the same BB of the outer loop and that no exit skips any blocks
>>> in the outer loop.  Like for
>> But we are working on dominating tree anyway.  Won't dominating latch be 
>> enough?

Hmm, yes.

>> Thanks,
>> bin
>>>
>>>   for (;;)
>>> {
>>>   for (;;) { if (x) goto skip; if (y) break }
>>>   foo();
>>> skip:
>>> }
>>>
>>> so it is just a simple and conservative algorithm it seems.  It's also
>>> quadratic in the number of BBs of the outermost loop.
> What we need to do is check if inner loop's exit goes to basic block
> not belonging to outer loop?

Doesn't

  FOR_EACH_EDGE (e, ei, bb->succs)
if (!flow_bb_inside_loop_p (loop, e->dest))
  break;
  if (e)
break;

catch this?

Richard.

> Thanks,
> bin

CFG transformation of loops with continue statement inside the loops.

2015-07-08 Thread Ajit Kumar Agarwal

All:

While/For ( condition1)
{
   Some code here.
  If(condition2 )
continue;
  Some code here.
}

Fig(1)

For the above loop in Fig(1)  there will be two backedges and multiple latches. 
 The below code can be transformed to the below in order to 
have a single backedge.

While/For (condition1)
{
   Some code here.
   If( condition2)
 Goto latch;
  Some code here.

Latch:
}

Fig(2).

With the transformation shown in Fig(2) the presence of GoTo inside loops  
affect and disables many  optimizations. To enable
the loops in Fig(1) can also be transformed to multiple loops with each 
backedge that will make the affected optimizations enabled 
and the transformed Multiple loops will  enable many optimizations that are 
disabled with the presence of GoTo in Fig(2) and multiple
latches given in Fig(1).

Thoughts?

Thanks & Regards
Ajit

Re: [AD] UltraGDB, an alternative tool to debug GCC, GDB, LLDB, etc. on Windows and Linux

2015-07-08 Thread Xu,Chiheng

On Wed, Jul 8, 2015 at 3:18 AM, Sergio Durigan Junior
 wrote:
>
> I tried to find the source code of this package, but I could not find
> it.  Do you have a URL or something you can provide?
>

Source code is now available at  https://github.com/ultragdb/org.eclipse.cdt

-- 
Xu,Chiheng(徐持恒)

Re: Can shrink-wrapping ever move prologue past an ASM statement?

2015-07-08 Thread Josh Poimboeuf

On Wed, Jul 08, 2015 at 05:36:31AM -0500, Segher Boessenkool wrote:
> On Wed, Jul 08, 2015 at 11:23:09AM +0200, Martin Jambor wrote:
> > > For other archs, e.g. x86-64, you can do
> > > 
> > >   register void *sp asm("%sp");
> > >   asm volatile("call func" : "+r"(sp));
> 
> 
> 
> > Well, I only have had a quick look at where things "go wrong" and have
> > not spent much time thinking about all the things that coud break by
> > moving an asm statement before stack frame setup so I don't know.
> 
> What goes wrong is that the "plain" asm (just the call) does not say it
> needs the prologue of this function to be before it.  But it needs some
> of its effects: at least it needs the stack pointer changes (by the
> prologue) to be done before it.
> 
> Writing the asm with a clobber of the stack pointer causes all stack
> accesses to go via the frame pointer, which causes pretty horrible
> code.

As far as I can tell, most (but not all) kernel stack accesses already
occur via the frame pointer anyway.  Is that bad?  Can you clarify what
you mean by horrible code?

> Using just an input of the stack pointer, like in
> 
>   register void *sp asm("%sp");
>   asm volatile("call func" : : "r"(sp));
> 
> also works, at least in my simple test case; but the version that also
> writes to sp prevents the asm's movement more.
> 
> > Nevertheless, given that this was discovered only now, it might not be
> > a common problem but of course it may become one if we improve
> > shrink-wrapping further.
> 
> GCC 5 and/or GCC 6 make shrink wrapping more aggressive, quite nice.
> 
> > Using a register variable seems like a workaround,
> 
> That is only a workaround for the fact that there is no other way to
> say "this input should be in this specific register" (in general;
> some ports have such constraints for some registers).
> 
> Having the asm take the stack pointer as input is hackish yes.
> 
> > so having a "stack" input operand is probably worth some
> > further thought.
> 
> It would be nice if you could just say "this asm has to stay behind
> the prologue", yes.  I suggested using a special clobber, not an input,
> but that is just syntax quibbles.
> 
> The question is if this usage is common enough to warrant a special
> syntax at all.

In the kernel it only affects a handful of code locations which are
mostly wrapped in macros, so I don't really see the need for a special
syntax.  IMO, the existing syntax is fine for the kernel.

Thanks!

-- 
Josh

Re: Can shrink-wrapping ever move prologue past an ASM statement?

2015-07-08 Thread Segher Boessenkool

On Wed, Jul 08, 2015 at 11:22:34AM -0500, Josh Poimboeuf wrote:
> > Writing the asm with a clobber of the stack pointer causes all stack
> > accesses to go via the frame pointer, which causes pretty horrible
> > code.
> 
> As far as I can tell, most (but not all) kernel stack accesses already
> occur via the frame pointer anyway.  Is that bad?  Can you clarify what
> you mean by horrible code?

If you already have a frame pointer anyway it won't get much worse,
sure :-)

If you don't have one yet you will get one, and a stack frame, and
all that boilerplate.  You don't want to force that if you don't
need it.


Segher

Re: making the new if-converter not mangle IR that is already vectorizer-friendly

2015-07-08 Thread Abe


[Abe wrote:]
[snip]

Even without cmove to/from main memory, two cmoves back-to-back with opposite 
conditions could still be used, e.g. [not for a real-world ISA]:
   load X[x] -> reg1
   load Y[y] -> reg2
   cmove   c  ? reg1 -> reg3
   cmove (!c) ? reg2 -> reg3

Or even better if the ISA can support something like:
   load X[x] -> reg1
   load Y[y] -> reg2
   cmove (c ? reg1 : reg2) -> reg3

However, this is a code-gen issue from my POV, and at the present time all the 
if-conversion work is occurring at the GIMPLE level.
If anybody reading this knows how I could make the if converter generate GIMPLE 
that leads to code-gen that is better for at
least one ISA



[Alan wrote:]

Ramana writes that your patch for PR46029 generates more 'csel's (i.e. the 
second form you write)
for AArch64: https://gcc.gnu.org/ml/gcc-patches/2015-06/msg01721.html


Thanks to Ramana for doing that analysis and thanks to Alan for informing me.



of course this says nothing about whether there is *some* other ISA that gets 
regressed!


After finishing fixing the known regressions, I intend/plan to reg-test for 
AArch64;
after that, I think I`m going to need some community help to reg-test for other 
ISAs.

Regards,

Abe

Re: Can shrink-wrapping ever move prologue past an ASM statement?

2015-07-08 Thread Josh Poimboeuf

On Wed, Jul 08, 2015 at 11:37:35AM -0500, Segher Boessenkool wrote:
> On Wed, Jul 08, 2015 at 11:22:34AM -0500, Josh Poimboeuf wrote:
> > > Writing the asm with a clobber of the stack pointer causes all stack
> > > accesses to go via the frame pointer, which causes pretty horrible
> > > code.
> > 
> > As far as I can tell, most (but not all) kernel stack accesses already
> > occur via the frame pointer anyway.  Is that bad?  Can you clarify what
> > you mean by horrible code?
> 
> If you already have a frame pointer anyway it won't get much worse,
> sure :-)
> 
> If you don't have one yet you will get one, and a stack frame, and
> all that boilerplate.  You don't want to force that if you don't
> need it.

Ah.  We're only clobbering the stack pointer when asm() has a call
instruction.  In that case we need to save and setup the frame pointer
anyway.  So I don't think it's a problem.

-- 
Josh

Re: X18 on AArch64

2015-07-08 Thread André Hentschel

Am 07.07.2015 um 23:01 schrieb pins...@gmail.com:
> What does the elf abi say about x18, I thought it was just another temp. If 
> the target does not use it as a platform reg.  Note there are many assembly 
> files which might use x18 also due to that. 
> 
> So this means you need wrapper functions when moving between the different 
> abis. Nothing much can be done in gcc really. Please push this back to wine 
> instead. 
> 
> Also I think windows had a bad choice of using x18 when there was a system 
> register already for tls. 
> 
> Thanks,
> Andrew

Hi,
I guess i haven't explained the problem too good.
The problem already starts with Wines environment which includes things we 
can't do something about like the loader/dynamic linker...
Using wrapper functions in Wine is not feasible, that'd be a massive 
infrastructure change to huge and old codebase, would need a lot of work
and in the end won't get upstream.
What's needed is, is that distributions don't use x18 by default, so we need a 
good patch for that, hence i ask for a usable chain register (X8-X15?)
Truly it would be better to upstream that to gcc, but I don't have much hope 
left.
For the ELF ABI thing, i didn't found specific calling conventions or register 
usage description in it...
For the windows thing, good choices are not part of their development process ;)

Re: X18 on AArch64

2015-07-08 Thread pinskia





> On Jul 8, 2015, at 11:39 AM, André Hentschel  wrote:
> 
>> Am 07.07.2015 um 23:01 schrieb pins...@gmail.com:
>> What does the elf abi say about x18, I thought it was just another temp. If 
>> the target does not use it as a platform reg.  Note there are many assembly 
>> files which might use x18 also due to that. 
>> 
>> So this means you need wrapper functions when moving between the different 
>> abis. Nothing much can be done in gcc really. Please push this back to wine 
>> instead. 
>> 
>> Also I think windows had a bad choice of using x18 when there was a system 
>> register already for tls. 
>> 
>> Thanks,
>> Andrew
> 
> Hi,
> I guess i haven't explained the problem too good.
> The problem already starts with Wines environment which includes things we 
> can't do something about like the loader/dynamic linker...
> Using wrapper functions in Wine is not feasible, that'd be a massive 
> infrastructure change to huge and old codebase, would need a lot of work
> and in the end won't get upstream.

So what. Changing gcc and distros for one specific thing would be a step 
backwards. Changing wine to better support huge differences in abi would be 
better. 


> What's needed is, is that distributions don't use x18 by default, so we need 
> a good patch for that, hence i ask for a usable chain register (X8-X15?)

I think this is a bad idea because it would mean old code will no longer work. 


> Truly it would be better to upstream that to gcc, but I don't have much hope 
> left.
> For the ELF ABI thing, i didn't found specific calling conventions or 
> register usage description in it...

It is part of the aacps calling convention that arm provides. 

Thanks,
Andrew


> For the windows thing, good choices are not part of their development process 
> ;)
>

libgomp: gomp_ptrlock*() specification

2015-07-08 Thread Sebastian Huber

Hello,

from the use cases in libgomp it seems that you may set the value of a ptrlock 
object at most once via gomp_ptrlock_set() if it wasn't already initialized to 
a non-NULL value via gomp_ptrlock_init()?  Is this correct?

Why is there a specialized Linux implementation?  Is this to save one extra 
integer for the mutex?

-- 
Sebastian Huber, embedded brains GmbH

Address : Dornierstr. 4, D-82178 Puchheim, Germany
Phone   : +49 89 189 47 41-16
Fax : +49 89 189 47 41-09
E-Mail  : sebastian.huber at embedded-brains.de
PGP : Public key available on request.

Diese Nachricht ist keine geschäftliche Mitteilung im Sinne des EHUG.

Re: libgomp: gomp_ptrlock*() specification

2015-07-08 Thread Jakub Jelinek

On Wed, Jul 08, 2015 at 09:36:43PM +0200, Sebastian Huber wrote:
> from the use cases in libgomp it seems that you may set the value of a
> ptrlock object at most once via gomp_ptrlock_set() if it wasn't already
> initialized to a non-NULL value via gomp_ptrlock_init()?  Is this correct?

Please see https://gcc.gnu.org/ml/gcc-patches/2008-03/msg01278.html
for details.

> Why is there a specialized Linux implementation?  Is this to save one extra 
> integer for the mutex?

Not just that, it is simply significantly more efficient, which is very
important for such a frequently used synchronization primitive.
You are free to define it for some other target also efficiently.

Jakub

s390: larl for Simode on 64-bit

2015-07-08 Thread DJ Delorie


Is there any reason that LARL can't be used to load a 32-bit symbolic
value, in 64-bit mode?  On TPF (64-bit) the app has the option of
being loaded in the first 4Gb so that all symbols are also valid
32-bit addresses, for backward compatibility.  (and if not, the linker
would complain)

Index: s390.md
===
--- s390.md (revision 225579)
+++ s390.md (working copy)
@@ -1845,13 +1845,13 @@
 emit_symbolic_move (operands);
 })
 
 (define_insn "*movsi_larl"
   [(set (match_operand:SI 0 "register_operand" "=d")
 (match_operand:SI 1 "larl_operand" "X"))]
-  "!TARGET_64BIT && TARGET_CPU_ZARCH
+  "TARGET_CPU_ZARCH
&& !FP_REG_P (operands[0])"
   "larl\t%0,%1"
[(set_attr "op_type" "RIL")
 (set_attr "type""larl")
 (set_attr "z10prop" "z10_fwd_A1")])

Re: Can shrink-wrapping ever move prologue past an ASM statement?

2015-07-08 Thread Josh Poimboeuf

On Wed, Jul 08, 2015 at 11:22:34AM -0500, Josh Poimboeuf wrote:
> On Wed, Jul 08, 2015 at 05:36:31AM -0500, Segher Boessenkool wrote:
> > On Wed, Jul 08, 2015 at 11:23:09AM +0200, Martin Jambor wrote:
> > > > For other archs, e.g. x86-64, you can do
> > > > 
> > > > register void *sp asm("%sp");
> > > > asm volatile("call func" : "+r"(sp));

I've found that putting "sp" in the clobber list also seems to work:

  asm volatile("call func" : : : "sp");

This syntax is nicer because it doesn't need a local variable associated
with the register.  Do you see any issues with this approach?

-- 
Josh

Re: s390: larl for Simode on 64-bit

2015-07-08 Thread Jeff Law


On 07/08/2015 02:33 PM, DJ Delorie wrote:

Is there any reason that LARL can't be used to load a 32-bit symbolic
value, in 64-bit mode?  On TPF (64-bit) the app has the option of
being loaded in the first 4Gb so that all symbols are also valid
32-bit addresses, for backward compatibility.  (and if not, the linker
would complain)
It would seem that we'd want the compiler to know when the app is going 
to be loaded into that first 4G so that it can use the more efficient 
addressing modes.

Jeff

Re: s390: larl for Simode on 64-bit

2015-07-08 Thread DJ Delorie


In the TPF case, the software has to explicitly mark such pointers as
SImode (such things happen only when structures that contain addresses
can't change size, for backwards compatibility reasons[1]):

int * __attribute__((mode(SImode))) ptr;

  ptr = &some_var;

so I wouldn't consider this the "default" case for those apps, just
*a* case that needs to be handled "well enough", and the user is
already telling the compiler that they assume those addresses are
32-bit (that either the whole app, or at least the part with that
object, will be linked below 4Gb).

The majority of the addresses are handled as 64-bit.


[1] /me refrains from commenting on the worth of such practices, just
that they exist and need to be (and have been) supported.

Re: Can shrink-wrapping ever move prologue past an ASM statement?

2015-07-08 Thread Segher Boessenkool

On Wed, Jul 08, 2015 at 03:51:12PM -0500, Josh Poimboeuf wrote:
> > > > > For other archs, e.g. x86-64, you can do
> > > > > 
> > > > >   register void *sp asm("%sp");
> > > > >   asm volatile("call func" : "+r"(sp));
> 
> I've found that putting "sp" in the clobber list also seems to work:
> 
>   asm volatile("call func" : : : "sp");
> 
> This syntax is nicer because it doesn't need a local variable associated
> with the register.  Do you see any issues with this approach?

Like I said, that forces the function to have a frame pointer.  That might
not matter for x86 and your application (because you already have a frame
pointer for other reasons).

Clobbering sp in inline asm isn't a terribly sane thing to do (and neither
is writing to it, as in my code above), of course; we don't *actually*
clobber it here (that is, write an unspecified value to it), but still.
Just reading it would be better for your sanity, and is also enough to
prevent shrink-wrapping the asm.

I do believe it should work though, I see no issues with it _that_ way.
You'll have to test and see (it works fine in trivial tests).

Segher

Re: Can shrink-wrapping ever move prologue past an ASM statement?

2015-07-08 Thread Jeff Law


On 07/08/2015 02:51 PM, Josh Poimboeuf wrote:

On Wed, Jul 08, 2015 at 11:22:34AM -0500, Josh Poimboeuf wrote:

On Wed, Jul 08, 2015 at 05:36:31AM -0500, Segher Boessenkool wrote:

On Wed, Jul 08, 2015 at 11:23:09AM +0200, Martin Jambor wrote:

For other archs, e.g. x86-64, you can do

register void *sp asm("%sp");
asm volatile("call func" : "+r"(sp));


I've found that putting "sp" in the clobber list also seems to work:

   asm volatile("call func" : : : "sp");

This syntax is nicer because it doesn't need a local variable associated
with the register.  Do you see any issues with this approach?
Given that SP isn't subject to register allocation, I'd expect it's 
fine.  Note that some folks have (loudly) requested that GCC issue an 
error if an asm tries to clobber sp.


The call doesn't actually clobber the stack pointer does it?  ISTM that 
a use of sp makes more sense and is better "future proof'd" than 
clobbering sp.


Jeff

Re: Can shrink-wrapping ever move prologue past an ASM statement?

2015-07-08 Thread Josh Poimboeuf

On Wed, Jul 08, 2015 at 04:14:20PM -0500, Segher Boessenkool wrote:
> On Wed, Jul 08, 2015 at 03:51:12PM -0500, Josh Poimboeuf wrote:
> > > > > > For other archs, e.g. x86-64, you can do
> > > > > > 
> > > > > > register void *sp asm("%sp");
> > > > > > asm volatile("call func" : "+r"(sp));
> > 
> > I've found that putting "sp" in the clobber list also seems to work:
> > 
> >   asm volatile("call func" : : : "sp");
> > 
> > This syntax is nicer because it doesn't need a local variable associated
> > with the register.  Do you see any issues with this approach?
> 
> Like I said, that forces the function to have a frame pointer.  That might
> not matter for x86 and your application (because you already have a frame
> pointer for other reasons).
> 
> Clobbering sp in inline asm isn't a terribly sane thing to do (and neither
> is writing to it, as in my code above), of course; we don't *actually*
> clobber it here (that is, write an unspecified value to it), but still.
> Just reading it would be better for your sanity, and is also enough to
> prevent shrink-wrapping the asm.
> 
> I do believe it should work though, I see no issues with it _that_ way.
> You'll have to test and see (it works fine in trivial tests).

My apologies, I misread your earlier comments.  You did mention
clobbering as a possibility.

I'll do some testing with it as an input operand to see if that
suffices.

Thanks again.

-- 
Josh

Re: Can shrink-wrapping ever move prologue past an ASM statement?

2015-07-08 Thread Segher Boessenkool

On Wed, Jul 08, 2015 at 03:15:12PM -0600, Jeff Law wrote:
> >For other archs, e.g. x86-64, you can do
> >
> > register void *sp asm("%sp");
> > asm volatile("call func" : "+r"(sp));
> >
> >I've found that putting "sp" in the clobber list also seems to work:
> >
> >   asm volatile("call func" : : : "sp");
> >
> >This syntax is nicer because it doesn't need a local variable associated
> >with the register.  Do you see any issues with this approach?
> Given that SP isn't subject to register allocation, I'd expect it's 
> fine.  Note that some folks have (loudly) requested that GCC issue an 
> error if an asm tries to clobber sp.
> 
> The call doesn't actually clobber the stack pointer does it?  ISTM that 
> a use of sp makes more sense and is better "future proof'd" than 
> clobbering sp.

LRA thinks that when an asm clobbers sp, it can no longer eliminate to sp.
When an asm "merely" writes to sp, it is fine with that.

In both cases everything happily pretends sp does not actually change.
So it is a good thing that in fact it doesn't ;-)

All these approaches assume the only thing you really want to happen
before the asm call is the sp changes done in the prologue; if the
prologue does other things you depend on (setting up fp?), if you want
futureproofitivity, you want to make the asm depend on that as well.

Segher

gcc-4.9-20150708 is now available

2015-07-08 Thread gccadmin

Snapshot gcc-4.9-20150708 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.9-20150708/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.9 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_9-branch 
revision 225585

You'll find:

 gcc-4.9-20150708.tar.bz2 Complete GCC

  MD5=1e59d4617757bcd6194c1f2bca6d8673
  SHA1=13cff3f9356d09143a1570607724486d665d646e

Diffs from 4.9-20150701 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.9
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.

Re: s390: larl for Simode on 64-bit

2015-07-08 Thread Jeff Law


On 07/08/2015 03:05 PM, DJ Delorie wrote:

In the TPF case, the software has to explicitly mark such pointers as
SImode (such things happen only when structures that contain addresses
can't change size, for backwards compatibility reasons[1]):

int * __attribute__((mode(SImode))) ptr;

   ptr = &some_var;
So in effect, we have two pointer sizes, 64 being the default, but we 
can also get a 32 bit pointer via the syntax above?  Wow, I'm surprised 
that works.


And the only time we'd be able to use larl is a dereference of a pointer 
declared with the syntax above.  Right


OK for the trunk with a simple testcase.  I think you can just scan the 
assembler output for the larl instruction.





so I wouldn't consider this the "default" case for those apps, just
*a* case that needs to be handled "well enough", and the user is
already telling the compiler that they assume those addresses are
32-bit (that either the whole app, or at least the part with that
object, will be linked below 4Gb).

The majority of the addresses are handled as 64-bit.


[1] /me refrains from commenting on the worth of such practices, just
 that they exist and need to be (and have been) supported.
Understood, but we also need to make sure that we don't do something 
that breaks things.  Thus I needed to know the tidbit about explicitly 
declaring those pointers as SImode.




jeff

Re: s390: larl for Simode on 64-bit

2015-07-08 Thread DJ Delorie


> So in effect, we have two pointer sizes, 64 being the default, but
> we can also get a 32 bit pointer via the syntax above?  Wow, I'm
> surprised that works.

Yup, been that way for many years.

> And the only time we'd be able to use larl is a dereference of a
> pointer declared with the syntax above.  Right

larl would be used to load the address of an object to *initialize*
such a pointer, but yes.  Regular pointers still use larl but as a
DImode operation.  I.e. larl will always load a 64-bit value into a
register, even if gcc will only use the 32 LSBs.

> OK for the trunk with a simple testcase.  I think you can just scan
> the assembler output for the larl instruction.

Will do, but it's part of a bigger patch.  I just wanted to make sure
there wasn't some side-effect of larl that precluded this use.

Re: Can shrink-wrapping ever move prologue past an ASM statement?

Re: Can shrink-wrapping ever move prologue past an ASM statement?

Re: Question about always executed info computed in tree-ssa-loop-im.c

Re: making the new if-converter not mangle IR that is already vectorizer-friendly

Re: Question about always executed info computed in tree-ssa-loop-im.c

Re: Question about always executed info computed in tree-ssa-loop-im.c

Re: Can shrink-wrapping ever move prologue past an ASM statement?

Re: Question about always executed info computed in tree-ssa-loop-im.c

CFG transformation of loops with continue statement inside the loops.

Re: [AD] UltraGDB, an alternative tool to debug GCC, GDB, LLDB, etc. on Windows and Linux

Re: Can shrink-wrapping ever move prologue past an ASM statement?

Re: Can shrink-wrapping ever move prologue past an ASM statement?

Re: making the new if-converter not mangle IR that is already vectorizer-friendly

Re: Can shrink-wrapping ever move prologue past an ASM statement?

Re: X18 on AArch64

Re: X18 on AArch64

libgomp: gomp_ptrlock*() specification

Re: libgomp: gomp_ptrlock*() specification

s390: larl for Simode on 64-bit

Re: Can shrink-wrapping ever move prologue past an ASM statement?

Re: s390: larl for Simode on 64-bit

Re: s390: larl for Simode on 64-bit

Re: Can shrink-wrapping ever move prologue past an ASM statement?

Re: Can shrink-wrapping ever move prologue past an ASM statement?

Re: Can shrink-wrapping ever move prologue past an ASM statement?

Re: Can shrink-wrapping ever move prologue past an ASM statement?

gcc-4.9-20150708 is now available

Re: s390: larl for Simode on 64-bit

Re: s390: larl for Simode on 64-bit

29 matches

Site Navigation

Mail list logo

Footer information