On Fri, Sep 26, 2014 at 12:37 PM, David Wohlferd <[email protected]>
wrote:
>
> On 9/25/2014 12:36 AM, Yury Gribov wrote:
>>
>> On 09/24/2014 12:31 PM, Richard Biener wrote:
>>>
>>> On Wed, Sep 24, 2014 at 9:43 AM, David Wohlferd <[email protected]>
>>> wrote:
>>>>
>>>> Hans-Peter Nilsson: I should have listened to you back when you raised
>>>> concerns about this. My apologies for ever doubting you.
>>>>
>>>> In summary:
>>>>
>>>> - The "trick" in the docs for using an arbitrarily sized struct to force
>>>> register flushes for inline asm does not work.
>>>> - Placing the inline asm in a separate routine can sometimes mask the
>>>> problem with the trick not working.
>>>> - The sample that has been in the docs forever performs an unhelpful,
>>>> unexpected, and probably unwanted stack allocation + memcpy.
>>>>
>>>> Details:
>>>>
>>>> Here is the text from the docs:
>>>>
>>>> -----------
>>>> One trick to avoid [using the "memory" clobber] is available if the size
>>>> of
>>>> the memory being accessed is known at compile time. For example, if
>>>> accessing ten bytes of a string, use a memory input like:
>>>>
>>>> "m"( ({ struct { char x[10]; } *p = (void *)ptr ; *p; }) )
>>>
>>>
>>> Well - this can't work because you essentially are using a _value_
>>> here (looking at the GIMPLE - I'm not sure if a statement expression
>>> evaluates to an lvalue.
>>>
>>> It should work if you simply do this without a stmt expression:
>>>
>>> "m" (*(struct { char x[10]; } *)ptr)
>>>
>>> because that's clearly an lvalue (and the GIMPLE correctly says so):
>>>
>>> <bb 2>:
>>> c.a = 1;
>>> c.b = 2;
>>> __asm__ __volatile__("rep; stosb" : "=D" Dest_4, "=c" Count_5 : "0"
>>> &c, "a" 0, "m" MEM[(struct foo *)&c], "1" 8);
>>> printf ("%u %u\n", 1, 2);
>>>
>>> note that we still constant propagated 1 and 2 for the reason that
>>> the asm didn't get any VDEF. That's because you do not have any
>>> memory output! So while it keeps 'c' live it doesn't consider it
>>> modified by the asm. You'd still need to clobber the memory,
>>> but "m" clobbers are not supported, only "memory".
>>>
>>> Thus fixed asm:
>>>
>>>
>>> __asm__ __volatile__ ("rep; stosb"
>>> : "=D" (Dest), "+c" (Count)
>>> : "0" (&c), "a" (0),
>>> "m" (*( struct foo { char x[8]; } *)&c)
>>> : "memory"
>>> );
>>>
>>> where I'm not 100% sure if the "m" input is now pointless (that is,
>>> if a "memory" clobber also constitutes a use of all memory).
>>
>>
>> Or maybe even
>> __asm__ __volatile__ ("rep; stosb"
>> : "=D" (Dest), "+c" (Count), "+m" (*(struct foo { char x[8]; }
>> *)&c)
>> : "0" (&c), "a" (0)
>> );
>> to avoid the big-hammer memory clobber?
>>
>> -Y
>>
>
> Thank you both for the responses. At this point I've started composing some
> replacement text for the docs (below), cuz clearly what's there is both
> inadequate and wrong. All comments are welcome.
>
> While the code in Richard's response will always produce the correct
> results, the intent here is to use memory constraints to *avoid* the
> performance penalties of the memory clobber. The existing docs say this
> should work, and I've seen a number of places using it (linux kernel, glibc,
> etc). If this does work, we should update the docs to say how it's done.
> If this doesn't work, we should say that too.
>
> Looking at Yury's response, that code does work (in c, not c++). At least
> it does sometimes. The problem is that sometimes gcc can "lose track" of
> what a pointer points to. And when it does, gcc can get confused about what
> to flush. Here's a simple example that shows this (4.9.0, x64, -O2, 'c'):
>
> #include <stdio.h>
>
> inline void *memset( void * Dest, int c, size_t Count) {
> void *dummy;
>
> __asm__ (
> "rep stosb"
> : "=D" (dummy), "+c" (Count), "=m" (*( struct foo { char x[8]; }
> *)Dest)
> : "0" (Dest), "a" (c)
> : "cc"//, "memory"
> );
>
> return Dest;
> }
>
> int main() {
> struct {
> int a;
> } c;
>
> void *v;
> asm volatile("#" : "=r" (v) : "0" (&c) );
>
> c.a = 0x30303030;
>
> //memset(&c, 0, sizeof(c));
> memset(v, 0, sizeof(c));
> printf("%x\n", c.a);
> }
>
> This code will work if you pass &c to memset. But it will fail if you use
> v. Oh, the wonders of aliasing.
Yeah, in this case because *(struct foo { char x[8]; } *)Dest uses alias-set
'4' but c.a is accessed with alias-set '3'.
You can't expect a struct with a char array to end up using alias-set
zero (which probably was intended). You want
"=m" (*( struct foo { char x[8]; } __attribute__((may_alias)) *)Dest)
which makes it work. Or use *(int *)Dest as you are later accessing
it with int, so you even do not lose TBAA.
Richard.
> And this is why I'm having a problem doc'ing this. I love the potential
> benefits of using this. But if you are writing general-purpose routines,
> how can you hope to know whether the code calling you will pass you a
> pointer that will work with this trick? This kind of thing can introduce
> *horribly* hard to track down problems. Note that the memory clobber always
> works, but potentially with a performance penalty. <sigh>
>
> I could simply skip describing the whole problem with aliasing, but that
> just hides the problem. I hate when docs do that.
>
> That said, here's what I've written so far. Any improvements or corrections
> are very welcome.
>
> ==========================
> Flushing registers to memory has performance implications and may be an
> issue
> for time-sensitive code. One trick to avoid flushing more registers than is
> absolutely required is to use a memory constraint.
>
> For example this i386 code works for both c and c++:
>
> @example
> inline void *memset(void *Dest, int c, size_t Count)
> @{
> struct _reallybigstruct @{ char x[SSIZE_MAX]; @};
> void *dummy;
>
> asm (
> "rep stosb"
> : "=D" (dummy), "+c" (Count), "=m" (*(struct _reallybigstruct *)Dest)
> : "0" (Dest), "a" (c)
> : "cc"
> );
>
> return Dest;
> @}
> @end example
>
> Similar code can be used if the memory is being read (by changing the memory
> constraint to be an input) or updated (by changing the constraint to use
> '+' instead of '=').
>
> A few noteworthy things about clobbering using memory constraints:
>
> @itemize @minus
> @item
> Note that @code{_reallybigstruct} uses a size of @code{SSIZE_MAX}. This does
> not mean that when casting @code{Dest} to @code{(struct _reallybigstruct
> *)},
> @code{Dest} must be @code{SSIZE_MAX} bytes long or that @code{SSIZE_MAX}
> bytes will be used. Nor does it affect registers associated with items that
> may follow @code{Dest} in memory. This only affects the symbol @code{Dest}.
>
> @item
> It may be that using memory constraints can remove the need for the asm
> @code{volatile} qualifier. This can, under certain circumstances, result in
> more efficient code. Consider the memset sample above. Without the memory
> constraint, the only outputs are @code{dummy} and @code{Count}. However,
> both those symbols go out of scope immediately after the asm statement. As a
> result, gcc optimization could reasonably assume that the asm block isn't
> actually needed for anything. Even adding the "memory" clobber doesn't
> change this. Normally this is resolved by adding the asm @code{volatile}
> qualifier. However, by adding the memory constraint, gcc also considers
> whether updating the @code{Dest} block will affect the subsequent code.
> Only in the event that the @code{Dest} block is not needed will gcc consider
> removing this asm code.
>
> @end itemize
>
> @strong{Warning}: Older versions of this documentation show an example for
> using
> memory constraints like this:
>
> @example
> "m"( (@{ struct @{ char x[10]; @} *p = (void *)ptr ; *p; @}) )
> @end example
>
> This code not only doesn't compile in c++, it has potentially undesirable
> side effects that affect performance, as well as producing incorrect
> results.
>
> ==========================
>
> dw