On Tue, Jun 19, 2012 at 12:07 AM, Richard Henderson <[email protected]> wrote:
> On 2012-06-18 13:19, Uros Bizjak wrote:
>> /* ??? The builtin doesn't understand that the PCMPESTRI read from
>> memory need not be aligned. */
>> - __asm ("%vpcmpestri $0, (%1), %2"
>> - : "=c"(index) : "r"(s), "x"(search), "a"(4), "d"(16));
>> + sv = __builtin_ia32_loaddqu ((const char *) s);
>> + index = __builtin_ia32_pcmpestri128 (search, 4, sv, 16, 0);
>> +
>
>
> Surely the comment can be removed too then?
I'm not sure there. The builtin, as defined, expects V16QI operand
with xm constraint. Using:
int test (const char *s1)
{
const v16qi *p = (const v16qi *)(unsigned long) s1;
return __builtin_ia32_pcmpistri128 (*p, ...);
}
will generate movdqa before pcmpistri.
With x86 pcmp[ie]str patch, we trick gcc to pass unaligned memory to
the pcmp[ie]str RTX, but we still need __builtin_ia32_loaddqu in front
of __builtin_ia32_pcmpestri128.
Uros.