On 06/29/2015 08:07 PM, Uros Bizjak wrote:
Index: lex.c
===================================================================
--- lex.c       (revision 225138)
+++ lex.c       (working copy)
@@ -450,15 +450,30 @@ search_line_sse42 (const uchar *s, const uchar *en
        s = (const uchar *)((si + 16) & -16);
      }

-  /* Main loop, processing 16 bytes at a time.  By doing the whole loop
-     in inline assembly, we can make proper use of the flags set.  */
-  __asm (      "sub $16, %1\n"
-       "  .balign 16\n"
+  /* Main loop, processing 16 bytes at a time.  */
+#ifdef __GCC_ASM_FLAG_OUTPUTS__
+  while (1)
+    {
+      char f;
+      __asm ("%vpcmpestri\t$0, %2, %3"
+            : "=c"(index), "=@ccc"(f)
+            : "m"(*s), "x"(search), "a"(4), "d"(16));
+      if (f)
+       break;
+
+      s += 16;
+    }

This change looks good. Modulo keeping a comment mentioning why we can't use the builtin.

+#else
+  s -= 16;
+  /* By doing the whole loop in inline assembly,
+     we can make proper use of the flags set.  */
+  __asm (      ".balign 16\n"
        "0:        add $16, %1\n"
-       "  %vpcmpestri $0, (%1), %2\n"
+       "  %vpcmpestri\t$0, (%1), %2\n"
        "  jnc 0b"
        : "=&c"(index), "+r"(s)
        : "x"(search), "a"(4), "d"(16));
+#endif

I do wonder about keeping this bit around. Surely we only really care about the performance of search_line after a full bootstrap, at which point we've got the new path.

I think maybe better to adjust the #ifdef HAVE_SSE4 line above to include the G_A_F_O check.


r~

Reply via email to