https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63464

            Bug ID: 63464
           Summary: compare one character to many: faster
           Product: gcc
           Version: 5.0
            Status: UNCONFIRMED
          Keywords: missed-optimization
          Severity: enhancement
          Priority: P3
         Component: tree-optimization
          Assignee: unassigned at gcc dot gnu.org
          Reporter: glisse at gcc dot gnu.org

This is inspired by reading this post:

http://stackoverflow.com/questions/26124620/why-does-msvc-emit-a-useless-movsx-before-performing-this-bit-test

which shows that bit testing can provide a very efficient lookup table for
small sizes, and in particular when we want to test if a character belongs to a
predefined set.

char*f1(char*s){
  while (*s == ' ' || *s == ',' || *s == '\r' || *s == '\n')
    ++s;
  return s;
}
char*f2(char*s){
  long n = (1L << ' ') | (1L << ',') | (1L << '\r') | (1L << '\n');
  int m = max(max(' ',','),max('\r','\n'));
  while(*s <= m && (n & (1L << *s)))
    ++s;
  return s;
}

On x86_64, the first one compiles to a bunch of cmpb+sete, combined with orl.
The second one has one cmpb+jg but uses btq+jc for the main job. A simple
benchmark on a string full of ',' shows running times of 158 vs 32, a very
large win for f2 (suspicious actually, but if I only test ' ', ',' and '\r', I
get the less surprising 50 for f1, which still shows f2 far ahead).

As is, it only works with characters less than 64, but more general versions
are possible with a bit more overhead (like applying the same to *s-64 after
testing its sign, or looking up the (*s/64)th element in a regular lookup table
and bit testing against that, etc).

The running time of 32 is exactly the same I get with a larger lookup table:
char issep[256]={0,0,...};
while(issep[*s])...

In this particular case, vectorization might also be an option, either the loop
kind if the string is likely to be long, but we don't even try because we can't
compute the number of iterations, or the slp kind by broadcasting *s, comparing
to all chars at once and reducing.

This PR is a bit vague and we cannot implement every weird optimization, but I
expect this type of comparison might be common enough that it would be worth
looking at. If you disagree, feel free to close, I never write code like that
myself ;-)

Reply via email to