https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63464
Bug ID: 63464 Summary: compare one character to many: faster Product: gcc Version: 5.0 Status: UNCONFIRMED Keywords: missed-optimization Severity: enhancement Priority: P3 Component: tree-optimization Assignee: unassigned at gcc dot gnu.org Reporter: glisse at gcc dot gnu.org This is inspired by reading this post: http://stackoverflow.com/questions/26124620/why-does-msvc-emit-a-useless-movsx-before-performing-this-bit-test which shows that bit testing can provide a very efficient lookup table for small sizes, and in particular when we want to test if a character belongs to a predefined set. char*f1(char*s){ while (*s == ' ' || *s == ',' || *s == '\r' || *s == '\n') ++s; return s; } char*f2(char*s){ long n = (1L << ' ') | (1L << ',') | (1L << '\r') | (1L << '\n'); int m = max(max(' ',','),max('\r','\n')); while(*s <= m && (n & (1L << *s))) ++s; return s; } On x86_64, the first one compiles to a bunch of cmpb+sete, combined with orl. The second one has one cmpb+jg but uses btq+jc for the main job. A simple benchmark on a string full of ',' shows running times of 158 vs 32, a very large win for f2 (suspicious actually, but if I only test ' ', ',' and '\r', I get the less surprising 50 for f1, which still shows f2 far ahead). As is, it only works with characters less than 64, but more general versions are possible with a bit more overhead (like applying the same to *s-64 after testing its sign, or looking up the (*s/64)th element in a regular lookup table and bit testing against that, etc). The running time of 32 is exactly the same I get with a larger lookup table: char issep[256]={0,0,...}; while(issep[*s])... In this particular case, vectorization might also be an option, either the loop kind if the string is likely to be long, but we don't even try because we can't compute the number of iterations, or the slp kind by broadcasting *s, comparing to all chars at once and reducing. This PR is a bit vague and we cannot implement every weird optimization, but I expect this type of comparison might be common enough that it would be worth looking at. If you disagree, feel free to close, I never write code like that myself ;-)