On Fri, Jul 24, 2015 at 09:09:39AM +0100, Kyrill Tkachov wrote: > This transformation folds (X % C) == N into > X & ((1 << (size - 1)) | (C - 1))) == N > for constants C and N where N is positive and C is a power of 2. > > The idea is similar to the existing X % C -> X & (C - 1) transformation > for unsigned values but this time when we have the comparison we use the > C - 1 mask (all 1s for power of 2 N) orred with the sign bit. > > At runtime, if X is positive then X & ((1 << (size - 1)) | (C - 1))) > calculates X % C, which is compared against the positive N. > > If X is negative then X & ((1 << (size - 1)) | (C - 1))) doesn't calculate > X % C but since we also catch the set sign bit, it will never compare equal > to N (which is positive), so we still get the right comparison result. > > I don't have much experience with writing match.pd patterns, so I appreciate > any feedback on how to write this properly. > > Bootstrapped and tested on arm, aarch64, x86_64.
I think this is another case that, if at all, should be done during or right before RTL expansion and should test rtx costs. Because, ((1 << (size - 1)) | (C - 1))) constant might be very expensive, while C cheap, and % might not be that expensive compared to & to offset that. E.g. on x86_64, for 32-bit and smaller X the constant is cheap as any other (well, if we don't take instruction size into account), but 64-bit constant is at least 3 times more expensive (movabsq is needed with its latency). In the x86_64 case supposedly the divmod is still more expensive, but there are many other targets. On sparc64 for 64-bit constants, you might need many instructions to create the constants, etc. Jakub