https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115025
--- Comment #5 from Haochen Jiang <haochen.jiang at intel dot com> ---
My guess is that for the prime judging loop:
for (i = 5; i < max; i += 6)
if ((n % i == 0) || (n % (i + 2) == 0))
return 0;
In GCC13, it extracts the first loop, which is (n % 5 == 0) || (n % 7 == 0),
out of the whole loop to do imul+cmp instead of div.
However, on current trunk, it still remains div and will be slower.
BTW, there is also a codegen regression which won't cause perf regression. On
current trunk, the sqrt BB is not merged together. It increases codesize but no
perf impact.