https://gcc.gnu.org/bugzilla/show_bug.cgi?id=103675
Bug ID: 103675 Summary: gather is a loss for floats and win for doubles at zen3 Product: gcc Version: 12.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: hubicka at gcc dot gnu.org Target Milestone: --- Currently we disable gather on zen1/2 and enable it for zen3. This was based on testing I did before Richard's code for open coding it. Testing the only kernels using gather in zen3 with double I get: jh@ryzen3:~/tsvc/src> ~/trunk-install/bin/gcc tsvc.c -Ofast -march=native common.c dummy.c -lm -mtune-ctrl=use_gather jh@ryzen3:~/tsvc/src> ./a.out Loop Time(sec) Checksum s352 7.529 1.644903 s4112 1.317 1127072.247160 s4114 1.377 32000.000684 s4115 1.250 1.038812 s4116 0.873 0.753265 s4117 1.283 32002.208699 s4121 0.583 196490.281734 jh@ryzen3:~/tsvc/src> ~/trunk-install/bin/gcc tsvc.c -Ofast -march=native common.c dummy.c -lm -mtune-ctrl=^use_gather jh@ryzen3:~/tsvc/src> ./a.out Loop Time(sec) Checksum s352 7.618 1.644903 s4112 1.223 1127072.247160 s4114 1.523 32000.000684 s4115 2.019 1.038812 s4116 1.531 0.753265 s4117 1.388 32002.208699 s4121 0.549 196490.281734 So it using gather is a win. For floats I get: jh@ryzen3:~/tsvc/src> ~/trunk-install/bin/gcc tsvc.c -Ofast -march=native common.c dummy.c -lm -mtune-ctrl=use_gather ; ./a.out Loop Time(sec) Checksum s352 7.419 1.644808 s4112 1.183 1127128.875000 s4114 1.198 32000.000000 s4115 1.056 1.038788 s4116 0.817 0.753265 s4117 1.179 32002.205078 s4121 0.264 196500.265625 jh@ryzen3:~/tsvc/src> ~/trunk-install/bin/gcc tsvc.c -Ofast -march=native common.c dummy.c -lm -mtune-ctrl=^use_gather ; ./a.out Loop Time(sec) Checksum s352 7.435 1.644808 s4112 0.742 1127128.875000 s4114 1.078 32000.000000 s4115 0.673 1.038788 s4116 0.532 0.753264 s4117 1.044 32002.205078 s4121 0.266 196500.265625 so using gather is loss. For long I get: jh@ryzen3:~/tsvc/src> ~/trunk-install/bin/gcc tsvc.c -Ofast -march=native common.c dummy.c -lm -mtune-ctrl=use_gather ; ./a.out Loop Time(sec) Checksum s352 4.327 1.000000 s4112 1.576 132000.000000 s4114 1.603 32000.000000 s4115 1.453 0.000000 s4116 1.131 0.000000 s4117 1.432 32001.000000 s4121 0.629 132000.000000 jh@ryzen3:~/tsvc/src> ~/trunk-install/bin/gcc tsvc.c -Ofast -march=native common.c dummy.c -lm -mtune-ctrl=^use_gather ; ./a.out Loop Time(sec) Checksum s352 4.532 1.000000 s4112 1.630 132000.000000 s4114 1.808 32000.000000 s4115 1.431 0.000000 s4116 1.130 0.000000 s4117 1.646 32001.000000 s4121 0.638 132000.000000 so gather is a win For int I get: jh@ryzen3:~/tsvc/src> ~/trunk-install/bin/gcc tsvc.c -Ofast -march=native common.c dummy.c -lm -mtune-ctrl=use_gather ; ./a.out Loop Time(sec) Checksum s352 1.345 1.000000 s4112 1.057 132000.000000 s4114 1.095 32000.000000 s4115 1.055 0.000000 s4116 0.817 0.000000 s4117 1.022 32001.000000 s4121 0.266 132000.000000 jh@ryzen3:~/tsvc/src> ~/trunk-install/bin/gcc tsvc.c -Ofast -march=native common.c dummy.c -lm -mtune-ctrl=^use_gather ; ./a.out Loop Time(sec) Checksum s352 1.345 1.000000 s4112 0.730 132000.000000 s4114 1.089 32000.000000 s4115 0.679 0.000000 s4116 0.528 0.000000 s4117 1.013 32001.000000 s4121 0.262 132000.000000 so gather is a loss. So i suppose we want to disable gathers for 32bit and smaller datatypes?