On Tue, Mar 23, 2021 at 3:41 AM Hongyu Wang <wwwhhhyyy...@gmail.com> wrote: > > > Hongyue, please collect code size differences on SPEC CPU 2017 and > > eembc. > > Here is code size difference for this patch
Thanks, nothing too bad although slightly larger impacts than envisioned. > SPEC CPU 2017 > difference w patch w/o > patch > 500.perlbench_r 0.051% 1622637 1621805 > 502.gcc_r 0.039% 6930877 6928141 > 505.mcf_r 0.098% 16413 16397 > 520.omnetpp_r 0.083% 1327757 1326653 > 523.xalancbmk_r 0.001% 3575709 3575677 > 525.x264_r -0.067% 769095 769607 > 531.deepsjeng_r 0.071% 67629 67581 > 541.leela_r -3.062% 127629 131661 > 548.exchange2_r -0.338% 66141 66365 > 557.xz_r 0.946% 128061 126861 > > 503.bwaves_r 0.534% 33117 32941 > 507.cactuBSSN_r 0.004% 2993645 2993517 > 508.namd_r 0.006% 851677 851629 > 510.parest_r 0.488% 6741277 6708557 > 511.povray_r -0.021% 849290 849466 > 521.wrf_r 0.022% 29682154 29675530 > 526.blender_r 0.054% 7544057 7540009 > 527.cam4_r 0.043% 6102234 6099594 > 538.imagick_r -0.015% 1625770 1626010 > 544.nab_r 0.155% 155453 155213 > 549.fotonik3d_r 0.000% 351757 351757 > 554.roms_r 0.041% 735837 735533 > > eembc > difference w patch w/o patch > aifftr01 0.762% 14813 > 14701 > aiifft01 0.556% 14477 > 14397 > idctrn01 0.101% 15853 15837 > cjpeg-rose7-preset 0.114% 56125 56061 > nnet_test -0.848% 35549 35853 > aes 0.125% 38493 > 38445 > cjpegv2data 0.108% 59213 59149 > djpegv2data 0.025% 63821 63805 > huffde -0.104% 30621 > 30653 > mp2decoddata -0.047% 68285 68317 > mp2enf32data1 0.018% 86925 86909 > mp2enf32data2 0.018% 89357 89341 > mp2enf32data3 0.018% 88253 88237 > mp3playerfixeddata 0.103% 46877 46829 > ip_pktcheckb1m 0.191% 25213 25165 > nat 0.527% 45757 > 45517 > ospfv2 0.196% 24573 > 24525 > routelookup 0.189% 25389 25341 > tcpbulk 0.155% 30925 30877 > textv2data 0.055% 29101 29085 > > H.J. Lu via Gcc-patches <gcc-patches@gcc.gnu.org> 于2021年3月22日周一 下午9:39写道: > > > > On Mon, Mar 22, 2021 at 6:29 AM Richard Biener > > <richard.guent...@gmail.com> wrote: > > > > > > On Mon, Mar 22, 2021 at 2:19 PM H.J. Lu via Gcc-patches > > > <gcc-patches@gcc.gnu.org> wrote: > > > > > > > > Simply memcpy and memset inline strategies to avoid branches for > > > > -mtune=generic: > > > > > > > > 1. With MOVE_RATIO and CLEAR_RATIO == 17, GCC will use integer/vector > > > > load and store for up to 16 * 16 (256) bytes when the data size is > > > > fixed and known. > > > > 2. Inline only if data size is known to be <= 256. > > > > a. Use "rep movsb/stosb" with simple code sequence if the data size > > > > is a constant. > > > > b. Use loop if data size is not a constant. > > > > 3. Use memcpy/memset libray function if data size is unknown or > 256. > > > > > > > > With -mtune=generic -O2, > > > > > > Is there any visible code-size effect of increasing CLEAR_RATIO on > > > > Hongyue, please collect code size differences on SPEC CPU 2017 and > > eembc. > > > > > SPEC/eembc? Did you play with other values of MOVE/CLEAR_RATIO? > > > 17 memory-to-memory/memory-clear insns looks quite a lot. > > > > > > > Yes, we did. 256 bytes is the threshold above which memcpy/memset in libc > > win. Below 256 bytes, 16 by_pieces move/store is faster. > > > > -- > > H.J. > > -- > Regards, > > Hongyu, Wang