> Hongyue, please collect code size differences on SPEC CPU 2017 and > eembc.
Here is code size difference for this patch SPEC CPU 2017 difference w patch w/o patch 500.perlbench_r 0.051% 1622637 1621805 502.gcc_r 0.039% 6930877 6928141 505.mcf_r 0.098% 16413 16397 520.omnetpp_r 0.083% 1327757 1326653 523.xalancbmk_r 0.001% 3575709 3575677 525.x264_r -0.067% 769095 769607 531.deepsjeng_r 0.071% 67629 67581 541.leela_r -3.062% 127629 131661 548.exchange2_r -0.338% 66141 66365 557.xz_r 0.946% 128061 126861 503.bwaves_r 0.534% 33117 32941 507.cactuBSSN_r 0.004% 2993645 2993517 508.namd_r 0.006% 851677 851629 510.parest_r 0.488% 6741277 6708557 511.povray_r -0.021% 849290 849466 521.wrf_r 0.022% 29682154 29675530 526.blender_r 0.054% 7544057 7540009 527.cam4_r 0.043% 6102234 6099594 538.imagick_r -0.015% 1625770 1626010 544.nab_r 0.155% 155453 155213 549.fotonik3d_r 0.000% 351757 351757 554.roms_r 0.041% 735837 735533 eembc difference w patch w/o patch aifftr01 0.762% 14813 14701 aiifft01 0.556% 14477 14397 idctrn01 0.101% 15853 15837 cjpeg-rose7-preset 0.114% 56125 56061 nnet_test -0.848% 35549 35853 aes 0.125% 38493 38445 cjpegv2data 0.108% 59213 59149 djpegv2data 0.025% 63821 63805 huffde -0.104% 30621 30653 mp2decoddata -0.047% 68285 68317 mp2enf32data1 0.018% 86925 86909 mp2enf32data2 0.018% 89357 89341 mp2enf32data3 0.018% 88253 88237 mp3playerfixeddata 0.103% 46877 46829 ip_pktcheckb1m 0.191% 25213 25165 nat 0.527% 45757 45517 ospfv2 0.196% 24573 24525 routelookup 0.189% 25389 25341 tcpbulk 0.155% 30925 30877 textv2data 0.055% 29101 29085 H.J. Lu via Gcc-patches <gcc-patches@gcc.gnu.org> 于2021年3月22日周一 下午9:39写道: > > On Mon, Mar 22, 2021 at 6:29 AM Richard Biener > <richard.guent...@gmail.com> wrote: > > > > On Mon, Mar 22, 2021 at 2:19 PM H.J. Lu via Gcc-patches > > <gcc-patches@gcc.gnu.org> wrote: > > > > > > Simply memcpy and memset inline strategies to avoid branches for > > > -mtune=generic: > > > > > > 1. With MOVE_RATIO and CLEAR_RATIO == 17, GCC will use integer/vector > > > load and store for up to 16 * 16 (256) bytes when the data size is > > > fixed and known. > > > 2. Inline only if data size is known to be <= 256. > > > a. Use "rep movsb/stosb" with simple code sequence if the data size > > > is a constant. > > > b. Use loop if data size is not a constant. > > > 3. Use memcpy/memset libray function if data size is unknown or > 256. > > > > > > With -mtune=generic -O2, > > > > Is there any visible code-size effect of increasing CLEAR_RATIO on > > Hongyue, please collect code size differences on SPEC CPU 2017 and > eembc. > > > SPEC/eembc? Did you play with other values of MOVE/CLEAR_RATIO? > > 17 memory-to-memory/memory-clear insns looks quite a lot. > > > > Yes, we did. 256 bytes is the threshold above which memcpy/memset in libc > win. Below 256 bytes, 16 by_pieces move/store is faster. > > -- > H.J. -- Regards, Hongyu, Wang