memset inline strategies for -mtune=generic

Richard Biener via Gcc-patches Tue, 23 Mar 2021 01:19:56 -0700

On Tue, Mar 23, 2021 at 3:41 AM Hongyu Wang <wwwhhhyyy...@gmail.com> wrote:
>
> > Hongyue, please collect code size differences on SPEC CPU 2017 and
> > eembc.
>
> Here is code size difference for this patch


Thanks, nothing too bad although slightly larger impacts than envisioned.

> SPEC CPU 2017
>                                    difference             w patch      w/o 
> patch
> 500.perlbench_r              0.051%             1622637          1621805
> 502.gcc_r                         0.039%             6930877          6928141
> 505.mcf_r                         0.098%             16413              16397
> 520.omnetpp_r               0.083%             1327757          1326653
> 523.xalancbmk_r            0.001%             3575709          3575677
> 525.x264_r                       -0.067%           769095            769607
> 531.deepsjeng_r             0.071%             67629              67581
> 541.leela_r                       -3.062%           127629            131661
> 548.exchange2_r            -0.338%            66141              66365
> 557.xz_r                            0.946%            128061            126861
>
> 503.bwaves_r                  0.534%             33117              32941
> 507.cactuBSSN_r            0.004%             2993645          2993517
> 508.namd_r                     0.006%             851677            851629
> 510.parest_r                    0.488%             6741277          6708557
> 511.povray_r                   -0.021%           849290            849466
> 521.wrf_r                         0.022%             29682154       29675530
> 526.blender_r                  0.054%             7544057          7540009
> 527.cam4_r                      0.043%             6102234          6099594
> 538.imagick_r                  -0.015%           1625770          1626010
> 544.nab_r                         0.155%             155453            155213
> 549.fotonik3d_r              0.000%             351757            351757
> 554.roms_r                      0.041%             735837            735533
>
> eembc
>                                     difference        w patch      w/o patch
> aifftr01                              0.762%             14813            
> 14701
> aiifft01                              0.556%             14477            
> 14397
> idctrn01                            0.101%             15853            15837
> cjpeg-rose7-preset         0.114%             56125              56061
> nnet_test                         -0.848%           35549              35853
> aes                                   0.125%             38493            
> 38445
> cjpegv2data                     0.108%             59213              59149
> djpegv2data                     0.025%             63821              63805
> huffde                               -0.104%           30621              
> 30653
> mp2decoddata                -0.047%           68285              68317
> mp2enf32data1              0.018%             86925              86909
> mp2enf32data2              0.018%             89357              89341
> mp2enf32data3              0.018%             88253              88237
> mp3playerfixeddata       0.103%             46877              46829
> ip_pktcheckb1m              0.191%             25213              25165
> nat                                   0.527%             45757             
> 45517
> ospfv2                               0.196%             24573             
> 24525
> routelookup                     0.189%             25389              25341
> tcpbulk                            0.155%             30925              30877
> textv2data                        0.055%             29101              29085
>
> H.J. Lu via Gcc-patches <gcc-patches@gcc.gnu.org> 于2021年3月22日周一 下午9:39写道：
> >
> > On Mon, Mar 22, 2021 at 6:29 AM Richard Biener
> > <richard.guent...@gmail.com> wrote:
> > >
> > > On Mon, Mar 22, 2021 at 2:19 PM H.J. Lu via Gcc-patches
> > > <gcc-patches@gcc.gnu.org> wrote:
> > > >
> > > > Simply memcpy and memset inline strategies to avoid branches for
> > > > -mtune=generic:
> > > >
> > > > 1. With MOVE_RATIO and CLEAR_RATIO == 17, GCC will use integer/vector
> > > >    load and store for up to 16 * 16 (256) bytes when the data size is
> > > >    fixed and known.
> > > > 2. Inline only if data size is known to be <= 256.
> > > >    a. Use "rep movsb/stosb" with simple code sequence if the data size
> > > >       is a constant.
> > > >    b. Use loop if data size is not a constant.
> > > > 3. Use memcpy/memset libray function if data size is unknown or > 256.
> > > >
> > > > With -mtune=generic -O2,
> > >
> > > Is there any visible code-size effect of increasing CLEAR_RATIO on
> >
> > Hongyue, please collect code size differences on SPEC CPU 2017 and
> > eembc.
> >
> > > SPEC/eembc?  Did you play with other values of MOVE/CLEAR_RATIO?
> > > 17 memory-to-memory/memory-clear insns looks quite a lot.
> > >
> >
> > Yes, we did.  256 bytes is the threshold above which memcpy/memset in libc
> > win. Below 256 bytes, 16 by_pieces move/store is faster.
> >
> > --
> > H.J.
>
> --
> Regards,
>
> Hongyu, Wang

Re: [PATCH 3/3] x86: Update memcpy/memset inline strategies for -mtune=generic

Reply via email to