[PATCH] Refactoring masked built-in decls
Hello, This patch converts mask type for masked builtins from signed to unsigned. Furthermore, several redundant builtins definitions were removed. Please have a look. It it ok for trunk? Thanks, Petr 2015-07-27 Petr Murzin * config/i386/i386.c (bdesc_special_args): Convert mask type from signed to unsigned for masked builtins. (ix86_expand_args_builtin): Do not handle UINT_FTYPE_V2DF, UINT64_FTYPE_V2DF, UINT64_FTYPE_V4SF, V16QI_FTYPE_V8DI, V16HI_FTYPE_V16SI, V16SI_FTYPE_V16SI, V16SF_FTYPE_FLOAT, V8HI_FTYPE_V8DI, V8UHI_FTYPE_V8UHI, V8SI_FTYPE_V8DI, V8SF_FTYPE_V8DF, V8DI_FTYPE_INT64, V8DI_FTYPE_V4DI, V8DI_FTYPE_V8DI, V8DF_FTYPE_DOUBLE, V8DF_FTYPE_V8SI, V16SI_FTYPE_V16SI_V16SI, V16SF_FTYPE_V16SF_V16SI, V8DI_FTYPE_V8DI_V8DI, V8DF_FTYPE_V8DF_V8DI, V4SI_FTYPE_V4SF_V4SF, V4SF_FTYPE_V4SF_UINT64, V2UDI_FTYPE_V4USI_V4USI, V2DI_FTYPE_V2DF_V2DF, V2DF_FTYPE_V2DF_UINT64, V4UDI_FTYPE_V8USI_V8USI, QI_FTYPE_V8DI_V8DI, HI_FTYPE_V16SI_V16SI, HI_FTYPE_HI_INT, V16SF_FTYPE_V16SF_V16SF_V16SF, V16SF_FTYPE_V16SF_V16SI_V16SF, V16SF_FTYPE_V16SI_V16SF_HI, V16SF_FTYPE_V16SI_V16SF_V16SF, V16SI_FTYPE_V16SF_V16SI_HI, V8DI_FTYPE_V8SF_V8DI_QI, V8SF_FTYPE_V8DI_V8SF_QI, V8DI_FTYPE_PV4DI, V8DF_FTYPE_V8DI_V8DF_QI, V16SI_FTYPE_V16SI_V16SI_V16SI, V2DI_FTYPE_V2DI_V2DI_V2DI, V8DI_FTYPE_V8DF_V8DI_QI, V8DF_FTYPE_PV4DF, V8SI_FTYPE_V8SI_V8SI_V8SI, V8DF_FTYPE_V8DF_V8DF_V8DF, UINT_FTYPE_V4SF, V8DF_FTYPE_V8DF_V8DI_V8DF, V8DF_FTYPE_V8DI_V8DF_V8DF, V8DF_FTYPE_V8SF_V8DF_QI, V8DI_FTYPE_V8DI_V8DI_V8DI, V16SF_FTYPE_PV4SF, V8SF_FTYPE_V8DF_V8SF_QI, V8SI_FTYPE_V8DF_V8SI_QI, V16SI_FTYPE_PV4SI, V2DF_FTYPE_V2DF_V4SF_V2DF_QI, V4SF_FTYPE_V4SF_V2DF_V4SF_QI, V8DI_FTYPE_V8DI_SI_V8DI_V8DI, QI_FTYPE_V8DF_V8DF_INT_QI, HI_FTYPE_V16SF_V16SF_INT_HI, V16SF_FTYPE_V16SF_V16SF_V16SI_INT_HI, VOID_FTYPE_PDOUBLE_V2DF_QI, VOID_FTYPE_PFLOAT_V4SF_QI, V2DF_FTYPE_PCDOUBLE_V2DF_QI, V4SF_FTYPE_PCFLOAT_V4SF_QI. * config/i386/i386-builtin-types.def (V16QI_FTYPE_V16SI): Remove. (V8DF_FTYPE_V8SI): Ditto. (V8HI_FTYPE_V8DI): Ditto. (V8SI_FTYPE_V8DI): Ditto. (V8SF_FTYPE_V8DF): Ditto. (V8SF_FTYPE_V8DF_V8SF_QI): Ditto. (V16HI_FTYPE_V16SI): Ditto. (V16SF_FTYPE_V16HI): Ditto. (V16SF_FTYPE_V16HI_V16SF_HI): Ditto. (V16SF_FTYPE_V16SI): Ditto. (V4DI_FTYPE_V4DI): Ditto. (V16SI_FTYPE_V16SF): Ditto. (V16SF_FTYPE_FLOAT): Ditto. (V8DF_FTYPE_DOUBLE): Ditto. (V8DI_FTYPE_INT64): Ditto. (V8DI_FTYPE_V4DI): Ditto. (V16QI_FTYPE_V8DI): Ditto. (UINT_FTYPE_V4SF): Ditto. (UINT64_FTYPE_V4SF): Ditto. (UINT_FTYPE_V2DF): Ditto. (UINT64_FTYPE_V2DF): Ditto. (V16SI_FTYPE_V16SI): Ditto. (V8DI_FTYPE_V8DI): Ditto. (V16SI_FTYPE_PV4SI): Ditto. (V16SF_FTYPE_PV4SF): Ditto. (V8DI_FTYPE_PV2DI): Ditto. (V8DF_FTYPE_PV2DF): Ditto. (V4DI_FTYPE_PV2DI): Ditto. (V4DF_FTYPE_PV2DF): Ditto. (V16SI_FTYPE_PV2SI): Ditto. (V16SF_FTYPE_PV2SF): Ditto. (V8DI_FTYPE_PV4DI): Ditto. (V8DF_FTYPE_PV4DF): Ditto. (V8SF_FTYPE_FLOAT): Ditto. (V4SF_FTYPE_FLOAT): Ditto. (V4DF_FTYPE_DOUBLE): Ditto. (V8SF_FTYPE_PV4SF): Ditto. (V8SI_FTYPE_PV4SI): Ditto. (V4SI_FTYPE_PV2SI): Ditto. (V8SF_FTYPE_PV2SF): Ditto. (V8SI_FTYPE_PV2SI): Ditto. (V16SF_FTYPE_PV8SF): Ditto. (V16SI_FTYPE_PV8SI): Ditto. (V8DI_FTYPE_V8SF): Ditto. (V4DI_FTYPE_V4SF): Ditto. (V2DI_FTYPE_V4SF): Ditto. (V64QI_FTYPE_QI): Ditto. (V32HI_FTYPE_HI): Ditto. (V8UHI_FTYPE_V8UHI): Ditto. (V16UHI_FTYPE_V16UHI): Ditto. (V32UHI_FTYPE_V32UHI): Ditto. (V2UDI_FTYPE_V2UDI): Ditto. (V4UDI_FTYPE_V4UDI): Ditto. (V8UDI_FTYPE_V8UDI): Ditto. (V4USI_FTYPE_V4USI): Ditto. (V8USI_FTYPE_V8USI): Ditto. (V16USI_FTYPE_V16USI): Ditto. (V2DF_FTYPE_V2DF_UINT64): Ditto. (V2DI_FTYPE_V2DF_V2DF): Ditto. (V2UDI_FTYPE_V4USI_V4USI): Ditto. (V8DF_FTYPE_V8DF_V8DI): Ditto. (V4SF_FTYPE_V4SF_UINT64): Ditto. (V4SI_FTYPE_V4SF_V4SF): Ditto. (V16SF_FTYPE_V16SF_V16SI): Ditto. (V64QI_FTYPE_V32HI_V32HI): Ditto. (V32HI_FTYPE_V16SI_V16SI): Ditto. (V8DF_FTYPE_V8DF_V8DF_V8DI_INT_QI): Ditto. (V16SF_FTYPE_V16SF_V16SF_V16SI_INT_HI): Ditto. (V32HI_FTYPE_V64QI_V64QI): Ditto. (V32HI_FTYPE_V32HI_V32HI): Ditto. (V16HI_FTYPE_V16HI_V16HI_INT_V16HI_HI): Ditto. (V16SI_FTYPE_V16SI_V4SI): Ditto. (V16SI_FTYPE_V16SI_V16SI): Ditto. (V16SI_FTYPE_V32HI_V32HI): Ditto. (V16SI_FTYPE_V16SI_SI): Ditto. (V8DI_FTYPE_V8DI_V8DI): Ditto. (V4UDI_FTYPE_V8USI_V8USI): Ditto. (V8DI_FTYPE_V16SI_V16SI): Ditto. (V8DI_FTYPE_V8DI_V2DI): Ditto. (QI_FTYPE_QI): Ditto. (SI_FTYPE_SI): Ditto. (DI_FTYPE_DI): Ditto. (QI_FTYPE_QI_QI): Ditto. (QI_FTYPE_QI_INT): Ditto. (HI_FTYPE_HI_INT): Ditto. (SI_FTYPE_SI_INT): Ditto. (DI_FTYPE_DI_INT): Ditto. (HI_FTYPE_V16QI_V16QI): Ditto. (SI_FTYPE_V32QI_V32QI): Ditto. (DI_FTYPE_V64QI_V64QI): Ditto. (QI_FTYPE_V8HI_V8HI): Ditto. (HI_FTYPE_V16HI_V16HI): Ditto. (SI_FTYPE_V32HI_V32HI): Ditto. (QI_FTYPE_V4SI_V4SI): Ditto. (QI_FTYPE_V8SI_V8SI): Ditto. (QI_FTYPE_V2DI_V2DI): Ditto. (QI_FTYPE_V4DI_V4DI): Ditto. (QI_FTYPE_V8DI_V8DI): Ditto. (HI_FTYPE_V16SI_V16SI): Ditto. (HI_FTYPE_V16SI_V16SI_INT_HI): Ditto. (QI_FTYPE_V8DF_V8DF_INT_QI): Ditto. (HI_FTYPE_V16SF_V16SF_INT_HI): Ditto. (V32HI_FTYPE_V32HI_V32HI_V32HI): Ditto. (V4SF_FTYPE_V4SF_V2DF_V4SF_QI): Ditto
[PATCH] [AVX512F] Add scatter support for vectorizer
Hello, This patch adds scatter support for vectorizer (for AVX512F instructions). Please have a look. Is it OK for trunk? Thanks, Petr 2015-07-31 Andrey Turetskiy Petr Murzin gcc/ * config/i386/i386-builtin-types.def (VOID_PFLOAT_HI_V8DI_V16SF_INT): New. (VOID_PDOUBLE_QI_V16SI_V8DF_INT): Ditto. (VOID_PINT_HI_V8DI_V16SI_INT): Ditto. (VOID_PLONGLONG_QI_V16SI_V8DI_INT): Ditto. * config/i386/i386.c (ix86_builtins): Add IX86_BUILTIN_SCATTERALTSIV8DF, IX86_BUILTIN_SCATTERALTDIV16SF, IX86_BUILTIN_SCATTERALTSIV8DI, IX86_BUILTIN_SCATTERALTDIV16SI. (ix86_init_mmx_sse_builtins): Define __builtin_ia32_scatteraltsiv8df, __builtin_ia32_scatteraltdiv8sf, __builtin_ia32_scatteraltsiv8di, __builtin_ia32_scatteraltdiv8si. (ix86_expand_builtin): Handle IX86_BUILTIN_SCATTERALTSIV8DF, IX86_BUILTIN_SCATTERALTDIV16SF, IX86_BUILTIN_SCATTERALTSIV8DI, IX86_BUILTIN_SCATTERALTDIV16SI. (ix86_vectorize_builtin_scatter): New. (TARGET_VECTORIZE_BUILTIN_SCATTER): Define as ix86_vectorize_builtin_scatter. * doc/tm.texi.in (TARGET_VECTORIZE_BUILTIN_SCATTER): New. * doc/tm.texi: Regenerate. * target.def: Add scatter builtin. * tree-vect-data-refs.c (vect_analyze_data_ref_dependence): Add new checkings for STMT_VINFO_SCATTER_P. (vect_check_gather): Rename to ... (vect_check_gather_scatter): this and enhance number of arguments. (vect_analyze_data_refs): Add scatter and maybe_scatter variables and new checkings for it accordingly. * tree-vectorizer.h (STMT_VINFO_SCATTER_P(S)): Define. (STMT_VINFO_STRIDE_LOAD_P(S)): Ditto. (vect_check_gather): Rename to ... (vect_check_gather_scatter): this. * triee-vect-stmts.c (vectorizable_mask_load_store): Ditto. (vectorizable_store): Add checkings for STMT_VINFO_SCATTER_P. (vect_mark_stmts_to_be_vectorized): Ditto. gcc/testsuite/ * gcc.target/i386/avx512f-scatter-1.c: New. * gcc.target/i386/avx512f-scatter-2.c: Ditto. * gcc.target/i386/avx512f-scatter-3.c: Ditto. scatter.patch Description: Binary data
[patch] Macroize logic patterns
Hi, I've macroized logic patterns. Please have a look. Is it ok for trunk? 2014-08-25 Petr Murzin * config/i386/i386.md: Macroize logic patterns. logic_patterns_patch Description: Binary data
Re: [patch] Macroize logic patterns
Done. 2014-08-25 Petr Murzin * config/i386/i386.md (SWI1248_AVX512BW): New mode iterator. (*k): Add *kqi and *khi and use SWI1248_AVX512BW mode iterator. Best regards, Petr Murzin On Mon, Aug 25, 2014 at 1:42 PM, Uros Bizjak wrote: > On Mon, Aug 25, 2014 at 11:00 AM, Petr Murzin wrote: > >> I've macroized logic patterns. Please have a look. Is it ok for trunk? >> >> 2014-08-25 Petr Murzin >> >> * config/i386/i386.md: Macroize logic patterns. > > Please write ChangeLog entry like: > > *config/i386/i386.md (SWI1248_AVX512BW): New mode iterator. > (*k): Add *kqi and *khi and use > SWI1248_AVX512BW mode iterator. > > +;; All integer modes with avx512bw. > +(define_mode_iterator SWI1248_avx512bw > + [QI HI (SI "TARGET_AVX512BW") (DI "TARGET_AVX512BW")]) > > Mode iterator should be named in all caps. > > OK with this change. > > Thanks, > Uros. macroize_logic_patterns_patch Description: Binary data
Re: Extract and insert merging patch
Hi, Please have a look at updated patch. 2014-10-22 Petr Murzin gcc/ * simplify-rtx.c (simplify_ternary_operation): Simplify vec_merge (vec_duplicate (vec_select)). gcc/testsuite/ * gcc.target/i386/extract-insert-combining.c: New. On Fri, Sep 19, 2014 at 1:43 AM, Jeff Law wrote: > On 09/16/14 13:40, Andrew Pinski wrote: >> >> On Tue, Sep 16, 2014 at 4:40 AM, Petr Murzin >> wrote: >>> >>> Hi, >>> This patch allows merging of extract and insert. Please have a look. >>> >>> 2014-09-16 Petr Murzin >>> >>> * simplify-rtx.c (simplify_ternary_operation): Allow extract and >>> insert merging. >> >> >> Besides no testcase. Can your changelog mention vectors because I >> thought from the description you were working on bits. > > Similarly :-) > > So a few more nits. ChangeLog format is > > * file (function): What changed. > > So something like > > * simplify-rtx.c (simplify_ternary_operation): Simplify > (vec_merge (vec_duplicate (vec_select ...)) in some cases. > > > + /* Replace (vec_merge (vec_duplicate (vec_select a parallel (0))) > a 1) > + with a. */ > + if (GET_CODE (op0) == VEC_DUPLICATE > + && GET_CODE (XEXP (op0, 0)) == VEC_SELECT > + && GET_CODE (XEXP (XEXP (op0, 0), 1)) == PARALLEL) > + { > + tem = XVECEXP ((XEXP (XEXP (op0, 0), 1)), 0, 0); > + if (CONST_INT_P (tem) && CONST_INT_P (op2)) > + { > + if (XEXP (XEXP (op0, 0), 0) == op1 && UINTVAL (tem) == 0 > + && UINTVAL (op2) == 1) > > Line break before the first && UINTVAL. ie, format it like this: > > if (XEXP (XEXP ...) > && UINTVAL (tem) == 0 > && UINTVAL (op2) == 1 > > > And definitely include a testcase and repost for further review. > > Thanks, > Jeff extract_insert_patch Description: Binary data
Re: [patch] Excessive alignment in ix86_data_alignment
On 09 Oct 08:25, H.J. Lu wrote: > On Thu, Oct 9, 2014 at 1:37 AM, Uros Bizjak wrote: > > On Thu, Oct 9, 2014 at 10:25 AM, Kirill Yukhin > > wrote: > >> On 08 Oct 23:02, Petr Murzin wrote: > >>> Hi, > >>> I have measured performance impact on Haswell platform according to this > >>> input: > >>> https://gcc.gnu.org/ml/gcc-patches/2014-06/msg00978.html > > Kirill, please mention: > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61296 > > in your ChangeLog. > > > What about older processors? > > Kirill, please collect data on Nehelam/Westmere, Sandybrigde/Ivybride > and Silvermont. > > > The optimization was introduced well before Haswell for then current > > processors, and it was based on the recommendation from Intel > > optimization guide. If this optimization doesn't apply for new > > processors, then tune option should be introduced and set accordingly. > > > > I believe the original excessive alignment was introduced by cut/paste > from > > https://gcc.gnu.org/git/?p=gcc.git;a=patch;h=ed45e834f305d1f2709bf200a13d5beebc2fcfee > > to improve x86 FP performance, which might be partially copied from > CONSTANT_ALIGNMENT: > > https://gcc.gnu.org/git/?p=gcc.git;a=patch;h=f7d6703c5d83fc9fb06246d6eb49e9b61098045c > > > -- > H.J. Hello, Please have a look at collected data. SLM -O2: Test Previous Current Ratio(%) 400.perlbench 11.500011.5000 +0% 401.bzip2 8.7800 8.7500 -0.34% 403.gcc9.7700 9.8200 +0.51% 429.mcf9.900010.1000 +2.02% 445.gobmk 10.400010.4000 +0% 456.hmmer 12.700012.7000 +0% 458.sjeng 10.600010.6000 +0% 462.libquantum25.24.7000 -1.20% 464.h264ref 17.500017.4000 -0.57% 471.omnetpp7.2700 7.2100 -0.82% 473.astar 8.5700 8.5600 -0.11% 483.xalancbmk 10.400010.4000 +0% 410.bwaves24.100024.1000 +0% 416.gamess 9.6900 9.6700 -0.20% 433.milc 9.5400 9.7300 +1.99% 434.zeusmp 8.7000 8.6900 -0.11% 435.gromacs7.7800 7.7700 -0.12% 436.cactusADM 12.400012.3000 -0.80% 437.leslie3d 10.500010.4000 -0.95% 444.namd 9.0100 9.0100 +0% 447.dealII17.800017.8000 +0% 450.soplex11.700011.7000 +0% 453.povray11.700011.7000 +0% 454.calculix 5.8700 5.8700 +0% 459.GemsFDTD 12.100012.1000 +0% 465.tonto 8.4700 8.4700 +0% 470.lbm 17.800017.8000 +0% 481.wrf 13.500013.6000 +0.74% 482.sphinx3 12.600012.6000 +0% Geomeans: INT : 11.20 11.19 -0.05% FP : 11.29 11.29 +0.03% ALL : 11.25 11.25 -0.00% SLM -O3: Test Previous Current Ratio(%) 400.perlbench 11.500011.5000 +0% 401.bzip2 8.7400 8.7400 +0% 403.gcc9.7800 9.8000 +0.20% 429.mcf9.890010.2000 +3.13% 445.gobmk 10.400010.4000 +0% 456.hmmer 12.700012.7000 +0% 458.sjeng 10.600010.6000 +0% 462.libquantum24.800025. +0.80% 464.h264ref 17.400017.4000 +0% 471.omnetpp7.1900 7.3100 +1.66% 473.astar 8.6000 8.5800 -0.23% 483.xalancbmk 10.400010.4000 +0% 410.bwaves24.200024.2000 +0% 416.gamess 9.7000 9.6700 -0.30% 433.milc 9.7300 9.7500 +0.20% 434.zeusmp 8.7000 8.7000 +0% 435.gromacs7.7700 7.7700 +0% 436.cactusADM 12.400012.3000 -0.80% 437.leslie3d 10.400010.4000 +0% 444.namd 9.0100 9.0100 +0% 447.dealII17.800017.9000 +0.56% 450.soplex11.900011.8000 -0.84% 453.povray11.700011.7000 +0% 454.calculix 5.8600 5.8700 +0.17% 459.GemsFDTD 12.100012. -0.82% 465.tonto 8.4800 8.4700 -0.11% 470.lbm 17.800017.8000 +0% 481.wrf 13.500013.5000 +0% 482.sphinx3 12.700012.7000 +0% Geomeans: INT : 11.17 11.22 +0.46% FP : 11.31 11.30 -0.12% ALL : 11.25 11.27 +0.12% SNB -O2: Test Previous Current Ratio(%) 400.perlbench 31.300031.3000 +0% 401.bzip2 21.700021.7000 +0% 403.gcc 30.600030.6000 +0% 429.mcf 43.200043.3000 +0.23% 445.gobmk 24.900024.9000 +0% 456.hmmer 23.800023.8000 +0% 458.sjeng 26.26.1000 +0.38% 462.libquantum63.200063.6000 +0.63% 464.h264ref 46.700046.9000 +0.42% 471.omnetpp 23.900023.7000 -0.83% 473.astar 22.800022.8000 +0% 483.xalancbmk 38.900038.7000 -0.51% 410.bwaves55.55.2000 +0.36% 416.gamess28.300028.
Re: [patch] Excessive alignment in ix86_data_alignment
On 09 Oct 08:25, H.J. Lu wrote: > On Thu, Oct 9, 2014 at 1:37 AM, Uros Bizjak wrote: > > On Thu, Oct 9, 2014 at 10:25 AM, Kirill Yukhin > > wrote: > >> On 08 Oct 23:02, Petr Murzin wrote: > >>> Hi, > >>> I have measured performance impact on Haswell platform according to this > >>> input: > >>> https://gcc.gnu.org/ml/gcc-patches/2014-06/msg00978.html > > Kirill, please mention: > > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61296 > > in your ChangeLog. > > > What about older processors? > > Kirill, please collect data on Nehelam/Westmere, Sandybrigde/Ivybride > and Silvermont. > > > The optimization was introduced well before Haswell for then current > > processors, and it was based on the recommendation from Intel > > optimization guide. If this optimization doesn't apply for new > > processors, then tune option should be introduced and set accordingly. > > > > I believe the original excessive alignment was introduced by cut/paste > from > > https://gcc.gnu.org/git/?p=gcc.git;a=patch;h=ed45e834f305d1f2709bf200a13d5beebc2fcfee > > to improve x86 FP performance, which might be partially copied from > CONSTANT_ALIGNMENT: > > https://gcc.gnu.org/git/?p=gcc.git;a=patch;h=f7d6703c5d83fc9fb06246d6eb49e9b61098045c > > > -- > H.J. Hi, Here is the info about code and data size. CODE SEGMENTS SLM Benchmark | Trunk | Trunk w/ patch | Diff, % soplex | 1970381| 1969805| -0.0292329250028 hmmer | 1244935| 1244775| -0.0128520766144 Xalan | 6464124| 6448572| -0.240589444138 gcc | 5613433| 5611001| -0.0433246464329 gamess | 16775109 | 16744869 | -0.180267085001 h264ref | 2004787| 2003155| -0.0814051567573 libquantum | 915693 | 915693 | 0.0 astar | 956881 | 956849 | -0.0033441984949 gobmk | 3000807| 3000263| -0.0181284567785 namd| 1525077| 1525045| -0.00209825471107 lbm | 777141 | 777141 | 0.0 milc| 1151431| 1151303| -0.066018632 sjeng | 994163 | 993555 | -0.0611569732529 GemsFDTD| 2361733| 2360773| -0.0406481172935 leslie3d| 1474453| 1473909| -0.0368950383634 bwaves | 1076385| 1075777| -0.0564853653665 mcf | 778835 | 778835 | 0.0 gromacs | 2348746| 2348106| -0.0272485828608 wrf | 7742741| 7739637| -0.040089162223 cactusADM | 2820689| 2820049| -0.0226894918227 zeusmp | 1676605| 1675997| -0.0362637592039 sphinx | 1379854| 1379822| -0.00231908593228 omnetpp | 2872183| 2871511| -0.0233968378756 dealII | 2514404| 2512324| -0.0827233809682 povray | 3212579| 3211395| -0.0368551248078 perlbench | 3123490| 3122178| -0.0420042964761 calculix| 3914026| 3909866| -0.106284424273 bzip2 | 887974 | 887910 | -0.00720741823522 tonto | 7213573| 7206981| -0.0913832853705 SNB and IVB Benchmark | Trunk | Trunk w/ patch | Diff, % soplex | 1969317| 1968741| -0.029248719226 hmmer | 1252815| 1252655| -0.0127712391694 Xalan | 6517380| 6501828| -0.238623495945 gcc | 5634249| 5631817| -0.0431645814731 gamess | 16555117 | 16525197 | -0.18072961973 h264ref | 2062507| 2060779| -0.0837815338324 libquantum | 915085 | 915085 | 0.0 astar | 957025 | 956993 | -0.00334369530577 gobmk | 3010535| 3009991| -0.0180698779453 namd| 1505805| 1505773| -0.00212510916088 lbm | 775733 | 775733 | 0.0 milc| 1140567| 1140471| -0.00841686634805 sjeng | 997235 | 996627 | -0.0609685781185 GemsFDTD| 2451645| 2450653| -0.0404626281537 leslie3d| 1479125| 1478581| -0.0367785008028 bwaves | 1072497| 1071953| -0.0507227526044 mcf | 778803 | 778803 | 0.0 gromacs | 2322770| 2322130| -0.0275533091955 wrf | 8092645| 8090693| -0.0241206675938 cactusADM | 2808633| 2807993| -0.0227868860047 zeusmp | 1733717| 1733077| -0.0369149059506 sphinx | 1388206| 1388174| -0.00230513338798 omnetpp | 2875179| 2874475| -0.0244854320374 dealII | 2527560| 2525576
[PATCH] Fix regexps in avx512* tests
Hello, This patch is according to this input: https://gcc.gnu.org/ml/gcc-patches/2014-10/msg02441.html Furthermore, the following regexps were fixed in order to make more precise assembler scanning. The endings like (?:\n|\[\ \t\]+#) are used in ICC which may generate comments. Please have a look. Is it ok for trunk? Thanks, Petr 2014-11-28 Petr Murzin gcc/testsuite/ * gcc.target/i386/avx512bw-kunpckdq-1.c: Fix regexps for assembler scanning. * gcc.target/i386/avx512bw-kunpckwd-1.c: Ditto. * gcc.target/i386/avx512bw-vdbpsadbw-1.c: Ditto. * gcc.target/i386/avx512bw-vmovdqu16-1.c: Ditto. * gcc.target/i386/avx512bw-vmovdqu8-1.c: Ditto. * gcc.target/i386/avx512bw-vpabsb-1.c: Ditto. * gcc.target/i386/avx512bw-vpabsw-1.c: Ditto. * gcc.target/i386/avx512bw-vpackssdw-1.c: Ditto. * gcc.target/i386/avx512bw-vpacksswb-1.c: Ditto. * gcc.target/i386/avx512bw-vpackusdw-1.c: Ditto. * gcc.target/i386/avx512bw-vpackuswb-1.c: Ditto. * gcc.target/i386/avx512bw-vpaddb-1.c: Ditto. * gcc.target/i386/avx512bw-vpaddsb-1.c: Ditto. * gcc.target/i386/avx512bw-vpaddsw-1.c: Ditto. * gcc.target/i386/avx512bw-vpaddusb-1.c: Ditto. * gcc.target/i386/avx512bw-vpaddusw-1.c: Ditto. * gcc.target/i386/avx512bw-vpaddw-1.c: Ditto. * gcc.target/i386/avx512bw-vpalignr-1.c: Ditto. * gcc.target/i386/avx512bw-vpavgb-1.c: Ditto. * gcc.target/i386/avx512bw-vpavgw-1.c: Ditto. * gcc.target/i386/avx512bw-vpblendmb-1.c: Ditto. * gcc.target/i386/avx512bw-vpblendmw-1.c: Ditto. * gcc.target/i386/avx512bw-vpbroadcastb-1.c: Ditto. * gcc.target/i386/avx512bw-vpbroadcastw-1.c: Ditto. * gcc.target/i386/avx512bw-vpcmpb-1.c: Ditto. * gcc.target/i386/avx512bw-vpcmpeqb-1.c: Ditto. * gcc.target/i386/avx512bw-vpcmpequb-1.c: Ditto. * gcc.target/i386/avx512bw-vpcmpequw-1.c: Ditto. * gcc.target/i386/avx512bw-vpcmpeqw-1.c: Ditto. * gcc.target/i386/avx512bw-vpcmpgeb-1.c: Ditto. * gcc.target/i386/avx512bw-vpcmpgeub-1.c: Ditto. * gcc.target/i386/avx512bw-vpcmpgeuw-1.c: Ditto. * gcc.target/i386/avx512bw-vpcmpgew-1.c: Ditto. * gcc.target/i386/avx512bw-vpcmpgtb-1.c: Ditto. * gcc.target/i386/avx512bw-vpcmpgtub-1.c: Ditto. * gcc.target/i386/avx512bw-vpcmpgtuw-1.c: Ditto. * gcc.target/i386/avx512bw-vpcmpgtw-1.c: Ditto. * gcc.target/i386/avx512bw-vpcmpleb-1.c: Ditto. * gcc.target/i386/avx512bw-vpcmpleub-1.c: Ditto. * gcc.target/i386/avx512bw-vpcmpleuw-1.c: Ditto. * gcc.target/i386/avx512bw-vpcmplew-1.c: Ditto. * gcc.target/i386/avx512bw-vpcmpltb-1.c: Ditto. * gcc.target/i386/avx512bw-vpcmpltub-1.c: Ditto. * gcc.target/i386/avx512bw-vpcmpltuw-1.c: Ditto. * gcc.target/i386/avx512bw-vpcmpltw-1.c: Ditto. * gcc.target/i386/avx512bw-vpcmpneqb-1.c: Ditto. * gcc.target/i386/avx512bw-vpcmpnequb-1.c: Ditto. * gcc.target/i386/avx512bw-vpcmpnequw-1.c: Ditto. * gcc.target/i386/avx512bw-vpcmpneqw-1.c: Ditto. * gcc.target/i386/avx512bw-vpcmpub-1.c: Ditto. * gcc.target/i386/avx512bw-vpcmpuw-1.c: Ditto. * gcc.target/i386/avx512bw-vpcmpw-1.c: Ditto. * gcc.target/i386/avx512bw-vpermi2w-1.c: Ditto. * gcc.target/i386/avx512bw-vpermt2w-1.c: Ditto. * gcc.target/i386/avx512bw-vpermw-1.c: Ditto. * gcc.target/i386/avx512bw-vpmaddubsw-1.c: Ditto. * gcc.target/i386/avx512bw-vpmaddwd-1.c: Ditto. * gcc.target/i386/avx512bw-vpmaxsb-1.c: Ditto. * gcc.target/i386/avx512bw-vpmaxsw-1.c: Ditto. * gcc.target/i386/avx512bw-vpmaxub-1.c: Ditto. * gcc.target/i386/avx512bw-vpmaxuw-1.c: Ditto. * gcc.target/i386/avx512bw-vpminsb-1.c: Ditto. * gcc.target/i386/avx512bw-vpminsw-1.c: Ditto. * gcc.target/i386/avx512bw-vpminub-1.c: Ditto. * gcc.target/i386/avx512bw-vpminuw-1.c: Ditto. * gcc.target/i386/avx512bw-vpmovb2m-1.c: Ditto. * gcc.target/i386/avx512bw-vpmovm2b-1.c: Ditto. * gcc.target/i386/avx512bw-vpmovm2w-1.c: Ditto. * gcc.target/i386/avx512bw-vpmovswb-1.c: Ditto. * gcc.target/i386/avx512bw-vpmovsxbw-1.c: Ditto. * gcc.target/i386/avx512bw-vpmovuswb-1.c: Ditto. * gcc.target/i386/avx512bw-vpmovw2m-1.c: Ditto. * gcc.target/i386/avx512bw-vpmovwb-1.c: Ditto. * gcc.target/i386/avx512bw-vpmovzxbw-1.c: Ditto. * gcc.target/i386/avx512bw-vpmulhrsw-1.c: Ditto. * gcc.target/i386/avx512bw-vpmulhuw-1.c: Ditto. * gcc.target/i386/avx512bw-vpmulhw-1.c: Ditto. * gcc.target/i386/avx512bw-vpmullw-1.c: Ditto. * gcc.target/i386/avx512bw-vpshufb-1.c: Ditto. * gcc.target/i386/avx512bw-vpshufhw-1.c: Ditto. * gcc.target/i386/avx512bw-vpshuflw-1.c: Ditto. * gcc.target/i386/avx512bw-vpslldq-1.c: Ditto. * gcc.target/i386/avx512bw-vpsllvw-1.c: Ditto. * gcc.target/i386/avx512bw-vpsllw-1.c: Ditto. * gcc.target/i386/avx512bw-vpsllwi-1.c: Ditto. * gcc.target/i386/avx512bw-vpsravw-1.c: Ditto. * gcc.target/i386/avx512bw-vpsraw-1.c: Ditto. * gcc.target/i386/avx512bw-vpsrawi-1.c: Ditto. * gcc.target/i386/avx512bw-vpsrldq-1.c: Ditto. * gcc.target/i386/avx512bw-vpsrlvw-1.c: Ditto. * gcc.target/i386/avx512bw-vpsrlw-1.c: Ditto. * gcc.target/i386/avx512bw-vpsrlwi-1.c: Ditto. * gcc.target/i386/avx512bw-vpsubb-1.c: Ditto. * gcc.target/i386/avx512bw-vpsubsb-1.c: Ditto. * gcc.target/i386/avx512bw-vpsubsw-1.c: Ditto. * gcc.target/i386/avx512bw-vpsubusb-1.c: Ditto
[PATCH] [AVX512F] Add scatter support for vectorizer
Hello, This patch adds scatter support for vectorizer (for AVX512F instructions). Please have a look. Is it ok for stage 1? 2015-03-05 Andrey Turetskiy * config/i386/i386-builtin-types.def (VOID_PFLOAT_HI_V8DI_V16SF_INT): New. (VOID_PDOUBLE_QI_V16SI_V8DF_INT): Ditto. (VOID_PINT_HI_V8DI_V16SI_INT): Ditto. (VOID_PLONGLONG_QI_V16SI_V8DI_INT): Ditto. * config/i386/i386.c (ix86_builtins): Add IX86_BUILTIN_SCATTERALTSIV8DF, IX86_BUILTIN_SCATTERALTDIV16SF, IX86_BUILTIN_SCATTERALTSIV8DI, IX86_BUILTIN_SCATTERALTDIV16SI. (ix86_init_mmx_sse_builtins): Define __builtin_ia32_scatteraltsiv8df, __builtin_ia32_scatteraltdiv8sf, __builtin_ia32_scatteraltsiv8di, __builtin_ia32_scatteraltdiv8si. (ix86_expand_builtin): Handle IX86_BUILTIN_SCATTERALTSIV8DF, IX86_BUILTIN_SCATTERALTDIV16SF, IX86_BUILTIN_SCATTERALTSIV8DI, IX86_BUILTIN_SCATTERALTDIV16SI. (ix86_vectorize_builtin_scatter): New. (ix86_initialize_bounds): (TARGET_VECTORIZE_BUILTIN_SCATTER): Ditto. * doc/tm.texi.in (TARGET_VECTORIZE_BUILTIN_SCATTER): Ditto. * doc/tm.texi: Regenerate. * target.def: Add scatter builtin. * tree-vect-data-refs.c (vect_analyze_data_ref_dependence): Add new checkings for STMT_VINFO_SCATTER_P. (vect_check_gather): Rename to ... (vect_check_gather_scatter): this and enhance number of arguments. (vect_analyze_data_ref_access): Update comment and returnable values. (vect_analyze_data_refs): Add maybe_scatter and new checking for it accordingly. * tree-vectorizer.h (STMT_VINFO_SCATTER_P(S)): Define. (vect_check_gather): Rename to ... (vect_check_gather_scatter): this. * tree-vect-stmts.c: Ditto. (vectorizable_store): Add checkings for STMT_VINFO_SCATTER_P. 2015-03-05 Andrey Turetskiy testsuite/ * gcc.target/i386/avx512f-scatter-1.c: New. * gcc.target/i386/avx512f-scatter-2.c: Ditto. * gcc.target/i386/avx512f-scatter-3.c: Ditto. * gcc.target/i386/avx512f-scatter-4.c: Ditto. * gcc.target/i386/avx512f-scatter-5.c: Ditto. Thanks, Petr scatter_patch Description: Binary data
Re: Extract and insert merging patch
Hi, Bootstrapped. No regressions detected. Please have a look. Is it ok for trunk? 2014-11-05 Petr Murzin gcc/ * simplify-rtx.c (simplify_ternary_operation): Simplify vec_merge (vec_duplicate (vec_select)). gcc/testsuite/ * gcc.target/i386/extract-insert-combining.c: New. Thanks, Petr On Wed, Oct 22, 2014 at 9:35 PM, Marc Glisse wrote: > On Wed, 22 Oct 2014, Petr Murzin wrote: > >> + && UINTVAL (op2) == 1 << UINTVAL (tem)) > > > With modes like V64QI around, it is better to replace 1 with > HOST_WIDE_INT_1U, though we are not consistent about it. > > -- > Marc Glisse extract_insert_patch Description: Binary data
[PATCH, i386, AVX-512] Fix sse-14.c (Intel assembly)
Hello, The attached patch fixes sse-14.c to compile with -masm=intel. Bootstrapped. No regressions detected. Please have a look. Is it ok for trunk? 2016-05-05 Petr Murzin gcc/ * config/i386/sse.md: Use proper operand modifiers. * config/i386/i386.c (ix86_print_operand): Expand check for size override codes for Intel syntax. Thanks, Petr fix_intel_syntax.patch Description: Binary data
Re: [PATCH] [AVX512F] Add scatter support for vectorizer
Hello, Please have a look at updated patch. On Tue, Aug 4, 2015 at 3:15 PM, Richard Biener wrote: > On Fri, 31 Jul 2015, Petr Murzin wrote: > @@ -5586,8 +5770,6 @@ vectorizable_store (gimple stmt, > gimple_stmt_iterator *gsi, gimple *vec_stmt, >prev_stmt_info = NULL; >for (j = 0; j < ncopies; j++) > { > - gimple new_stmt; > - >if (j == 0) > { >if (slp) > > spurious change? I have increased the scope of this variable to use it in checking for STMT_VINFO_SCATTER_P (stmt_info). Thanks, Petr 2015-08-21 Andrey Turetskiy Petr Murzin gcc/ * config/i386/i386-builtin-types.def (VOID_PFLOAT_HI_V8DI_V16SF_INT): New. (VOID_PDOUBLE_QI_V16SI_V8DF_INT): Ditto. (VOID_PINT_HI_V8DI_V16SI_INT): Ditto. (VOID_PLONGLONG_QI_V16SI_V8DI_INT): Ditto. * config/i386/i386.c (ix86_builtins): Add IX86_BUILTIN_SCATTERALTSIV8DF, IX86_BUILTIN_SCATTERALTDIV16SF, IX86_BUILTIN_SCATTERALTSIV8DI, IX86_BUILTIN_SCATTERALTDIV16SI. (ix86_init_mmx_sse_builtins): Define __builtin_ia32_scatteraltsiv8df, __builtin_ia32_scatteraltdiv8sf, __builtin_ia32_scatteraltsiv8di, __builtin_ia32_scatteraltdiv8si. (ix86_expand_builtin): Handle IX86_BUILTIN_SCATTERALTSIV8DF, IX86_BUILTIN_SCATTERALTDIV16SF, IX86_BUILTIN_SCATTERALTSIV8DI, IX86_BUILTIN_SCATTERALTDIV16SI. (ix86_vectorize_builtin_scatter): New. (TARGET_VECTORIZE_BUILTIN_SCATTER): Define as ix86_vectorize_builtin_scatter. * doc/tm.texi.in (TARGET_VECTORIZE_BUILTIN_SCATTER): New. * doc/tm.texi: Regenerate. * target.def: Add scatter builtin. * tree-vect-data-refs.c (vect_analyze_data_ref_dependence): Add new checkings for STMT_VINFO_SCATTER_P. (vect_check_gather): Rename to ... (vect_check_gather_scatter): this and enhance number of arguments. (vect_analyze_data_refs): Add gatherscatter enum and maybe_scatter variable and new checkings for it accordingly. * tree-vectorizer.h: Rename gather_p to gather_scatter_p and use it for loads/stores in case of gather/scatter accordingly. (STMT_VINFO_SCATTER_P(S)): Define. (vect_check_gather): Rename to ... (vect_check_gather_scatter): this. * triee-vect-stmts.c (vectorizable_mask_load_store): Ditto. (vectorizable_store): Add checkings for STMT_VINFO_SCATTER_P. (vect_mark_stmts_to_be_vectorized): Ditto. scatter_patch_upd Description: Binary data
Re: [PATCH] [AVX512F] Add scatter support for vectorizer
On Wed, Aug 26, 2015 at 10:41 AM, Richard Biener wrote: > @@ -3763,32 +3776,46 @@ again: >if (vf > *min_vf) > *min_vf = vf; > > - if (gather) > + if (gatherscatter != SG_NONE) > { > tree off; > + if (vect_check_gather_scatter (stmt, loop_vinfo, NULL, &off, > NULL, true) != 0) > + gatherscatter = GATHER; > + else if (vect_check_gather_scatter (stmt, loop_vinfo, NULL, > &off, NULL, false) > + != 0) > + gatherscatter = SCATTER; > + else > + gatherscatter = SG_NONE; > > as I said vect_check_gather_scatter already knows whether the DR is a read or > a write and thus whether it needs to check for gather or scatter. Remove > the new argument. And simply do > >if (!vect_check_gather_scatter (stmt)) > gatherscatter = SG_NONE; > > - STMT_VINFO_GATHER_P (stmt_info) = true; > + if (gatherscatter == GATHER) > + STMT_VINFO_GATHER_P (stmt_info) = true; > + else > + STMT_VINFO_SCATTER_P (stmt_info) = true; > } > > and as suggested merge STMT_VINFO_GATHER_P and STMT_VINFO_SCATTER_P > using the enum so you can simply do > > STMT_VINFO_SCATTER_GATHER_P (smt_info) = gatherscatter; > Otherwise the patch looks ok to me. Fixed. Uros, could you please have a look at target part of patch? Thanks, Petr 2015-08-26 Andrey Turetskiy Petr Murzin gcc/ * config/i386/i386-builtin-types.def (VOID_PFLOAT_HI_V8DI_V16SF_INT): New. (VOID_PDOUBLE_QI_V16SI_V8DF_INT): Ditto. (VOID_PINT_HI_V8DI_V16SI_INT): Ditto. (VOID_PLONGLONG_QI_V16SI_V8DI_INT): Ditto. * config/i386/i386.c (ix86_builtins): Add IX86_BUILTIN_SCATTERALTSIV8DF, IX86_BUILTIN_SCATTERALTDIV16SF, IX86_BUILTIN_SCATTERALTSIV8DI, IX86_BUILTIN_SCATTERALTDIV16SI. (ix86_init_mmx_sse_builtins): Define __builtin_ia32_scatteraltsiv8df, __builtin_ia32_scatteraltdiv8sf, __builtin_ia32_scatteraltsiv8di, __builtin_ia32_scatteraltdiv8si. (ix86_expand_builtin): Handle IX86_BUILTIN_SCATTERALTSIV8DF, IX86_BUILTIN_SCATTERALTDIV16SF, IX86_BUILTIN_SCATTERALTSIV8DI, IX86_BUILTIN_SCATTERALTDIV16SI. (ix86_vectorize_builtin_scatter): New. (TARGET_VECTORIZE_BUILTIN_SCATTER): Define as ix86_vectorize_builtin_scatter. * doc/tm.texi.in (TARGET_VECTORIZE_BUILTIN_SCATTER): New. * doc/tm.texi: Regenerate. * target.def: Add scatter builtin. * tree-vectorizer.h: Rename gather_p to gather_scatter_p and use it for loads/stores in case of gather/scatter accordingly. (STMT_VINFO_GATHER_SCATTER_P(S)): Use it instead of STMT_VINFO_GATHER_P(S). (vect_check_gather): Rename to ... (vect_check_gather_scatter): this. * tree-vect-data-refs.c (vect_analyze_data_ref_dependence): Use STMT_VINFO_GATHER_SCATTER_P instead of STMT_VINFO_SCATTER_P. (vect_check_gather_scatter): Use it instead of vect_check_gather. (vect_analyze_data_refs): Add gatherscatter enum and maybe_scatter variable and new checkings for it accordingly. * tree-vect-stmts.c (STMT_VINFO_GATHER_SCATTER_P(S)): Use it instead of STMT_VINFO_GATHER_P(S). (vect_check_gather_scatter): Use it instead of vect_check_gather. (vectorizable_store): Add checkings for STMT_VINFO_GATHER_SCATTER_P. gcc/testsuite/ * gcc.target/i386/avx512f-scatter-1.c: New. * gcc.target/i386/avx512f-scatter-2.c: Ditto. * gcc.target/i386/avx512f-scatter-3.c: Ditto. scatter Description: Binary data tests Description: Binary data
Extract and insert merging patch
Hi, This patch allows merging of extract and insert. Please have a look. 2014-09-16 Petr Murzin * simplify-rtx.c (simplify_ternary_operation): Allow extract and insert merging. extract_insert_patch Description: Binary data
[patch] fix the uninitialized variable problem in avx512f-vbroadcastf64x4-2.c
Hi, I've fixed the uninitialized variable problem. Please have a look. Is it ok for trunk? 2014-07-15 Petr Murzin * gcc.target/i386/avx512f-vbroadcastf64x4-2.c: Fix the uninitialized variable problem. patch Description: Binary data
[patch] fix AVX512F tests
Hello, I've fixed AVX512F tests. These tests failed on Android because they were using , which seem to be obsolete and is not present in Android sysroot. Here is the quote from /* This interface is obsolete. New programs should use and/or instead of . */ So now tests run with Android compiler. Please have a look. Is it ok for trunk? 2014-07-18 Petr Murzin * gcc.target/i386/avx512f-vfixupimmpd-2.c: Add float.h instead of values.h, change MAXDOUBLE for DBL_MAX. * gcc.target/i386/avx512f-vfixupimmsd-2.c: Ditto. * gcc.target/i386/avx512f-vfixupimmps-2.c: Add float.h instead of values.h, change MAXFLOAT for FLT_MAX. * gcc.target/i386/avx512f-vfixupimmss-2.c: Ditto. * gcc.target/i386/avx512f-vpermi2d-2.c: Exclude values.h. * gcc.target/i386/avx512f-vpermi2pd-2.c: Ditto. * gcc.target/i386/avx512f-vpermi2ps-2.c: Ditto. * gcc.target/i386/avx512f-vpermi2q-2.c: Ditto. * gcc.target/i386/avx512f-vpermt2d-2.c: Ditto. * gcc.target/i386/avx512f-vpermt2pd-2.c: Ditto. * gcc.target/i386/avx512f-vpermt2ps-2.c: Ditto. * gcc.target/i386/avx512f-vpermt2q-2.c: Ditto. patch Description: Binary data