Re: [1/2] OpenACC routine support
On Mon, Nov 09, 2015 at 09:28:47PM -0800, Cesar Philippidis wrote: > Here's the patch that Nathan was referring to. I ended up introducing a > boolean variable named first in the various functions which call > finalize_oacc_routines. The problem the original approach was having was > that the routine clauses is only applied to the first function > declarator in a declaration list. By using 'first', which is set to true > if the current declarator is the first in a sequence of declarators, I > was able to defer setting parser->oacc_routine to NULL. The #pragma omp declare simd has identical restrictions, but doesn't need to add any of the first parameters to the C++ parser. So, what are you doing differently that you need it? Handling both differently is a consistency issue, and unnecessary additional complexity to the parser. Jakub
Re: [PATCH PR52272]Be smart when adding iv candidates
On Tue, Nov 10, 2015 at 9:26 AM, Bin.Cheng wrote: > On Mon, Nov 9, 2015 at 11:24 PM, Bernd Schmidt wrote: >> On 11/08/2015 10:11 AM, Richard Biener wrote: >>> >>> On November 8, 2015 3:58:57 AM GMT+01:00, "Bin.Cheng" >>> wrote: > > +inline bool > +iv_common_cand_hasher::equal (const iv_common_cand *ccand1, > + const iv_common_cand *ccand2) > +{ > + return ccand1->hash == ccand2->hash > +&& operand_equal_p (ccand1->base, ccand2->base, 0) > +&& operand_equal_p (ccand1->step, ccand2->step, 0) > +&& TYPE_PRECISION (TREE_TYPE (ccand1->base)) > + == TYPE_PRECISION (TREE_TYPE (ccand2->base)); > >>> Yes. Patch is OK then. >> >> >> Doesn't follow the formatting rules though in the quoted piece. > > Hi Bernd, > Thanks for reviewing. I haven't committed it yet, could you please > point out which quoted piece is so that I can update patch? Ah, the part quoted in review message, I was stupid and tried to find quoted part in my patch... I can see the problem now, here is the updated patch. Thanks, bin diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c index 1f952a7..aecba12 100644 --- a/gcc/tree-ssa-loop-ivopts.c +++ b/gcc/tree-ssa-loop-ivopts.c @@ -247,6 +247,45 @@ struct iv_cand smaller type. */ }; +/* Hashtable entry for common candidate derived from iv uses. */ +struct iv_common_cand +{ + tree base; + tree step; + /* IV uses from which this common candidate is derived. */ + vec uses; + hashval_t hash; +}; + +/* Hashtable helpers. */ + +struct iv_common_cand_hasher : free_ptr_hash +{ + static inline hashval_t hash (const iv_common_cand *); + static inline bool equal (const iv_common_cand *, const iv_common_cand *); +}; + +/* Hash function for possible common candidates. */ + +inline hashval_t +iv_common_cand_hasher::hash (const iv_common_cand *ccand) +{ + return ccand->hash; +} + +/* Hash table equality function for common candidates. */ + +inline bool +iv_common_cand_hasher::equal (const iv_common_cand *ccand1, + const iv_common_cand *ccand2) +{ + return ccand1->hash == ccand2->hash +&& operand_equal_p (ccand1->base, ccand2->base, 0) +&& operand_equal_p (ccand1->step, ccand2->step, 0) +&& TYPE_PRECISION (TREE_TYPE (ccand1->base)) + == TYPE_PRECISION (TREE_TYPE (ccand2->base)); +} + /* Loop invariant expression hashtable entry. */ struct iv_inv_expr_ent { @@ -255,8 +294,6 @@ struct iv_inv_expr_ent hashval_t hash; }; -/* The data used by the induction variable optimizations. */ - /* Hashtable helpers. */ struct iv_inv_expr_hasher : free_ptr_hash @@ -323,6 +360,12 @@ struct ivopts_data /* Cache used by tree_to_aff_combination_expand. */ hash_map *name_expansion_cache; + /* The hashtable of common candidates derived from iv uses. */ + hash_table *iv_common_cand_tab; + + /* The common candidates. */ + vec iv_common_cands; + /* The maximum invariant id. */ unsigned max_inv_id; @@ -894,6 +937,8 @@ tree_ssa_iv_optimize_init (struct ivopts_data *data) data->inv_expr_tab = new hash_table (10); data->inv_expr_id = 0; data->name_expansion_cache = NULL; + data->iv_common_cand_tab = new hash_table (10); + data->iv_common_cands.create (20); decl_rtl_to_reset.create (20); gcc_obstack_init (&data->iv_obstack); } @@ -3051,6 +3096,96 @@ add_iv_candidate_for_bivs (struct ivopts_data *data) } } +/* Record common candidate {BASE, STEP} derived from USE in hashtable. */ + +static void +record_common_cand (struct ivopts_data *data, tree base, + tree step, struct iv_use *use) +{ + struct iv_common_cand ent; + struct iv_common_cand **slot; + + gcc_assert (use != NULL); + + ent.base = base; + ent.step = step; + ent.hash = iterative_hash_expr (base, 0); + ent.hash = iterative_hash_expr (step, ent.hash); + + slot = data->iv_common_cand_tab->find_slot (&ent, INSERT); + if (*slot == NULL) +{ + *slot = XNEW (struct iv_common_cand); + (*slot)->base = base; + (*slot)->step = step; + (*slot)->uses.create (8); + (*slot)->hash = ent.hash; + data->iv_common_cands.safe_push ((*slot)); +} + (*slot)->uses.safe_push (use); + return; +} + +/* Comparison function used to sort common candidates. */ + +static int +common_cand_cmp (const void *p1, const void *p2) +{ + unsigned n1, n2; + const struct iv_common_cand *const *const ccand1 += (const struct iv_common_cand *const *)p1; + const struct iv_common_cand *const *const ccand2 += (const struct iv_common_cand *const *)p2; + + n1 = (*ccand1)->uses.length (); + n2 = (*ccand2)->uses.length (); + return n2 - n1; +} + +/* Adds IV candidates based on common candidated recorded. */ + +static void +add_iv_candidate_derived_from_uses (struct ivopts_data *data) +{ + unsigned i, j; + struct iv_cand *cand_1, *cand_2; + + data->iv_commo
Re: [PATCH, VECTOR ABI] Add __attribute__((__simd__)) to GCC.
Hi Jakub, On 29 Oct 09:54, Jakub Jelinek wrote: > On Wed, Oct 28, 2015 at 12:16:04PM +0300, Kirill Yukhin wrote: > > Bootstrapped. Regtested. Is it ok for trunk? > > > > > > gcc/ > > * omp-low.c (pass_omp_simd_clone::gate): If target allows - call > > without additional conditions. > > * doc/extend.texi (simd): Document new attribute. > > gcc/cp/ > > * parser.h (cp_parser): Add simd_attr_present. > > * parser.c (cp_parser_late_return_type_opt): Handle > > simd_attr_present, > > require comman in __vector__ attribute. > > (cp_parser_gnu_attribute_list): Ditto. > > gcc/c/ > > * c-parser.c (c_parser): Add simd_attr_present flag. > > (c_parser_declaration_or_fndef): Call c_parser_declaration_or_fndef > > if simd_attr_present is set. > > (c_finish_omp_declare_simd): Handle simd_attr_present. > > Actually, do you plan to eventually add some clauses/operands to the simd > attribute, or is the plan to just say that simd attribute is > #pragma omp declare simd > with no clauses as if -fopenmp-simd has been enabled? I think so/ > If you don't plan to add any clauses, I wonder whether you really need to > add any parser changes at all, whether this couldn't be all handled in > c-family/c-common.c - handle_simd_attribute, adding simd to the attribute > table in there as a function decl attribute, and simply when processing it > add > tree c = build_tree_list (get_identifier ("omp declare simd"), > NULL_TREE); > TREE_CHAIN (c) = DECL_ATTRIBUTES (fndecl); > DECL_ATTRIBUTES (fndecl) = c; > (after checking whether the attribute isn't already present and erroring out > if there is "cilk simd function" attribute). > The reason for the (admittedly ugly) parser changes for #pragma omp declare > simd is > that the clauses on the directive refer to parameters that will be declared > later, so we need to save the tokens of the pragma and then after parsing > the parameter declarations actually parse the clauses. But, in the simd > attribute case, there are no clauses, there is nothing to parse later. I've refactored the patch. New tests pass except one, which fails due to PR68158. Bootstrapped and reg-tested. Is it ok for trunk? gcc/ * omp-low.c (pass_omp_simd_clone::gate): If target allows - call without additional conditions. * doc/extend.texi (@item simd): New. gcc/c-family/ * c-common.c (handle_simd_attribute): New. (struct attribute_spec): Add entry for "simd". (handle_simd_attribute): New gcc/c/ * c-parser.c (c_finish_omp_declare_simd): Look for "simd" attribute as well. Update error message. gcc/cp/ * parser.c (cp_parser_late_parsing_cilk_simd_fn_info): Look for "simd" attribute as well. Update error message. gcc/testsuite/ * c-c++-common/attr-simd.c: New test. * c-c++-common/attr-simd-2.c: Ditto. * c-c++-common/attr-simd-3.c: Ditto. > Jakub -- Thanks, K diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c index 1c75921..08ab220 100644 --- a/gcc/c-family/c-common.c +++ b/gcc/c-family/c-common.c @@ -392,6 +392,7 @@ static tree handle_warn_unused_attribute (tree *, tree, tree, int, bool *); static tree handle_returns_nonnull_attribute (tree *, tree, tree, int, bool *); static tree handle_omp_declare_simd_attribute (tree *, tree, tree, int, bool *); +static tree handle_simd_attribute (tree *, tree, tree, int, bool *); static tree handle_omp_declare_target_attribute (tree *, tree, tree, int, bool *); static tree handle_designated_init_attribute (tree *, tree, tree, int, bool *); @@ -818,6 +819,8 @@ const struct attribute_spec c_common_attribute_table[] = handle_omp_declare_simd_attribute, false }, { "cilk simd function", 0, -1, true, false, false, handle_omp_declare_simd_attribute, false }, + { "simd", 0, -1, true, false, false, + handle_simd_attribute, false }, { "omp declare target", 0, 0, true, false, false, handle_omp_declare_target_attribute, false }, { "alloc_align", 1, 1, false, true, true, @@ -8955,6 +8958,37 @@ handle_omp_declare_simd_attribute (tree *, tree, tree, int, bool *) return NULL_TREE; } +/* Handle an "simd" attribute. */ + +static tree +handle_simd_attribute (tree *node, tree name, tree ARG_UNUSED (args), + int ARG_UNUSED (flags), bool *no_add_attrs) +{ + if (TREE_CODE (*node) == FUNCTION_DECL) +{ + if (lookup_attribute ("cilk simd function", DECL_ATTRIBUTES (*node)) != NULL) + { + error_at (DECL_SOURCE_LOCATION (*node), + "%<__simd__%> attribute cannot be " + "used in the same function marked as a Cilk Plus SIMD-ena
Re: [hsa 9/12] Small alloc-pool fix
On 11/06/2015 10:57 AM, Richard Biener wrote: > On Fri, 6 Nov 2015, Martin Liška wrote: > >> On 11/06/2015 10:00 AM, Richard Biener wrote: >>> On Thu, 5 Nov 2015, Martin Jambor wrote: >>> Hi, we use C++ new operators based on alloc-pools a lot in the subsequent patches and realized that on the current trunk, such new operators would needlessly call the placement ::new operator within the allocate method of pool-alloc. Fixed below by providing a new allocation method which does not call placement new, which is only safe to use from within a new operator. The patch also fixes the slightly weird two parameter operator new (which we do not use in HSA backend) so that it does not do the same. >>> >> >> Hi. >> >>> Why do you need to add the pointer variant then? >> >> You are right, we originally used the variant in the branch, but it was >> eventually >> left. >> >>> >>> Also isn't the issue with allocate() that it does >>> >>> return ::new (m_allocator.allocate ()) T (); >>> >>> which 1) value-initializes and 2) doesn't even work with types like >>> >>> struct T { T(int); }; >>> >>> thus types without a default constructor. >> >> You are right, it produces compilation error. >> >>> >>> I think the allocator was poorly C++-ified without updating the >>> specification for the cases it is supposed to handle. And now >>> we have C++ uses that are not working because the allocator is >>> broken. >>> >>> An incrementally better version (w/o fixing the issue with >>> types w/o default constructor) is >>> >>> return ::new (m_allocator.allocate ()) T; >> >> I've tried that, and it also calls default ctor: >> >> ../../gcc/alloc-pool.h: In instantiation of ‘T* >> object_allocator::allocate() [with T = et_occ]’: >> ../../gcc/alloc-pool.h:531:22: required from ‘void* operator new(size_t, >> object_allocator&) [with T = et_occ; size_t = long unsigned int]’ >> ../../gcc/et-forest.c:449:46: required from here >> ../../gcc/et-forest.c:58:3: error: ‘et_occ::et_occ()’ is private >>et_occ (); >>^ >> In file included from ../../gcc/et-forest.c:28:0: >> ../../gcc/alloc-pool.h:483:44: error: within this context >> return ::new (m_allocator.allocate ()) T; > > Yes, but it does slightly cheaper initialization of PODs > >> >>> >>> thus default-initialize which does no initialization for PODs (without >>> array members...) which is what the old pool allocator did. >> >> I'm not so familiar with differences related to PODs. >> >>> >>> To fix the new operator (how do you even call that? does it allow >>> specifying constructor args and thus work without a default constructor?) >>> it should indeed use an allocation method not performing the placement >>> new. But I'd call it allocate_raw rather than vallocate. >> >> For situations where do not have a default ctor, one should you the >> helper method defined at the end of alloc-pool.h: >> >> template >> inline void * >> operator new (size_t, object_allocator &a) >> { >> return a.allocate (); >> } >> >> For instance: >> et_occ *nw = new (et_occurrences) et_occ (2); > > Oh, so it uses placement new syntax... works for me. > >> or as used in the HSA branch: >> >> /* New operator to allocate convert instruction from pool alloc. */ >> >> void * >> hsa_insn_cvt::operator new (size_t) >> { >> return hsa_allocp_inst_cvt->allocate_raw (); >> } >> >> and >> >> cvtinsn = new hsa_insn_cvt (reg, *ptmp2); >> >> >> I attached patch where I rename the method as suggested. > > Ok. Hi. I'm sending suggested patch that survives regression tests and bootstrap on x86_64-linux-gnu. Can I install the patch to trunk? Thanks, Martin > > Thanks, > Richard. > >> Thanks, >> Martin >> >>> >>> Thanks. >>> Richard. >>> Thanks, Martin 2015-11-05 Martin Liska Martin Jambor * alloc-pool.h (object_allocator::vallocate): New method. (operator new): Call vallocate instead of allocate. (operator new): New operator. diff --git a/gcc/alloc-pool.h b/gcc/alloc-pool.h index 0dc05cd..46b6550 100644 --- a/gcc/alloc-pool.h +++ b/gcc/alloc-pool.h @@ -483,6 +483,12 @@ public: return ::new (m_allocator.allocate ()) T (); } + inline void * + vallocate () ATTRIBUTE_MALLOC + { +return m_allocator.allocate (); + } + inline void remove (T *object) { @@ -523,12 +529,19 @@ struct alloc_pool_descriptor }; /* Helper for classes that do not provide default ctor. */ - template inline void * operator new (size_t, object_allocator &a) { - return a.allocate (); + return a.vallocate (); +} + +/* Helper for classes that do not provide default ctor. */ +template +inline void * +operator new (size_t, object_allocator *a) +{ + return a->vallocate ()
Re: [PATCH, VECTOR ABI] Add __attribute__((__simd__)) to GCC.
On Tue, Nov 10, 2015 at 11:44:18AM +0300, Kirill Yukhin wrote: > gcc/ > * omp-low.c (pass_omp_simd_clone::gate): If target allows - call > without additional conditions. Please make sure CHangeLog entries are tab indented. I would just use a comma instead of " -" above. > * c-c++-common/attr-simd.c: New test. > * c-c++-common/attr-simd-2.c: Ditto. > * c-c++-common/attr-simd-3.c: Ditto. New test is IMHO short enough that it is better to spell it in each case. > bool *); > +static tree handle_simd_attribute (tree *, tree, tree, int, bool *); > static tree handle_omp_declare_target_attribute (tree *, tree, tree, int, >bool *); > static tree handle_designated_init_attribute (tree *, tree, tree, int, bool > *); > @@ -818,6 +819,8 @@ const struct attribute_spec c_common_attribute_table[] = > handle_omp_declare_simd_attribute, false }, >{ "cilk simd function", 0, -1, true, false, false, > handle_omp_declare_simd_attribute, false }, > + { "simd",0, -1, true, false, false, > + handle_simd_attribute, false }, Why the -1 in there? I thought the simd attribute has exactly zero arguments, so 0, 0. > +static tree > +handle_simd_attribute (tree *node, tree name, tree ARG_UNUSED (args), > +int ARG_UNUSED (flags), bool *no_add_attrs) As per recent discussion, please leave ARG_UNUSED (args) etc. out, just use unnamed arguments. > +{ > + if (TREE_CODE (*node) == FUNCTION_DECL) > +{ > + if (lookup_attribute ("cilk simd function", DECL_ATTRIBUTES (*node)) > != NULL) Too long line. > + { > + error_at (DECL_SOURCE_LOCATION (*node), > + "%<__simd__%> attribute cannot be " > + "used in the same function marked as a Cilk Plus > SIMD-enabled function"); Too long line. You should just move "use in the same " one line above. > + *no_add_attrs = true; > + } > + else > + { > + DECL_ATTRIBUTES (*node) > + = tree_cons (get_identifier ("omp declare simd"), > + NULL_TREE, DECL_ATTRIBUTES (*node)); > + } Please avoid {}s around single statement in the body. > { > - error ("%<#pragma omp declare simd%> cannot be used in the same " > + error ("%<#pragma omp declare simd%> or % attribute cannot be > used in the same " >"function marked as a Cilk Plus SIMD-enabled function"); Too long line. > + if (lookup_attribute ("simd", DECL_ATTRIBUTES (fndecl)) != NULL) > + { > + error_at (DECL_SOURCE_LOCATION (fndecl), > + "%<__simd__%> attribute cannot be " > + "used in the same function marked as a Cilk Plus > SIMD-enabled function"); Too long line, see above. >c = build_tree_list (get_identifier ("omp declare simd"), c); >TREE_CHAIN (c) = DECL_ATTRIBUTES (fndecl); >DECL_ATTRIBUTES (fndecl) = c; > diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c > index 7555bf3..f3831b9 100644 > --- a/gcc/cp/parser.c > +++ b/gcc/cp/parser.c > @@ -34534,10 +34534,11 @@ cp_parser_late_parsing_cilk_simd_fn_info (cp_parser > *parser, tree attrs) >cp_omp_declare_simd_data *info = parser->cilk_simd_fn_info; >int ii = 0; > > - if (parser->omp_declare_simd != NULL) > + if (parser->omp_declare_simd != NULL > + || lookup_attribute ("simd", attrs)) > { > - error ("%<#pragma omp declare simd%> cannot be used in the same > function" > - " marked as a Cilk Plus SIMD-enabled function"); > + error ("%<#pragma omp declare simd%> of % attribute cannot be > used " > + "in the same function marked as a Cilk Plus SIMD-enabled > function"); Too long lines. >XDELETE (parser->cilk_simd_fn_info); >parser->cilk_simd_fn_info = NULL; >return attrs; > diff --git a/gcc/doc/extend.texi b/gcc/doc/extend.texi > index fdb1547..32994a2 100644 > --- a/gcc/doc/extend.texi > +++ b/gcc/doc/extend.texi > @@ -3066,6 +3066,21 @@ This function attribute make a stack protection of the > function if > flags @option{fstack-protector} or @option{fstack-protector-strong} > or @option{fstack-protector-explicit} are set. > > +@item simd > +@cindex @code{simd} function attribute. > +This attribute enables creation of one or more function versions that > +can process multiple arguments using SIMD instructions from a > +single invocation. Specifying this attribute allows compiler to > +assume that such a versions are available at link time (provided > +in the same or another translation unit). Generated versions are > +target dependent and described in corresponding Vector ABI document. For > +x86_64 target this document can be found > +@w{@uref{https://sourceware.org/glibc/wiki/libmvec?action=AttachFile&do=view&target=VectorABI.txt,here}}. Are you
Re: [hsa 5/12] New HSA-related GCC options
On Mon, 9 Nov 2015, Martin Jambor wrote: > Hi, > > On Fri, Nov 06, 2015 at 09:42:25AM +0100, Richard Biener wrote: > > On Thu, 5 Nov 2015, Martin Jambor wrote: > > > > > Hi, > > > > > > the following small part of the merge deals with new options. It adds > > > four independent things: > > > > > > 1) flag_disable_hsa is used by code in opts.c (in the first patch) to > > >remember whether HSA has been explicitely disabled on the compiler > > >command line. > > > > But I don't see any way to disable it on the command line? (no switch?) > > No, the switch is -foffload, which has missing documentation (PR > 67300) and is only described at https://gcc.gnu.org/wiki/Offloading > Nevertheless, the option allows the user to specify compiler option > -foffload=disable and no offloading should happen, not even HSA. The > user can also enumerate just the offload targets they want (and pass > them special command line stuff). > > It seems I have misplaced a hunk in the patch series. Nevertheless, > in the first patch (with configuration stuff), there is a change to > opts.c which scans the -foffload= contents and sets the flag variable > if hsa is not present. > > Whenever the compiler has to decide whether HSA is enabled for the > given compilation or not, it has to look at this variable (if > configured for HSA). > > > > > > 2) -Whsa is a new warning we emit whenever we fail to produce HSAIL > > >for some source code. It is on by default but of course only > > >emitted by HSAIL generating code so should never affect anybody who > > >does not use HSA-enabled compiler and OpenMP 4 device constructs. > > > > > > We have found the following two additions very useful for debugging on > > > the branch but will understand if they are not deemed suitable for > > > trunk and will gladly remove them: > > > > > > 3) -fdisable-hsa-gridification disables the gridification process to > > >ease experimenting with dynamic parallelism. With this option, > > >HSAIL is always generated from the CPU-intended gimple. > > > > So this sounds like sth a user should never do which means > > it shouln't be a switch (but a parameter or removed). > > Martin said he likes the capability to switch gridification off so I > turned it into a parameter. > > > > > > 4) Parameter hsa-gen-debug-stores will be obsolete once HSA run-time > > >supports debugging traps. Before that, we have to do with > > >debugging stores to memory at defined places, which however can > > >cost speed in benchmarks. So they are only enabled with this > > >parameter. We decided to make it a parameter rather than a switch > > >to emphasize the fact it will go away and to possibly allow us > > >select different levels of verbosity of the stores in the future). > > > > You miss documentation in invoke.texi for new switches and parameters. > > Right, I have added that together with other changes addressing the > above comments and am about to commit the following to the branch: Looks good to me. Thanks, Richard. > > 2015-11-09 Martin Jambor > > * common.opt (-fdisable-hsa-gridification): Removed. > * params.def (PARAM_OMP_GPU_GRIDIFY): New. > * omp-low.c: Include params.h. > (execute_lower_omp): Check parameter PARAM_OMP_GPU_GRIDIFY instead of > flag_disable-hsa-gridification. > * doc/invoke.texi (Optimize Options): Add description of > omp-gpu-gridify and hsa-gen-debug-stores parameters. > > diff --git a/gcc/common.opt b/gcc/common.opt > index 9cb52db..8bee504 100644 > --- a/gcc/common.opt > +++ b/gcc/common.opt > @@ -1115,10 +1115,6 @@ fdiagnostics-show-location= > Common Joined RejectNegative Enum(diagnostic_prefixing_rule) > -fdiagnostics-show-location=[once|every-line]How often to emit > source location at the beginning of line-wrapped diagnostics. > > -fdisable-hsa-gridification > -Common Report Var(flag_disable_hsa_gridification) > -Disable HSA gridification for OMP pragmas > - > ; Required for these enum values. > SourceInclude > pretty-print.h > diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi > index 4fc7d88..b9fb1e1 100644 > --- a/gcc/doc/invoke.texi > +++ b/gcc/doc/invoke.texi > @@ -11171,6 +11171,17 @@ dynamic, guided, auto, runtime). The default is > static. > Maximum depth of recursion when querying properties of SSA names in things > like fold routines. One level of recursion corresponds to following a > use-def chain. > + > +@item omp-gpu-gridify > +Enable creation of gridified GPU kernels out of loops within target > +OpenMP constructs. This conversion is enabled by default when > +offloading to HSA, to disable it, use @option{--param omp-gpu-gridify=0} > + > +@item hsa-gen-debug-stores > +Enable emission of special debug stores within HSA kernels which are > +then read and reported by libgomp plugin. Generation of these stores > +is disabled by default, use @option{--param hsa-gen-debug-stores=1} to >
Re: RFC: C++ delayed folding merge
On Mon, 9 Nov 2015, Jason Merrill wrote: > On 11/09/2015 02:28 PM, Jason Merrill wrote: > > On 11/09/2015 04:08 AM, Richard Biener wrote: > > > On Mon, 9 Nov 2015, Jason Merrill wrote: > > > > > > > I'm planning to merge the C++ delayed folding branch this week, but I > > > > need to > > > > get approval of the back end changes (the first patch attached). > > > > Most of > > > > these are the introduction of non-folding variants of convert_to_*, > > > > but there > > > > are a few others. > > > > > > > > One question: The branch changes 'convert' to not fold its result, > > > > and it's > > > > not clear to me whether that's part of the expected behavior of a > > > > front end > > > > 'convert' function or not. > > > > > > History. convert is purely frontend (but shared, unfortunately between > > > all frontends). I would expect that FEs that do not do delayed folding > > > expect convert to fold. > > > > > > > Also, I'm a bit uncertain about merging this at the end of stage 1, > > > > since it's > > > > a large internal change with relatively small user impact; it just > > > > improves > > > > handling of constant expression corner cases. I'm inclined to go > > > > ahead with > > > > it at this point, but I'm interested in contrary opinions. > > > > > > I welcome this change as it should allow cleaning up the FE-middle-end > > > interface a bit more. It should be possible to remove all > > > NON_LVALUE_EXPR adding/removal from the middle-end folders. > > > > > > Looks like the backend patch included frontend parts but as far as I > > > skimmed it only > > > > > > diff --git a/gcc/fold-const.c b/gcc/fold-const.c > > > index 5e32901..d754a90 100644 > > > --- a/gcc/fold-const.c > > > +++ b/gcc/fold-const.c > > > @@ -2091,6 +2091,17 @@ fold_convert_const (enum tree_code code, tree > > > type, > > > tree arg1) > > > else if (TREE_CODE (arg1) == REAL_CST) > > > return fold_convert_const_fixed_from_real (type, arg1); > > > } > > > + else if (TREE_CODE (type) == VECTOR_TYPE) > > > +{ > > > + if (TREE_CODE (arg1) == VECTOR_CST > > > + && TYPE_MAIN_VARIANT (type) == TYPE_MAIN_VARIANT (TREE_TYPE > > > (arg1)) > > > + && TYPE_VECTOR_SUBPARTS (type) == VECTOR_CST_NELTS (arg1)) > > > + { > > > + tree r = copy_node (arg1); > > > + TREE_TYPE (arg1) = type; > > > + return r; > > > + } > > > +} > > > > > > > > > looks suspicious. The issue here is that the vector elements will > > > have the wrong type after this simple handling. > > > > I was aiming to just handle simple cv-qualifier changes; that's why the > > TYPE_MAIN_VARIANT comparison is there. > > > > > If you fix that you can as well handle all kind of element type > > > changes via recursing to fold_convert_const (that includes > > > float to int / int to float changes). > > > > But I'll try this. > > Like so? Yes. Thanks, Richard.
Re: [patch] Fix PR middle-end/68251
> Tested on x86_64-suse-linux, OK for the mainline? I'll install the Fortran > testcase once it is reduced because it takes a while to compile ATM. Here it is, as reduced by Joost, installed on the mainline. 2015-11-10 Eric Botcazou * gfortran.dg/pr68251.f90: New test. -- Eric Botcazou! PR middle-end/68251 ! Reduced testcase by Joost VandeVondele ! { dg-do compile } ! { dg-options "-O3" } MODULE hfx_contract_block INTEGER, PARAMETER :: dp=8 CONTAINS SUBROUTINE contract_block(ma_max,mb_max,mc_max,md_max,kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale) REAL(KIND=dp) :: kbd(mb_max*md_max), kbc(mb_max*mc_max), & kad(ma_max*md_max), kac(ma_max*mc_max), pbd(mb_max*md_max), & pbc(mb_max*mc_max), pad(ma_max*md_max), pac(ma_max*mc_max), & prim(ma_max*mb_max*mc_max*md_max), scale SELECT CASE(ma_max) CASE(1) SELECT CASE(mb_max) CASE(1) SELECT CASE(mc_max) CASE(1) SELECT CASE(md_max) CASE(1) CALL block_1_1_1_1(kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale) CALL block_1_1_1_2(kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale) CALL block_1_1_11(md_max,kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale) END SELECT END SELECT SELECT CASE(mc_max) CASE(1) SELECT CASE(md_max) CASE(2) CALL block_1_2_1_2(kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale) CALL block_1_2_1_3(kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale) CALL block_1_2_1_4(kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale) CALL block_1_2_1_5(kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale) CALL block_1_2_1_6(kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale) CALL block_1_2_1_7(kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale) CALL block_1_2_2_2(kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale) CALL block_1_2_2_4(kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale) CALL block_1_2_4_1(kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale) CALL block_1_2_6_1(kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale) END SELECT SELECT CASE(md_max) CASE(1) CALL block_1_2_7_1(kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale) END SELECT END SELECT SELECT CASE(mc_max) CASE(1) SELECT CASE(md_max) CASE(1) CALL block_1_3_1_1(kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale) CALL block_1_3_1_3(kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale) CALL block_1_3_1_4(kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale) CALL block_1_3_1_5(kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale) CALL block_1_3_1_6(kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale) CALL block_1_3_1(md_max,kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale) CALL block_1_3_2_1(kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale) CALL block_1_3_2_2(kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale) CALL block_1_3_2_3(kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale) END SELECT SELECT CASE(md_max) CASE(1) CALL block_1_3_3_1(kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale) CALL block_1_3_3_2(kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale) END SELECT SELECT CASE(md_max) CASE(1) CALL block_1_3_5(md_max,kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale) CALL block_1_3_5(md_max,kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale) END SELECT END SELECT SELECT CASE(mc_max) CASE(1) SELECT CASE(md_max) CASE(1) CALL block_1_4_1_1(kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale) CALL block_1_4_1_2(kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale) CALL block_1_4_1_3(kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale) END SELECT SELECT CASE(md_max) CASE(1) CALL block_1_4_2_1(kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale) CALL block_1_4_2_2(kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale) CALL block_1_4_3_1(kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale) CALL block_1_4_3(md_max,kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale) CALL block_1_4_3(md_max,kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale) CALL block_1_4_3(md_max,kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale) CALL block_1_4_3(md_max,kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale) CALL block_1_4_3(md_max,kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale) CALL block_1_4_3(md_max,kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale) CALL block_1_4_3(md_max,kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale) CALL block_1_4_3(md_max,kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale) CALL block_1_4_4_1(kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale) CALL block_1_4_4(md_max,kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale) END SELECT SELECT CASE(md
[PATCH][ARM][3/3][v2] Implement negsicc, notsicc optabs
Hi all, This is a slight respin of https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00075.html. This had been ok'd but I've encountered a bug with the *if__move pattern For some reason, after reload operands[1] doesn't end up living in the same register as operands[0] even though it has the constraint '0'. Maybe I misunderstood the semantics of the '0' constraints. In any case, telling the splitter to explicitly emit the move before the cond_exec if the registers don't match fixes this. Bootstrapped and tested on arm. Ok to commit this updated version instead? Thanks, Kyrill 2015-11-10 Kyrylo Tkachov * config/arm/arm.md (sicc): New define_expand. (*if_neg_move): Rename to... (*if__move): ... This. Use NOT_NEG code iterator. Move operands[1] into operands[0] if they don't match up. * config/arm/iterators.md (NOT_NEG): New code iterator. (NOT_NEG_op): New code attribute. 2015-11-10 Kyrylo Tkachov * gcc.target/arm/cond_op_imm_1.c: New test. commit c5a3ade022a18dad02d3391aab7af9ddf7e26340 Author: Kyrylo Tkachov Date: Fri Aug 14 13:42:51 2015 +0100 [ARM][3/3] Implement negsicc, notsicc optabs diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md index 8ebb1bf..ab7ece9 100644 --- a/gcc/config/arm/arm.md +++ b/gcc/config/arm/arm.md @@ -10079,19 +10079,43 @@ (define_insn "*ifcompare_neg_move" (set_attr "type" "multiple")] ) -(define_insn_and_split "*if_neg_move" +;; The negsicc and notsicc optabs. +(define_expand "sicc" + [(set (match_operand:SI 0 "s_register_operand" "") + (if_then_else:SI (match_operand 1 "arm_comparison_operator" "") + (NOT_NEG:SI (match_operand:SI 2 "s_register_operand" "")) + (match_operand:SI 3 "s_register_operand" "")))] + "TARGET_32BIT" + { +rtx ccreg; +enum rtx_code code = GET_CODE (operands[1]); + +if (code == UNEQ || code == LTGT) + FAIL; + +ccreg = arm_gen_compare_reg (code, XEXP (operands[1], 0), + XEXP (operands[1], 1), NULL); +operands[1] = gen_rtx_fmt_ee (code, VOIDmode, ccreg, const0_rtx); + } +) + + +(define_insn_and_split "*if__move" [(set (match_operand:SI 0 "s_register_operand" "=l,r") (if_then_else:SI (match_operator 4 "arm_comparison_operator" [(match_operand 3 "cc_register" "") (const_int 0)]) - (neg:SI (match_operand:SI 2 "s_register_operand" "l,r")) + (NOT_NEG:SI (match_operand:SI 2 "s_register_operand" "l,r")) (match_operand:SI 1 "s_register_operand" "0,0")))] "TARGET_32BIT" "#" "&& reload_completed" [(cond_exec (match_op_dup 4 [(match_dup 3) (const_int 0)]) - (set (match_dup 0) (neg:SI (match_dup 2] - "" + (set (match_dup 0) (NOT_NEG:SI (match_dup 2] + { +if (!rtx_equal_p (operands[0], operands[1])) + emit_move_insn (operands[0], operands[1]); + } [(set_attr "conds" "use") (set_attr "length" "4") (set_attr "arch" "t2,32") diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md index 6a54125..2f4bc5c 100644 --- a/gcc/config/arm/iterators.md +++ b/gcc/config/arm/iterators.md @@ -209,6 +209,9 @@ (define_code_iterator COMPARISONS [eq gt ge le lt]) ;; A list of ... (define_code_iterator IOR_XOR [ior xor]) +;; Bitwise complement and negation +(define_code_iterator NOT_NEG [not neg]) + ;; Operations on two halves of a quadword vector. (define_code_iterator VQH_OPS [plus smin smax umin umax]) @@ -656,6 +659,8 @@ (define_code_attr VQH_type [(plus "add") (smin "minmax") (smax "minmax") (define_code_attr VQH_sign [(plus "i") (smin "s") (smax "s") (umin "u") (umax "u")]) +(define_code_attr NOT_NEG_op [(not "not") (neg "neg")]) + (define_code_attr cnb [(ltu "CC_C") (geu "CC")]) (define_code_attr optab [(ltu "ltu") (geu "geu")]) diff --git a/gcc/testsuite/gcc.target/arm/cond_op_imm_1.c b/gcc/testsuite/gcc.target/arm/cond_op_imm_1.c new file mode 100644 index 000..9d335e2 --- /dev/null +++ b/gcc/testsuite/gcc.target/arm/cond_op_imm_1.c @@ -0,0 +1,42 @@ +/* { dg-do run } */ +/* { dg-options "-save-temps -O2 -fno-inline" } */ +/* { dg-require-effective-target arm_cond_exec } */ + +extern void abort (void); + +#define N 25089992 + +int +foonegsi (int a) +{ + return a ? N : -N; +} + +int +fooinvsi (int a) +{ + return a ? N : ~N; +} + + + +int +main (void) +{ + if (foonegsi (1) != N) +abort (); + + if (foonegsi (0) != -N) +abort (); + + if (fooinvsi (1) != N) +abort (); + + if (fooinvsi (0) != ~N) +abort (); + + return 0; +} + +/* { dg-final { scan-assembler "rsbne" } } */ +/* { dg-final { scan-assembler "mvnne" } } */
[PATCH] Fix PR56118
The following fixes PR56118 by adjusting the cost model handling of basic-block vectorization to favor the vectorized version in case estimated cost is the same as the estimated cost of the scalar version. This makes sense because we over-estimate the vectorized cost in several places. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2015-11-10 Richard Biener PR tree-optimization/56118 * tree-vect-slp.c (vect_bb_vectorization_profitable_p): Make equal cost favor vectorized version. * gcc.target/i386/pr56118.c: New testcase. Index: gcc/tree-vect-slp.c === --- gcc/tree-vect-slp.c (revision 230020) +++ gcc/tree-vect-slp.c (working copy) @@ -2317,9 +2317,12 @@ vect_bb_vectorization_profitable_p (bb_v dump_printf (MSG_NOTE, " Scalar cost of basic block: %d\n", scalar_cost); } - /* Vectorization is profitable if its cost is less than the cost of scalar - version. */ - if (vec_outside_cost + vec_inside_cost >= scalar_cost) + /* Vectorization is profitable if its cost is more than the cost of scalar + version. Note that we err on the vector side for equal cost because + the cost estimate is otherwise quite pessimistic (constant uses are + free on the scalar side but cost a load on the vector side for + example). */ + if (vec_outside_cost + vec_inside_cost > scalar_cost) return false; return true; Index: gcc/testsuite/gcc.target/i386/pr56118.c === --- gcc/testsuite/gcc.target/i386/pr56118.c (revision 0) +++ gcc/testsuite/gcc.target/i386/pr56118.c (working copy) @@ -0,0 +1,17 @@ +/* { dg-do compile } */ +/* { dg-options "-O3 -msse2" } */ + +#include + +__m128d f() +{ + __m128d r={3,4}; + r[0]=1; + r[1]=2; + return r; +} + +/* We want to "vectorize" this to a aligned vector load from the + constant pool. */ + +/* { dg-final { scan-assembler "movapd" } } */
Re: RFC: Incomplete Draft Patches to Correct Errors in Loop Unrolling Frequencies (bugzilla problem 68212)
On November 9, 2015 6:35:20 PM GMT+01:00, Bernd Schmidt wrote: >On 11/07/2015 03:44 PM, Kelvin Nilsen wrote: >> +bool >> +in_loop_p (basic_block block, struct loop *loop_ptr) >> +{ >> + basic_block *bbs = get_loop_body (loop_ptr); >> + bool result = false; >> + >> + for (unsigned int i = 0; i < loop_ptr->num_nodes; i++) >> +{ >> + if (bbs[i] == block) >> +result = true; >> +} > >I think something that starts with bb->loop_father and iterates >outwards >would be more efficient. flow_bb_inside_loop_p() ? Cheers,
Re: Extend tree-call-cdce to calls whose result is used
On Mon, Nov 9, 2015 at 10:03 PM, Michael Matz wrote: > Hi, > > On Mon, 9 Nov 2015, Richard Sandiford wrote: > >> +static bool >> +can_use_internal_fn (gcall *call) >> +{ >> + /* Only replace calls that set errno. */ >> + if (!gimple_vdef (call)) >> +return false; > > Oh, I managed to confuse this in my head while reading the patch. So, > hmm, you don't actually replace the builtin with an internal function > (without the condition) under no-errno-math? Does something else do that? > Because otherwise that seems an unnecessary restriction? > >> >> r229916 fixed that for the non-EH case. >> > >> > Ah, missed it. Even the EH case shouldn't be difficult. If the >> > original dominator of the EH destination was the call block it moves, >> > otherwise it remains unchanged. >> >> The target of the edge is easy in itself, I agree, but that isn't >> necessarily the only affected block, if the EH handler doesn't >> exit or rethrow. > > You're worried the non-EH and the EH regions merge again, right? Like so: > > before change: > > BB1: throwing-call > fallthru/ \EH > BB2 BBeh > | /\ (stuff in EH-region) > | /some path out of EH region > | /--/ > BB3 > > Here, BB3 must at least be dominated by BB1 (the throwing block), or by > something further up (when there are other side-entries to the path > BB2->BB3 or into the EH region). When further up, nothing changes, when > it's BB1, then it's afterwards dominated by the BB containing the > condition. So everything with idom==BB1 gets idom=Bcond, except for BBeh, > which gets idom=Bcall. Depending on how you split BB1, either Bcond or > BBcall might still be BB1 and doesn't lead to changes in the dom tree. > >> > Currently we have quite some of such passes (reassoc, forwprop, >> > lower_vector_ssa, cse_reciprocals, cse_sincos (sigh!), optimize_bswap >> > and others), but they are all handling only special situations in one >> > way or the other. pass_fold_builtins is another one, but it seems >> > most related to what you want (replacing a call with something else), >> > so I thought that'd be the natural choice. >> >> Well, to be pedantic, it's not really replacing the call. Except for >> the special case of targets that support direct assignments to errno, >> it keeps the original call but ensures that it isn't usually executed. >> From that point of view it doesn't really seem like a fold. >> >> But I suppose that's just naming again :-). And it's easily solved with >> s/fold/rewrite/. > > Exactly, in my mind pass_fold_builtin (like many of the others I > mentioned) doesn't do folding but rewriting :) So I am replying here to the issue of where to do the transform call_cdce does and the one Richard wants to add. For example we "lower" posix_memalign as early as GIMPLE lowering (that's before CFG construction). We also lower sincos to cexpi during GENERIC folding (or if that is dropped either GIMPLE lowering or GIMPLE folding during gimplification would be appropriate). Now, with offloading we have to avoid creating target dependencies before LTO stream-out (thus no IFN replacements before that - not sure if Richards patches have an issue there already). Which would leave us with a lowering stage early in the main optimization pipeline - I think fold_builtins pass is way too late but any "folding" pass will do (like forwprop or backprop where the latter might be better because it might end up computing FP "ranges" to improve the initial lowering code). Of course call_cdce is as good as long as it still exists. >> > call_cdce is also such a pass, but I think it's simply not the >> > appropriate one (only in so far as its source file contains the helper >> > routines you need), and in addition I think it shouldn't exist at all >> > (and wouldn't need to if it had been part of DCE from the start, or if >> > you implemented the conditionalizing as part of another pass). Hey, >> > you could be one to remove a pass! ;-) >> >> It still seems a bit artificial to me to say that the transformation >> with a null lhs is "DCE enough" to go in the main DCE pass (even though >> like I say it doesn't actually eliminate any code from the IR, it just >> adds more code) and should be kept in a separate pass from the one that >> does the transformation on a non-null lhs. > > Oh, I agree, I might not have been clear: I'm not arguing that the normal > DCE should now be changed to do the conditionalizing when it removes an > call LHS; I was saying that it _would_ have been good instead of adding > the call_cdce pass in the past, when it was for DCE purposes only. Yes, I also argued that. > But > now your proposal is on the plate, namely doing the conditionalizing also > with an LHS. So that conditionalizing should take place in some rewriting > pass (and ideally not call_cdce), no matter the LHS, and normal DCE not be > changed (it will sti
Re: Use combined_fn in tree-vrp.c
On Tue, Nov 10, 2015 at 1:09 AM, Bernd Schmidt wrote: > On 11/07/2015 01:46 PM, Richard Sandiford wrote: >> >> @@ -3814,8 +3817,8 @@ extract_range_basic (value_range *vr, gimple *stmt) >> break; >> /* Both __builtin_ffs* and __builtin_popcount return >> [0, prec]. */ >> - CASE_INT_FN (BUILT_IN_FFS): >> - CASE_INT_FN (BUILT_IN_POPCOUNT): >> + CASE_CFN_FFS: >> + CASE_CFN_POPCOUNT: >> arg = gimple_call_arg (stmt, 0); >> prec = TYPE_PRECISION (TREE_TYPE (arg)); >> mini = 0; > > > So let me see if I understood this. From what we discussed the purpose of > these new internal functions is that they can have vector types. If so, > isn't this code (here and elsewhere) which expects integers potentially > going to be confused? We indeed need to add additional checks to most users of CASE_CFN_* to cover the bigger freedom that exists with respect to types. Richard, please audit all the cases you change for that. Thanks, Richard. > > > Bernd
Re: [PATCH PR52272]Be smart when adding iv candidates
On 11/10/2015 09:25 AM, Bin.Cheng wrote: Thanks for reviewing. I haven't committed it yet, could you please point out which quoted piece is so that I can update patch? Sorry, I thought it was pretty obvious... +{ + return ccand1->hash == ccand2->hash +&& operand_equal_p (ccand1->base, ccand2->base, 0) +&& operand_equal_p (ccand1->step, ccand2->step, 0) +&& TYPE_PRECISION (TREE_TYPE (ccand1->base)) + == TYPE_PRECISION (TREE_TYPE (ccand2->base)); +} + Multi-line expressions should be wrapped in parentheses so that emacs/indent can format them automatically. Two sets of parens are needed for this. Operators should then line up appropriately. Bernd
Re: [hsa 9/12] Small alloc-pool fix
On Tue, Nov 10, 2015 at 9:47 AM, Martin Liška wrote: > On 11/06/2015 10:57 AM, Richard Biener wrote: >> On Fri, 6 Nov 2015, Martin Liška wrote: >> >>> On 11/06/2015 10:00 AM, Richard Biener wrote: On Thu, 5 Nov 2015, Martin Jambor wrote: > Hi, > > we use C++ new operators based on alloc-pools a lot in the subsequent > patches and realized that on the current trunk, such new operators > would needlessly call the placement ::new operator within the allocate > method of pool-alloc. Fixed below by providing a new allocation > method which does not call placement new, which is only safe to use > from within a new operator. > > The patch also fixes the slightly weird two parameter operator new > (which we do not use in HSA backend) so that it does not do the same. >>> >>> Hi. >>> Why do you need to add the pointer variant then? >>> >>> You are right, we originally used the variant in the branch, but it was >>> eventually >>> left. >>> Also isn't the issue with allocate() that it does return ::new (m_allocator.allocate ()) T (); which 1) value-initializes and 2) doesn't even work with types like struct T { T(int); }; thus types without a default constructor. >>> >>> You are right, it produces compilation error. >>> I think the allocator was poorly C++-ified without updating the specification for the cases it is supposed to handle. And now we have C++ uses that are not working because the allocator is broken. An incrementally better version (w/o fixing the issue with types w/o default constructor) is return ::new (m_allocator.allocate ()) T; >>> >>> I've tried that, and it also calls default ctor: >>> >>> ../../gcc/alloc-pool.h: In instantiation of ‘T* >>> object_allocator::allocate() [with T = et_occ]’: >>> ../../gcc/alloc-pool.h:531:22: required from ‘void* operator new(size_t, >>> object_allocator&) [with T = et_occ; size_t = long unsigned int]’ >>> ../../gcc/et-forest.c:449:46: required from here >>> ../../gcc/et-forest.c:58:3: error: ‘et_occ::et_occ()’ is private >>>et_occ (); >>>^ >>> In file included from ../../gcc/et-forest.c:28:0: >>> ../../gcc/alloc-pool.h:483:44: error: within this context >>> return ::new (m_allocator.allocate ()) T; >> >> Yes, but it does slightly cheaper initialization of PODs >> >>> thus default-initialize which does no initialization for PODs (without array members...) which is what the old pool allocator did. >>> >>> I'm not so familiar with differences related to PODs. >>> To fix the new operator (how do you even call that? does it allow specifying constructor args and thus work without a default constructor?) it should indeed use an allocation method not performing the placement new. But I'd call it allocate_raw rather than vallocate. >>> >>> For situations where do not have a default ctor, one should you the >>> helper method defined at the end of alloc-pool.h: >>> >>> template >>> inline void * >>> operator new (size_t, object_allocator &a) >>> { >>> return a.allocate (); >>> } >>> >>> For instance: >>> et_occ *nw = new (et_occurrences) et_occ (2); >> >> Oh, so it uses placement new syntax... works for me. >> >>> or as used in the HSA branch: >>> >>> /* New operator to allocate convert instruction from pool alloc. */ >>> >>> void * >>> hsa_insn_cvt::operator new (size_t) >>> { >>> return hsa_allocp_inst_cvt->allocate_raw (); >>> } >>> >>> and >>> >>> cvtinsn = new hsa_insn_cvt (reg, *ptmp2); >>> >>> >>> I attached patch where I rename the method as suggested. >> >> Ok. > > Hi. > > I'm sending suggested patch that survives regression tests and bootstrap > on x86_64-linux-gnu. > > Can I install the patch to trunk? Ok. Thanks, Richard. > Thanks, > Martin > >> >> Thanks, >> Richard. >> >>> Thanks, >>> Martin >>> Thanks. Richard. > Thanks, > > Martin > > > 2015-11-05 Martin Liska >Martin Jambor > >* alloc-pool.h (object_allocator::vallocate): New method. >(operator new): Call vallocate instead of allocate. >(operator new): New operator. > > > diff --git a/gcc/alloc-pool.h b/gcc/alloc-pool.h > index 0dc05cd..46b6550 100644 > --- a/gcc/alloc-pool.h > +++ b/gcc/alloc-pool.h > @@ -483,6 +483,12 @@ public: > return ::new (m_allocator.allocate ()) T (); >} > > + inline void * > + vallocate () ATTRIBUTE_MALLOC > + { > +return m_allocator.allocate (); > + } > + >inline void >remove (T *object) >{ > @@ -523,12 +529,19 @@ struct alloc_pool_descriptor > }; > > /* Helper for classes that do not provide default ctor. */ > - > template > inline void * > operator new (size_t, object_allocator &a) > { > - return a.al
[PATCH] vect_slp_analyze_node_dependences TLC
Some TLC also preparing for further enhancements. Bootstrapped and tested on x86_64-unknown-linux-gnu, applied. Richard. 2015-11-10 Richard Biener * tree-vect-data-refs.c (vect_slp_analyze_node_dependences): Handle memory using/clobbering stmts without a STMT_VINFO_DATA_REF conservatively. Index: gcc/tree-vect-data-refs.c === --- gcc/tree-vect-data-refs.c (revision 230020) +++ gcc/tree-vect-data-refs.c (working copy) @@ -573,21 +595,22 @@ vect_slp_analyze_node_dependences (slp_i gimple *access = SLP_TREE_SCALAR_STMTS (node)[k]; if (access == last_access) continue; - stmt_vec_info access_stmt_info = vinfo_for_stmt (access); - gimple_stmt_iterator gsi = gsi_for_stmt (access); - gsi_next (&gsi); - for (; gsi_stmt (gsi) != last_access; gsi_next (&gsi)) + data_reference *dr_a = STMT_VINFO_DATA_REF (vinfo_for_stmt (access)); + for (gimple_stmt_iterator gsi = gsi_for_stmt (access); + gsi_stmt (gsi) != last_access; gsi_next (&gsi)) { gimple *stmt = gsi_stmt (gsi); - stmt_vec_info stmt_info = vinfo_for_stmt (stmt); - if (!STMT_VINFO_DATA_REF (stmt_info) - || (DR_IS_READ (STMT_VINFO_DATA_REF (stmt_info)) - && DR_IS_READ (STMT_VINFO_DATA_REF (access_stmt_info + if (! gimple_vuse (stmt) + || (DR_IS_READ (dr_a) && ! gimple_vdef (stmt))) continue; - ddr_p ddr = initialize_data_dependence_relation - (STMT_VINFO_DATA_REF (access_stmt_info), - STMT_VINFO_DATA_REF (stmt_info), vNULL); + /* If we couldn't record a (single) data reference for this +stmt we have to give up. */ + data_reference *dr_b = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt)); + if (!dr_b) + return false; + + ddr_p ddr = initialize_data_dependence_relation (dr_a, dr_b, vNULL); if (vect_slp_analyze_data_ref_dependence (ddr)) { /* ??? If the dependence analysis failed we can resort to the
[Patch GCC 5/Vect] Partial backport of r228751 (pr68238)
Hi, As requested in the PR, this patch is a partial backport of r228751. I can't claim any responsibility for it, but I did take it through the paces on an aarch64-none-linux-gnu and x86_64-none-linux-gnu bootstrap/ test run and found no issues. Applied as r230092 on gcc-5-branch (pre-approved in the PR) after checking that it gives the right results for the code I derived the PR from. I'll start a test cycle for a 4.9 backport. Thanks, James --- 2015-11-09 James Greenhalgh Partial backport from trunk r228751. PR tree-optimization/68238 2015-10-13 Richard Biener * tree-vect-loop.c (vect_estimate_min_profitable_iters): Use LOOP_VINFO_COMP_ALIAS_DDRS to estimate alias versioning cost. diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c index 88ef251..05515b5 100644 --- a/gcc/tree-vect-loop.c +++ b/gcc/tree-vect-loop.c @@ -2825,7 +2825,7 @@ vect_estimate_min_profitable_iters (loop_vec_info loop_vinfo, if (LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo)) { /* FIXME: Make cost depend on complexity of individual check. */ - unsigned len = LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo).length (); + unsigned len = LOOP_VINFO_COMP_ALIAS_DDRS (loop_vinfo).length (); (void) add_stmt_cost (target_cost_data, len, vector_stmt, NULL, 0, vect_prologue); dump_printf (MSG_NOTE,
[PATCH] Fix PR68240
Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk. Richard. 2015-11-10 Richard Biener PR tree-optimization/68240 * tree-ssa-sccvn.c (cond_stmts_equal_p): Handle commutative compares properly. (visit_phi): For PHIs with just a single executable edge take its value directly. (expressions_equal_p): Handle VN_TOP properly. * gcc.dg/torture/pr68240.c: New testcase. Index: gcc/testsuite/gcc.dg/torture/pr68240.c === *** gcc/testsuite/gcc.dg/torture/pr68240.c (revision 0) --- gcc/testsuite/gcc.dg/torture/pr68240.c (working copy) *** *** 0 --- 1,12 + /* { dg-do compile } */ + + int a, b, f; + + void + fn1 () + { + int c = 1, d, e = 1; + a = 1; + for (; f;) + b = (c && (d = (e && a))); + } Index: gcc/tree-ssa-sccvn.c === *** gcc/tree-ssa-sccvn.c(revision 230020) --- gcc/tree-ssa-sccvn.c(working copy) *** cond_stmts_equal_p (gcond *cond1, gcond *** 2760,2770 else return false; ! if (! expressions_equal_p (vn_valueize (lhs1), vn_valueize (lhs2)) ! || ! expressions_equal_p (vn_valueize (rhs1), vn_valueize (rhs2))) ! return false; ! ! return true; } /* Compare two phi entries for equality, ignoring VN_TOP arguments. */ --- 2806,2820 else return false; ! lhs1 = vn_valueize (lhs1); ! rhs1 = vn_valueize (rhs1); ! lhs2 = vn_valueize (lhs2); ! rhs2 = vn_valueize (rhs2); ! return ((expressions_equal_p (lhs1, lhs2) ! && expressions_equal_p (rhs1, rhs2)) ! || (commutative_tree_code (code1) ! && expressions_equal_p (lhs1, rhs2) ! && expressions_equal_p (rhs1, lhs2))); } /* Compare two phi entries for equality, ignoring VN_TOP arguments. */ *** visit_phi (gimple *phi) *** 3379,3384 --- 3428,3434 tree result; tree sameval = VN_TOP; bool allsame = true; + unsigned n_executable = 0; /* TODO: We could check for this in init_sccvn, and replace this with a gcc_assert. */ *** visit_phi (gimple *phi) *** 3394,3399 --- 3444,3450 { tree def = PHI_ARG_DEF_FROM_EDGE (phi, e); + ++n_executable; if (TREE_CODE (def) == SSA_NAME) def = SSA_VAL (def); if (def == VN_TOP) *** visit_phi (gimple *phi) *** 3408,3416 } /* If none of the edges was executable or all incoming values are ! undefined keep the value-number at VN_TOP. */ ! if (sameval == VN_TOP) ! return set_ssa_val_to (PHI_RESULT (phi), VN_TOP); /* First see if it is equivalent to a phi node in this block. We prefer this as it allows IV elimination - see PRs 66502 and 67167. */ --- 3459,3469 } /* If none of the edges was executable or all incoming values are ! undefined keep the value-number at VN_TOP. If only a single edge ! is exectuable use its value. */ ! if (sameval == VN_TOP ! || n_executable == 1) ! return set_ssa_val_to (PHI_RESULT (phi), sameval); /* First see if it is equivalent to a phi node in this block. We prefer this as it allows IV elimination - see PRs 66502 and 67167. */ *** expressions_equal_p (tree e1, tree e2) *** 4610,4615 --- 4663,4672 if (e1 == e2) return true; + /* If either one is VN_TOP consider them equal. */ + if (e1 == VN_TOP || e2 == VN_TOP) + return true; + /* If only one of them is null, they cannot be equal. */ if (!e1 || !e2) return false;
Re: RFC: Incomplete Draft Patches to Correct Errors in Loop Unrolling Frequencies (bugzilla problem 68212)
On 11/10/2015 10:56 AM, Bernhard Reutner-Fischer wrote: On November 9, 2015 6:35:20 PM GMT+01:00, Bernd Schmidt wrote: I think something that starts with bb->loop_father and iterates outwards would be more efficient. flow_bb_inside_loop_p() ? Ah thanks. I knew there must be one but I couldn't find it. Bernd
Re: [PATCH PR52272]Be smart when adding iv candidates
On Tue, Nov 10, 2015 at 6:06 PM, Bernd Schmidt wrote: > On 11/10/2015 09:25 AM, Bin.Cheng wrote: >>> >>> Thanks for reviewing. I haven't committed it yet, could you please >>> point out which quoted piece is so that I can update patch? > > > Sorry, I thought it was pretty obvious... > >> +{ >> + return ccand1->hash == ccand2->hash >> +&& operand_equal_p (ccand1->base, ccand2->base, 0) >> +&& operand_equal_p (ccand1->step, ccand2->step, 0) >> +&& TYPE_PRECISION (TREE_TYPE (ccand1->base)) >> + == TYPE_PRECISION (TREE_TYPE (ccand2->base)); >> +} >> + > > > Multi-line expressions should be wrapped in parentheses so that emacs/indent > can format them automatically. Two sets of parens are needed for this. > Operators should then line up appropriately. Ah, thanks for teaching. Here is the updated patch, hoping it's correct. Thanks, bin > > > Bernd diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c index 1f952a7..a00e33c 100644 --- a/gcc/tree-ssa-loop-ivopts.c +++ b/gcc/tree-ssa-loop-ivopts.c @@ -247,6 +247,45 @@ struct iv_cand smaller type. */ }; +/* Hashtable entry for common candidate derived from iv uses. */ +struct iv_common_cand +{ + tree base; + tree step; + /* IV uses from which this common candidate is derived. */ + vec uses; + hashval_t hash; +}; + +/* Hashtable helpers. */ + +struct iv_common_cand_hasher : free_ptr_hash +{ + static inline hashval_t hash (const iv_common_cand *); + static inline bool equal (const iv_common_cand *, const iv_common_cand *); +}; + +/* Hash function for possible common candidates. */ + +inline hashval_t +iv_common_cand_hasher::hash (const iv_common_cand *ccand) +{ + return ccand->hash; +} + +/* Hash table equality function for common candidates. */ + +inline bool +iv_common_cand_hasher::equal (const iv_common_cand *ccand1, + const iv_common_cand *ccand2) +{ + return (ccand1->hash == ccand2->hash + && operand_equal_p (ccand1->base, ccand2->base, 0) + && operand_equal_p (ccand1->step, ccand2->step, 0) + && (TYPE_PRECISION (TREE_TYPE (ccand1->base)) + == TYPE_PRECISION (TREE_TYPE (ccand2->base; +} + /* Loop invariant expression hashtable entry. */ struct iv_inv_expr_ent { @@ -255,8 +294,6 @@ struct iv_inv_expr_ent hashval_t hash; }; -/* The data used by the induction variable optimizations. */ - /* Hashtable helpers. */ struct iv_inv_expr_hasher : free_ptr_hash @@ -323,6 +360,12 @@ struct ivopts_data /* Cache used by tree_to_aff_combination_expand. */ hash_map *name_expansion_cache; + /* The hashtable of common candidates derived from iv uses. */ + hash_table *iv_common_cand_tab; + + /* The common candidates. */ + vec iv_common_cands; + /* The maximum invariant id. */ unsigned max_inv_id; @@ -894,6 +937,8 @@ tree_ssa_iv_optimize_init (struct ivopts_data *data) data->inv_expr_tab = new hash_table (10); data->inv_expr_id = 0; data->name_expansion_cache = NULL; + data->iv_common_cand_tab = new hash_table (10); + data->iv_common_cands.create (20); decl_rtl_to_reset.create (20); gcc_obstack_init (&data->iv_obstack); } @@ -3051,6 +3096,96 @@ add_iv_candidate_for_bivs (struct ivopts_data *data) } } +/* Record common candidate {BASE, STEP} derived from USE in hashtable. */ + +static void +record_common_cand (struct ivopts_data *data, tree base, + tree step, struct iv_use *use) +{ + struct iv_common_cand ent; + struct iv_common_cand **slot; + + gcc_assert (use != NULL); + + ent.base = base; + ent.step = step; + ent.hash = iterative_hash_expr (base, 0); + ent.hash = iterative_hash_expr (step, ent.hash); + + slot = data->iv_common_cand_tab->find_slot (&ent, INSERT); + if (*slot == NULL) +{ + *slot = XNEW (struct iv_common_cand); + (*slot)->base = base; + (*slot)->step = step; + (*slot)->uses.create (8); + (*slot)->hash = ent.hash; + data->iv_common_cands.safe_push ((*slot)); +} + (*slot)->uses.safe_push (use); + return; +} + +/* Comparison function used to sort common candidates. */ + +static int +common_cand_cmp (const void *p1, const void *p2) +{ + unsigned n1, n2; + const struct iv_common_cand *const *const ccand1 += (const struct iv_common_cand *const *)p1; + const struct iv_common_cand *const *const ccand2 += (const struct iv_common_cand *const *)p2; + + n1 = (*ccand1)->uses.length (); + n2 = (*ccand2)->uses.length (); + return n2 - n1; +} + +/* Adds IV candidates based on common candidated recorded. */ + +static void +add_iv_candidate_derived_from_uses (struct ivopts_data *data) +{ + unsigned i, j; + struct iv_cand *cand_1, *cand_2; + + data->iv_common_cands.qsort (common_cand_cmp); + for (i = 0; i < data->iv_common_cands.length (); i++) +{ + struct iv_common_cand *ptr = data->iv_common_cands[i]; + + /* Only add IV candidate if it's derived from
Re: [PATCH 1/6] Use IFN_SQRT in tree-vect-patterns.c
On Mon, Nov 9, 2015 at 5:21 PM, Richard Sandiford wrote: > In practice all targets that can vectorise sqrt define the appropriate > sqrt2 optab. The only case where this isn't immediately obvious > is the libmass support in rs6000.c, but Mike Meissner said that it shouldn't > be exercised for sqrt. > > This patch therefore uses the internal function interface instead of > going via the target hook. > > > gcc/ > * tree-vect-patterns.c: Include internal-fn.h. > (vect_recog_pow_pattern): Use IFN_SQRT instead of BUILT_IN_SQRT*. > > diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c > index bab9a4f..a803e8c 100644 > --- a/gcc/tree-vect-patterns.c > +++ b/gcc/tree-vect-patterns.c > @@ -39,6 +39,7 @@ along with GCC; see the file COPYING3. If not see > #include "tree-vectorizer.h" > #include "dumpfile.h" > #include "builtins.h" > +#include "internal-fn.h" > #include "case-cfn-macros.h" > > /* Pattern recognition functions */ > @@ -1052,18 +1053,13 @@ vect_recog_pow_pattern (vec *stmts, tree > *type_in, >if (TREE_CODE (exp) == REAL_CST >&& real_equal (&TREE_REAL_CST (exp), &dconsthalf)) > { > - tree newfn = mathfn_built_in (TREE_TYPE (base), BUILT_IN_SQRT); >*type_in = get_vectype_for_scalar_type (TREE_TYPE (base)); > - if (*type_in) > + if (*type_in && direct_internal_fn_supported_p (IFN_SQRT, *type_in)) > { > - gcall *stmt = gimple_build_call (newfn, 1, base); > - if (vectorizable_function (stmt, *type_in, *type_in) > - != NULL_TREE) > - { > - var = vect_recog_temp_ssa_var (TREE_TYPE (base), stmt); > - gimple_call_set_lhs (stmt, var); > - return stmt; > - } > + gcall *stmt = gimple_build_call_internal (IFN_SQRT, 1, base); > + var = vect_recog_temp_ssa_var (TREE_TYPE (base), stmt); > + gimple_call_set_lhs (stmt, var); > + return stmt; Looks ok but I wonder if this is dead code with (for pows (POW) sqrts (SQRT) cbrts (CBRT) (simplify (pows @0 REAL_CST@1) (with { const REAL_VALUE_TYPE *value = TREE_REAL_CST_PTR (@1); REAL_VALUE_TYPE tmp; } (switch ... /* pow(x,0.5) -> sqrt(x). */ (if (flag_unsafe_math_optimizations && canonicalize_math_p () && real_equal (value, &dconsthalf)) (sqrts @0)) also wondering here about canonicalize_math_p (), I'd expected the reverse transform as canonicalization. Also wondering about flag_unsafe_math_optimizations (missing from the vectorizer pattern). Anyway, patch is ok. Thanks, Richard. > } > } > >
Re: [mask conversion, patch 2/2, i386] Add pack/unpack patterns for scalar masks
On 19 Oct 15:30, Ilya Enkovich wrote: > Hi, > > This patch adds patterns to be used for vector masks pack/unpack for AVX512. > Bootstrapped and tested on x86_64-unknown-linux-gnu. Does it look OK? > > Thanks, > Ilya Here is a modified version which reflects changes in boolean type sign. Only pattern names were changed. Bootstrapped and tested on x86_64-unknown-linux-gnu. Does it look OK? Thanks, Ilya -- gcc/ 2015-11-10 Ilya Enkovich * config/i386/sse.md (HALFMASKMODE): New attribute. (DOUBLEMASKMODE): New attribute. (vec_pack_trunc_qi): New. (vec_pack_trunc_): New. (vec_unpacks_lo_hi): New. (vec_unpacks_lo_si): New. (vec_unpacks_lo_di): New. (vec_unpacks_hi_hi): New. (vec_unpacks_hi_): New. gcc/testsuite/ 2015-11-10 Ilya Enkovich * gcc.target/i386/mask-pack.c: New test. * gcc.target/i386/mask-unpack.c: New test. diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 452629f..aad6a0d 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -799,6 +799,14 @@ [(V32QI "t") (V16HI "t") (V8SI "t") (V4DI "t") (V8SF "t") (V4DF "t") (V64QI "g") (V32HI "g") (V16SI "g") (V8DI "g") (V16SF "g") (V8DF "g")]) +;; Half mask mode for unpacks +(define_mode_attr HALFMASKMODE + [(DI "SI") (SI "HI")]) + +;; Double mask mode for packs +(define_mode_attr DOUBLEMASKMODE + [(HI "SI") (SI "DI")]) + ;; Include define_subst patterns for instructions with mask (include "subst.md") @@ -11578,6 +11586,23 @@ DONE; }) +(define_expand "vec_pack_trunc_qi" + [(set (match_operand:HI 0 ("register_operand")) +(ior:HI (ashift:HI (zero_extend:HI (match_operand:QI 1 ("register_operand"))) + (const_int 8)) +(zero_extend:HI (match_operand:QI 2 ("register_operand")] + "TARGET_AVX512F") + +(define_expand "vec_pack_trunc_" + [(set (match_operand: 0 ("register_operand")) +(ior: (ashift: (zero_extend: (match_operand:SWI24 1 ("register_operand"))) + (match_dup 3)) +(zero_extend: (match_operand:SWI24 2 ("register_operand")] + "TARGET_AVX512BW" +{ + operands[3] = GEN_INT (GET_MODE_BITSIZE (mode)); +}) + (define_insn "_packsswb" [(set (match_operand:VI1_AVX512 0 "register_operand" "=x,x") (vec_concat:VI1_AVX512 @@ -13474,12 +13499,42 @@ "TARGET_SSE2" "ix86_expand_sse_unpack (operands[0], operands[1], true, false); DONE;") +(define_expand "vec_unpacks_lo_hi" + [(set (match_operand:QI 0 "register_operand") +(subreg:QI (match_operand:HI 1 "register_operand") 0))] + "TARGET_AVX512DQ") + +(define_expand "vec_unpacks_lo_si" + [(set (match_operand:HI 0 "register_operand") +(subreg:HI (match_operand:SI 1 "register_operand") 0))] + "TARGET_AVX512F") + +(define_expand "vec_unpacks_lo_di" + [(set (match_operand:SI 0 "register_operand") +(subreg:SI (match_operand:DI 1 "register_operand") 0))] + "TARGET_AVX512BW") + (define_expand "vec_unpacku_hi_" [(match_operand: 0 "register_operand") (match_operand:VI124_AVX2_24_AVX512F_1_AVX512BW 1 "register_operand")] "TARGET_SSE2" "ix86_expand_sse_unpack (operands[0], operands[1], true, true); DONE;") +(define_expand "vec_unpacks_hi_hi" + [(set (subreg:HI (match_operand:QI 0 "register_operand") 0) +(lshiftrt:HI (match_operand:HI 1 "register_operand") + (const_int 8)))] + "TARGET_AVX512F") + +(define_expand "vec_unpacks_hi_" + [(set (subreg:SWI48x (match_operand: 0 "register_operand") 0) +(lshiftrt:SWI48x (match_operand:SWI48x 1 "register_operand") + (match_dup 2)))] + "TARGET_AVX512BW" +{ + operands[2] = GEN_INT (GET_MODE_BITSIZE (mode)); +}) + ; ;; ;; Miscellaneous diff --git a/gcc/testsuite/gcc.target/i386/mask-pack.c b/gcc/testsuite/gcc.target/i386/mask-pack.c new file mode 100644 index 000..0b564ef --- /dev/null +++ b/gcc/testsuite/gcc.target/i386/mask-pack.c @@ -0,0 +1,100 @@ +/* { dg-do compile } */ +/* { dg-options "-mavx512bw -O3 -fopenmp-simd -fdump-tree-vect-details" } */ +/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 10 "vect" } } */ +/* { dg-final { scan-assembler-not "maskmov" } } */ + +#define LENGTH 1000 + +long l1[LENGTH], l2[LENGTH]; +int i1[LENGTH], i2[LENGTH]; +short s1[LENGTH], s2[LENGTH]; +char c1[LENGTH], c2[LENGTH]; +double d1[LENGTH], d2[LENGTH]; + +int test1 (int n) +{ + int i; + #pragma omp simd safelen(16) + for (i = 0; i < LENGTH; i++) +if (l1[i] > l2[i]) + i1[i] = 1; +} + +int test2 (int n) +{ + int i; + #pragma omp simd safelen(16) + for (i = 0; i < LENGTH; i++) +if (i1[i] > i2[i]) + s1[i] = 1; +} + +int test3 (int n) +{ + int i; + #pragma omp simd safelen(16) + for (i = 0; i < LENGTH; i++) +if (s1[i] > s2[i]) + c1[i] = 1; +} + +int test4 (int n) +{ + int i; + #pragma omp simd saf
Re: [PATCH 2/6] Make builtin_vectorized_function take a combined_fn
On Mon, Nov 9, 2015 at 5:25 PM, Richard Sandiford wrote: > This patch replaces the fndecl argument to builtin_vectorized_function > with a combined_fn and gets the vectoriser to call it for internal > functions too. The patch also moves vectorisation of machine-specific > built-ins to a new hook, builtin_md_vectorized_function. > > I've attached a -b version too since that's easier to read. @@ -42095,8 +42018,7 @@ ix86_builtin_vectorized_function (tree fndecl, tree type_out, /* Dispatch to a handler for a vectorization library. */ if (ix86_veclib_handler) -return ix86_veclib_handler ((enum built_in_function) fn, type_out, - type_in); +return ix86_veclib_handler (combined_fn (fn), type_out, type_in); return NULL_TREE; } fn is already a combined_fn? Why does the builtin_vectorized_function not take one but an unsigned int? @@ -42176,11 +42077,12 @@ ix86_veclibabi_svml (enum built_in_function fn, tree type_out, tree type_in) return NULL_TREE; } - bname = IDENTIFIER_POINTER (DECL_NAME (builtin_decl_implicit (fn))); + tree fndecl = mathfn_built_in (TREE_TYPE (type_in), fn); + bname = IDENTIFIER_POINTER (DECL_NAME (fndecl)); - if (fn == BUILT_IN_LOGF) + if (DECL_FUNCTION_CODE (fndecl) == BUILT_IN_LOGF) with 'fn' now a combined_fn how is this going to work with IFNs? @@ -42194,9 +42096,7 @@ ix86_veclibabi_svml (enum built_in_function fn, tree type_out, tree type_in) name[4] &= ~0x20; arity = 0; - for (args = DECL_ARGUMENTS (builtin_decl_implicit (fn)); - args; - args = TREE_CHAIN (args)) + for (args = DECL_ARGUMENTS (fndecl); args; args = TREE_CHAIN (args)) arity++; or this? Did you try this out? We have only two basic testcases for all this code using sin() which may not end up as IFN even with -ffast-math(?). +/* Implement TARGET_VECTORIZE_BUILTIN_MD_VECTORIZED_FUNCTION. */ + +static tree +rs6000_builtin_md_vectorized_function (tree fndecl, tree type_out, + tree type_in) +{ any reason you are using a fndecl for this hook instead of the function code? @@ -1639,20 +1639,20 @@ vect_finish_stmt_generation (gimple *stmt, gimple *vec_stmt, tree vectorizable_function (gcall *call, tree vectype_out, tree vectype_in) { - tree fndecl = gimple_call_fndecl (call); - - /* We only handle functions that do not read or clobber memory -- i.e. - const or novops ones. */ - if (!(gimple_call_flags (call) & (ECF_CONST | ECF_NOVOPS))) + /* We only handle functions that do not read or clobber memory. */ + if (gimple_vuse (call)) return NULL_TREE; - if (!fndecl - || TREE_CODE (fndecl) != FUNCTION_DECL - || !DECL_BUILT_IN (fndecl)) -return NULL_TREE; + combined_fn fn = gimple_call_combined_fn (call); + if (fn != CFN_LAST) +return targetm.vectorize.builtin_vectorized_function + (fn, vectype_out, vectype_in); - return targetm.vectorize.builtin_vectorized_function (fndecl, vectype_out, - vectype_in); + if (gimple_call_builtin_p (call, BUILT_IN_MD)) +return targetm.vectorize.builtin_md_vectorized_function + (gimple_call_fndecl (call), vectype_out, vectype_in); + + return NULL_TREE; Looking at this and the issues above wouldn't it be easier to simply pass the call stmt to the hook (which then can again handle both normal and target builtins)? And it has context available (actual arguments and number of arguments for IFN calls). Richard. > > gcc/ > * target.def (builtin_vectorized_function): Take a combined_fn (in > the form of an unsigned int) rather than a function decl. > (builtin_md_vectorized_function): New. > * targhooks.h (default_builtin_vectorized_function): Replace the > fndecl argument with an unsigned int. > (default_builtin_md_vectorized_function): Declare. > * targhooks.c (default_builtin_vectorized_function): Replace the > fndecl argument with an unsigned int. > (default_builtin_md_vectorized_function): New function. > * doc/tm.texi.in (TARGET_VECTORIZE_BUILTIN_MD_VECTORIZED_FUNCTION): > New hook. > * doc/tm.texi: Regenerate. > * tree-vect-stmts.c (vectorizable_function): Update call to > builtin_vectorized_function, also passing internal functions. > Call builtin_md_vectorized_function for target-specific builtins. > * config/aarch64/aarch64-protos.h > (aarch64_builtin_vectorized_function): Replace fndecl argument > with an unsigned int. > * config/aarch64/aarch64-builtins.c: Include case-cfn-macros.h. > (aarch64_builtin_vectorized_function): Update after above changes. > Use CASE_CFN_*. > * config/arm/arm-protos.h (arm_builtin_vectorized_function): Replace > fndecl argument with an unsigned int. > * config/arm/arm-builtins.c: Include case-cfn-macros.h > (arm_builtin_v
Re: [gomp4 06/14] omp-low: copy omp_data_o to shared memory on NVPTX
On Tue, Nov 03, 2015 at 05:25:53PM +0300, Alexander Monakov wrote: > Here's an alternative patch that does not depend on exposure of shared-memory > address space, and does not try to use pass_late_lower_omp. It's based on > Bernd's suggestion to transform FYI, I've committed a new testcase to gomp-4_5-branch that covers various target data sharing/team sharing/privatization parallel sharing/privatization offloading cases. 2015-11-10 Jakub Jelinek * testsuite/libgomp.c/target-31.c: New test. --- libgomp/testsuite/libgomp.c/target-31.c.jj 2015-11-09 19:05:50.439644694 +0100 +++ libgomp/testsuite/libgomp.c/target-31.c 2015-11-10 11:12:12.930286760 +0100 @@ -0,0 +1,163 @@ +#include +#include + +int a = 1, b = 2, c = 3, d = 4; +int e[2] = { 5, 6 }, f[2] = { 7, 8 }, g[2] = { 9, 10 }, h[2] = { 11, 12 }; + +__attribute__((noinline, noclone)) void +use (int *k, int *l, int *m, int *n, int *o, int *p, int *q, int *r) +{ + asm volatile ("" : : "r" (k) : "memory"); + asm volatile ("" : : "r" (l) : "memory"); + asm volatile ("" : : "r" (m) : "memory"); + asm volatile ("" : : "r" (n) : "memory"); + asm volatile ("" : : "r" (o) : "memory"); + asm volatile ("" : : "r" (p) : "memory"); + asm volatile ("" : : "r" (q) : "memory"); + asm volatile ("" : : "r" (r) : "memory"); +} + +#pragma omp declare target to (use) + +int +main () +{ + int err = 0, r = -1, t[4]; + long s[4] = { -1, -2, -3, -4 }; + int j = 13, k = 14, l[2] = { 15, 16 }, m[2] = { 17, 18 }; + #pragma omp target private (a, b, e, f) firstprivate (c, d, g, h) map(from: r, s, t) \ +map(tofrom: err, j, l) map(to: k, m) + #pragma omp teams num_teams (4) thread_limit (8) private (b, f) firstprivate (d, h, k, m) + { +int u1 = k, u2[2] = { m[0], m[1] }; +int u3[64]; +int i; +for (i = 0; i < 64; i++) + u3[i] = k + i; +#pragma omp parallel num_threads (1) +{ + if (c != 3 || d != 4 || g[0] != 9 || g[1] != 10 || h[0] != 11 || h[1] != 12 || k != 14 || m[0] != 17 || m[1] != 18) + #pragma omp atomic write + err = 1; + b = omp_get_team_num (); + if (b >= 4) + #pragma omp atomic write + err = 1; + if (b == 0) + { + a = omp_get_num_teams (); + e[0] = 2 * a; + e[1] = 3 * a; + } + f[0] = 2 * b; + f[1] = 3 * b; + #pragma omp atomic update + c++; + #pragma omp atomic update + g[0] += 2; + #pragma omp atomic update + g[1] += 3; + d++; + h[0] += 2; + h[1] += 3; + k += b; + m[0] += 2 * b; + m[1] += 3 * b; +} +use (&a, &b, &c, &d, e, f, g, h); +#pragma omp parallel firstprivate (u1, u2) +{ + int w = omp_get_thread_num (); + int x = 19; + int y[2] = { 20, 21 }; + int v = 24; + int ll[64]; + if (u1 != 14 || u2[0] != 17 || u2[1] != 18) + #pragma omp atomic write + err = 1; + u1 += w; + u2[0] += 2 * w; + u2[1] += 3 * w; + use (&u1, u2, &t[b], l, &k, m, &j, h); + #pragma omp master + t[b] = omp_get_num_threads (); + #pragma omp atomic update + j++; + #pragma omp atomic update + l[0] += 2; + #pragma omp atomic update + l[1] += 3; + #pragma omp atomic update + k += 4; + #pragma omp atomic update + m[0] += 5; + #pragma omp atomic update + m[1] += 6; + x += w; + y[0] += 2 * w; + y[1] += 3 * w; + #pragma omp simd safelen(32) private (v) + for (i = 0; i < 64; i++) + { + v = 3 * i; + ll[i] = u1 + v * u2[0] + u2[1] + x + y[0] + y[1] + v + h[0] + u3[i]; + } + #pragma omp barrier + use (&u1, u2, &t[b], l, &k, m, &x, y); + if (w < 0 || w > 8 || w != omp_get_thread_num () || u1 != 14 + w + || u2[0] != 17 + 2 * w || u2[1] != 18 + 3 * w + || x != 19 + w || y[0] != 20 + 2 * w || y[1] != 21 + 3 * w + || v != 24) + #pragma omp atomic write + err = 1; + for (i = 0; i < 64; i++) + if (ll[i] != u1 + 3 * i * u2[0] + u2[1] + x + y[0] + y[1] + 3 * i + 13 + 14 + i) + #pragma omp atomic write + err = 1; +} +#pragma omp parallel num_threads (1) +{ + if (b == 0) + { + r = a; + if (a != omp_get_num_teams () + || e[0] != 2 * a + || e[1] != 3 * a) + #pragma omp atomic write + err = 1; + } + int v1, v2, v3; + #pragma omp atomic read + v1 = c; + #pragma omp atomic read + v2 = g[0]; + #pragma omp atomic read + v3 = g[1]; + s[b] = v1 * 65536L + v2 * 256L + v3; + if (d != 5 || h[0] != 13 || h[1] != 15 + || k != 14 + b + 4 * t[b] + || m[0] != 17 + 2 * b + 5 * t[b] + || m[1] != 18 + 3 * b + 6 * t[b] + || b != omp_get_team_num () + || f[0] != 2 * b || f[1] != 3 * b) + #pragma omp atomic write + err = 1; +
Re: Add internal math functions
Thanks for the review. Jeff Law writes: > On 11/07/2015 05:30 AM, Richard Sandiford wrote: >> This patch adds internal functions for simple FLT_FN built-in functions, >> in cases where an associated optab already exists. Unlike some of the >> built-in functions, these internal functions never set errno. >> >> LDEXP is an odd-one out in that its second operand is an integer. >> All the others operate on uniform types. >> >> The patch also adds a function to query the internal function associated >> with a built-in function (if any), and another to test whether a given >> gcall could be replaced by a call to an internal function on the current >> target (as long as the caller deals with errno appropriately). >> >> Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi. >> OK to install? >> >> Thanks, >> Richard >> >> >> gcc/ >> * builtins.h (associated_internal_fn): Declare. >> (replacement_internal_fn): Likewise. >> * builtins.c: Include internal-fn.h >> (associated_internal_fn, replacement_internal_fn): New functions. >> * internal-fn.def (DEF_INTERNAL_FLT_FN): New macro. >> (ACOS, ASIN, ATAN, COS, EXP, EXP10, EXP2, EXPM1, LOG, LOG10, LOG1P) >> (LOG2, LOGB, SIGNIFICAND, SIN, SQRT, TAN, CEIL, FLOOR, NEARBYINT) >> (RINT, ROUND, TRUNC, ATAN2, COPYSIGN, FMOD, POW, REMAINDER, SCALB) >> (LDEXP): New functions. >> * internal-fn.c: Include recog.h. >> (unary_direct, binary_direct): New macros. >> (expand_direct_optab_fn): New function. >> (expand_unary_optab_fn): New macro. >> (expand_binary_optab_fn): Likewise. >> (direct_unary_optab_supported_p): Likewise. >> (direct_binary_optab_supported_p): Likewise. >> > > >> diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c >> index 72536da..9f9f9cf 100644 >> --- a/gcc/internal-fn.c >> +++ b/gcc/internal-fn.c >> @@ -38,6 +38,7 @@ along with GCC; see the file COPYING3. If not see >> #include "dojump.h" >> #include "expr.h" >> #include "ubsan.h" >> +#include "recog.h" > recog.h? I don't see anything that would require recognition in this > patch. Is there something in a later patch that needs the recog.h header? It's needed for: + create_output_operand (&ops[0], lhs_rtx, insn_data[icode].operand[0].mode); I did wonder about adding a comment, but I think it's likely to get out of date. I wouldn't be surprised if we add more uses of recog.h stuff in future. Richard
Re: RFC: Incomplete Draft Patches to Correct Errors in Loop Unrolling Frequencies (bugzilla problem 68212)
On Mon, Nov 9, 2015 at 6:35 PM, Bernd Schmidt wrote: > On 11/07/2015 03:44 PM, Kelvin Nilsen wrote: >> >> This is a draft patch to partially address the concerns described in >> bugzilla problem report >> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68212). The patch is >> incomplete in the sense that there are some known shortcomings with >> nested loops which I am still working on. I am sending this out for >> comments at this time because we would like these patches to be >> integrated into the GCC 6 release and want to begin responding to >> community feedback as soon as possible in order to make the integration >> possible. > > > I'll mainly comment on style points right now. Your code generally looks > mostly good but doesn't quite follow our guidelines. In terms of logic I'm > sure there will be followup questions after the first round of points is > addressed. Others might be better qualified to review the frequencies > manipulation; for this first round I'm just assuming that the general idea > is sound (but would appreciate input). > >> 1. Before a loop body is unpeeled into a pre-header location, we >> temporarily adjust the loop body frequencies to represent the values >> appropriate for the context into which the loop body is to be >> copied. >> >> 2. After unrolling the loop body (by replicating the loop body (N-1) >> times within the loop), we recompute all frequencies associated with >> blocks contained within the loop. > > > If these are independent from each other it might be better to split up the > patch. > >> (check_loop_frequency_integrity): new helper routine >> (set_zero_probability): added another parameter >> (duplicate_loop_to_header_edge): Add code to recompute loop >> body frequencies after blocks are replicated (unrolled) into the loop >> body. Introduce certain help routines because existing infrastructure >> routines are not reliable during typical executions of >> duplicate_loop_to_header_edge(). > > > Please review our guidelines how to write ChangeLogs - capitalize, > punctuate, and document only the what, not the why. Also, linewrap manually. > >> opt_info_start_duplication (opt_info); >> + >> ok = duplicate_loop_to_header_edge (loop, loop_latch_edge (loop), > > > Please make sure you don't change whitespace unnecessarily. There are a few > other occurrences in the patch, and also cases where you seem to be adding > spaces to otherwise blank lines. > >> @@ -1015,14 +1041,44 @@ unroll_loop_runtime_iterations (struct loop *loop) >> bitmap_clear_bit (wont_exit, may_exit_copy); >> opt_info_start_duplication (opt_info); >> >> + { >> +/* Recompute the loop body frequencies. */ >> +zero_loop_frequencies (loop); > > > No reason to start a braced block here. > >> +/* Scale the incoming frequencies according to the heuristic that >> + * the loop frequency is the incoming edge frequency divided by >> + * 0.09. This heuristic applies only to loops that iterate over a >> + * run-time value that is not known at compile time. Note that >> + * 1/.09 equals 11.. We'll use integer arithmetic on ten >> + * thousandths, and then divide by 10,000 after we've "rounded". >> + */ > > > Please examine the comment style in gcc - no asterisks to start new lines, > and comment terminators don't go on their own line. > >> +sum_incoming_frequencies *= 11; /* multiply by 11. */ >> +sum_incoming_frequencies += 5000;/* round by adding 0.5 */ >> +sum_incoming_frequencies /= 1; /* convert ten thousandths >> + to ones >> +*/ > > > These comments could also be improved, but really they should just be > removed since they're pretty obvious and redundant with the one before. > >> +/* Define ENABLE_CHECKING to enforce the following run-time checks. > > > "With checking enabled, the following run-time checks are performed:" > >> + * This may report false-positive errors due to round-off errors. > > > That doesn't sound good as it could lead to bootstrap failures when checking > is enabled. > >> @@ -44,6 +55,543 @@ static void fix_loop_placements (struct loop *, bo >> static bool fix_bb_placement (basic_block); >> static void fix_bb_placements (basic_block, bool *, bitmap); >> >> +/* >> + * Return true iff block is considered to reside within the loop >> + * represented by loop_ptr. >> + */ > > > Arguments are capitalized in function comments. > >> +bool >> +in_loop_p (basic_block block, struct loop *loop_ptr) >> +{ >> + basic_block *bbs = get_loop_body (loop_ptr); >> + bool result = false; >> + >> + for (unsigned int i = 0; i < loop_ptr->num_nodes; i++) >> +{ >> + if (bbs[i] == block) >> + result = true; >> +} > > > I think something that starts with bb->loop_father and iterates outwards > would be more efficient. Well, either bb->loop_father == loop_ptr or that with || flow_loop_nested_
Re: [PATCH 1/6] Use IFN_SQRT in tree-vect-patterns.c
Richard Biener writes: > On Mon, Nov 9, 2015 at 5:21 PM, Richard Sandiford > wrote: >> In practice all targets that can vectorise sqrt define the appropriate >> sqrt2 optab. The only case where this isn't immediately obvious >> is the libmass support in rs6000.c, but Mike Meissner said that it shouldn't >> be exercised for sqrt. >> >> This patch therefore uses the internal function interface instead of >> going via the target hook. >> >> >> gcc/ >> * tree-vect-patterns.c: Include internal-fn.h. >> (vect_recog_pow_pattern): Use IFN_SQRT instead of BUILT_IN_SQRT*. >> >> diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c >> index bab9a4f..a803e8c 100644 >> --- a/gcc/tree-vect-patterns.c >> +++ b/gcc/tree-vect-patterns.c >> @@ -39,6 +39,7 @@ along with GCC; see the file COPYING3. If not see >> #include "tree-vectorizer.h" >> #include "dumpfile.h" >> #include "builtins.h" >> +#include "internal-fn.h" >> #include "case-cfn-macros.h" >> >> /* Pattern recognition functions */ >> @@ -1052,18 +1053,13 @@ vect_recog_pow_pattern (vec *stmts, tree >> *type_in, >>if (TREE_CODE (exp) == REAL_CST >>&& real_equal (&TREE_REAL_CST (exp), &dconsthalf)) >> { >> - tree newfn = mathfn_built_in (TREE_TYPE (base), BUILT_IN_SQRT); >>*type_in = get_vectype_for_scalar_type (TREE_TYPE (base)); >> - if (*type_in) >> + if (*type_in && direct_internal_fn_supported_p (IFN_SQRT, *type_in)) >> { >> - gcall *stmt = gimple_build_call (newfn, 1, base); >> - if (vectorizable_function (stmt, *type_in, *type_in) >> - != NULL_TREE) >> - { >> - var = vect_recog_temp_ssa_var (TREE_TYPE (base), stmt); >> - gimple_call_set_lhs (stmt, var); >> - return stmt; >> - } >> + gcall *stmt = gimple_build_call_internal (IFN_SQRT, 1, base); >> + var = vect_recog_temp_ssa_var (TREE_TYPE (base), stmt); >> + gimple_call_set_lhs (stmt, var); >> + return stmt; > > Looks ok but I wonder if this is dead code with > > (for pows (POW) > sqrts (SQRT) > cbrts (CBRT) > (simplify > (pows @0 REAL_CST@1) > (with { > const REAL_VALUE_TYPE *value = TREE_REAL_CST_PTR (@1); > REAL_VALUE_TYPE tmp; >} >(switch > ... > /* pow(x,0.5) -> sqrt(x). */ > (if (flag_unsafe_math_optimizations > && canonicalize_math_p () > && real_equal (value, &dconsthalf)) > (sqrts @0)) Yeah, I wondered that too, although I think it's more likely to be dead because of sincos. In the end it just seemed like a rabiit hole too far though. > Anyway, patch is ok. Thanks, Richard
Re: [mask-load, patch 2/2, i386] Add/modify mask load/store patterns
Hi Ilya, On 08 Oct 18:42, Ilya Enkovich wrote: > Hi, > > This patch reflects changes in maskload and maskstore optabs and adds > patterns for AVX-512. The patch is OK for trunk. -- Thanks, K > > Thanks, > Ilya > -- > 2015-10-08 Ilya Enkovich > > * config/i386/sse.md (maskload): Rename to ... > (maskload): ... this. > (maskstore): Rename to ... > (maskstore): ... this. > (maskload): New. > (maskstore): New.
Re: [mask-vec_cond, patch 2/2, i386] Add patterns for vcond_mask_optab
Hello Ilya On 08 Oct 18:53, Ilya Enkovich wrote: > Hi, > > This patch add patterns for vcond_mask_optab. No new expand code is > required, existing ix86_expand_sse_movcc is used. The patch is OK for trunk. -- Thanks, K > > Thanks, > Ilya > -- > gcc/ChangeLog: > > 2015-10-08 Ilya Enkovich > > * config/i386/i386-protos.h (ix86_expand_sse_movcc): New. > * config/i386/i386.c (ix86_expand_sse_movcc): Make public. > Cast mask to FP mode if required. > * config/i386/sse.md (vcond_mask_): New. > (vcond_mask_): New. > (vcond_mask_): New. > (vcond_mask_): New. > (vcond_mask_v2div2di): New. > (vcond_mask_): New. > (vcond_mask_): New.
Re: [mask conversion, patch 2/2, i386] Add pack/unpack patterns for scalar masks
Hello Ilya, On 10 Nov 13:25, Ilya Enkovich wrote: > On 19 Oct 15:30, Ilya Enkovich wrote: > > Hi, > > > > This patch adds patterns to be used for vector masks pack/unpack for > > AVX512. Bootstrapped and tested on x86_64-unknown-linux-gnu. Does it > > look OK? The patch is OK for trunk. -- Thanks, K > > > > Thanks, > > Ilya > > 2015-11-10 Ilya Enkovich > > * config/i386/sse.md (HALFMASKMODE): New attribute. > (DOUBLEMASKMODE): New attribute. > (vec_pack_trunc_qi): New. > (vec_pack_trunc_): New. > (vec_unpacks_lo_hi): New. > (vec_unpacks_lo_si): New. > (vec_unpacks_lo_di): New. > (vec_unpacks_hi_hi): New. > (vec_unpacks_hi_): New. > > gcc/testsuite/ > > 2015-11-10 Ilya Enkovich > > * gcc.target/i386/mask-pack.c: New test. > * gcc.target/i386/mask-unpack.c: New test.
[PATCH 0/2] Fix invalid left shift of negative value.
The following series of patches fixes all occurences of left-shifting negative constants in C code which is undefined by the C standard. The patches have been tested on s390x, covering only a small subset of the changes. Ciao Dominik ^_^ ^_^ -- Dominik Vogt IBM Germany
[patch] Tweak comments in libstdc++ header
Minor grammatical fix. Committed to trunk. commit 1d82192ff38f4d0777ed57e8c05f3d1659e98649 Author: Jonathan Wakely Date: Tue Nov 10 08:41:57 2015 + * include/bits/functional_hash.h: Fix grammar in comment. diff --git a/libstdc++-v3/include/bits/functional_hash.h b/libstdc++-v3/include/bits/functional_hash.h index 88937bd..bc192c8 100644 --- a/libstdc++-v3/include/bits/functional_hash.h +++ b/libstdc++-v3/include/bits/functional_hash.h @@ -239,9 +239,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION // @} group hashes - // Hint about performance of hash functor. If not fast the hash based + // Hint about performance of hash functor. If not fast the hash-based // containers will cache the hash code. - // Default behavior is to consider that hasher are fast unless specified + // Default behavior is to consider that hashers are fast unless specified // otherwise. template struct __is_fast_hash : public std::true_type
Re: Add basic support for direct_optab internal functions
On Sat, Nov 7, 2015 at 1:26 PM, Richard Sandiford wrote: > This patch adds a concept of internal functions that map directly to an > optab (here called "direct internal functions"). The function can only > be used if the associated optab can be used. > > We currently have four functions like that: > > - LOAD_LANES > - STORE_LANES > - MASK_LOAD > - MASK_STORE > > so the patch converts them to the new infrastructure. These four > all need different types of optabs, but future patches will add > regular unary and binary ones. > > In general we need one or two modes to decide whether an optab is > supported, depending on whether it's a convert_optab or not. > This in turn means that we need up to two types to decide whether > an internal function is supported. The patch records which types > are needed for each internal function, using -1 if the return type > should be used and N>=0 if the type of argument N should be used. > > (LOAD_LANES and STORE_LANES are unusual in that both optab modes > come from the same array type.) > > Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi. > OK to install? Ok. Thanks, Richard. > Thanks, > Richard > > > gcc/ > * coretypes.h (tree_pair): New type. > * internal-fn.def (DEF_INTERNAL_OPTAB_FN): New macro. Use it > for MASK_LOAD, LOAD_LANES, MASK_STORE and STORE_LANES. > * internal-fn.h (direct_internal_fn_info): New structure. > (direct_internal_fn_array): Declare. > (direct_internal_fn_p, direct_internal_fn): New functions. > (direct_internal_fn_types, direct_internal_fn_supported_p): Declare. > * internal-fn.c (not_direct, mask_load_direct, load_lanes_direct) > (mask_store_direct, store_lanes_direct): New macros. > (direct_internal_fn_array) New array. > (get_multi_vector_move): Return the optab handler without asserting > that it is available. > (expand_LOAD_LANES): Rename to... > (expand_load_lanes_optab_fn): ...this and add an optab argument. > (expand_STORE_LANES): Rename to... > (expand_store_lanes_optab_fn): ...this and add an optab argument. > (expand_MASK_LOAD): Rename to... > (expand_mask_load_optab_fn): ...this and add an optab argument. > (expand_MASK_STORE): Rename to... > (expand_mask_store_optab_fn): ...this and add an optab argument. > (direct_internal_fn_types, direct_optab_supported_p) > (multi_vector_optab_supported_p, direct_internal_fn_supported_p) > (direct_internal_fn_supported_p): New functions. > (direct_mask_load_optab_supported_p): New macro. > (direct_load_lanes_optab_supported_p): Likewise. > (direct_mask_store_optab_supported_p): Likewise. > (direct_store_lanes_optab_supported_p): Likewise. > > diff --git a/gcc/coretypes.h b/gcc/coretypes.h > index 3439c38..d4a75db 100644 > --- a/gcc/coretypes.h > +++ b/gcc/coretypes.h > @@ -251,6 +251,8 @@ namespace gcc { >class context; > } > > +typedef std::pair tree_pair; > + > #else > > struct _dont_use_rtx_here_; > diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c > index afbfae8..72536da 100644 > --- a/gcc/internal-fn.c > +++ b/gcc/internal-fn.c > @@ -66,13 +66,27 @@ init_internal_fns () >internal_fn_fnspec_array[IFN_LAST] = 0; > } > > +/* Create static initializers for the information returned by > + direct_internal_fn. */ > +#define not_direct { -2, -2 } > +#define mask_load_direct { -1, -1 } > +#define load_lanes_direct { -1, -1 } > +#define mask_store_direct { 3, 3 } > +#define store_lanes_direct { 0, 0 } > + > +const direct_internal_fn_info direct_internal_fn_array[IFN_LAST + 1] = { > +#define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC) not_direct, > +#define DEF_INTERNAL_OPTAB_FN(CODE, FLAGS, OPTAB, TYPE) TYPE##_direct, > +#include "internal-fn.def" > + not_direct > +}; > + > /* ARRAY_TYPE is an array of vector modes. Return the associated insn > - for load-lanes-style optab OPTAB. The insn must exist. */ > + for load-lanes-style optab OPTAB, or CODE_FOR_nothing if none. */ > > static enum insn_code > get_multi_vector_move (tree array_type, convert_optab optab) > { > - enum insn_code icode; >machine_mode imode; >machine_mode vmode; > > @@ -80,15 +94,13 @@ get_multi_vector_move (tree array_type, convert_optab > optab) >imode = TYPE_MODE (array_type); >vmode = TYPE_MODE (TREE_TYPE (array_type)); > > - icode = convert_optab_handler (optab, imode, vmode); > - gcc_assert (icode != CODE_FOR_nothing); > - return icode; > + return convert_optab_handler (optab, imode, vmode); > } > > -/* Expand LOAD_LANES call STMT. */ > +/* Expand LOAD_LANES call STMT using optab OPTAB. */ > > static void > -expand_LOAD_LANES (gcall *stmt) > +expand_load_lanes_optab_fn (gcall *stmt, convert_optab optab) > { >struct expand_operand ops[2]; >tree type, lhs, rhs; > @@ -106,13 +118,13 @@ expand_LOAD_LANES (gcall *stmt) > >create_o
Re: [PATCH 1/2] Fix invalid left shift of negative value.
On Tue, Nov 10, 2015 at 12:11:23PM +0100, Dominik Vogt wrote: > The following series of patches fixes all occurences of > left-shifting negative constants in C code which is undefined by > the C standard. The patches have been tested on s390x, covering > only a small subset of the changes. Changes in gdb/. Ciao Dominik ^_^ ^_^ -- Dominik Vogt IBM Germany >From f0480d41f3036d193513fa4dfbba414201b610ec Mon Sep 17 00:00:00 2001 From: Dominik Vogt Date: Fri, 30 Oct 2015 15:17:22 +0100 Subject: [PATCH 1/2] gdb: Fix left shift of negative value. This patch fixes all occurences of left-shifting negative constants in C cod which is undefined by the C standard. gdb/ChangeLog: * hppa-tdep.c (hppa_sign_extend, hppa_low_hppa_sign_extend) (prologue_inst_adjust_sp, hppa_frame_cache): Fix left shift of negative value. * dwarf2read.c (read_subrange_type): Likewise. --- gdb/ChangeLog| 7 +++ gdb/dwarf2read.c | 2 +- gdb/hppa-tdep.c | 8 2 files changed, 5 insertions(+), 5 deletions(-) diff --git a/gdb/dwarf2read.c b/gdb/dwarf2read.c index 87dc8b4..48921e7 100644 --- a/gdb/dwarf2read.c +++ b/gdb/dwarf2read.c @@ -15048,7 +15048,7 @@ read_subrange_type (struct die_info *die, struct dwarf2_cu *cu) the bounds as signed, and thus sign-extend their values, when the base type is signed. */ negative_mask = -(LONGEST) -1 << (TYPE_LENGTH (base_type) * TARGET_CHAR_BIT - 1); +-((LONGEST) 1 << (TYPE_LENGTH (base_type) * TARGET_CHAR_BIT - 1)); if (low.kind == PROP_CONST && !TYPE_UNSIGNED (base_type) && (low.data.const_val & negative_mask)) low.data.const_val |= negative_mask; diff --git a/gdb/hppa-tdep.c b/gdb/hppa-tdep.c index ba7f946..3206729 100644 --- a/gdb/hppa-tdep.c +++ b/gdb/hppa-tdep.c @@ -104,7 +104,7 @@ static const struct objfile_data *hppa_objfile_priv_data = NULL; static int hppa_sign_extend (unsigned val, unsigned bits) { - return (int) (val >> (bits - 1) ? (-1 << bits) | val : val); + return (int) (val >> (bits - 1) ? (-(1 << bits)) | val : val); } /* For many immediate values the sign bit is the low bit! */ @@ -112,7 +112,7 @@ hppa_sign_extend (unsigned val, unsigned bits) static int hppa_low_hppa_sign_extend (unsigned val, unsigned bits) { - return (int) ((val & 0x1 ? (-1 << (bits - 1)) : 0) | val >> 1); + return (int) ((val & 0x1 ? (-(1 << (bits - 1))) : 0) | val >> 1); } /* Extract the bits at positions between FROM and TO, using HP's numbering @@ -1357,7 +1357,7 @@ prologue_inst_adjust_sp (unsigned long inst) /* std,ma X,D(sp) */ if ((inst & 0xffe8) == 0x73c8) -return (inst & 0x1 ? -1 << 13 : 0) | (((inst >> 4) & 0x3ff) << 3); +return (inst & 0x1 ? -(1 << 13) : 0) | (((inst >> 4) & 0x3ff) << 3); /* addil high21,%r30; ldo low11,(%r1),%r30) save high bits in save_high21 for later use. */ @@ -2066,7 +2066,7 @@ hppa_frame_cache (struct frame_info *this_frame, void **this_cache) CORE_ADDR offset; if ((inst >> 26) == 0x1c) - offset = (inst & 0x1 ? -1 << 13 : 0) + offset = (inst & 0x1 ? -(1 << 13) : 0) | (((inst >> 4) & 0x3ff) << 3); else if ((inst >> 26) == 0x03) offset = hppa_low_hppa_sign_extend (inst & 0x1f, 5); -- 2.3.0
Re: [PATCH 0/2] Fix invalid left shift of negative value.
Sorry, wrong mailing list, please ignore. On Tue, Nov 10, 2015 at 12:11:23PM +0100, Dominik Vogt wrote: > The following series of patches fixes all occurences of > left-shifting negative constants in C code which is undefined by > the C standard. The patches have been tested on s390x, covering > only a small subset of the changes. Ciao Dominik ^_^ ^_^ -- Dominik Vogt IBM Germany
Re: [PATCH 2/2] Fix invalid left shift of negative value.
On Tue, Nov 10, 2015 at 12:11:23PM +0100, Dominik Vogt wrote: > The following series of patches fixes all occurences of > left-shifting negative constants in C code which is undefined by > the C standard. The patches have been tested on s390x, covering > only a small subset of the changes. Changes in gdb/testsuite/. I'm not sure whether these are good or not. Ciao Dominik ^_^ ^_^ -- Dominik Vogt IBM Germany >From d1cbc6f371e2720fe4bf4975d8ad9c81a9f01351 Mon Sep 17 00:00:00 2001 From: Dominik Vogt Date: Fri, 30 Oct 2015 15:18:06 +0100 Subject: [PATCH 2/2] gdb/testsuite: Fix left shift of negative value. This patch fixes all occurences of left-shifting negative constants in C cod which is undefined by the C standard. gdb/testsuite/ChangeLog: * lib/dwarf.exp (_note): Fix left shift of negative value. * gdb.trace/trace-condition.exp: Likewise. --- gdb/testsuite/gdb.trace/trace-condition.exp | 2 +- gdb/testsuite/lib/dwarf.exp | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/gdb/testsuite/gdb.trace/trace-condition.exp b/gdb/testsuite/gdb.trace/trace-condition.exp index aec0401..46fa5ed 100644 --- a/gdb/testsuite/gdb.trace/trace-condition.exp +++ b/gdb/testsuite/gdb.trace/trace-condition.exp @@ -152,7 +152,7 @@ foreach trace_command { "trace" "ftrace" } { test_tracepoints $trace_command "21 * 2 == 42" 10 test_tracepoints $trace_command "21 << 1 == 42" 10 test_tracepoints $trace_command "42 >> 1 == 21" 10 -test_tracepoints $trace_command "-21 << 1 == -42" 10 +test_tracepoints $trace_command "-(21 << 1) == -42" 10 test_tracepoints $trace_command "-42 >> 1 == -21" 10 test_tracepoints $trace_command "(0xabababab & 0x) == 0xabab" 10 test_tracepoints $trace_command "(0xabababab | 0x) == 0xabab" 10 diff --git a/gdb/testsuite/lib/dwarf.exp b/gdb/testsuite/lib/dwarf.exp index 9716795..c87da87 100644 --- a/gdb/testsuite/lib/dwarf.exp +++ b/gdb/testsuite/lib/dwarf.exp @@ -1289,7 +1289,7 @@ namespace eval Dwarf { _op .ascii [_quote $name] # Alignment. set align 2 - set total [expr {($namelen + (1 << $align) - 1) & (-1 << $align)}] + set total [expr {($namelen + (1 << $align) - 1) & -(1 << $align)}] for {set i $namelen} {$i < $total} {incr i} { _op .byte 0 } -- 2.3.0
Re: [PATCH 1/2] simplify-rtx: Simplify trunc of and of shiftrt
On 11/09/2015 08:33 AM, Segher Boessenkool wrote: If we have (truncate:M1 (and:M2 (lshiftrt:M2 (x:M2) C) C2)) we can write it instead as (and:M1 (lshiftrt:M1 (truncate:M1 (x:M2)) C) C2) + /* Likewise (truncate:QI (and:SI (lshiftrt:SI (x:SI) C) C2)) into + (and:QI (lshiftrt:QI (truncate:QI (x:SI)) C) C2) for suitable C + and C2. */ + if (GET_CODE (op) == AND + && (GET_CODE (XEXP (op, 0)) == LSHIFTRT + || GET_CODE (XEXP (op, 0)) == ASHIFTRT) + && CONST_INT_P (XEXP (XEXP (op, 0), 1)) + && CONST_INT_P (XEXP (op, 1)) + && UINTVAL (XEXP (XEXP (op, 0), 1)) < precision + && ((GET_MODE_MASK (mode) >> UINTVAL (XEXP (XEXP (op, 0), 1))) + & UINTVAL (XEXP (op, 1))) +== ((GET_MODE_MASK (op_mode) >> UINTVAL (XEXP (XEXP (op, 0), 1))) +& UINTVAL (XEXP (op, 1 In general this would be easier to read if there were intermediate variables called shift_amount and mask. I'm not entirely sure what the last condition here is supposed to test. Is it related to... + return simplify_gen_binary (AND, mode, op0, XEXP (op, 1)); ... the fact that here I think you'd have to trunc_int_for_mode the AND amount for the smaller mode? Bernd
Re: Add genmatch support for internal functions
On Sat, Nov 7, 2015 at 2:17 PM, Richard Sandiford wrote: > Richard Sandiford writes: >> This patch makes genmatch match calls based on combined_fn rather >> than built_in_function and extends the matching to internal functions. >> It also uses fold_const_call to fold the calls to a constant, rather >> than going through fold_builtin_n. >> >> In order to slightly simplify the code and remove potential >> ambiguity, the patch enforces lower case for tree codes >> (foo->FOO_EXPR), caps for functions (no built_in_hypot->BUILT_IN_HYPOT) >> and requires an exact match for user-defined identifiers. The first two >> were already met in practice but there were a couple of cases where >> operator lists were defined in one case and used in another. >> >> Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi. >> OK to install? >> >> Thanks, >> Richard > > Sorry, I'd originally treated converting match.pd to CASE_CFN_* as part > of a later patch, but really it belongs here. (In practice it doesn't > matter because we can use BUILT_IN_* cases to match a combined_fn without > a warning from either GCC or clang.) Ok. Thanks, Richard. > Thanks, > Richard > > > gcc/ > * match.pd: Use HYPOT and COS rather than hypot and cos. > Use CASE_CFN_* macros. > * genmatch.c (internal_fn): New enum. > (fn_id::fn): Change to an unsigned int. > (fn_id::fn_id): Accept internal_fn too. > (add_builtin): Rename to... > (add_function): ...this and turn into a template. > (get_operator): Only try one variation if the original name fails. > Only add _EXPR if the original name was all lower case. > Try converting internal and built-in function names to their > CFN equivalents. > (expr::gen_transform): Use maybe_build_call_expr_loc for generic. > (dt_simplify::gen_1): Likewise. > (dt_node::gen_kids_1): Use gimple_call_combined_fn for gimple > and get_call_combined_fn for generic. > (dt_simplify::gen): Use combined_fn as the type of fn_ids. > (decision_tree::gen): Likewise. > (main): Use lower case in the strings for {VIEW_,}CONVERT[012]. > Use add_function rather than add_builtin. Register internal > functions too. > * generic-match-head.c: Include case-cfn-macros.h. > * gimple-fold.c (replace_stmt_with_simplification): Use > gimple_call_combined_fn to test whether we can keep an > existing call. > * gimple-match.h (code_helper): Replace built_in_function > with combined_fn. > * gimple-match-head.c: Include fold-const-call.h, internal-fn.h > and case-fn-macros.h. > (gimple_resimplify1): Use fold_const_call. > (gimple_resimplify2, gimple_resimplify3): Likewise. > (build_call_internal, build_call): New functions. > (maybe_push_res_to_seq): Use them. > (gimple_simplify): Use fold_const_call. Set *rcode to a combined_fn > rather than a built-in function. > * tree.h (build_call_expr_internal_loc): Declare. > (maybe_build_call_expr_loc): Likewise. > * tree.c (build_call_expr_internal_loc_array): New function. > (maybe_build_call_expr_loc): Likewise. > > diff --git a/gcc/generic-match-head.c b/gcc/generic-match-head.c > index f2e08ed..f55f91e 100644 > --- a/gcc/generic-match-head.c > +++ b/gcc/generic-match-head.c > @@ -32,6 +32,7 @@ along with GCC; see the file COPYING3. If not see > #include "tree-dfa.h" > #include "builtins.h" > #include "dumpfile.h" > +#include "case-cfn-macros.h" > > > /* Routine to determine if the types T1 and T2 are effectively > diff --git a/gcc/genmatch.c b/gcc/genmatch.c > index 1eb8c24..c7ab4a4 100644 > --- a/gcc/genmatch.c > +++ b/gcc/genmatch.c > @@ -230,6 +230,12 @@ enum built_in_function { > END_BUILTINS > }; > > +#define DEF_INTERNAL_FN(CODE, FLAGS, FNSPEC) IFN_##CODE, > +enum internal_fn { > +#include "internal-fn.def" > + IFN_LAST > +}; > + > /* Return true if CODE represents a commutative tree code. Otherwise > return false. */ > bool > @@ -341,13 +347,15 @@ struct operator_id : public id_base >const char *tcc; > }; > > -/* Identifier that maps to a builtin function code. */ > +/* Identifier that maps to a builtin or internal function code. */ > > struct fn_id : public id_base > { >fn_id (enum built_in_function fn_, const char *id_) >: id_base (id_base::FN, id_), fn (fn_) {} > - enum built_in_function fn; > + fn_id (enum internal_fn fn_, const char *id_) > + : id_base (id_base::FN, id_), fn (int (END_BUILTINS) + int (fn_)) {} > + unsigned int fn; > }; > > struct simplify; > @@ -447,10 +455,12 @@ add_operator (enum tree_code code, const char *id, >*slot = op; > } > > -/* Add a builtin identifier to the hash. */ > +/* Add a built-in or internal function identifier to the hash. ID is > + the name of its CFN_* enumeration value. */ > > +templat
Re: [RFC][PATCH] Preferred rename register in regrename pass
On 9 November 2015 at 18:01, Robert Suchanek wrote: > Hi, > >> On 11/09/2015 02:32 PM, Robert Suchanek wrote: >> > The results below were generated for CSiBE benchmark and the numbers in >> > columns express bytes in format 'net (gain/loss)' to show the difference >> > with and without the patch when -frename-registers switch is used. >> >> I'm not entirely sure what the numbers represent. I can see how you'd >> measure at a net size change (I assume a negative net is the intended >> goal), but how did you arrive at gain/loss numbers? >> >> In any case, assuming negative is good, the results seem pretty decent. > > The gain/loss was calculated based on per function analysis. > Each flavour e.g. MIPS n64 -Os was ran with/without the patch and compared to > the base i.e. without the patch. The patched version of each function may > show either positive (larger code size), negative or no difference to > the code size. The gain/loss in a cell is the sum of all positive/negative > numbers for a test. The negatives, as you said, are the good ones. > >> >> > + gcc_assert >> > + (terminated_this_insn->regno == REGNO (recog_data.operand[1])); >> >> Maybe break the line before the == so that you can start the arguments >> on the same line as the assert. >> >> > + /* Nonzero if the chain is renamed. */ >> > + unsigned int renamed:1; >> >> I'd write "has already been renamed" since that is maybe slightly less >> ambiguous. >> >> Ok with those changes. >> >> >> Bernd > > Will do the changes and apply. > Hi, Since you committed this (r230087 if I'm correct), I can see that GCC fails to build ligfortran for target arm-none-linuxgnueabi --with-cpu=cortex-a9. The backtrace is: /tmp/8079076_3.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgfortran/generated/matmul_i8.c: In function 'matmul_i8': /tmp/8079076_3.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/libgfortran/generated/matmul_i8.c:374:1: internal compiler error: in scan_rtx_reg, at regrename.c:1074 } ^ 0xa13940 scan_rtx_reg /tmp/8079076_3.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/regrename.c:1074 0xa1451d record_out_operands /tmp/8079076_3.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/regrename.c:1554 0xa14d12 build_def_use /tmp/8079076_3.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/regrename.c:1802 0xa1533e regrename_analyze(bitmap_head*) /tmp/8079076_3.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/regrename.c:726 0xa161f9 regrename_optimize /tmp/8079076_3.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/regrename.c:1871 0xa161f9 execute /tmp/8079076_3.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/regrename.c:1908 Please submit a full bug report, Can you have a look? > Regards, > Robert >
[PATCH] Improve C++ loop's backward-jump location
While fixing PR67192 it turned out that there are multiple potential issues with the location of the backward-jump statement generated by the C and C++ front-ends for an unconditional loop. First, due to a bug the C front-end had sometimes set the location to the token following the loop. This is what triggered PR67192. The C front-end's intention was to set the location to the last line of the loop instead. This is not perfect either; it may be slightly confusing in the special case where the loop body consists of an if-else statement: Breaking on that line then causes a breakpoint hit on every iteration, even if the else-path is never executed. The C++ front-end does not have these issues; it sets the location to the "while" or "for" token. But this can cause confusion when setting a breakpoint on "while (1)": When hitting it for the first time, one loop iteration will already have executed. To fix this as well, this patch mimics the fix applied to the C front-end, making the front-ends behave identically in this regard. gcc/cp/ChangeLog: * cp-gimplify.c (genericize_cp_loop): Change LOOP_EXPR's location to start of loop body instead of start of loop. gcc/testsuite/ChangeLog: * g++.dg/guality/pr67192.C: New test. --- gcc/cp/cp-gimplify.c | 7 ++- gcc/testsuite/g++.dg/guality/pr67192.C | 79 ++ 2 files changed, 85 insertions(+), 1 deletion(-) create mode 100644 gcc/testsuite/g++.dg/guality/pr67192.C diff --git a/gcc/cp/cp-gimplify.c b/gcc/cp/cp-gimplify.c index e4b50e5..d9bb708 100644 --- a/gcc/cp/cp-gimplify.c +++ b/gcc/cp/cp-gimplify.c @@ -266,7 +266,12 @@ genericize_cp_loop (tree *stmt_p, location_t start_locus, tree cond, tree body, loop = stmt_list; } else -loop = build1_loc (start_locus, LOOP_EXPR, void_type_node, stmt_list); +{ + location_t loc = EXPR_LOCATION (expr_first (body)); + if (loc == UNKNOWN_LOCATION) + loc = start_locus; + loop = build1_loc (loc, LOOP_EXPR, void_type_node, stmt_list); +} stmt_list = NULL; append_to_statement_list (loop, &stmt_list); diff --git a/gcc/testsuite/g++.dg/guality/pr67192.C b/gcc/testsuite/g++.dg/guality/pr67192.C new file mode 100644 index 000..c09ecf8 --- /dev/null +++ b/gcc/testsuite/g++.dg/guality/pr67192.C @@ -0,0 +1,79 @@ +/* PR debug/67192 */ +/* { dg-do run } */ +/* { dg-options "-x c++ -g -Wmisleading-indentation" } */ + +volatile int cnt = 0; + +__attribute__((noinline, noclone)) static int +last (void) +{ + return ++cnt % 5 == 0; +} + +__attribute__((noinline, noclone)) static void +do_it (void) +{ + asm volatile ("" : : "r" (&cnt) : "memory"); +} + +__attribute__((noinline, noclone)) static void +f1 (void) +{ + for (;; do_it()) +{ + if (last ()) + break; +} + do_it (); /* { dg-final { gdb-test 27 "cnt" "5" } } */ +} + +__attribute__((noinline, noclone)) static void +f2 (void) +{ + while (1) +{ + if (last ()) + break; + do_it (); +} + do_it (); /* { dg-final { gdb-test 39 "cnt" "10" } } */ +} + +__attribute__((noinline, noclone)) static void +f3 (void) +{ + for (;; do_it()) +if (last ()) + break; + do_it (); /* { dg-final { gdb-test 48 "cnt" "15" } } */ +} + +__attribute__((noinline, noclone)) static void +f4 (void) +{ + while (1) /* { dg-final { gdb-test 54 "cnt" "15" } } */ +if (last ()) + break; +else + do_it (); + do_it (); /* { dg-final { gdb-test 59 "cnt" "20" } } */ +} + +void (*volatile fnp1) (void) = f1; +void (*volatile fnp2) (void) = f2; +void (*volatile fnp3) (void) = f3; +void (*volatile fnp4) (void) = f4; + +int +main () +{ + asm volatile ("" : : "r" (&fnp1) : "memory"); + asm volatile ("" : : "r" (&fnp2) : "memory"); + asm volatile ("" : : "r" (&fnp3) : "memory"); + asm volatile ("" : : "r" (&fnp4) : "memory"); + fnp1 (); + fnp2 (); + fnp3 (); + fnp4 (); + return 0; +} -- 2.3.0
Re: Replace match.pd DEFINE_MATH_FNs with auto-generated lists
On Sat, Nov 7, 2015 at 2:23 PM, Richard Sandiford wrote: > This patch autogenerates the operator lists for maths functions > like SQRT, adding an additional entry for internal functions. > E.g.: > > (define_operator_list SQRT > BUILT_IN_SQRTF > BUILT_IN_SQRT > BUILT_IN_SQRTL > IFN_SQRT) > > and: > > (define_operator_list CABS > BUILT_IN_CABSF > BUILT_IN_CABS > BUILT_IN_CABSL > null) > > (since there's no internal function for CABS). > > Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi. > OK to install? > > Thanks, > Richard > > > gcc/ > * Makefile.in (MOSTLYCLEANFILES): Add cfn-operators.pd. > (generated_files): Likewise. > (s-cfn-operators, cfn-operators.pd): New rules. > (s-match): Depend on cfn-operators.pd. > * gencfn-macros.c: Expand comment to describe -o behavior. > (print_define_operator_list): New function. > (main): Accept -o. Call print_define_operator_list. > * genmatch.c (main): Add "." to the include path. > * match.pd (DEFINE_MATH_FN): Delete. Include cfn-operators.pd > instead. > > diff --git a/gcc/Makefile.in b/gcc/Makefile.in > index 298bb38..a21aaf5 100644 > --- a/gcc/Makefile.in > +++ b/gcc/Makefile.in > @@ -1566,7 +1566,7 @@ MOSTLYCLEANFILES = insn-flags.h insn-config.h > insn-codes.h \ > tm-preds.h tm-constrs.h checksum-options gimple-match.c generic-match.c \ > tree-check.h min-insn-modes.c insn-modes.c insn-modes.h \ > genrtl.h gt-*.h gtype-*.h gtype-desc.c gtyp-input.list \ > - case-cfn-macros.h \ > + case-cfn-macros.h cfn-operators.pd \ > xgcc$(exeext) cpp$(exeext) $(FULL_DRIVER_NAME) \ > $(EXTRA_PROGRAMS) gcc-cross$(exeext) \ > $(SPECS) collect2$(exeext) gcc-ar$(exeext) gcc-nm$(exeext) \ > @@ -2252,6 +2252,14 @@ s-case-cfn-macros: build/gencfn-macros$(build_exeext) > $(STAMP) s-case-cfn-macros > case-cfn-macros.h: s-case-cfn-macros; @true > > +s-cfn-operators: build/gencfn-macros$(build_exeext) > + $(RUN_GEN) build/gencfn-macros$(build_exeext) -o \ > + > tmp-cfn-operators.pd > + $(SHELL) $(srcdir)/../move-if-change tmp-cfn-operators.pd \ > + cfn-operators.pd > + $(STAMP) s-cfn-operators > +cfn-operators.pd: s-cfn-operators; @true > + > target-hooks-def.h: s-target-hooks-def-h; @true > # make sure that when we build info files, the used tm.texi is up to date. > $(srcdir)/doc/tm.texi: s-tm-texi; @true > @@ -2318,7 +2326,7 @@ s-tm-texi: build/genhooks$(build_exeext) > $(srcdir)/doc/tm.texi.in > gimple-match.c: s-match gimple-match-head.c ; @true > generic-match.c: s-match generic-match-head.c ; @true > > -s-match: build/genmatch$(build_exeext) $(srcdir)/match.pd > +s-match: build/genmatch$(build_exeext) $(srcdir)/match.pd cfn-operators.pd > $(RUN_GEN) build/genmatch$(build_exeext) --gimple $(srcdir)/match.pd \ > > tmp-gimple-match.c > $(RUN_GEN) build/genmatch$(build_exeext) --generic $(srcdir)/match.pd > \ > @@ -2439,7 +2447,8 @@ generated_files = config.h tm.h $(TM_P_H) $(TM_H) > multilib.h \ > $(ALL_GTFILES_H) gtype-desc.c gtype-desc.h gcov-iov.h \ > options.h target-hooks-def.h insn-opinit.h \ > common/common-target-hooks-def.h pass-instances.def \ > - c-family/c-target-hooks-def.h params.list case-cfn-macros.h > + c-family/c-target-hooks-def.h params.list case-cfn-macros.h \ > + cfn-operators.pd > > # > # How to compile object files to run on the build machine. > diff --git a/gcc/gencfn-macros.c b/gcc/gencfn-macros.c > index 5ee3af0..401c429 100644 > --- a/gcc/gencfn-macros.c > +++ b/gcc/gencfn-macros.c > @@ -40,7 +40,27 @@ along with GCC; see the file COPYING3. If not see >case CFN_BUILT_IN_SQRTL: >case CFN_SQRT: > > - The macros for groups with no internal function drop the last line. */ > + The macros for groups with no internal function drop the last line. > + > + When run with -o, the generator prints a similar list of > + define_operator_list directives, for use by match.pd. Each operator > + list starts with the built-in functions, in order of ascending type width. > + This is followed by an entry for the internal function, or "null" if there > + is no internal function for the group. For example: > + > + (define_operator_list SQRT > +BUILT_IN_SQRTF > +BUILT_IN_SQRT > +BUILT_IN_SQRTL > +IFN_SQRT) > + > + and: > + > + (define_operator_list CABS > +BUILT_IN_CABSF > +BUILT_IN_CABS > +BUILT_IN_CABSL > +null) */ > > #include "bconfig.h" > #include "system.h" > @@ -89,6 +109,23 @@ print_case_cfn (const char *name, bool internal_p, >printf ("\n"); > } > > +/* Print an operator list for all combined functions related to NAME, > + with the null-terminated list of suffixes in SUFFIXES. INTERNAL_P > + says whether CFN_ also exists. */ > + > +static void > +p
Re: Short-cut generation of simple built-in functions
On Sat, Nov 7, 2015 at 2:31 PM, Richard Sandiford wrote: > This patch short-circuits the builtins.c expansion code for a particular > gimple call if: > > - the function has an associated internal function > - the target implements that internal function > - the call has no side effects > > This allows a later patch to remove the builtins.c code, once calls with > side effects have been handled. > > Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi. > OK to install? > > Thanks, > Richard > > > gcc/ > * builtins.h (called_as_built_in): Declare. > * builtins.c (called_as_built_in): Make external. > * internal-fn.h (expand_internal_call): Define a variant that > specifies the internal function explicitly. > * internal-fn.c (expand_load_lanes_optab_fn) > (expand_store_lanes_optab_fn, expand_ANNOTATE, expand_GOMP_SIMD_LANE) > (expand_GOMP_SIMD_VF, expand_GOMP_SIMD_LAST_LANE) > (expand_GOMP_SIMD_ORDERED_START, expand_GOMP_SIMD_ORDERED_END) > (expand_UBSAN_NULL, expand_UBSAN_BOUNDS, expand_UBSAN_VPTR) > (expand_UBSAN_OBJECT_SIZE, expand_ASAN_CHECK, expand_TSAN_FUNC_EXIT) > (expand_UBSAN_CHECK_ADD, expand_UBSAN_CHECK_SUB) > (expand_UBSAN_CHECK_MUL, expand_ADD_OVERFLOW, expand_SUB_OVERFLOW) > (expand_MUL_OVERFLOW, expand_LOOP_VECTORIZED) > (expand_mask_load_optab_fn, expand_mask_store_optab_fn) > (expand_ABNORMAL_DISPATCHER, expand_BUILTIN_EXPECT, expand_VA_ARG) > (expand_UNIQUE, expand_GOACC_DIM_SIZE, expand_GOACC_DIM_POS) > (expand_GOACC_LOOP, expand_GOACC_REDUCTION, expand_direct_optab_fn) > (expand_unary_optab_fn, expand_binary_optab_fn): Add an internal_fn > argument. > (internal_fn_expanders): Update prototype. > (expand_internal_call): Define a variant that specifies the > internal function explicitly. Use it to implement the previous > interface. > * cfgexpand.c (expand_call_stmt): Try to expand calls to built-in > functions as calls to internal functions. > > diff --git a/gcc/builtins.c b/gcc/builtins.c > index f65011e..bbcc7dc3 100644 > --- a/gcc/builtins.c > +++ b/gcc/builtins.c > @@ -222,7 +222,7 @@ is_builtin_fn (tree decl) > of the optimization level. This means whenever a function is invoked with > its "internal" name, which normally contains the prefix "__builtin". */ > > -static bool > +bool > called_as_built_in (tree node) > { >/* Note that we must use DECL_NAME, not DECL_ASSEMBLER_NAME_SET_P since > diff --git a/gcc/builtins.h b/gcc/builtins.h > index 917eb90..1d00068 100644 > --- a/gcc/builtins.h > +++ b/gcc/builtins.h > @@ -50,6 +50,7 @@ extern struct target_builtins *this_target_builtins; > extern bool force_folding_builtin_constant_p; > > extern bool is_builtin_fn (tree); > +extern bool called_as_built_in (tree); > extern bool get_object_alignment_1 (tree, unsigned int *, > unsigned HOST_WIDE_INT *); > extern unsigned int get_object_alignment (tree); > diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c > index bfbc958..dc7d4f5 100644 > --- a/gcc/cfgexpand.c > +++ b/gcc/cfgexpand.c > @@ -2551,10 +2551,25 @@ expand_call_stmt (gcall *stmt) >return; > } > > + /* If this is a call to a built-in function and it has no effect other > + than setting the lhs, try to implement it using an internal function > + instead. */ > + decl = gimple_call_fndecl (stmt); > + if (gimple_call_lhs (stmt) > + && !gimple_vdef (stmt) I think you want && ! gimple_has_side_effects (stmt) instead of checking !gimple_vdef (stmt). > + && (optimize || (decl && called_as_built_in (decl > +{ > + internal_fn ifn = replacement_internal_fn (stmt); > + if (ifn != IFN_LAST) > + { > + expand_internal_call (ifn, stmt); > + return; > + } > +} > + >exp = build_vl_exp (CALL_EXPR, gimple_call_num_args (stmt) + 3); > >CALL_EXPR_FN (exp) = gimple_call_fn (stmt); > - decl = gimple_call_fndecl (stmt); >builtin_p = decl && DECL_BUILT_IN (decl); > >/* If this is not a builtin function, the function type through which the > diff --git a/gcc/internal-fn.c b/gcc/internal-fn.c > index 9f9f9cf..c03c0fc 100644 > --- a/gcc/internal-fn.c > +++ b/gcc/internal-fn.c > @@ -103,7 +103,7 @@ get_multi_vector_move (tree array_type, convert_optab > optab) > /* Expand LOAD_LANES call STMT using optab OPTAB. */ > > static void > -expand_load_lanes_optab_fn (gcall *stmt, convert_optab optab) > +expand_load_lanes_optab_fn (internal_fn, gcall *stmt, convert_optab optab) I'm somewhat confused by all these unused internal_fn args. I'm sure we can avoid them? > { >struct expand_operand ops[2]; >tree type, lhs, rhs; > @@ -127,7 +127,7 @@ expand_load_lanes_optab_fn (gcall *stmt, convert_optab > optab) > /* Expand STORE_LANES call STMT using optab OPTAB. */ > > static void > -expand_st
RE: [RFC][PATCH] Preferred rename register in regrename pass
Hi Christophe, > Hi, > > Since you committed this (r230087 if I'm correct), I can see that GCC > fails to build > ligfortran for target arm-none-linuxgnueabi --with-cpu=cortex-a9. ... > > Can you have a look? Sorry for the breakage. I see that my assertion is being triggered. I'll investigate this and check whether the assertion is correct or something else needs to be done. Robert
[Patch ARM] Switch ARM to unified asm.
[Resending as I managed to muck this up with my mail client] Hi, I held off committing a previous version of this patch that I posted in July to be nice to folks backporting fixes and to watch for any objections to move the ARM backend completely over into the unified assembler. The patch does the following. * The compiler now generates code in all ISA modes in unified asm. * We've had unified asm only for the last 10 years, ever since the first Thumb2 support was put in, the disassembler generates output in unified assembler, while the compiler output is always in divided syntax for ARM state. * This means patterns get simpler not having to worry about the position of the condition in a conditional instruction. For example we now consistently use a. ldrbeq rather than ldreqb b. movseq rather than moveqs c. Or indeed the appropriate push / pop instructions whereever appropriate. The compiler behaviour has not changed in terms of what it does with inline assembler, that still remains in divided syntax and over time we need to move all of this over to unified syntax if we can do so as all the official documentation is now in terms of unified asm. I've been carrying this in my tree for quite a while and am reasonably happy that it is stable. I will watch out for any fallout in the coming weeks with this but it is better to take this now rather than later given we are hitting the end of stage1. Tested on arm-none-eabi - applied to trunk. regards Ramana gcc/ChangeLog: 2015-11-06 Ramana Radhakrishnan * config/arm/arm-ldmstm.ml: Rewrite to generate unified asm templates. * config/arm/arm.c (arm_asm_trampoline_template): Make unified asm safe. (arm_output_multireg_pop): Likewise. (output_move_double): Likewise. (output_move_quad): Likewise. (output_return_instruction): Likewise. (arm_print_operand): Remove support for %( and %. print modifiers. (arm_output_shift): Make unified asm. (arm_declare_function_name): Likewise. * config/arm/arm.h (TARGET_UNIFIED_ASM): Delete. (ASM_APP_OFF): Adjust. (ASM_OUTPUT_REG_PUSH): Undo special casing for TARGET_ARM. (ASM_OUTPUT_REG_POP): Likewise. * config/arm/arm.md: Adjust uses of %., %(, %) * config/arm/sync.md: Likewise. * config/arm/thumb2.md: Likewise. * config/arm/ldmstm.md: Regenerate. * config/arm/arm.opt (masm-unified-syntax): Do not special case Thumb. * doc/invoke.texi (masm-unified-syntax): Update documentation. gcc/testsuite/ChangeLog: 2015-11-06 Ramana Radhakrishnan * gcc.target/arm/combine-movs.c: * gcc.target/arm/interrupt-1.c: * gcc.target/arm/interrupt-2.c: * gcc.target/arm/unaligned-memcpy-4.c: >From 265ab44e7d5a9d3807951d0b768b7115aecfe6d6 Mon Sep 17 00:00:00 2001 From: Ramana Radhakrishnan Date: Thu, 5 Nov 2015 10:04:50 + Subject: [PATCH] uasm jumbo --- gcc/config/arm/arm-ldmstm.ml | 23 ++- gcc/config/arm/arm.c | 157 - gcc/config/arm/arm.h | 19 +-- gcc/config/arm/arm.md | 162 ++ gcc/config/arm/arm.opt| 2 +- gcc/config/arm/ldmstm.md | 126 - gcc/config/arm/sync.md| 4 +- gcc/config/arm/thumb2.md | 12 +- gcc/doc/invoke.texi | 7 +- gcc/testsuite/gcc.target/arm/combine-movs.c | 3 +- gcc/testsuite/gcc.target/arm/interrupt-1.c| 4 +- gcc/testsuite/gcc.target/arm/interrupt-2.c| 4 +- gcc/testsuite/gcc.target/arm/unaligned-memcpy-4.c | 2 +- 13 files changed, 240 insertions(+), 285 deletions(-) diff --git a/gcc/config/arm/arm-ldmstm.ml b/gcc/config/arm/arm-ldmstm.ml index bb90192..62982df 100644 --- a/gcc/config/arm/arm-ldmstm.ml +++ b/gcc/config/arm/arm-ldmstm.ml @@ -33,9 +33,20 @@ type amode = IA | IB | DA | DB type optype = IN | OUT | INOUT -let rec string_of_addrmode addrmode = +let rec string_of_addrmode addrmode thumb update = + if thumb || update +then match addrmode with -IA -> "ia" | IB -> "ib" | DA -> "da" | DB -> "db" +IA -> "ia" + | IB -> "ib" + | DA -> "da" + | DB -> "db" +else + match addrmode with +IA -> "" + | IB -> "ib" + | DA -> "da" + | DB -> "db" let rec initial_offset addrmode nregs = match addrmode with @@ -160,7 +171,7 @@ let target addrmode thumb = | _, _ -> raise (InvalidAddrMode "ERROR: Invalid Addressing mode for Thumb1.") let write_pattern_1 name ls addrmode nregs write_set_fn update thumb = - let astr = string_of_addrmode addrmode in + let astr = string_of_addrmode addrmode thumb update in Printf.printf "(define_insn \"*%s%s%d_%s%s\"\n" (if thumb then "thumb_" else "") name nregs astr (if update then "_update
[PATCH] Fix target arch attribute for skylake
Hi, This patch fixes the issue, when target(arch="haswell") function was chosen instead of target(arch="skylake") on skylake. Ok for trunk? gcc/ * config/i386/i386.c: Handle "skylake" and "skylake-avx512". gcc/testsuite/ * g++.dg/ext/mv16.C: New functions. Yulia patch_skylake Description: Binary data
Re: [RFC] [PATCH V2]: RE: [RFC] [Patch] Relax tree-if-conv.c trap assumptions.
On Sat, Nov 7, 2015 at 12:41 PM, Kumar, Venkataramanan wrote: > Hi Richard, > > I have now implemented storing of DR and references using hash maps. > Please find attached patch. > > As discussed, I am now storing the ref, DR and baseref, DR pairs along with > unconditional read/write information in hash tables while iterating over DR > during its initialization. > Then during checking for possible traps for if-converting, just check if the > memory reference for a gimple statement is read/written unconditionally by > querying the hash table instead of quadratic walk. > > Boot strapped and regression tested on x86_64. @@ -592,137 +598,153 @@ struct ifc_dr { /* -1 when not initialized, 0 when false, 1 when true. */ int rw_unconditionally; + + tree ored_result; + excess vertical space at the end. A better name would be simply "predicate". + if (!exsist1) +{ + if (is_true_predicate (ca)) + { + DR_RW_UNCONDITIONALLY (a) = 1; + } -while (TREE_CODE (ref_base_a) == COMPONENT_REF - || TREE_CODE (ref_base_a) == IMAGPART_EXPR - || TREE_CODE (ref_base_a) == REALPART_EXPR) - ref_base_a = TREE_OPERAND (ref_base_a, 0); + IFC_DR (a)->ored_result = ca; + *master_dr = a; + } + else +{ + IFC_DR (*master_dr)->ored_result += fold_or_predicates + (EXPR_LOCATION (IFC_DR (*master_dr)->ored_result), +ca, IFC_DR (*master_dr)->ored_result); -while (TREE_CODE (ref_base_b) == COMPONENT_REF - || TREE_CODE (ref_base_b) == IMAGPART_EXPR - || TREE_CODE (ref_base_b) == REALPART_EXPR) - ref_base_b = TREE_OPERAND (ref_base_b, 0); + if (is_true_predicate (ca) + || is_true_predicate (IFC_DR (*master_dr)->ored_result)) + { + DR_RW_UNCONDITIONALLY (*master_dr) = 1; + } + } please common the predicate handling from both cases, that is, do if (is_true_predicate (IFC_DR (*master_dr)->ored_result)) DR_RW_... after the if. Note no {}s around single stmts. Likewise for the base_master_dr code. + if (!found) + { + DR_WRITTEN_AT_LEAST_ONCE (a) =0; + DR_RW_UNCONDITIONALLY (a) = 0; + return false; no need to update the flags on non-"master" DRs. Please remove the 'found' variable and simply return 'true' whenever found. static bool ifcvt_memrefs_wont_trap (gimple *stmt, vec refs) { - return write_memrefs_written_at_least_once (stmt, refs) -&& memrefs_read_or_written_unconditionally (stmt, refs); + return memrefs_read_or_written_unconditionally (stmt, refs); please simply inline memrefs_read_or_written_unconditionally here. + if (ref_DR_map) +delete ref_DR_map; + ref_DR_map = NULL; + + if (baseref_DR_map) +delete baseref_DR_map; + baseref_DR_map = NULL; at this point the pointers will never be NULL. Ok with those changes. Note the tree-hash-traits.h part is already committed. > gcc/ChangeLog > 2015-11-07 Venkataramanan > > *tree-hash-traits.h (struct tree_operand_hash). Use compare_type and > value_type. > (equal_keys): Rename to equal and use compare_type and value_type. > * tree-if-conv.c (ref_DR_map): Define. >(baseref_DR_map): Like wise >(struct ifc_dr): New tree predicate field. >(hash_memrefs_baserefs_and_store_DRs_read_written_info): New > function. >(memrefs_read_or_written_unconditionally): Use hashmap to query > unconditional read/written information. > (write_memrefs_written_at_least_once) : Remove. > (ifcvt_memrefs_wont_trap): Remove call to > write_memrefs_written_at_least_once. > (if_convertible_loop_p_1): Initialize/delete hash maps an > initialize predicates > before the call to > hash_memrefs_baserefs_and_store_DRs_read_written_info. Watch changelog formatting. Thanks, Richard. > gcc/testsuite/ChangeLog > 2015-11-07 Venkataramanan >*gcc.dg/tree-ssa/ifc-8.c: Add new. > > Ok for trunk? > > Regards, > Venkat. >> >> >> > + } >> >> > + } >> >> > +} >> >> > + return found; >> >> > +} >> >> > + >> >> > /* Return true when the memory references of STMT won't trap in the >> >> > if-converted code. There are two things that we have to check for: >> >> > >> >> > @@ -748,8 +776,20 @@ write_memrefs_written_at_least_once >> (gimple >> >> > *stmt, static bool ifcvt_memrefs_wont_trap (gimple *stmt, >> >> > vec refs) { >> >> > - return write_memrefs_written_at_least_once (stmt, refs) >> >> > -&& memrefs_read_or_written_unconditionally (stmt, refs); >> >> > + bool memrefs_written_once, >> memrefs_read_written_unconditionally; >> >> > + bool memrefs_safe_to_access; >> >> > + >> >> > + memrefs_written_once >> >> > + = write_memrefs_written_at_least_once (stmt, refs); >> >> > + >> >> > + memrefs_read_written_uncon
Re: [RFC][PATCH] Preferred rename register in regrename pass
On 11/09/2015 02:32 PM, Robert Suchanek wrote: @@ -1707,6 +1749,8 @@ build_def_use (basic_block bb) scan_rtx (insn, &XEXP (note, 0), ALL_REGS, mark_read, OP_INOUT); + terminated_this_insn = NULL; + /* Step 4: Close chains for registers that die here, unless the register is mentioned in a REG_UNUSED note. In that case we keep the chain open until step #7 below to ensure I suspect you'll want to move this earlier, just before step 1. My guess would be that the reported failure was for an earlyclobber operand. Bernd
Re: [PATCH] Fix target arch attribute for skylake
On Tue, Nov 10, 2015 at 12:54 PM, Yulia Koval wrote: > Hi, > > This patch fixes the issue, when target(arch="haswell") function was > chosen instead of target(arch="skylake") on skylake. Ok for trunk? > > gcc/ > * config/i386/i386.c: Handle "skylake" and > "skylake-avx512". > > gcc/testsuite/ > * g++.dg/ext/mv16.C: New functions. OK. Thanks, Uros.
Re: Enable pointer TBAA for LTO
On Sun, 8 Nov 2015, Jan Hubicka wrote: > Hi, > this patch adds basic TBAA for pointers to LTO. The basic scheme is simple; > because TYPE_CANONICAL is not really needed by get_alias_set, we completely > drop the caluclation of these (which also saves about 50% canonical type hash > searches) and update get_alias_set to not punt on pointers with > TYPE_STRUCTURAL_EQUALITY. > > The patch makes quite nice improvements (32%) on number of disambiguations on > dealII (that is my random C++ testbed): > > Before: > [WPA] GIMPLE canonical type table: size 16381, 817 elements, 35453 searches, > 91 collisions (ratio: 0.002567) > [WPA] GIMPLE canonical type pointer-map: 817 elements, 15570 searches > > > after: > [WPA] GIMPLE canonical type table: size 16381, 822 elements, 14863 searches, > 114 collisions (ratio: 0.007670) > [WPA] GIMPLE canonical type pointer-map: 822 elements, 12663 searches > > > The number of disambiguations goes 1713472->2331078 (32%) > and number of queries goes 3387753->3669698 (8%) > We get code size growth 677527->701782 (3%) > > Also a query is disambiguated 63% of the time instead of 50% we had before. > > Clearly there are many areas for improvements (since functions are > TYPE_STRUCTURAL_EQUALITY in LTO we ptr_type_node alias set on them), but that > M > can wait for next stage1. > > lto-bootstrapped/regtested x86_64-linux and also used it in my tree for quite > a while, so the patch was tested on Firefox and other applications. > > OK? > Honza > > * alias.c (get_alias_set): Do structural equality for pointer types; > drop LTO specific path. > * lto.c (iterative_hash_canonical_type): Do not compute TYPE_CANONICAL > for pointer types. > (gimple_register_canonical_type_1): Likewise. > (gimple_register_canonical_type): Likewise. > (lto_register_canonical_types): Do not clear canonical types of pointer > types. > Index: lto/lto.c > === > --- lto/lto.c (revision 229968) > +++ lto/lto.c (working copy) > @@ -396,8 +396,13 @@ iterative_hash_canonical_type (tree type > >/* All type variants have same TYPE_CANONICAL. */ >type = TYPE_MAIN_VARIANT (type); > + > + /* We do not compute TYPE_CANONICAl of POINTER_TYPE becuase the aliasing > + code never use it anyway. */ > + if (POINTER_TYPE_P (type)) > +v = hash_canonical_type (type); >/* An already processed type. */ > - if (TYPE_CANONICAL (type)) > + else if (TYPE_CANONICAL (type)) > { >type = TYPE_CANONICAL (type); >v = gimple_canonical_type_hash (type); > @@ -445,7 +450,9 @@ gimple_register_canonical_type_1 (tree t > { >void **slot; > > - gcc_checking_assert (TYPE_P (t) && !TYPE_CANONICAL (t)); > + gcc_checking_assert (TYPE_P (t) && !TYPE_CANONICAL (t) > +&& type_with_alias_set_p (t) > +&& TREE_CODE (t) != POINTER_TYPE); POINTER_TYPE_P > >slot = htab_find_slot_with_hash (gimple_canonical_types, t, hash, INSERT); >if (*slot) > @@ -478,7 +485,7 @@ gimple_register_canonical_type_1 (tree t > static void > gimple_register_canonical_type (tree t) > { > - if (TYPE_CANONICAL (t) || !type_with_alias_set_p (t)) > + if (TYPE_CANONICAL (t) || !type_with_alias_set_p (t) || POINTER_TYPE_P (t)) > return; > >/* Canonical types are same among all complete variants. */ > @@ -498,14 +505,13 @@ static void > lto_register_canonical_types (tree node, bool first_p) > { >if (!node > - || !TYPE_P (node)) > + || !TYPE_P (node) || POINTER_TYPE_P (node)) > return; > >if (first_p) > TYPE_CANONICAL (node) = NULL_TREE; > > - if (POINTER_TYPE_P (node) > - || TREE_CODE (node) == COMPLEX_TYPE > + if (TREE_CODE (node) == COMPLEX_TYPE >|| TREE_CODE (node) == ARRAY_TYPE) > lto_register_canonical_types (TREE_TYPE (node), first_p); Hmm, here we'll miss registering canoncial types for the pointed-to type. I believe this will miss FILE for example but also some va_list types? Please drop this change. > Index: tree.c > === > --- tree.c(revision 229968) > +++ tree.c(working copy) > @@ -13198,6 +13198,7 @@ gimple_canonical_types_compatible_p (con >/* If the types have been previously registered and found equal > they still are. */ >if (TYPE_CANONICAL (t1) && TYPE_CANONICAL (t2) > + && !POINTER_TYPE_P (t1) && !POINTER_TYPE_P (t2) But TYPE_CANONICAL (t1) should be NULL_TREE for POINTER_TYPE_P? >&& trust_type_canonical) > return TYPE_CANONICAL (t1) == TYPE_CANONICAL (t2); > > Index: alias.c > === > --- alias.c (revision 229968) > +++ alias.c (working copy) > @@ -869,13 +874,19 @@ get_alias_set (tree t) >set = lang_hooks.get_alias_set (t); >if (set != -1) > return set; > -
Re: [RFC] Combine vectorized loops with its scalar remainder.
On Tue, Nov 3, 2015 at 1:08 PM, Yuri Rumyantsev wrote: > Richard, > > It looks like misunderstanding - we assume that for GCCv6 the simple > scheme of remainder will be used through introducing new IV : > https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01435.html > > Is it true or we missed something? > > Do you have an idea how "masking" is better be organized to be usable > > for both 4b and 4c? > > Do 2a ... Okay. Richard. > Now we are testing vectorization of loops with small non-constant trip count. > Yuri. > > 2015-11-03 14:47 GMT+03:00 Richard Biener : >> On Wed, Oct 28, 2015 at 11:45 AM, Yuri Rumyantsev wrote: >>> Hi All, >>> >>> Here is a preliminary patch to combine vectorized loop with its scalar >>> remainder, draft of which was proposed by Kirill Yukhin month ago: >>> https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01435.html >>> It was tested wwith '-mavx2' option to run on Haswell processor. >>> The main goal of it is to improve performance of vectorized loops for >>> AVX512. >>> Note that only loads/stores and simple reductions with binary operations are >>> converted to masked form, e.g. load --> masked load and reduction like >>> r1 = f r2 --> t = f r2; r1 = m ? t : r2. Masking is performed >>> through >>> creation of a new vector induction variable initialized with consequent >>> values >>> from 0.. VF-1, new const vector upper bound which contains number of >>> iterations >>> and the result of comparison which is considered as mask vector. >>> This implementation has several restrictions: >>> >>> 1. Multiple types are not supported. >>> 2. SLP is not supported. >>> 3. Gather/Scatter's are also not supported. >>> 4. Vectorization of the loops with low trip count is not implemented yet >>> since >>>it requires additional design and tuning. >>> >>> We are planning to eleminate all these restrictions in GCCv7. >>> >>> This patch will be extended to include cost model to reject unprofutable >>> transformations, e.g. new vector body cost will be evaluated through new >>> target hook which estimates cast of masking different vector statements. New >>> threshold parameter will be introduced which determines permissible cost >>> increasing which will be tuned on an AVX512 machine. >>> This patch is not in sync with changes of Ilya Enkovich for AVX512 masked >>> load/store support since only part of them is in trunk compiler. >>> >>> Any comments will be appreciated. >> >> As stated in the previous discussion I don't think the extra mask IV >> is a good idea >> and we instead should have a masked final iteration for the epilogue >> (yes, that's >> not really "combined" then). This is because in the end we'd not only >> want AVX512 >> to benefit from this work but also other ISAs that can do unaligned or masked >> operations (we can overlap the epilogue work with the vectorized work or use >> masked loads/stores available with AVX). Note that the same applies to >> the alignment prologue if present, I can't see how you can handle that with >> the >> in-loop approach. >> >> Richard.
Re: [PATCH] Simple optimization for MASK_STORE.
On Fri, Nov 6, 2015 at 2:28 PM, Yuri Rumyantsev wrote: > Richard, > > I tried it but 256-bit precision integer type is not yet supported. What's the symptom? The compare cannot be expanded? Just add a pattern then. After all we have modes up to XImode. Richard. > Yuri. > > > 2015-11-06 15:56 GMT+03:00 Richard Biener : >> On Mon, Nov 2, 2015 at 4:24 PM, Yuri Rumyantsev wrote: >>> Hi Richard, >>> >>> I've come back to this optimization and try to implement your proposal >>> for comparison: Btw, you didn't try the simpler alternative of tree type = type_for_mode (int_mode_for_mode (TYPE_MODE (vectype))); build2 (EQ_EXPR, boolean_type_node, build1 (VIEW_CONVERT, type, op0), build1 (VIEW_CONVERT, type, op1)); ? That is, use the GIMPLE level equivalent of (cmp (subreg:TI reg:V4SI) (subreg:TI reg:V4SI)) >>> >>> using the following code: >>> >>> vectype = TREE_TYPE (mask); >>> ext_mode = mode_for_size (GET_MODE_BITSIZE (TYPE_MODE (vectype)), >>> MODE_INT, 0); >>> ext_type = lang_hooks.types.type_for_mode (ext_mode , 1); >>> >>> but I've got zero type for it. Should I miss something? >> >> Use ext_type = build_nonstandard_integer_type (GET_MODE_PRECISION >> (ext_mode), 1); >> >> Richard. >> >>> Any help will be appreciated. >>> Yuri. >>> >>> >>> 2015-08-13 14:40 GMT+03:00 Richard Biener : On Thu, Aug 13, 2015 at 1:32 PM, Yuri Rumyantsev wrote: > Hi Richard, > > Did you have a chance to look at updated patch? Having a quick look now. Btw, you didn't try the simpler alternative of tree type = type_for_mode (int_mode_for_mode (TYPE_MODE (vectype))); build2 (EQ_EXPR, boolean_type_node, build1 (VIEW_CONVERT, type, op0), build1 (VIEW_CONVERT, type, op1)); ? That is, use the GIMPLE level equivalent of (cmp (subreg:TI reg:V4SI) (subreg:TI reg:V4SI)) ? That should be supported by the expander already, though again not sure if the target(s) have compares that match this. Btw, the tree-cfg.c hook wasn't what was agreed on - the restriction on EQ/NE_EXPR is missing. Operand type equality is tested anyway. Why do you need to restrict forward_propagate_into_comparison_1? Otherwise this looks better, but can you try with the VIEW_CONVERT as well? Thanks, Richard. > Thanks. > Yuri. > > 2015-08-06 14:07 GMT+03:00 Yuri Rumyantsev : >> HI All, >> >> Here is updated patch which implements Richard proposal to use vector >> comparison with boolean result instead of target hook. Support for it >> was added to ix86_expand_branch. >> >> Any comments will be appreciated. >> >> Bootstrap and regression testing did not show any new failures. >> >> ChangeLog: >> 2015-08-06 Yuri Rumyantsev >> >> * config/i386/i386.c (ix86_expand_branch): Implement vector >> comparison with boolean result. >> * config/i386/sse.md (define_expand "cbranch4): Add define >> for vector comparion. >> * fold-const.c (fold_relational_const): Add handling of vector >> comparison with boolean result. >> * params.def (PARAM_ZERO_TEST_FOR_STORE_MASK): New DEFPARAM. >> * params.h (ENABLE_ZERO_TEST_FOR_STORE_MASK): new macros. >> * tree-cfg.c (verify_gimple_comparison): Add test for vector >> comparion with boolean result. >> * tree-ssa-forwprop.c (forward_propagate_into_comparison_1): Do not >> propagate vector comparion with boolean result. >> * tree-vect-stmts.c (vectorizable_mask_load_store): Initialize >> has_mask_store field of vect_info. >> * tree-vectorizer.c: Include files ssa.h, cfghooks.h and params.h. >> (is_valid_sink): New function. >> (optimize_mask_stores): New function. >> (vectorize_loops): Invoke optimaze_mask_stores for loops having masked >> stores. >> * tree-vectorizer.h (loop_vec_info): Add new has_mask_store field and >> correspondent macros. >> >> gcc/testsuite/ChangeLog: >> * gcc.target/i386/avx2-vect-mask-store-move1.c: New test. >> >> >> 2015-07-27 11:48 GMT+03:00 Richard Biener : >>> On Fri, Jul 24, 2015 at 9:11 PM, Jeff Law wrote: On 07/24/2015 03:16 AM, Richard Biener wrote: >> >> Is there any rationale given anywhere for the transformation into >> conditional expressions? ie, is there any reason why we can't have a >> GIMPLE_COND where the expression is a vector condition? > > > No rationale for equality compare which would have the semantic of > having all elements equal or not equal. But you can't define a > sensible > ordering (that HW implements) for other compare operators and you > obviously need a single boolean result, not a vector of element > comparison > results. Right. EQ/NE only as ot
Re: [PATCH] PR/67682, break SLP groups up if only some elements match
On Mon, Nov 9, 2015 at 3:59 PM, Alan Lawrence wrote: > On 06/11/15 12:55, Richard Biener wrote: >> >>> + /* GROUP_GAP of the first group now has to skip over the second group >>> too. */ >>> + GROUP_GAP (first_vinfo) += group2_size; >> >> Please add a MSG_NOTE debug printf stating that we split the group and >> at which element. > > Done. > >> I think you want to add && STMT_VINFO_GROUPED_ACCESS (vinfo_for_stmt (stmt)) >> otherwise this could be SLP reductions where there is no way the split >> group would enable vectorization. > > Ah, I had thought that the (GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt))) check > sufficed for that, as per e.g. the section above > /* Create a node (a root of the SLP tree) for the packed grouped stores. */ > But done. > >> Note that BB vectorization now also very aggressively falls back to >> considering >> non-matches being "external". >> >> Not sure why that doesn't trigger for your testcases ;) > > I tested against trunk r229944, on which all of my scan-tree-dump's were > failing > >> I'm comfortable with the i < group_size half of the patch. For the other >> piece >> I'd like to see more compile-time / success data from, say, building >> SPEC CPU 2006. > > Well, as a couple of quick data points, a compile of SPEC2000 on > aarch64-none-linux-gnu (-Ofast -fomit-frame-pointer -mcpu=cortex-a57), I have: > > 3080 successes without patch; > +79 successes from the "i < vectorization_factor" part of the patch (on its > own) > +90 successes from the (i>=vectorization_factor) && "i < group_size" part (on > its own) > +(79 from first) +(90 from second) + (an additional 62) from both parts > together. > > And for SPEC2006, aarch64-linux-gnu (-O3 -fomit-frame-pointer > -mcpu=cortex-a57): > 11979 successes without patch; > + 499 from the "i < vectorization_factor" part > + 264 from the (i >= vectorization factor) && (i < group_size)" part > + extra 336 if both parts combined. Thanks > I haven't done any significant measurements of compile-time yet. > > (snipping this bit out-of-order) >> Hmm. This is of course pretty bad for compile-time for the non-matching >> case as that effectively will always split into N pieces if we feed it >> garbage (that is, without being sure that at least one pice _will_ >> vectorize). >> >> OTOH with the current way of computing "matches" we'd restrict ourselves >> to cases where the first stmt we look at (and match to) happens to be >> the operation that in the end will form a matching group. > ... >> Eventually we'd want to improve the "matches" return >> to include the affected stmts (ok, that'll be not very easy) so we can do >> a cheap "if we split here will it fix that particular mismatch" check. > > Yes, I think there are a bunch of things we can do here, that would be more > powerful than the simple approach I used here. The biggest limiting factor > will > probably be (lack of) permutation, i.e. if we only SLP stores to consecutive > addresses. Yeah, doing anything else requires us to use masked or strided stores though. >> So, please split the patch and I suggest to leave the controversical part >> for next stage1 together with some improvement on the SLP build process >> itself? > > Here's a reduced version with just the second case, > bootstrapped+check-gcc/g++ on x86_64. > > gcc/ChangeLog: > > * tree-vect-slp.c (vect_split_slp_store_group): New. > (vect_analyze_slp_instance): During basic block SLP, recurse on > subgroups if vect_build_slp_tree fails after 1st vector. > > gcc/testsuite/ChangeLog: > > * gcc.dg/vect/bb-slp-7.c (main1): Make subgroups non-isomorphic. > * gcc.dg/vect/bb-slp-subgroups-1.c: New. > * gcc.dg/vect/bb-slp-subgroups-2.c: New. > * gcc.dg/vect/bb-slp-subgroups-3.c: New. > * gcc.dg/vect/bb-slp-subgroups-4.c: New. > --- > gcc/testsuite/gcc.dg/vect/bb-slp-7.c | 10 ++-- > gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-1.c | 44 +++ > gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-2.c | 42 +++ > gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3.c | 41 ++ > gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-4.c | 41 ++ > gcc/tree-vect-slp.c| 74 > +- > 6 files changed, 246 insertions(+), 6 deletions(-) > create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-1.c > create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-2.c > create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3.c > create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-4.c > > diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-7.c > b/gcc/testsuite/gcc.dg/vect/bb-slp-7.c > index ab54a48..b8bef8c 100644 > --- a/gcc/testsuite/gcc.dg/vect/bb-slp-7.c > +++ b/gcc/testsuite/gcc.dg/vect/bb-slp-7.c > @@ -16,12 +16,12 @@ main1 (unsigned int x, unsigned int y) >unsigned int *pout = &out[0]; >unsigned int a0, a1, a2, a3; > > - /* Non isomorphic. */ > +
Re: [PATCH] Simple optimization for MASK_STORE.
2015-11-10 15:33 GMT+03:00 Richard Biener : > On Fri, Nov 6, 2015 at 2:28 PM, Yuri Rumyantsev wrote: >> Richard, >> >> I tried it but 256-bit precision integer type is not yet supported. > > What's the symptom? The compare cannot be expanded? Just add a pattern then. > After all we have modes up to XImode. I suppose problem may be in: gcc/config/i386/i386-modes.def:#define MAX_BITSIZE_MODE_ANY_INT (128) which doesn't allow to create constants of bigger size. Changing it to maximum vector size (512) would mean we increase wide_int structure size significantly. New patterns are probably also needed. Ilya > > Richard. > >> Yuri. >> >>
Re: [PATCH] PR/67682, break SLP groups up if only some elements match
On Tue, Nov 10, 2015 at 1:45 PM, Richard Biener wrote: > On Mon, Nov 9, 2015 at 3:59 PM, Alan Lawrence wrote: >> On 06/11/15 12:55, Richard Biener wrote: >>> + /* GROUP_GAP of the first group now has to skip over the second group too. */ + GROUP_GAP (first_vinfo) += group2_size; >>> >>> Please add a MSG_NOTE debug printf stating that we split the group and >>> at which element. >> >> Done. >> >>> I think you want to add && STMT_VINFO_GROUPED_ACCESS (vinfo_for_stmt (stmt)) >>> otherwise this could be SLP reductions where there is no way the split >>> group would enable vectorization. >> >> Ah, I had thought that the (GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt))) >> check >> sufficed for that, as per e.g. the section above >> /* Create a node (a root of the SLP tree) for the packed grouped stores. */ >> But done. >> >>> Note that BB vectorization now also very aggressively falls back to >>> considering >>> non-matches being "external". >>> >>> Not sure why that doesn't trigger for your testcases ;) >> >> I tested against trunk r229944, on which all of my scan-tree-dump's were >> failing >> >>> I'm comfortable with the i < group_size half of the patch. For the other >>> piece >>> I'd like to see more compile-time / success data from, say, building >>> SPEC CPU 2006. >> >> Well, as a couple of quick data points, a compile of SPEC2000 on >> aarch64-none-linux-gnu (-Ofast -fomit-frame-pointer -mcpu=cortex-a57), I >> have: >> >> 3080 successes without patch; >> +79 successes from the "i < vectorization_factor" part of the patch (on its >> own) >> +90 successes from the (i>=vectorization_factor) && "i < group_size" part >> (on its own) >> +(79 from first) +(90 from second) + (an additional 62) from both parts >> together. >> >> And for SPEC2006, aarch64-linux-gnu (-O3 -fomit-frame-pointer >> -mcpu=cortex-a57): >> 11979 successes without patch; >> + 499 from the "i < vectorization_factor" part >> + 264 from the (i >= vectorization factor) && (i < group_size)" part >> + extra 336 if both parts combined. > > Thanks > >> I haven't done any significant measurements of compile-time yet. >> >> (snipping this bit out-of-order) >>> Hmm. This is of course pretty bad for compile-time for the non-matching >>> case as that effectively will always split into N pieces if we feed it >>> garbage (that is, without being sure that at least one pice _will_ >>> vectorize). >>> >>> OTOH with the current way of computing "matches" we'd restrict ourselves >>> to cases where the first stmt we look at (and match to) happens to be >>> the operation that in the end will form a matching group. >> ... >>> Eventually we'd want to improve the "matches" return >>> to include the affected stmts (ok, that'll be not very easy) so we can do >>> a cheap "if we split here will it fix that particular mismatch" check. >> >> Yes, I think there are a bunch of things we can do here, that would be more >> powerful than the simple approach I used here. The biggest limiting factor >> will >> probably be (lack of) permutation, i.e. if we only SLP stores to consecutive >> addresses. > > Yeah, doing anything else requires us to use masked or strided stores though. > >>> So, please split the patch and I suggest to leave the controversical part >>> for next stage1 together with some improvement on the SLP build process >>> itself? >> >> Here's a reduced version with just the second case, >> bootstrapped+check-gcc/g++ on x86_64. >> >> gcc/ChangeLog: >> >> * tree-vect-slp.c (vect_split_slp_store_group): New. >> (vect_analyze_slp_instance): During basic block SLP, recurse on >> subgroups if vect_build_slp_tree fails after 1st vector. >> >> gcc/testsuite/ChangeLog: >> >> * gcc.dg/vect/bb-slp-7.c (main1): Make subgroups non-isomorphic. >> * gcc.dg/vect/bb-slp-subgroups-1.c: New. >> * gcc.dg/vect/bb-slp-subgroups-2.c: New. >> * gcc.dg/vect/bb-slp-subgroups-3.c: New. >> * gcc.dg/vect/bb-slp-subgroups-4.c: New. >> --- >> gcc/testsuite/gcc.dg/vect/bb-slp-7.c | 10 ++-- >> gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-1.c | 44 +++ >> gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-2.c | 42 +++ >> gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3.c | 41 ++ >> gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-4.c | 41 ++ >> gcc/tree-vect-slp.c| 74 >> +- >> 6 files changed, 246 insertions(+), 6 deletions(-) >> create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-1.c >> create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-2.c >> create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-3.c >> create mode 100644 gcc/testsuite/gcc.dg/vect/bb-slp-subgroups-4.c >> >> diff --git a/gcc/testsuite/gcc.dg/vect/bb-slp-7.c >> b/gcc/testsuite/gcc.dg/vect/bb-slp-7.c >> index ab54a48..b8bef8c 100644 >> --- a/gcc/testsuite/gcc.dg/vect/bb-slp-7.c >> +++ b/gcc/testsuite/gcc
Re: [PATCH] Make BB vectorizer work on sub-BBs
On 6 November 2015 at 12:11, Kyrill Tkachov wrote: > Hi Richard, > > > On 06/11/15 11:09, Richard Biener wrote: >> >> On Fri, 6 Nov 2015, Richard Biener wrote: >> >>> The following patch makes the BB vectorizer not only handle BB heads >>> (until the first stmt with a data reference it cannot handle) but >>> arbitrary regions in a BB separated by such stmts. >>> >>> This improves the number of BB vectorizations from 469 to 556 >>> in a quick test on SPEC CPU 2006 with -Ofast on x86_64 and >>> 1x400.perlbench 1x410.bwaves 1x416.gamess 1x450.soplex 1x453.povray >>> 1x481.wrf failing both patched and unpatched (have to update my >>> config used for such experiments it seems ...) >>> >>> Bootstrapped and tested on x86_64-unknown-linux-gnu, aarch64 cross built. >>> >>> I'm currently re-testing for a cosmetic change I made when writing >>> the changelog. >>> >>> I expected (and there are) some issues with compile-time. Left >>> is unpatched and right is patched. >>> >>> '403.gcc': 00:00:54 (54) | '403.gcc': 00:00:55 (55) >>> '483.xalancbmk': 00:02:20 (140) | '483.xalancbmk': 00:02:24 (144) >>> '416.gamess': 00:02:36 (156) | '416.gamess': 00:02:37 (157) >>> '435.gromacs': 00:00:18 (18) | '435.gromacs': 00:00:19 (19) >>> '447.dealII': 00:01:31 (91) | '447.dealII': 00:01:33 (93) >>> '453.povray': 00:04:54 (294) | '453.povray': 00:08:54 (534) >>> '454.calculix': 00:00:34 (34) | '454.calculix': 00:00:52 (52) >>> '481.wrf': 00:01:57 (117) | '481.wrf': 00:01:59 (119) >>> >>> other benchmarks are unchanged. I'm double-checking now that a followup >>> patch I have which re-implements BB vectorization dependence checking >>> fixes this (that's the only quadraticness I know of). >> >> Fixes all but >> >> '453.povray': 00:04:54 (294) | '453.povray': 00:06:46 (406) > > > Note that povray is currently suffering from PR 68198 > Hi, I've also noticed that the new test bb-slp-38 fails on armeb: FAIL: gcc.dg/vect/bb-slp-38.c -flto -ffat-lto-objects scan-tree-dump-times slp2 "basic block part vectorized" 2 FAIL: gcc.dg/vect/bb-slp-38.c scan-tree-dump-times slp2 "basic block part vectorized" 2 I haven't checked in more detail, maybe it's similar to what we discussed in PR65962 > Kyrill > > >> >> it even improves compile-time on some: >> >> '464.h264ref': 00:00:26 (26) | '464.h264ref': 00:00:21 (21) >> >> it also increases the number of vectorized BBs to 722. >> >> Needs some work still though. >> >> Richard. >> >>> Richard. >>> >>> 2015-11-06 Richard Biener >>> >>> * tree-vectorizer.h (struct _bb_vec_info): Add region_begin/end >>> members. >>> (vect_stmt_in_region_p): Declare. >>> * tree-vect-slp.c (new_bb_vec_info): Work on a region. >>> (destroy_bb_vec_info): Likewise. >>> (vect_bb_slp_scalar_cost): Use vect_stmt_in_region_p. >>> (vect_get_and_check_slp_defs): Likewise. >>> (vect_slp_analyze_bb_1): Refactor to make it work on sub-BBs. >>> (vect_slp_bb): Likewise. >>> * tree-vect-patterns.c (vect_same_loop_or_bb_p): Implement >>> in terms of vect_stmt_in_region_p. >>> (vect_pattern_recog): Iterate over the BB region. >>> * tree-vect-stmts.c (vect_is_simple_use): Use >>> vect_stmt_in_region_p. >>> * tree-vectorizer.c (vect_stmt_in_region_p): New function. >>> (pass_slp_vectorize::execute): Initialize all stmt UIDs to -1. >>> >>> * config/i386/i386.c: Include gimple-iterator.h. >>> * config/aarch64/aarch64.c: Likewise. >>> >>> * gcc.dg/vect/bb-slp-38.c: New testcase. >>> >>> Index: gcc/tree-vectorizer.h >>> === >>> *** gcc/tree-vectorizer.h.orig 2015-11-05 09:52:00.640227178 +0100 >>> --- gcc/tree-vectorizer.h 2015-11-05 13:20:58.385786476 +0100 >>> *** nested_in_vect_loop_p (struct loop *loop >>> *** 390,395 >>> --- 390,397 >>>typedef struct _bb_vec_info : public vec_info >>>{ >>> basic_block bb; >>> + gimple_stmt_iterator region_begin; >>> + gimple_stmt_iterator region_end; >>>} *bb_vec_info; >>> #define BB_VINFO_BB(B) (B)->bb >>> *** void vect_pattern_recog (vec_info *); >>> *** 1085,1089 >>> --- 1087,1092 >>>/* In tree-vectorizer.c. */ >>>unsigned vectorize_loops (void); >>>void vect_destroy_datarefs (vec_info *); >>> + bool vect_stmt_in_region_p (vec_info *, gimple *); >>> #endif /* GCC_TREE_VECTORIZER_H */ >>> Index: gcc/tree-vect-slp.c >>> === >>> *** gcc/tree-vect-slp.c.orig2015-11-05 09:52:00.640227178 +0100 >>> --- gcc/tree-vect-slp.c 2015-11-06 10:22:56.707880233 +0100 >>> *** vect_get_and_check_slp_defs (vec_info *v >>> *** 209,215 >>> unsigned int i, number_of_oprnds; >>> gimple *def_stmt; >>>
Re: [RFC] Combine vectorized loops with its scalar remainder.
2015-11-10 15:30 GMT+03:00 Richard Biener : > On Tue, Nov 3, 2015 at 1:08 PM, Yuri Rumyantsev wrote: >> Richard, >> >> It looks like misunderstanding - we assume that for GCCv6 the simple >> scheme of remainder will be used through introducing new IV : >> https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01435.html >> >> Is it true or we missed something? > > >> > Do you have an idea how "masking" is better be organized to be usable >> > for both 4b and 4c? >> >> Do 2a ... > Okay. > 2a was 'transform already vectorized loop as a separate post-processing'. Isn't it what this prototype patch implements? Current version only masks loop body which is in practice applicable for AVX-512 only in the most cases. With AVX-512 it's easier to see how profitable masking might be and it is a main target for the first masking version. Extending it to prologues/epilogues and thus making it more profitable for other targets is the next step and is out of the scope of this patch. Thanks, Ilya > > Richard. >
Re: [PATCH] Make BB vectorizer work on sub-BBs
On Tue, 10 Nov 2015, Christophe Lyon wrote: > On 6 November 2015 at 12:11, Kyrill Tkachov wrote: > > Hi Richard, > > > > > > On 06/11/15 11:09, Richard Biener wrote: > >> > >> On Fri, 6 Nov 2015, Richard Biener wrote: > >> > >>> The following patch makes the BB vectorizer not only handle BB heads > >>> (until the first stmt with a data reference it cannot handle) but > >>> arbitrary regions in a BB separated by such stmts. > >>> > >>> This improves the number of BB vectorizations from 469 to 556 > >>> in a quick test on SPEC CPU 2006 with -Ofast on x86_64 and > >>> 1x400.perlbench 1x410.bwaves 1x416.gamess 1x450.soplex 1x453.povray > >>> 1x481.wrf failing both patched and unpatched (have to update my > >>> config used for such experiments it seems ...) > >>> > >>> Bootstrapped and tested on x86_64-unknown-linux-gnu, aarch64 cross built. > >>> > >>> I'm currently re-testing for a cosmetic change I made when writing > >>> the changelog. > >>> > >>> I expected (and there are) some issues with compile-time. Left > >>> is unpatched and right is patched. > >>> > >>> '403.gcc': 00:00:54 (54) | '403.gcc': 00:00:55 (55) > >>> '483.xalancbmk': 00:02:20 (140) | '483.xalancbmk': 00:02:24 (144) > >>> '416.gamess': 00:02:36 (156) | '416.gamess': 00:02:37 (157) > >>> '435.gromacs': 00:00:18 (18) | '435.gromacs': 00:00:19 (19) > >>> '447.dealII': 00:01:31 (91) | '447.dealII': 00:01:33 (93) > >>> '453.povray': 00:04:54 (294) | '453.povray': 00:08:54 (534) > >>> '454.calculix': 00:00:34 (34) | '454.calculix': 00:00:52 (52) > >>> '481.wrf': 00:01:57 (117) | '481.wrf': 00:01:59 (119) > >>> > >>> other benchmarks are unchanged. I'm double-checking now that a followup > >>> patch I have which re-implements BB vectorization dependence checking > >>> fixes this (that's the only quadraticness I know of). > >> > >> Fixes all but > >> > >> '453.povray': 00:04:54 (294) | '453.povray': 00:06:46 (406) > > > > > > Note that povray is currently suffering from PR 68198 > > > > Hi, > > I've also noticed that the new test bb-slp-38 fails on armeb: > FAIL: gcc.dg/vect/bb-slp-38.c -flto -ffat-lto-objects > scan-tree-dump-times slp2 "basic block part vectorized" 2 > FAIL: gcc.dg/vect/bb-slp-38.c scan-tree-dump-times slp2 "basic block > part vectorized" 2 > > I haven't checked in more detail, maybe it's similar to what we > discussed in PR65962 Maybe though there is no misalignment involved as far as I can see. Please open a bug and attach vectorizer dumps. Richard. > > Kyrill > > > > > >> > >> it even improves compile-time on some: > >> > >> '464.h264ref': 00:00:26 (26) | '464.h264ref': 00:00:21 (21) > >> > >> it also increases the number of vectorized BBs to 722. > >> > >> Needs some work still though. > >> > >> Richard. > >> > >>> Richard. > >>> > >>> 2015-11-06 Richard Biener > >>> > >>> * tree-vectorizer.h (struct _bb_vec_info): Add region_begin/end > >>> members. > >>> (vect_stmt_in_region_p): Declare. > >>> * tree-vect-slp.c (new_bb_vec_info): Work on a region. > >>> (destroy_bb_vec_info): Likewise. > >>> (vect_bb_slp_scalar_cost): Use vect_stmt_in_region_p. > >>> (vect_get_and_check_slp_defs): Likewise. > >>> (vect_slp_analyze_bb_1): Refactor to make it work on sub-BBs. > >>> (vect_slp_bb): Likewise. > >>> * tree-vect-patterns.c (vect_same_loop_or_bb_p): Implement > >>> in terms of vect_stmt_in_region_p. > >>> (vect_pattern_recog): Iterate over the BB region. > >>> * tree-vect-stmts.c (vect_is_simple_use): Use > >>> vect_stmt_in_region_p. > >>> * tree-vectorizer.c (vect_stmt_in_region_p): New function. > >>> (pass_slp_vectorize::execute): Initialize all stmt UIDs to -1. > >>> > >>> * config/i386/i386.c: Include gimple-iterator.h. > >>> * config/aarch64/aarch64.c: Likewise. > >>> > >>> * gcc.dg/vect/bb-slp-38.c: New testcase. > >>> > >>> Index: gcc/tree-vectorizer.h > >>> === > >>> *** gcc/tree-vectorizer.h.orig 2015-11-05 09:52:00.640227178 +0100 > >>> --- gcc/tree-vectorizer.h 2015-11-05 13:20:58.385786476 +0100 > >>> *** nested_in_vect_loop_p (struct loop *loop > >>> *** 390,395 > >>> --- 390,397 > >>>typedef struct _bb_vec_info : public vec_info > >>>{ > >>> basic_block bb; > >>> + gimple_stmt_iterator region_begin; > >>> + gimple_stmt_iterator region_end; > >>>} *bb_vec_info; > >>> #define BB_VINFO_BB(B) (B)->bb > >>> *** void vect_pattern_recog (vec_info *); > >>> *** 1085,1089 > >>> --- 1087,1092 > >>>/* In tree-vectorizer.c. */ > >>>unsigned vectorize_loops (void); > >>>void vect_destroy_datarefs (vec_info *); > >>> + bool vect_stmt_in_region_p (vec_info *, gimple *); > >>> #endif /* GCC_TREE_V
Re: Remove instantiations when no concept check
On 09/11/15 22:14 +0100, François Dumont wrote: Hi I just committed this trivial cleanup. 2015-11-09 François Dumont * include/bits/stl_algo.h (partial_sort_copy): Instantiate std::iterator_traits only if concept checks. (lower_bound): Likewise. (upper_bound): Likewise. (equal_range): Likewise. (binary_search): Likewise. * include/bits/stl_heap.h (pop_heap): Likewise. Thanks! The unused-local-typedefs warnings are quite noisy, so it's good to get rid of them.
Re: [PATCH] Add configure flag for operator new (std::nothrow)
On 06/11/15 09:59 +, Pedro Alves wrote: On 11/06/2015 01:56 AM, Jonathan Wakely wrote: On 5 November 2015 at 23:31, Daniel Gutson The issue is, as I understand it, to do the actual work of operator new, i.e. allocate memory. It should force us to copy most of the code of the original code of operator new, which may change on new versions of the STL, forcing us to keep updated. It can just call malloc, and the replacement operator delete can call free. That is very unlikely to need to change (which is corroborated by the fact that the default definitions in libsupc++ change very rarely). Or perhaps libsupc++ could provide the default operator new under a __default_operator_new alias or some such, so that the user-defined replacement can fallback to calling it. Likewise for op delete. That could be useful, please file an enhancement request in bugzilla if you'd like that done.
[PATCH] Fix IRA register preferencing
Ping of https://gcc.gnu.org/ml/gcc-patches/2014-12/msg00829.html: This fixes a bug in register preferencing. When live range splitting creates a new register from another, it copies most fields except for the register preferences. The preference GENERAL_REGS is used as reg_pref[i].prefclass is initialized with GENERAL_REGS in allocate_reg_info () and resize_reg_info (). This initialization value is not innocuous like the comment suggests - if a new register has a non-integer mode, it is forced to prefer GENERAL_REGS. This changes the register costs in pass 2 so that they are incorrect. As a result the liverange is either spilled or allocated to an integer register: void g(double); void f(double x) { if (x == 0) return; g (x); g (x); } f: fcmp d0, #0.0 bne .L6 ret .L6: stp x19, x30, [sp, -16]! fmov x19, d0 bl g fmov d0, x19 ldp x19, x30, [sp], 16 b g With the fix it uses a floating point register as expected. Given a similar issue in https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02253.html, would it not be better to change the initialization values of reg_pref to illegal register classes so this kind of issue can be trivially found with an assert? Also would it not be a good idea to have a single register copy function that ensures all data is copied? ChangeLog: 2014-12-09 Wilco Dijkstra wdijk...@arm.com * gcc/ira-emit.c (ira_create_new_reg): Copy preference classes. --- gcc/ira-emit.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/gcc/ira-emit.c b/gcc/ira-emit.c index d246b7f..d736836 100644 --- a/gcc/ira-emit.c +++ b/gcc/ira-emit.c @@ -348,6 +348,7 @@ rtx ira_create_new_reg (rtx original_reg) { rtx new_reg; + int original_regno = REGNO (original_reg); new_reg = gen_reg_rtx (GET_MODE (original_reg)); ORIGINAL_REGNO (new_reg) = ORIGINAL_REGNO (original_reg); @@ -356,8 +357,16 @@ ira_create_new_reg (rtx original_reg) REG_ATTRS (new_reg) = REG_ATTRS (original_reg); if (internal_flag_ira_verbose > 3 && ira_dump_file != NULL) fprintf (ira_dump_file, " Creating newreg=%i from oldreg=%i\n", - REGNO (new_reg), REGNO (original_reg)); + REGNO (new_reg), original_regno); ira_expand_reg_equiv (); + + /* Copy the preference classes to new_reg. */ + resize_reg_info (); + setup_reg_classes (REGNO (new_reg), + reg_preferred_class (original_regno), + reg_alternate_class (original_regno), + reg_allocno_class (original_regno)); + return new_reg; } -- 1.9.1
remove inadvertent testcase commit
I've removed a testcase I inadvertently committed along with the routine testcases: 2015-11-09 Nathan Sidwell * testsuite/libgomp.oacc-c-c++-common/firstprivate-1.c: Remove inadvertent commit. nathan
Re: [PATCH 1/2][ARC] Add support for ARCv2 CPUs
On 30/10/15 11:19, Claudiu Zissulescu wrote: Hi, Please find the updated patch. I will defer the secondary reload optimization which will use the ld instructions with LIMM, for the time being. Apart from the gen_compare_reg change, the patch is OK. If the v2 support mostly works like support for the other subtargets, you may check it in without the gen_compare_reg change. If that change is required because of particular code paths taken with the v2 port, you may check in the whole patch. The operand-swapping in gen_compare_reg was not expected to be triggered when re-generating a comparison, as comparisons gleaned from existing instructions are supposed to already have the operands in the right order. Do you have a testcase that triggers the assert independently from the v2 support? If you can name a pre-existing testcase to trigger the assert, the patch is approved for separate check-in. If you have a new testcase, is it in a form and of a legal status that it can be submitted for inclusion in the gcc regression tests suite?
Re: [PATCH 2/2][ARC] Add support for ARCv2 CPUs
On 30/10/15 11:22, Claudiu Zissulescu wrote: Hi, Please find the updated patch. Both ARC patches were tested using dg.exp. The ChangeLog entry is unchanged. This is OK.
[patch] Update C++17 library implementation status
Committed to trunk. commit 49c7ea7c21cb4592233904cc1a4eb293ac6d851c Author: Jonathan Wakely Date: Tue Nov 10 14:03:39 2015 + Update C++17 library implementation status * doc/xml/manual/status_cxx2017.xml: Update. * doc/html/*: Regenerate. diff --git a/libstdc++-v3/doc/xml/manual/status_cxx2017.xml b/libstdc++-v3/doc/xml/manual/status_cxx2017.xml index fc2ebd2..4ea0d1e 100644 --- a/libstdc++-v3/doc/xml/manual/status_cxx2017.xml +++ b/libstdc++-v3/doc/xml/manual/status_cxx2017.xml @@ -103,15 +103,14 @@ not in any particular release. - http://www.w3.org/1999/xlink"; xlink:href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4258.pdf";> N4258 Cleaning-up noexcept in the Library - Partial - Changes to basic_string not complete. + Y + @@ -137,15 +136,14 @@ not in any particular release. - http://www.w3.org/1999/xlink"; xlink:href="http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2014/n4169.html";> N4169 A proposal to add invoke function template - N - In progress + Y +
Re: OpenACC Firstprivate
On 11/09/15 09:46, Nathan Sidwell wrote: I'm going to try and get clarification, but I think the intent is to initialize with the value seen on the device. Consider: My thinking is that the intent of the firstprivate is to initialize with the value known on the device (and behave as-if copyin, if it's not there). Not the value most recently seen on the host -- the update clause could change that, and may well be being used as a debugging aide, so it seems bizarre that it can change program semantics in such a way. We believe my example is well formed. The data clauses transfer liveness of the data from host to device (and vice versa). It is ill formed to manipulate the data on the non-live system. firstprivate's intial value is taken from the (statically determined) live location. Unless I'm misunderstanding something about GOMP_MAP_FIRSTPRIVATE, using regular target mapping is the right thing. Here's an updated patch with the other two issues you noted fixed. nathan 2015-11-10 Nathan Sidwell Cesar Philippidis gcc/ * gcc/gimplify.c (enum omp_region_type): Add ORT_ACC, ORT_ACC_DATA, ORT_ACC_PARALLEL, ORT_ACC_KERNELS. Adjust ORT_NONE. (gimple_add_tmp_var): Add ORT_ACC checks. (gimplify_var_or_parm_decl): Likewise. (omp_firstprivatize_variable): Likewise. Use ORT_TARGET_DATA as a mask. (omp_add_variable): Look in outer contexts for openacc and allow reductions with other sharing. Add ORT_ACC and ORT_TARGET_DATA checks. (omp_notice_variable, omp_is_private, omp_check_private): Add ORT_ACC checks. (gimplify_scan_omp_clauses: Treat ORT_ACC as ORT_WORKSHARE. Permit private openacc reductions. (gimplify_oacc_cache): Specify ORT_ACC. (gimplify_omp_workshare): Adjust OpenACC region types. (gimplify_omp_target_update): Likewise. * gcc/omp-low.c (scan_sharing_clauses): Remove Openacc firstprivate sorry. (lower-rec_input_clauses): Don't handle openacc firstprivate references here. (lower_omp_target): Emit initializers for openacc firstprivate vars. gcc/testsuite/ * gfortran.dg/goacc/private-3.f95: Remove xfail. * gfortran.dg/goacc/combined_loop.f90: Remove xfail. libgomp/ * testsuite/libgomp.oacc-c-c++-common/loop-red-v-2.c: Remove xfail. * testsuite/libgomp.oacc-c-c++-common/loop-red-w-2.c: Remove xfail. * testsuite/libgomp.oacc-c-c++-common/firstprivate-1.c: New. Index: gcc/gimplify.c === --- gcc/gimplify.c (revision 230107) +++ gcc/gimplify.c (working copy) @@ -95,22 +95,34 @@ enum gimplify_omp_var_data enum omp_region_type { - ORT_WORKSHARE = 0, - ORT_SIMD = 1, - ORT_PARALLEL = 2, - ORT_COMBINED_PARALLEL = 3, - ORT_TASK = 4, - ORT_UNTIED_TASK = 5, - ORT_TEAMS = 8, - ORT_COMBINED_TEAMS = 9, + ORT_WORKSHARE = 0x00, + ORT_SIMD = 0x01, + + ORT_PARALLEL = 0x02, + ORT_COMBINED_PARALLEL = 0x03, + + ORT_TASK = 0x04, + ORT_UNTIED_TASK = 0x05, + + ORT_TEAMS = 0x08, + ORT_COMBINED_TEAMS = 0x09, + /* Data region. */ - ORT_TARGET_DATA = 16, + ORT_TARGET_DATA = 0x10, + /* Data region with offloading. */ - ORT_TARGET = 32, - ORT_COMBINED_TARGET = 33, + ORT_TARGET = 0x20, + ORT_COMBINED_TARGET = 0x21, + + /* OpenACC variants. */ + ORT_ACC = 0x40, /* A generic OpenACC region. */ + ORT_ACC_DATA = ORT_ACC | ORT_TARGET_DATA, /* Data construct. */ + ORT_ACC_PARALLEL = ORT_ACC | ORT_TARGET, /* Parallel construct */ + ORT_ACC_KERNELS = ORT_ACC | ORT_TARGET | 0x80, /* Kernels construct. */ + /* Dummy OpenMP region, used to disable expansion of DECL_VALUE_EXPRs in taskloop pre body. */ - ORT_NONE = 64 + ORT_NONE = 0x100 }; /* Gimplify hashtable helper. */ @@ -689,7 +701,8 @@ gimple_add_tmp_var (tree tmp) struct gimplify_omp_ctx *ctx = gimplify_omp_ctxp; while (ctx && (ctx->region_type == ORT_WORKSHARE - || ctx->region_type == ORT_SIMD)) + || ctx->region_type == ORT_SIMD + || ctx->region_type == ORT_ACC)) ctx = ctx->outer_context; if (ctx) omp_add_variable (ctx, tmp, GOVD_LOCAL | GOVD_SEEN); @@ -1804,7 +1817,8 @@ gimplify_var_or_parm_decl (tree *expr_p) struct gimplify_omp_ctx *ctx = gimplify_omp_ctxp; while (ctx && (ctx->region_type == ORT_WORKSHARE - || ctx->region_type == ORT_SIMD)) + || ctx->region_type == ORT_SIMD + || ctx->region_type == ORT_ACC)) ctx = ctx->outer_context; if (!ctx && !nonlocal_vlas->add (decl)) { @@ -5579,7 +5593,8 @@ omp_firstprivatize_variable (struct gimp } else if (ctx->region_type != ORT_WORKSHARE && ctx->region_type != ORT_SIMD - && ctx->region_type != ORT_TARGET_DATA) + && ctx->region_type != ORT_ACC + && !(ctx->region_type & ORT_TARGET_DATA)) omp_add_variable (ctx, decl, GOVD_FIRSTPRIVATE); ctx = ctx->outer_context; @@ -5667,11 +5682,13 @@ omp_add_variable (struct gimplify_omp_ct /* We shouldn't be re-adding the decl with the same data sharing class. */ gcc_
Re: [0/7] Type promotion pass and elimination of zext/sext
On Sun, Nov 8, 2015 at 10:43 AM, Kugan wrote: > > Thanks Richard for the comments. Please find the attached patches which > now passes bootstrap with x86_64-none-linux-gnu, aarch64-linux-gnu and > ppc64-linux-gnu. Regression testing is ongoing. Please find the comments > for your questions/suggestions below. > >> >> I notice >> >> diff --git a/gcc/tree-ssanames.c b/gcc/tree-ssanames.c >> index 82fd4a1..80fcf70 100644 >> --- a/gcc/tree-ssanames.c >> +++ b/gcc/tree-ssanames.c >> @@ -207,7 +207,8 @@ set_range_info (tree name, enum value_range_type >> range_type, >>unsigned int precision = TYPE_PRECISION (TREE_TYPE (name)); >> >>/* Allocate if not available. */ >> - if (ri == NULL) >> + if (ri == NULL >> + || (precision != ri->get_min ().get_precision ())) >> >> and I think you need to clear range info on promoted SSA vars in the >> promotion pass. > > Done. > >> >> The basic "structure" thing still remains. You walk over all uses and >> defs in all stmts >> in promote_all_stmts which ends up calling promote_ssa_if_not_promoted on all >> uses and defs which in turn promotes (the "def") and then fixes up all >> uses in all stmts. > > Done. Not exactly. I still see /* Promote all the stmts in the basic block. */ static void promote_all_stmts (basic_block bb) { gimple_stmt_iterator gsi; ssa_op_iter iter; tree def, use; use_operand_p op; for (gphi_iterator gpi = gsi_start_phis (bb); !gsi_end_p (gpi); gsi_next (&gpi)) { gphi *phi = gpi.phi (); def = PHI_RESULT (phi); promote_ssa (def, &gsi); FOR_EACH_PHI_ARG (op, phi, iter, SSA_OP_USE) { use = USE_FROM_PTR (op); if (TREE_CODE (use) == SSA_NAME && gimple_code (SSA_NAME_DEF_STMT (use)) == GIMPLE_NOP) promote_ssa (use, &gsi); fixup_uses (phi, &gsi, op, use); } you still call promote_ssa on both DEFs and USEs and promote_ssa looks at SSA_NAME_DEF_STMT of the passed arg. Please call promote_ssa just on DEFs and fixup_uses on USEs. Any reason you do not promote debug stmts during the DOM walk? So for each DEF you record in ssa_name_info struct ssa_name_info { tree ssa; tree type; tree promoted_type; }; (the fields need documenting). Add a tree promoted_def to it which you can replace any use of the DEF with. Currently as you call promote_ssa for DEFs and USEs you repeatedly overwrite the entry in ssa_name_info_map with a new copy. So you should assert it wasn't already there. switch (gimple_code (def_stmt)) { case GIMPLE_PHI: { the last { is indented too much it should be indented 2 spaces relative to the 'case' SSA_NAME_RANGE_INFO (def) = NULL; only needed in the case 'def' was promoted itself. Please use reset_flow_sensitive_info (def). >> >> Instead of this you should, in promote_all_stmts, walk over all uses doing >> what >> fixup_uses does and then walk over all defs, doing what promote_ssa does. >> >> +case GIMPLE_NOP: >> + { >> + if (SSA_NAME_VAR (def) == NULL) >> + { >> + /* Promote def by fixing its type for anonymous def. */ >> + TREE_TYPE (def) = promoted_type; >> + } >> + else >> + { >> + /* Create a promoted copy of parameters. */ >> + bb = single_succ (ENTRY_BLOCK_PTR_FOR_FN (cfun)); >> >> I think the uninitialized vars are somewhat tricky and it would be best >> to create a new uninit anonymous SSA name for them. You can >> have SSA_NAME_VAR != NULL and def _not_ being a parameter >> btw. > > Done. I also had to do some changes to in couple of other places to > reflect this. > They are: > --- a/gcc/tree-ssa-reassoc.c > +++ b/gcc/tree-ssa-reassoc.c > @@ -302,6 +302,7 @@ phi_rank (gimple *stmt) > { >tree arg = gimple_phi_arg_def (stmt, i); >if (TREE_CODE (arg) == SSA_NAME > + && SSA_NAME_VAR (arg) > && !SSA_NAME_IS_DEFAULT_DEF (arg)) > { > gimple *def_stmt = SSA_NAME_DEF_STMT (arg); > @@ -434,7 +435,8 @@ get_rank (tree e) >if (gimple_code (stmt) == GIMPLE_PHI) > return phi_rank (stmt); > > - if (!is_gimple_assign (stmt)) > + if (!is_gimple_assign (stmt) > + && !gimple_nop_p (stmt)) > return bb_rank[gimple_bb (stmt)->index]; > > and > > --- a/gcc/tree-ssa.c > +++ b/gcc/tree-ssa.c > @@ -752,7 +752,8 @@ verify_use (basic_block bb, basic_block def_bb, > use_operand_p use_p, >TREE_VISITED (ssa_name) = 1; > >if (gimple_nop_p (SSA_NAME_DEF_STMT (ssa_name)) > - && SSA_NAME_IS_DEFAULT_DEF (ssa_name)) > + && (SSA_NAME_IS_DEFAULT_DEF (ssa_name) > + || SSA_NAME_VAR (ssa_name) == NULL)) > ; /* Default definitions have empty statements. Nothing to do. */ >else if (!def_bb) > { > > Does this look OK? Hmm, no, this looks bogus. I think the best thing to do is not promoting default defs at all and instead promote at the uses. /* Cr
Re: [PATCH][ARM] Fix costing of vmul+vcvt combine pattern
On Tue, Oct 27, 2015 at 1:55 PM, Kyrill Tkachov wrote: > Hi all, > > This patch allows us to handle the *combine_vcvtf2i pattern in rtx costs by > properly identifying it > as a toint coversion. Before this I saw a pattern like: > (set (reg/i:SI 0 r0) > (fix:SI (fix:SF (mult:SF (reg:SF 16 s0 [ a ]) > (const_double:SF 3.2e+1 [0x0.8p+6]) > > being assigned a cost of 40 because the costs blindly recursed into the > operands. > With this patch for -mcpu=cortex-a57 I see it being assigned a cost of 4. > > Bootstrapped and tested on arm-none-linux-gnueabihf. > > Ok for trunk? > > Thanks, > Kyrill > > 2015-10-27 Kyrylo Tkachov > > * config/arm/arm.c (arm_new_rtx_costs, FIX case): Handle > combine_vcvtf2i pattern. OK. Ramana
Re: RFC: Incomplete Draft Patches to Correct Errors in Loop Unrolling Frequencies (bugzilla problem 68212)
Hi, On Tue, 10 Nov 2015, Richard Biener wrote: > >> +static bool > >> +same_edge_p (edge an_edge, edge another_edge) > >> +{ > >> + return ((an_edge->src == another_edge->src) > >> + && (an_edge->dest == another_edge->dest)); > >> +} > > > > > > Formatting aside (extra parentheses), I wonder why you can't just test > > edges for pointer equality? Does anything make duplicate edges? > > We don't allow duplicate edges apart from edges with different flags (an > EH edge may appear alongside a FALLTRHU one for example). Actually, no. The flags are merged, the edge structure itself will never be duplicated (so one such structure might represent several logical edges, e.g. a fall-thru and an abnormal edge). So, yes, pointer equality is enough. Ciao, Michael.
[gomp4] Random omp-low.c backporting
I've committed this to backport a bunch of random bits from trunk to gomp4, and thereby reduce divergence. nathan 2015-11-10 Nathan Sidwell * omp-low.c: Backport whitespace & formatting changes from trunk. (is_atomic_compatible_p): Delete. (lower_reduction_clauses): Don't call it. (expand_omp_for_static_nochunk): Revert no-longer needed thread numbering expansion. (set_oacc_fn_attrib): Make static, fixup comment. (expand_omp_for_static_chunk): Likewise. (expand_omp_target): Adjust args initial size. (lower_omp_target): Rename label to oacc_firstprivate_map. Tweak assertions. Index: omp-low.c === --- omp-low.c (revision 230080) +++ omp-low.c (working copy) @@ -3950,6 +3950,7 @@ maybe_lookup_ctx (gimple *stmt) return n ? (omp_context *) n->value : NULL; } + /* Find the mapping for DECL in CTX or the immediately enclosing context that has a mapping for DECL. @@ -5446,24 +5447,6 @@ lower_lastprivate_clauses (tree clauses, gimple_seq_add_stmt (stmt_list, gimple_build_label (label)); } -/* Returns true if this reduction operation is compatible with atomics. */ - -static bool -is_atomic_compatible_reduction (tree var, omp_context *ctx) -{ - if (!is_gimple_omp_oacc (ctx->stmt)) -return true; - - /* OpenACC requires an additional level of indirection due to the use - of shared memory. Because of that, it's better to surround the - reduction inside a atomic start/stop region. */ - if (is_reference (var)) -return false; - - return true; -} - - /* Lower the OpenACC reductions of CLAUSES for compute axis LEVEL (which might be a placeholder). INNER is true if this is an inner axis of a multi-axis loop. FORK and JOIN are (optional) fork and @@ -5718,18 +5701,15 @@ lower_reduction_clauses (tree clauses, g if (code == MINUS_EXPR) code = PLUS_EXPR; - if (count == 1 && is_atomic_compatible_reduction (var, ctx)) + if (count == 1) { tree addr = build_fold_addr_expr_loc (clause_loc, ref); addr = save_expr (addr); - ref = build1 (INDIRECT_REF, TREE_TYPE (TREE_TYPE (addr)), addr); - x = fold_build2_loc (clause_loc, code, TREE_TYPE (ref), ref, - new_var); + x = fold_build2_loc (clause_loc, code, TREE_TYPE (ref), ref, new_var); x = build2 (OMP_ATOMIC, void_type_node, addr, x); gimplify_and_add (x, stmt_seqp); - return; } else if (TREE_CODE (OMP_CLAUSE_DECL (c)) == MEM_REF) @@ -9135,7 +9115,6 @@ expand_omp_for_static_nochunk (struct om t = fold_binary (fd->loop.cond_code, boolean_type_node, fold_convert (type, fd->loop.n1), fold_convert (type, fd->loop.n2)); - if (fd->collapse == 1 && TYPE_UNSIGNED (type) && (t == NULL_TREE || !integer_onep (t))) @@ -9181,22 +9160,20 @@ expand_omp_for_static_nochunk (struct om { case GF_OMP_FOR_KIND_FOR: nthreads = builtin_decl_explicit (BUILT_IN_OMP_GET_NUM_THREADS); - nthreads = build_call_expr (nthreads, 0); threadid = builtin_decl_explicit (BUILT_IN_OMP_GET_THREAD_NUM); - threadid = build_call_expr (threadid, 0); break; case GF_OMP_FOR_KIND_DISTRIBUTE: nthreads = builtin_decl_explicit (BUILT_IN_OMP_GET_NUM_TEAMS); - nthreads = build_call_expr (nthreads, 0); threadid = builtin_decl_explicit (BUILT_IN_OMP_GET_TEAM_NUM); - threadid = build_call_expr (threadid, 0); break; default: gcc_unreachable (); } + nthreads = build_call_expr (nthreads, 0); nthreads = fold_convert (itype, nthreads); nthreads = force_gimple_operand_gsi (&gsi, nthreads, true, NULL_TREE, true, GSI_SAME_STMT); + threadid = build_call_expr (threadid, 0); threadid = fold_convert (itype, threadid); threadid = force_gimple_operand_gsi (&gsi, threadid, true, NULL_TREE, true, GSI_SAME_STMT); @@ -9588,7 +9565,6 @@ expand_omp_for_static_chunk (struct omp_ cont_bb = region->cont; gcc_assert (EDGE_COUNT (iter_part_bb->succs) == 2); fin_bb = BRANCH_EDGE (iter_part_bb)->dest; - gcc_assert (broken_loop || fin_bb == FALLTHRU_EDGE (cont_bb)->dest); seq_start_bb = split_edge (FALLTHRU_EDGE (iter_part_bb)); @@ -9623,7 +9599,6 @@ expand_omp_for_static_chunk (struct omp_ t = fold_binary (fd->loop.cond_code, boolean_type_node, fold_convert (type, fd->loop.n1), fold_convert (type, fd->loop.n2)); - if (fd->collapse == 1 && TYPE_UNSIGNED (type) && (t == NULL_TREE || !integer_onep (t))) @@ -9669,22 +9644,20 @@ expand_omp_for_static_chunk (struct omp_ { case GF_OMP_FOR_KIND_FOR: nthreads = builtin_decl_explicit (BUILT_IN_OMP_GET_NUM_THREADS); - nthreads = build_call_expr (nthreads, 0); threadid = builtin_decl_explicit (BUILT_IN_OMP_GET_THREAD_NUM); - threadid = build_call_expr (threadid, 0); break; case GF_OMP_FOR_KIND_DISTRIBUTE: nthreads = builtin_decl_explicit (BU
Re: [PATCH][ARM][cleanup] Remove uses of CONST_DOUBLE_HIGH/LOW
On Thu, Nov 5, 2015 at 9:32 AM, Kyrill Tkachov wrote: > Hi all, > > This cleanup patch removes handling of CONST_DOUBLE rtxes that carry large > integers. > These should never be passed down from the midend and the arm backend > doesn't create them. > The code has been there since 2007 but the arm backend was moved to > TARGET_SUPPORTS_WIDE_INT > in 2014, so this path should never be taken. > > Bootstrapped and tested on arm-none-linux-gnueabihf. > > Ok for trunk? This is OK - Ramana > > Thanks, > Kyrill >
Re: [1/2] OpenACC routine support
On 11/10/2015 12:16 AM, Jakub Jelinek wrote: > On Mon, Nov 09, 2015 at 09:28:47PM -0800, Cesar Philippidis wrote: >> Here's the patch that Nathan was referring to. I ended up introducing a >> boolean variable named first in the various functions which call >> finalize_oacc_routines. The problem the original approach was having was >> that the routine clauses is only applied to the first function >> declarator in a declaration list. By using 'first', which is set to true >> if the current declarator is the first in a sequence of declarators, I >> was able to defer setting parser->oacc_routine to NULL. > > The #pragma omp declare simd has identical restrictions, but doesn't need > to add any of the first parameters to the C++ parser. > So, what are you doing differently that you need it? Handling both > differently is a consistency issue, and unnecessary additional complexity to > the parser. I see that you added an omp_declare_simd->fndecl_seen field to cp_parser. My objective was to try and make the c++ routine parsing somewhat consistent with the c front end. I could probably add a similar oacc_routine field, but I wonder if it would be better to share omp_declare_simd. There was talk about the next version of openacc adding support for -fopenacc and -fopenmp together. So maybe there needs to be a separate oacc_routine field. Cesar
[PATCH] PR47266
I’ld like to close PR47266 as FIXED after the commit on trunk and 5.2.1 of the following test extracted from https://gcc.gnu.org/ml/fortran/2011-01/msg00094.html. The test succeeds on trunk and 5.2.1 (see https://gcc.gnu.org/ml/gcc-testresults/2015-11/msg00965.html in which it was included). OK for trunk and 5.2.1? Dominique Index: gcc/testsuite/ChangeLog === --- gcc/testsuite/ChangeLog (revision 229793) +++ gcc/testsuite/ChangeLog (working copy) @@ -1,3 +1,9 @@ +2015-11-10 Dominique d'Humieres + + PR fortran/47266 + * gfortran.dg/warn_unused_function_2.f90: Add a new test. + 2015-11-10 Ilya Enkovich * gcc.target/i386/mask-pack.c: New test. Index: gcc/testsuite/gfortran.dg/module_private_2.f90 === --- gcc/testsuite/gfortran.dg/module_private_2.f90 (nonexistent) +++ gcc/testsuite/gfortran.dg/module_private_2.f90 (working copy) @@ -0,0 +1,34 @@ +! { dg-do compile } +! { dg-options "-O2 -fdump-tree-optimized" } +! +! PR fortran/47266 +! +! Check whether the private procedure "priv" is optimized away +! +module m + implicit none + private :: priv + private :: export1, export2 + public :: pub +contains + integer function priv() +priv = 44 + end function priv + integer function export1() +export1 = 45 + end function export1 + function export2() bind(C) ! { dg-warning "is marked PRIVATE" } +use iso_c_binding, only: c_int +integer(c_int) :: export2 +export2 = 46 + end function export2 + subroutine pub(a,b) +integer :: a +procedure(export1), pointer :: b +a = priv() +b => export1 + end subroutine pub +end module m +! { dg-final { scan-tree-dump-times "priv" 0 "optimized" } } +! { dg-final { scan-tree-dump-times "export1 \\(\\)" 1 "optimized" } } +! { dg-final { scan-tree-dump-times "export2 \\(\\)" 1 "optimized" } }
Re: [PATCH 1/6] Use IFN_SQRT in tree-vect-patterns.c
On Tue, Nov 10, 2015 at 11:57 AM, Richard Sandiford wrote: > Richard Biener writes: >> On Mon, Nov 9, 2015 at 5:21 PM, Richard Sandiford >> wrote: >>> In practice all targets that can vectorise sqrt define the appropriate >>> sqrt2 optab. The only case where this isn't immediately obvious >>> is the libmass support in rs6000.c, but Mike Meissner said that it shouldn't >>> be exercised for sqrt. >>> >>> This patch therefore uses the internal function interface instead of >>> going via the target hook. >>> >>> >>> gcc/ >>> * tree-vect-patterns.c: Include internal-fn.h. >>> (vect_recog_pow_pattern): Use IFN_SQRT instead of BUILT_IN_SQRT*. >>> >>> diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c >>> index bab9a4f..a803e8c 100644 >>> --- a/gcc/tree-vect-patterns.c >>> +++ b/gcc/tree-vect-patterns.c >>> @@ -39,6 +39,7 @@ along with GCC; see the file COPYING3. If not see >>> #include "tree-vectorizer.h" >>> #include "dumpfile.h" >>> #include "builtins.h" >>> +#include "internal-fn.h" >>> #include "case-cfn-macros.h" >>> >>> /* Pattern recognition functions */ >>> @@ -1052,18 +1053,13 @@ vect_recog_pow_pattern (vec *stmts, tree >>> *type_in, >>>if (TREE_CODE (exp) == REAL_CST >>>&& real_equal (&TREE_REAL_CST (exp), &dconsthalf)) >>> { >>> - tree newfn = mathfn_built_in (TREE_TYPE (base), BUILT_IN_SQRT); >>>*type_in = get_vectype_for_scalar_type (TREE_TYPE (base)); >>> - if (*type_in) >>> + if (*type_in && direct_internal_fn_supported_p (IFN_SQRT, *type_in)) >>> { >>> - gcall *stmt = gimple_build_call (newfn, 1, base); >>> - if (vectorizable_function (stmt, *type_in, *type_in) >>> - != NULL_TREE) >>> - { >>> - var = vect_recog_temp_ssa_var (TREE_TYPE (base), stmt); >>> - gimple_call_set_lhs (stmt, var); >>> - return stmt; >>> - } >>> + gcall *stmt = gimple_build_call_internal (IFN_SQRT, 1, base); >>> + var = vect_recog_temp_ssa_var (TREE_TYPE (base), stmt); >>> + gimple_call_set_lhs (stmt, var); >>> + return stmt; >> >> Looks ok but I wonder if this is dead code with >> >> (for pows (POW) >> sqrts (SQRT) >> cbrts (CBRT) >> (simplify >> (pows @0 REAL_CST@1) >> (with { >> const REAL_VALUE_TYPE *value = TREE_REAL_CST_PTR (@1); >> REAL_VALUE_TYPE tmp; >>} >>(switch >> ... >> /* pow(x,0.5) -> sqrt(x). */ >> (if (flag_unsafe_math_optimizations >> && canonicalize_math_p () >> && real_equal (value, &dconsthalf)) >> (sqrts @0)) > > Yeah, I wondered that too, although I think it's more likely to be dead > because of sincos. In the end it just seemed like a rabiit hole too far > though. Indeed. sincos should also fix the pow(x, 2.) case handled. Richard. >> Anyway, patch is ok. > > Thanks, > Richard >
Re: [Patch GCC 5/Vect] Partial backport of r228751 (pr68238)
On Tue, Nov 10, 2015 at 10:11:30AM +, James Greenhalgh wrote: > > Hi, > > As requested in the PR, this patch is a partial backport of r228751. > > I can't claim any responsibility for it, but I did take it through the > paces on an aarch64-none-linux-gnu and x86_64-none-linux-gnu bootstrap/ > test run and found no issues. > > Applied as r230092 on gcc-5-branch (pre-approved in the PR) after checking > that it gives the right results for the code I derived the PR from. > > I'll start a test cycle for a 4.9 backport. Now done and committed as r230110, after a clean run on aarch64-none-linux-gnu and x86_64-none-linux-gnu. Thanks, James > 2015-11-10 James Greenhalgh > > Partial backport from trunk r228751. > PR tree-optimization/68238 > 2015-10-13 Richard Biener > > * tree-vect-loop.c (vect_estimate_min_profitable_iters): Use > LOOP_VINFO_COMP_ALIAS_DDRS to estimate alias versioning cost. > > diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c > index 88ef251..05515b5 100644 > --- a/gcc/tree-vect-loop.c > +++ b/gcc/tree-vect-loop.c > @@ -2825,7 +2825,7 @@ vect_estimate_min_profitable_iters (loop_vec_info > loop_vinfo, >if (LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo)) > { >/* FIXME: Make cost depend on complexity of individual check. */ > - unsigned len = LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo).length (); > + unsigned len = LOOP_VINFO_COMP_ALIAS_DDRS (loop_vinfo).length (); >(void) add_stmt_cost (target_cost_data, len, vector_stmt, NULL, 0, > vect_prologue); >dump_printf (MSG_NOTE,
Re: [PATCH] Simple optimization for MASK_STORE.
On Tue, Nov 10, 2015 at 1:48 PM, Ilya Enkovich wrote: > 2015-11-10 15:33 GMT+03:00 Richard Biener : >> On Fri, Nov 6, 2015 at 2:28 PM, Yuri Rumyantsev wrote: >>> Richard, >>> >>> I tried it but 256-bit precision integer type is not yet supported. >> >> What's the symptom? The compare cannot be expanded? Just add a pattern >> then. >> After all we have modes up to XImode. > > I suppose problem may be in: > > gcc/config/i386/i386-modes.def:#define MAX_BITSIZE_MODE_ANY_INT (128) > > which doesn't allow to create constants of bigger size. Changing it > to maximum vector size (512) would mean we increase wide_int structure > size significantly. New patterns are probably also needed. Yes, new patterns are needed but wide-int should be fine (we only need to create a literal zero AFACS). The "new pattern" would be equality/inequality against zero compares only. Richard. > Ilya > >> >> Richard. >> >>> Yuri. >>> >>>
Re: [hsa 7/12] Disabling the vectorizer for GPU kernels/functions
On Fri, Nov 06, 2015 at 09:38:21AM +0100, Richard Biener wrote: > On Thu, 5 Nov 2015, Martin Jambor wrote: > > > Hi, > > > > in the previous email I wrote we need to "change behavior" of a few > > optimization passes. One was the flattening of GPU functions and the > > other two are in the patch below. It all comes to that, at the > > moment, we need to switch off the vectorizer (only for the GPU > > functions, of course). > > > > We are actually quite close to being able to handle gimple vector > > input in HSA back-end but not all the way yet, and before allowing the > > vectorizer again, we will have to make sure it never produces vectors > > bigger than 128bits (in GPU functions). > > Hmm. I'd rather have this modify > DECL_FUNCTION_SPECIFIC_OPTIMIZATION of the hsa function to get this > effect. I think I mentioned this to the OACC guys as well for a > similar needs of them. I see, that is a good idea. I have reverted changes to tree-ssa-loop.c and tree-vectorizer.c and on top of that committed the following patch to the branch which makes modifications to HSA fndecls at a more convenient spot and disables vectorization in the following way: tree gdecl = gpu->decl; tree fn_opts = DECL_FUNCTION_SPECIFIC_OPTIMIZATION (gdecl); if (fn_opts == NULL_TREE) fn_opts = optimization_default_node; fn_opts = copy_node (fn_opts); TREE_OPTIMIZATION (fn_opts)->x_flag_tree_loop_vectorize = false; TREE_OPTIMIZATION (fn_opts)->x_flag_tree_slp_vectorize = false; DECL_FUNCTION_SPECIFIC_OPTIMIZATION (gdecl) = fn_opts; I hope that is what you meant. I have also verified that it works. Thanks, Martin 2015-11-10 Martin Jambor * hsa.h (hsa_summary_t): Add a comment to method link_functions. (hsa_summary_t::link_functions): Moved... * hsa.c (hsa_summary_t::link_functions): ...here. Added common fndecl modifications. Include stringpool.h. * ipa-hsa.c (process_hsa_functions): Do not add flatten attribute here. Fixed comments. diff --git a/gcc/hsa.c b/gcc/hsa.c index ab05a1d..e63be95 100644 --- a/gcc/hsa.c +++ b/gcc/hsa.c @@ -34,6 +34,7 @@ along with GCC; see the file COPYING3. If not see #include "alloc-pool.h" #include "cgraph.h" #include "print-tree.h" +#include "stringpool.h" #include "symbol-summary.h" #include "hsa.h" @@ -693,6 +694,40 @@ hsa_get_declaration_name (tree decl) return NULL; } +/* Couple GPU and HOST as gpu-specific and host-specific implementation of the + same function. KIND determines whether GPU is a host-invokable kernel or + gpu-callable function. */ + +inline void +hsa_summary_t::link_functions (cgraph_node *gpu, cgraph_node *host, + hsa_function_kind kind) +{ + hsa_function_summary *gpu_summary = get (gpu); + hsa_function_summary *host_summary = get (host); + + gpu_summary->m_kind = kind; + host_summary->m_kind = kind; + + gpu_summary->m_gpu_implementation_p = true; + host_summary->m_gpu_implementation_p = false; + + gpu_summary->m_binded_function = host; + host_summary->m_binded_function = gpu; + + tree gdecl = gpu->decl; + DECL_ATTRIBUTES (gdecl) += tree_cons (get_identifier ("flatten"), NULL_TREE, +DECL_ATTRIBUTES (gdecl)); + + tree fn_opts = DECL_FUNCTION_SPECIFIC_OPTIMIZATION (gdecl); + if (fn_opts == NULL_TREE) +fn_opts = optimization_default_node; + fn_opts = copy_node (fn_opts); + TREE_OPTIMIZATION (fn_opts)->x_flag_tree_loop_vectorize = false; + TREE_OPTIMIZATION (fn_opts)->x_flag_tree_slp_vectorize = false; + DECL_FUNCTION_SPECIFIC_OPTIMIZATION (gdecl) = fn_opts; +} + /* Add a HOST function to HSA summaries. */ void diff --git a/gcc/hsa.h b/gcc/hsa.h index 025de67..b6855ea 100644 --- a/gcc/hsa.h +++ b/gcc/hsa.h @@ -1161,27 +1161,14 @@ public: hsa_summary_t (symbol_table *table): function_summary (table) { } + /* Couple GPU and HOST as gpu-specific and host-specific implementation of + the same function. KIND determines whether GPU is a host-invokable kernel + or gpu-callable function. */ + void link_functions (cgraph_node *gpu, cgraph_node *host, hsa_function_kind kind); }; -inline void -hsa_summary_t::link_functions (cgraph_node *gpu, cgraph_node *host, - hsa_function_kind kind) -{ - hsa_function_summary *gpu_summary = get (gpu); - hsa_function_summary *host_summary = get (host); - - gpu_summary->m_kind = kind; - host_summary->m_kind = kind; - - gpu_summary->m_gpu_implementation_p = true; - host_summary->m_gpu_implementation_p = false; - - gpu_summary->m_binded_function = host; - host_summary->m_binded_function = gpu; -} - /* in hsa.c */ extern struct hsa_function_representation *hsa_cfun; extern hash_map *> *hsa_decl_kernel_dependencies; diff --git a/gcc/ipa-hsa.c b/gcc/ipa-hsa.c index b4cb58e..d77fa6b 100644 --- a/gcc/ipa-hsa.c +++ b/gcc/ipa-hsa.c @@ -90,16 +90,12 @@ process_hsa_functions (void) cgraph_no
Re: [RFC] Combine vectorized loops with its scalar remainder.
On Tue, Nov 10, 2015 at 2:02 PM, Ilya Enkovich wrote: > 2015-11-10 15:30 GMT+03:00 Richard Biener : >> On Tue, Nov 3, 2015 at 1:08 PM, Yuri Rumyantsev wrote: >>> Richard, >>> >>> It looks like misunderstanding - we assume that for GCCv6 the simple >>> scheme of remainder will be used through introducing new IV : >>> https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01435.html >>> >>> Is it true or we missed something? >> >> >>> > Do you have an idea how "masking" is better be organized to be usable >>> > for both 4b and 4c? >>> >>> Do 2a ... >> Okay. >> > > 2a was 'transform already vectorized loop as a separate > post-processing'. Isn't it what this prototype patch implements? > Current version only masks loop body which is in practice applicable > for AVX-512 only in the most cases. With AVX-512 it's easier to see > how profitable masking might be and it is a main target for the first > masking version. Extending it to prologues/epilogues and thus making > it more profitable for other targets is the next step and is out of > the scope of this patch. Ok, technically the prototype transforms the already vectorized loop. Of course I meant the vectorized loop be copied, masked and that result used as epilogue... I'll queue a more detailed look into the patch for this week. Did you perform any measurements with this patch like # of masked epilogues in SPEC 2006 FP (and any speedup?) Thanks, Richard. > Thanks, > Ilya > >> >> Richard. >>
Re: [PATCH] Simple optimization for MASK_STORE.
2015-11-10 17:46 GMT+03:00 Richard Biener : > On Tue, Nov 10, 2015 at 1:48 PM, Ilya Enkovich wrote: >> 2015-11-10 15:33 GMT+03:00 Richard Biener : >>> On Fri, Nov 6, 2015 at 2:28 PM, Yuri Rumyantsev wrote: Richard, I tried it but 256-bit precision integer type is not yet supported. >>> >>> What's the symptom? The compare cannot be expanded? Just add a pattern >>> then. >>> After all we have modes up to XImode. >> >> I suppose problem may be in: >> >> gcc/config/i386/i386-modes.def:#define MAX_BITSIZE_MODE_ANY_INT (128) >> >> which doesn't allow to create constants of bigger size. Changing it >> to maximum vector size (512) would mean we increase wide_int structure >> size significantly. New patterns are probably also needed. > > Yes, new patterns are needed but wide-int should be fine (we only need to > create > a literal zero AFACS). The "new pattern" would be equality/inequality > against zero > compares only. Currently 256bit integer creation fails because wide_int for max and min values cannot be created. It is fixed by increasing MAX_BITSIZE_MODE_ANY_INT, but it increases WIDE_INT_MAX_ELTS and thus increases wide_int structure. If we use 512 for MAX_BITSIZE_MODE_ANY_INT then wide_int structure would grow by 48 bytes (16 bytes if use 256 for MAX_BITSIZE_MODE_ANY_INT). Is it OK for such narrow usage? Ilya > > Richard. > >> Ilya >> >>> >>> Richard. >>> Yuri.
Re: [hsa 7/12] Disabling the vectorizer for GPU kernels/functions
On Tue, 10 Nov 2015, Martin Jambor wrote: > On Fri, Nov 06, 2015 at 09:38:21AM +0100, Richard Biener wrote: > > On Thu, 5 Nov 2015, Martin Jambor wrote: > > > > > Hi, > > > > > > in the previous email I wrote we need to "change behavior" of a few > > > optimization passes. One was the flattening of GPU functions and the > > > other two are in the patch below. It all comes to that, at the > > > moment, we need to switch off the vectorizer (only for the GPU > > > functions, of course). > > > > > > We are actually quite close to being able to handle gimple vector > > > input in HSA back-end but not all the way yet, and before allowing the > > > vectorizer again, we will have to make sure it never produces vectors > > > bigger than 128bits (in GPU functions). > > > > Hmm. I'd rather have this modify > > DECL_FUNCTION_SPECIFIC_OPTIMIZATION of the hsa function to get this > > effect. I think I mentioned this to the OACC guys as well for a > > similar needs of them. > > I see, that is a good idea. I have reverted changes to > tree-ssa-loop.c and tree-vectorizer.c and on top of that committed the > following patch to the branch which makes modifications to HSA fndecls > at a more convenient spot and disables vectorization in the following > way: > > tree gdecl = gpu->decl; > tree fn_opts = DECL_FUNCTION_SPECIFIC_OPTIMIZATION (gdecl); > if (fn_opts == NULL_TREE) > fn_opts = optimization_default_node; > fn_opts = copy_node (fn_opts); > TREE_OPTIMIZATION (fn_opts)->x_flag_tree_loop_vectorize = false; > TREE_OPTIMIZATION (fn_opts)->x_flag_tree_slp_vectorize = false; > DECL_FUNCTION_SPECIFIC_OPTIMIZATION (gdecl) = fn_opts; > > I hope that is what you meant. I have also verified that it works. Yes, that's what I meant. Thanks, Richard. > Thanks, > > Martin > > > 2015-11-10 Martin Jambor > > * hsa.h (hsa_summary_t): Add a comment to method link_functions. > (hsa_summary_t::link_functions): Moved... > * hsa.c (hsa_summary_t::link_functions): ...here. Added common fndecl > modifications. > Include stringpool.h. > * ipa-hsa.c (process_hsa_functions): Do not add flatten attribute > here. Fixed comments. > > diff --git a/gcc/hsa.c b/gcc/hsa.c > index ab05a1d..e63be95 100644 > --- a/gcc/hsa.c > +++ b/gcc/hsa.c > @@ -34,6 +34,7 @@ along with GCC; see the file COPYING3. If not see > #include "alloc-pool.h" > #include "cgraph.h" > #include "print-tree.h" > +#include "stringpool.h" > #include "symbol-summary.h" > #include "hsa.h" > > @@ -693,6 +694,40 @@ hsa_get_declaration_name (tree decl) >return NULL; > } > > +/* Couple GPU and HOST as gpu-specific and host-specific implementation of > the > + same function. KIND determines whether GPU is a host-invokable kernel or > + gpu-callable function. */ > + > +inline void > +hsa_summary_t::link_functions (cgraph_node *gpu, cgraph_node *host, > +hsa_function_kind kind) > +{ > + hsa_function_summary *gpu_summary = get (gpu); > + hsa_function_summary *host_summary = get (host); > + > + gpu_summary->m_kind = kind; > + host_summary->m_kind = kind; > + > + gpu_summary->m_gpu_implementation_p = true; > + host_summary->m_gpu_implementation_p = false; > + > + gpu_summary->m_binded_function = host; > + host_summary->m_binded_function = gpu; > + > + tree gdecl = gpu->decl; > + DECL_ATTRIBUTES (gdecl) > += tree_cons (get_identifier ("flatten"), NULL_TREE, > + DECL_ATTRIBUTES (gdecl)); > + > + tree fn_opts = DECL_FUNCTION_SPECIFIC_OPTIMIZATION (gdecl); > + if (fn_opts == NULL_TREE) > +fn_opts = optimization_default_node; > + fn_opts = copy_node (fn_opts); > + TREE_OPTIMIZATION (fn_opts)->x_flag_tree_loop_vectorize = false; > + TREE_OPTIMIZATION (fn_opts)->x_flag_tree_slp_vectorize = false; > + DECL_FUNCTION_SPECIFIC_OPTIMIZATION (gdecl) = fn_opts; > +} > + > /* Add a HOST function to HSA summaries. */ > > void > diff --git a/gcc/hsa.h b/gcc/hsa.h > index 025de67..b6855ea 100644 > --- a/gcc/hsa.h > +++ b/gcc/hsa.h > @@ -1161,27 +1161,14 @@ public: >hsa_summary_t (symbol_table *table): > function_summary (table) { } > > + /* Couple GPU and HOST as gpu-specific and host-specific implementation of > + the same function. KIND determines whether GPU is a host-invokable > kernel > + or gpu-callable function. */ > + >void link_functions (cgraph_node *gpu, cgraph_node *host, > hsa_function_kind kind); > }; > > -inline void > -hsa_summary_t::link_functions (cgraph_node *gpu, cgraph_node *host, > -hsa_function_kind kind) > -{ > - hsa_function_summary *gpu_summary = get (gpu); > - hsa_function_summary *host_summary = get (host); > - > - gpu_summary->m_kind = kind; > - host_summary->m_kind = kind; > - > - gpu_summary->m_gpu_implementation_p = true; > - host_summary->m_gpu_implementation_p = false; > - > - gpu_summary->m_binded_function = host; > - hos
[patch] libstdc++/68190 Fix return type of heterogeneous find for sets
This converts the return type of heterogeneous find members to the correct set iterator type. Tested powerpc64le-linux, committed to trunk. Will commit to the gcc-5-branch too. commit d84e13dd8a7d47016bdfc5a9f45d8658a9d16ed9 Author: Jonathan Wakely Date: Tue Nov 10 14:59:00 2015 + Fix return type of heterogeneous find for sets PR libstdc++/68190 * include/bits/stl_multiset.h (multiset::find): Fix return types. * include/bits/stl_set.h (set::find): Likewise. * testsuite/23_containers/map/operations/2.cc: Test find return types. * testsuite/23_containers/multimap/operations/2.cc: Likewise. * testsuite/23_containers/multiset/operations/2.cc: Likewise. * testsuite/23_containers/set/operations/2.cc: Likewise. diff --git a/libstdc++-v3/include/bits/stl_multiset.h b/libstdc++-v3/include/bits/stl_multiset.h index 5ccc6dd..e6e2337 100644 --- a/libstdc++-v3/include/bits/stl_multiset.h +++ b/libstdc++-v3/include/bits/stl_multiset.h @@ -680,13 +680,15 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER #if __cplusplus > 201103L template auto - find(const _Kt& __x) -> decltype(_M_t._M_find_tr(__x)) - { return _M_t._M_find_tr(__x); } + find(const _Kt& __x) + -> decltype(iterator{_M_t._M_find_tr(__x)}) + { return iterator{_M_t._M_find_tr(__x)}; } template auto - find(const _Kt& __x) const -> decltype(_M_t._M_find_tr(__x)) - { return _M_t._M_find_tr(__x); } + find(const _Kt& __x) const + -> decltype(const_iterator{_M_t._M_find_tr(__x)}) + { return const_iterator{_M_t._M_find_tr(__x)}; } #endif //@} diff --git a/libstdc++-v3/include/bits/stl_set.h b/libstdc++-v3/include/bits/stl_set.h index cf74368..8bea61a 100644 --- a/libstdc++-v3/include/bits/stl_set.h +++ b/libstdc++-v3/include/bits/stl_set.h @@ -699,13 +699,15 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER #if __cplusplus > 201103L template auto - find(const _Kt& __x) -> decltype(_M_t._M_find_tr(__x)) - { return _M_t._M_find_tr(__x); } + find(const _Kt& __x) + -> decltype(iterator{_M_t._M_find_tr(__x)}) + { return iterator{_M_t._M_find_tr(__x)}; } template auto - find(const _Kt& __x) const -> decltype(_M_t._M_find_tr(__x)) - { return _M_t._M_find_tr(__x); } + find(const _Kt& __x) const + -> decltype(const_iterator{_M_t._M_find_tr(__x)}) + { return const_iterator{_M_t._M_find_tr(__x)}; } #endif //@} diff --git a/libstdc++-v3/testsuite/23_containers/map/operations/2.cc b/libstdc++-v3/testsuite/23_containers/map/operations/2.cc index 6cc277a..ef301ef 100644 --- a/libstdc++-v3/testsuite/23_containers/map/operations/2.cc +++ b/libstdc++-v3/testsuite/23_containers/map/operations/2.cc @@ -54,6 +54,11 @@ test01() VERIFY( cit == cx.end() ); VERIFY( Cmp::count == 0); + + static_assert(std::is_same::value, + "find returns iterator"); + static_assert(std::is_same::value, + "const find returns const_iterator"); } void diff --git a/libstdc++-v3/testsuite/23_containers/multimap/operations/2.cc b/libstdc++-v3/testsuite/23_containers/multimap/operations/2.cc index 67c3bfd..eef6ee4 100644 --- a/libstdc++-v3/testsuite/23_containers/multimap/operations/2.cc +++ b/libstdc++-v3/testsuite/23_containers/multimap/operations/2.cc @@ -54,6 +54,11 @@ test01() VERIFY( cit == cx.end() ); VERIFY( Cmp::count == 0); + + static_assert(std::is_same::value, + "find returns iterator"); + static_assert(std::is_same::value, + "const find returns const_iterator"); } void diff --git a/libstdc++-v3/testsuite/23_containers/multiset/operations/2.cc b/libstdc++-v3/testsuite/23_containers/multiset/operations/2.cc index ff2748f..4bea719 100644 --- a/libstdc++-v3/testsuite/23_containers/multiset/operations/2.cc +++ b/libstdc++-v3/testsuite/23_containers/multiset/operations/2.cc @@ -54,6 +54,11 @@ test01() VERIFY( cit == cx.end() ); VERIFY( Cmp::count == 0); + + static_assert(std::is_same::value, + "find returns iterator"); + static_assert(std::is_same::value, + "const find returns const_iterator"); } void diff --git a/libstdc++-v3/testsuite/23_containers/set/operations/2.cc b/libstdc++-v3/testsuite/23_containers/set/operations/2.cc index 84ddd1f..6a68453 100644 --- a/libstdc++-v3/testsuite/23_containers/set/operations/2.cc +++ b/libstdc++-v3/testsuite/23_containers/set/operations/2.cc @@ -54,6 +54,11 @@ test01() VERIFY( cit == cx.end() ); VERIFY( Cmp::count == 0); + + static_assert(std::is_same::value, + "find returns iterator"); + static_assert(std::is_same::value, + "const find returns const_iterator"); } void
Re: [PATCH] Make BB vectorizer work on sub-BBs
On 10 November 2015 at 14:02, Richard Biener wrote: > On Tue, 10 Nov 2015, Christophe Lyon wrote: > >> On 6 November 2015 at 12:11, Kyrill Tkachov wrote: >> > Hi Richard, >> > >> > >> > On 06/11/15 11:09, Richard Biener wrote: >> >> >> >> On Fri, 6 Nov 2015, Richard Biener wrote: >> >> >> >>> The following patch makes the BB vectorizer not only handle BB heads >> >>> (until the first stmt with a data reference it cannot handle) but >> >>> arbitrary regions in a BB separated by such stmts. >> >>> >> >>> This improves the number of BB vectorizations from 469 to 556 >> >>> in a quick test on SPEC CPU 2006 with -Ofast on x86_64 and >> >>> 1x400.perlbench 1x410.bwaves 1x416.gamess 1x450.soplex 1x453.povray >> >>> 1x481.wrf failing both patched and unpatched (have to update my >> >>> config used for such experiments it seems ...) >> >>> >> >>> Bootstrapped and tested on x86_64-unknown-linux-gnu, aarch64 cross built. >> >>> >> >>> I'm currently re-testing for a cosmetic change I made when writing >> >>> the changelog. >> >>> >> >>> I expected (and there are) some issues with compile-time. Left >> >>> is unpatched and right is patched. >> >>> >> >>> '403.gcc': 00:00:54 (54) | '403.gcc': 00:00:55 (55) >> >>> '483.xalancbmk': 00:02:20 (140) | '483.xalancbmk': 00:02:24 (144) >> >>> '416.gamess': 00:02:36 (156) | '416.gamess': 00:02:37 (157) >> >>> '435.gromacs': 00:00:18 (18) | '435.gromacs': 00:00:19 (19) >> >>> '447.dealII': 00:01:31 (91) | '447.dealII': 00:01:33 (93) >> >>> '453.povray': 00:04:54 (294) | '453.povray': 00:08:54 (534) >> >>> '454.calculix': 00:00:34 (34) | '454.calculix': 00:00:52 (52) >> >>> '481.wrf': 00:01:57 (117) | '481.wrf': 00:01:59 (119) >> >>> >> >>> other benchmarks are unchanged. I'm double-checking now that a followup >> >>> patch I have which re-implements BB vectorization dependence checking >> >>> fixes this (that's the only quadraticness I know of). >> >> >> >> Fixes all but >> >> >> >> '453.povray': 00:04:54 (294) | '453.povray': 00:06:46 (406) >> > >> > >> > Note that povray is currently suffering from PR 68198 >> > >> >> Hi, >> >> I've also noticed that the new test bb-slp-38 fails on armeb: >> FAIL: gcc.dg/vect/bb-slp-38.c -flto -ffat-lto-objects >> scan-tree-dump-times slp2 "basic block part vectorized" 2 >> FAIL: gcc.dg/vect/bb-slp-38.c scan-tree-dump-times slp2 "basic block >> part vectorized" 2 >> >> I haven't checked in more detail, maybe it's similar to what we >> discussed in PR65962 > > Maybe though there is no misalignment involved as far as I can see. > > Please open a bug and attach vectorizer dumps. > OK, this is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68275 > Richard. > >> > Kyrill >> > >> > >> >> >> >> it even improves compile-time on some: >> >> >> >> '464.h264ref': 00:00:26 (26) | '464.h264ref': 00:00:21 (21) >> >> >> >> it also increases the number of vectorized BBs to 722. >> >> >> >> Needs some work still though. >> >> >> >> Richard. >> >> >> >>> Richard. >> >>> >> >>> 2015-11-06 Richard Biener >> >>> >> >>> * tree-vectorizer.h (struct _bb_vec_info): Add region_begin/end >> >>> members. >> >>> (vect_stmt_in_region_p): Declare. >> >>> * tree-vect-slp.c (new_bb_vec_info): Work on a region. >> >>> (destroy_bb_vec_info): Likewise. >> >>> (vect_bb_slp_scalar_cost): Use vect_stmt_in_region_p. >> >>> (vect_get_and_check_slp_defs): Likewise. >> >>> (vect_slp_analyze_bb_1): Refactor to make it work on sub-BBs. >> >>> (vect_slp_bb): Likewise. >> >>> * tree-vect-patterns.c (vect_same_loop_or_bb_p): Implement >> >>> in terms of vect_stmt_in_region_p. >> >>> (vect_pattern_recog): Iterate over the BB region. >> >>> * tree-vect-stmts.c (vect_is_simple_use): Use >> >>> vect_stmt_in_region_p. >> >>> * tree-vectorizer.c (vect_stmt_in_region_p): New function. >> >>> (pass_slp_vectorize::execute): Initialize all stmt UIDs to -1. >> >>> >> >>> * config/i386/i386.c: Include gimple-iterator.h. >> >>> * config/aarch64/aarch64.c: Likewise. >> >>> >> >>> * gcc.dg/vect/bb-slp-38.c: New testcase. >> >>> >> >>> Index: gcc/tree-vectorizer.h >> >>> === >> >>> *** gcc/tree-vectorizer.h.orig 2015-11-05 09:52:00.640227178 +0100 >> >>> --- gcc/tree-vectorizer.h 2015-11-05 13:20:58.385786476 +0100 >> >>> *** nested_in_vect_loop_p (struct loop *loop >> >>> *** 390,395 >> >>> --- 390,397 >> >>>typedef struct _bb_vec_info : public vec_info >> >>>{ >> >>> basic_block bb; >> >>> + gimple_stmt_iterator region_begin; >> >>> + gimple_stmt_iterator region_end; >> >>>} *bb_vec_info; >> >>> #define BB_VINFO_BB(B) (B)->bb >> >>> *** void vect_pattern_recog (vec_info *); >> >>> *** 1085,1089 >> >>>
Re: [PR64164] drop copyrename, integrate into expand
On 05/11/15 05:08, Alexandre Oliva wrote: [PR67753] fix copy of PARALLEL entry_parm to CONCAT target_reg for gcc/ChangeLog PR rtl-optimization/67753 PR rtl-optimization/64164 * function.c (assign_parm_setup_block): Avoid allocating a stack slot if we don't have an ABI-reserved one. Emit the copy to target_reg in the conversion seq if the copy from entry_parm is in it too. Don't use the conversion seq to copy a PARALLEL to a REG or a CONCAT. Since this change, we have on aarch64_be: FAIL: gcc.target/aarch64/aapcs64/func-ret-4.c execution, -O1 FAIL: gcc.target/aarch64/aapcs64/func-ret-4.c execution, -O2 FAIL: gcc.target/aarch64/aapcs64/func-ret-4.c execution, -O3 -g FAIL: gcc.target/aarch64/aapcs64/func-ret-4.c execution, -Os FAIL: gcc.target/aarch64/aapcs64/func-ret-4.c execution, -Og -g The difference in the assembler looks as follows (this is at -Og): func_return_val_10: - sub sp, sp, #16 - lsr x2, x1, 48 - lsr x1, x1, 32 + ubfxx2, x1, 16, 16 fmovx3, d0 // Start of user assembly // 23 "func-ret-4.c" 1 mov x0, x30 // 0 "" 2 // End of user assembly adrpx3, saved_return_address str x0, [x3, #:lo12:saved_return_address] adrpx0, myfunc add x0, x0, :lo12:myfunc // Start of user assembly // 23 "func-ret-4.c" 1 mov x30, x0 // 0 "" 2 // End of user assembly bfi w0, w2, 16, 16 bfi w0, w1, 0, 16 lsl x0, x0, 32 - add sp, sp, 16 (ubfx is a bitfield extract, the first immediate is the lsbit, the second the width. lsr = logical shift right.) And in the RTL dump, this (before the patch): (insn 4 3 5 2 (set (mem/c:DI (plus:DI (reg/f:DI 68 virtual-stack-vars) (const_int -8 [0xfff8])) [0 t+0 S8 A64]) (reg:DI 1 x1)) func-ret-4.c:23 -1 (nil)) (insn 5 4 6 2 (set (reg:HI 78 [ t ]) (mem/c:HI (plus:DI (reg/f:DI 68 virtual-stack-vars) (const_int -8 [0xfff8])) [0 t+0 S2 A64])) func-ret-4.c:23 -1 (nil)) (insn 6 5 7 2 (set (reg:HI 79 [ t+2 ]) (mem/c:HI (plus:DI (reg/f:DI 68 virtual-stack-vars) (const_int -6 [0xfffa])) [0 t+2 S2 A16])) func-ret-4.c:23 -1 (nil)) becomes (after the patch): (insn 4 3 5 2 (set (subreg:SI (reg:CHI 80) 0) (reg:SI 1 x1 [ t ])) func-ret-4.c:23 -1 (nil)) (insn 5 4 6 2 (set (reg:SI 81) (subreg:SI (reg:CHI 80) 0)) func-ret-4.c:23 -1 (nil)) (insn 6 5 7 2 (set (subreg:DI (reg:HI 82) 0) (zero_extract:DI (subreg:DI (reg:SI 81) 0) (const_int 16 [0x10]) (const_int 16 [0x10]))) func-ret-4.c:23 -1 (nil)) (insn 7 6 8 2 (set (reg:HI 78 [ t ]) (reg:HI 82)) func-ret-4.c:23 -1 (nil)) (insn 8 7 9 2 (set (reg:SI 83) (subreg:SI (reg:CHI 80) 0)) func-ret-4.c:23 -1 (nil)) (insn 9 8 10 2 (set (reg:HI 79 [ t+2 ]) (subreg:HI (reg:SI 83) 2)) func-ret-4.c:23 -1 (nil)) --Alan
Re: [AArch64] Move iterators from atomics.md to iterators.md
On Mon, Nov 02, 2015 at 11:44:02AM +, Matthew Wahab wrote: > Hello > > One of the review comments for the v8.1 atomics patches was that the > iterators and unspec declarations should be moved out of the atomics.md > file (https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01375.html). > > The iterators in atomics.md are tied to the unspecv definition in the > same file. This patch moves both into iterators.md. > > Tested aarch64-none-elf with cross-compiled check-gcc and > aarch64-none-linux-gnu with native bootstrap and make check. > > Ok for trunk? OK. Thanks, James > Matthew > > gcc/ > 2015-11-02 Matthew Wahab > > * config/aarch64/atomics.md (unspecv): Move to iterators.md. > (ATOMIC_LDOP): Likewise. > (atomic_ldop): Likewise. > * config/aarch64/iterators.md (unspecv): Moved from atomics.md. > (ATOMIC_LDOP): Likewise. > (atomic_ldop): Likewise.
[PATCH 02/02] C FE: add fix-it hint for . vs ->
This is the most trivial example of a real fix-it example I could think of: if the user writes ptr.field rather than ptr->field. gcc/c/ChangeLog: * c-typeck.c (build_component_ref): Special-case POINTER_TYPE when generating a "not a structure of union" error message, and suggest a "->" rather than a ".", providing a fix-it hint. gcc/testsuite/ChangeLog: * gcc.dg/fixits.c: New file. --- gcc/c/c-typeck.c | 15 +++ gcc/testsuite/gcc.dg/fixits.c | 14 ++ 2 files changed, 29 insertions(+) create mode 100644 gcc/testsuite/gcc.dg/fixits.c diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c index c2e16c6..6fe1ca8 100644 --- a/gcc/c/c-typeck.c +++ b/gcc/c/c-typeck.c @@ -2336,6 +2336,21 @@ build_component_ref (location_t loc, tree datum, tree component) return ref; } + else if (code == POINTER_TYPE && !c_dialect_objc ()) +{ + /* Special-case the error message for "ptr.field" for the case +where the user has confused "." vs "->". +We don't do it for Objective-C, since Objective-C 2.0 dot-syntax +allows "." for ptrs; we could be handling a failed attempt +to access a property. */ + rich_location richloc (line_table, loc); + /* "loc" should be the "." token. */ + richloc.add_fixit_replace (source_range::from_location (loc), "->"); + error_at_rich_loc (&richloc, +"%qE is a pointer; did you mean to use %<->%>?", +datum); + return error_mark_node; +} else if (code != ERROR_MARK) error_at (loc, "request for member %qE in something not a structure or union", diff --git a/gcc/testsuite/gcc.dg/fixits.c b/gcc/testsuite/gcc.dg/fixits.c new file mode 100644 index 000..3b8c8a8 --- /dev/null +++ b/gcc/testsuite/gcc.dg/fixits.c @@ -0,0 +1,14 @@ +/* { dg-do compile } */ +/* { dg-options "-fdiagnostics-show-caret" } */ + +struct foo { int x; }; + +int test (struct foo *ptr) +{ + return ptr.x; /* { dg-error "'ptr' is a pointer; did you mean to use '->'?" } */ +/* { dg-begin-multiline-output "" } + return ptr.x; + ^ + -> + { dg-end-multiline-output "" } */ +} -- 1.8.5.3
[PATCH 01/02] PR/62314: add ability to add fixit-hints
This patch adds the ability to add "fix-it hints" to a rich_location, which will be displayed when the corresponding diagnostic is printed. It does not actually add any fix-it hints (that comes in the second patch), but it adds test coverage of the machinery and printing, by using the existing diagnostic_plugin_test_show_locus to inject some meaningless fixit hints, and to verify the output. For now, add a nasty linker kludge in layout::print_any_fixits for the sake of diagnostic_plugin_test_show_locus. Successfully bootstrapped®rtested the pair of patches on x86_64-pc-linux-gnu (on top of the 10-patch diagnostics kit). OK for trunk? gcc/ChangeLog: PR/62314 * diagnostic-show-locus.c (colorizer::set_fixit_hint): New. (class layout): Update comment (layout::print_any_fixits): New method. (layout::move_to_column): New method. (diagnostic_show_locus): Add call to layout.print_any_fixits. gcc/testsuite/ChangeLog: PR/62314 * gcc.dg/plugin/diagnostic-test-show-locus-ascii-bw.c (test_fixit_insert): New. (test_fixit_remove): New. (test_fixit_replace): New. * gcc.dg/plugin/diagnostic-test-show-locus-ascii-color.c (test_fixit_insert): New. (test_fixit_remove): New. (test_fixit_replace): New. * gcc.dg/plugin/diagnostic_plugin_test_show_locus.c (test_show_locus): Add tests of rendering fixit hints. libcpp/ChangeLog: PR/62314 * include/line-map.h (source_range::intersects_line_p): New method. (rich_location::~rich_location): New. (rich_location::add_fixit_insert): New method. (rich_location::add_fixit_remove): New method. (rich_location::add_fixit_replace): New method. (rich_location::get_num_fixit_hints): New accessor. (rich_location::get_fixit_hint): New accessor. (rich_location::MAX_FIXIT_HINTS): New constant. (rich_location::m_num_fixit_hints): New field. (rich_location::m_fixit_hints): New field. (class fixit_hint): New class. (class fixit_insert): New class. (class fixit_remove): New class. (class fixit_replace): New class. * line-map.c (source_range::intersects_line_p): New method. (rich_location::rich_location): Add initialization of m_num_fixit_hints to both ctors. (rich_location::~rich_location): New. (rich_location::add_fixit_insert): New method. (rich_location::add_fixit_remove): New method. (rich_location::add_fixit_replace): New method. (fixit_insert::fixit_insert): New. (fixit_insert::~fixit_insert): New. (fixit_insert::affects_line_p): New. (fixit_remove::fixit_remove): New. (fixit_remove::affects_line_p): New. (fixit_replace::fixit_replace): New. (fixit_replace::~fixit_replace): New. (fixit_replace::affects_line_p): New. --- gcc/diagnostic-show-locus.c| 125 ++- .../gcc.dg/plugin/diagnostic-test-show-locus-bw.c | 43 +++ .../plugin/diagnostic-test-show-locus-color.c | 43 +++ .../plugin/diagnostic_plugin_test_show_locus.c | 35 ++ libcpp/include/line-map.h | 96 +++ libcpp/line-map.c | 136 - 6 files changed, 471 insertions(+), 7 deletions(-) diff --git a/gcc/diagnostic-show-locus.c b/gcc/diagnostic-show-locus.c index 22203cd..f3d4a0e 100644 --- a/gcc/diagnostic-show-locus.c +++ b/gcc/diagnostic-show-locus.c @@ -78,6 +78,7 @@ class colorizer void set_range (int range_idx) { set_state (range_idx); } void set_normal_text () { set_state (STATE_NORMAL_TEXT); } + void set_fixit_hint () { set_state (0); } private: void set_state (int state); @@ -139,8 +140,8 @@ struct line_bounds /* A class to control the overall layout when printing a diagnostic. The layout is determined within the constructor. - It is then printed by repeatedly calling the "print_source_line" - and "print_annotation_line" methods. + It is then printed by repeatedly calling the "print_source_line", + "print_annotation_line" and "print_any_fixits" methods. We assume we have disjoint ranges. */ @@ -155,6 +156,7 @@ class layout bool print_source_line (int row, line_bounds *lbounds_out); void print_annotation_line (int row, const line_bounds lbounds); + void print_any_fixits (int row, const rich_location *richloc); private: bool @@ -168,6 +170,9 @@ class layout get_x_bound_for_row (int row, int caret_column, int last_non_ws); + void + move_to_column (int *column, int dest_column); + private: diagnostic_context *m_context; pretty_printer *m_pp; @@ -593,6 +598,92 @@ layout::print_annotation_line (int row, const line_bounds lbounds) pp_newline (m_pp); } +/* If there are any fixit hints on source line ROW within
Re: [RFC][PATCH] Preferred rename register in regrename pass
On 10 November 2015 at 12:41, Robert Suchanek wrote: > Hi Christophe, > >> Hi, >> >> Since you committed this (r230087 if I'm correct), I can see that GCC >> fails to build >> ligfortran for target arm-none-linuxgnueabi --with-cpu=cortex-a9. > ... >> >> Can you have a look? > > Sorry for the breakage. I see that my assertion is being triggered. > I'll investigate this and check whether the assertion is correct or > something else needs to be done. > Now that 'make check' has had enough time to run, I can see several regressions in the configurations where GCC still builds. For more details: http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/230087/report-build-info.html > Robert
[PATCH] libcpp: add examples to source_location description
This is a followup to: [PATCH 10/10] Compress short ranges into source_location which adds some worked examples of what a source_location/location_t can encode. Successfully bootstrapped®rtested on x86_64-pc-linux-gnu (although it only touches a comment). OK for trunk? libcpp/ChangeLog: * include/line-map.h (source_location): Add worked examples of location encoding to the leading commment. --- libcpp/include/line-map.h | 97 ++- 1 file changed, 95 insertions(+), 2 deletions(-) diff --git a/libcpp/include/line-map.h b/libcpp/include/line-map.h index 36247b2..e7169af 100644 --- a/libcpp/include/line-map.h +++ b/libcpp/include/line-map.h @@ -163,8 +163,101 @@ typedef unsigned int linenum_type; 0x | UINT_MAX | ---+---+--- - To see how this works in practice, see the worked example in - libcpp/location-example.txt. */ + Examples of location encoding. + + Packed ranges + = + + Consider encoding the location of a token "foo", seen underlined here + on line 523, within an ordinary line_map that starts at line 500: + + 112 +12345678901234567890 + 522 + 523 return foo + bar; + ^~~ + 524 + + The location's caret and start are both at line 523, column 11; the + location's finish is on the same line, at column 13 (an offset of 2 + columns, for length 3). + + Line 523 is offset 23 from the starting line of the ordinary line_map. + + caret == start, and the offset of the finish fits within 5 bits, so + this can be stored as a packed range. + + This is encoded as: + ordmap->start + + (line_offset << ordmap->m_column_and_range_bits) + + (column << ordmap->m_range_bits) + + (range_offset); + i.e. (for line offset 23, column 11, range offset 2): + ordmap->start + + (23 << 12) + + (11 << 5) + + 2; + i.e.: + ordmap->start + 0x17162 + assuming that the line_map uses the default of 7 bits for columns and + 5 bits for packed range (giving 12 bits for m_column_and_range_bits). + + + "Pure" locations + + + These are a special case of the above, where + caret == start == finish + They are stored as packed ranges with offset == 0. + For example, the location of the "f" of "foo" could be stored + as above, but with range offset 0, giving: + ordmap->start + + (23 << 12) + + (11 << 5) + + 0; + i.e.: + ordmap->start + 0x17160 + + + Unoptimized ranges + == + + Consider encoding the location of the binary expression + below: + + 112 +12345678901234567890 + 521 + 523 return foo + bar; + ^ + 523 + + The location's caret is at the "+", line 523 column 15, but starts + earlier, at the "f" of "foo" at column 11. The finish is at the "r" + of "bar" at column 19. + + This can't be stored as a packed range since start != caret. + Hence it is stored as an ad-hoc location e.g. 0x8003. + + Stripping off the top bit gives us an index into the ad-hoc + lookaside table: + + line_table->location_adhoc_data_map.data[0x3] + + from which the caret, start and finish can be looked up, + encoded as "pure" locations: + + start == ordmap->start + (23 << 12) + (11 << 5) +== ordmap->start + 0x17160 (as above; the "f" of "foo") + + caret == ordmap->start + (23 << 12) + (15 << 5) +== ordmap->start + 0x171e0 + + finish == ordmap->start + (23 << 12) + (19 << 5) +== ordmap->start + 0x17260 + + To further see how source_location works in practice, see the + worked example in libcpp/location-example.txt. */ typedef unsigned int source_location; /* A range of source locations. -- 1.8.5.3
Re: [PATCH 01/02] PR/62314: add ability to add fixit-hints
On 11/10/2015 05:35 PM, David Malcolm wrote: + /* Nasty workaround to convince the linker to add + rich_location::add_fixit_insert + rich_location::add_fixit_remove + rich_location::add_fixit_replace + to cc1 for use by diagnostic_plugin_test_show_locus, + before anything in cc1 is using them. + + This conditional should never hold, but hopefully the compiler can't + figure that out. */ Does attribute((used)) help with this problem? Bernd
Re: [gomp4] Random omp-low.c backporting
Hi Nathan! On Tue, 10 Nov 2015 09:19:50 -0500, Nathan Sidwell wrote: > I've committed this to backport a bunch of random bits from trunk to gomp4, > and > thereby reduce divergence. Yeah, I had some of these on my list, too. > --- omp-low.c (revision 230080) > +++ omp-low.c (working copy) > @@ -12515,7 +12485,7 @@ replace_oacc_fn_attrib (tree fn, tree di > function attribute. Push any that are non-constant onto the ARGS > list, along with an appropriate GOMP_LAUNCH_DIM tag. */ > > -void > +static void > set_oacc_fn_attrib (tree fn, tree clauses, vec *args) > { >/* Must match GOMP_DIM ordering. */ [...]/gcc/omp-low.c: In function 'void set_oacc_fn_attrib(tree, tree, vec*)': [...]/gcc/omp-low.c:12578:59: error: 'void set_oacc_fn_attrib(tree, tree, vec*)' was declared 'extern' and later 'static' [-fpermissive] set_oacc_fn_attrib (tree fn, tree clauses, vec *args) ^ In file included from [...]/gcc/omp-low.c:71:0: [...]/gcc/omp-low.h:36:13: error: previous declaration of 'void set_oacc_fn_attrib(tree, tree, vec*)' [-fpermissive] extern void set_oacc_fn_attrib (tree, tree, vec *); ^ Makefile:1083: recipe for target 'omp-low.o' failed make[2]: *** [omp-low.o] Error 1 If it's intended to be static in gcc/omp-low.c, you'll need to change gcc/tree-parloops.c:create_parallel_loop to not use it. > @@ -15530,7 +15499,7 @@ lower_omp_target (gimple_stmt_iterator * > case OMP_CLAUSE_MAP: > case OMP_CLAUSE_TO: > case OMP_CLAUSE_FROM: > - oacc_firstprivate_2: > + oacc_firstprivate_map: > nc = c; > ovar = OMP_CLAUSE_DECL (c); > if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP We got another label oacc_firstprivate above this one, which is why I had named this oacc_firstprivate_2 -- no idea if oacc_firstprivate_map is a "better" name. > @@ -15581,9 +15550,9 @@ lower_omp_target (gimple_stmt_iterator * > x = build_sender_ref (ovar, ctx); > > if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP > - && OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_POINTER > - && !OMP_CLAUSE_MAP_ZERO_BIAS_ARRAY_SECTION (c) > - && TREE_CODE (TREE_TYPE (ovar)) == ARRAY_TYPE) > + && OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_POINTER > + && !OMP_CLAUSE_MAP_ZERO_BIAS_ARRAY_SECTION (c) > + && TREE_CODE (TREE_TYPE (ovar)) == ARRAY_TYPE) > { > gcc_assert (offloaded); > tree avar Needs to be fixed on trunk, I think? > @@ -15727,8 +15696,7 @@ lower_omp_target (gimple_stmt_iterator * > > case OMP_CLAUSE_FIRSTPRIVATE: > if (is_oacc_parallel (ctx)) > - goto oacc_firstprivate_2; > - gcc_assert (!is_gimple_omp_oacc (ctx->stmt)); > + goto oacc_firstprivate_map; > ovar = OMP_CLAUSE_DECL (c); > if (is_reference (ovar)) > talign = TYPE_ALIGN_UNIT (TREE_TYPE (TREE_TYPE (ovar))); I had put in the "gcc_assert (!is_gimple_omp_oacc (ctx->stmt))" to make sure we don't ever reach this for OpenACC kernels, which will not "goto oacc_firstprivate_2" because that's only being done for "is_oacc_parallel" (but not for "is_oacc_kernels"). Grüße Thomas signature.asc Description: PGP signature
[C/C++ PATCH] Reject declarators with huge arrays (PR c/68107, c++/68266)
While both C and C++ FEs are able to reject e.g. int a[__SIZE_MAX__ / sizeof(int)]; they are accepting code such as int (*a)[__SIZE_MAX__ / sizeof(int)]; As Joseph pointed out, any construction of a non-VLA type whose size is half or more of the address space should receive a compile-time error. Done by moving up the check for the size in bytes so that it checks check every non-VLA complete array type constructed in the course of processing the declarator. Since the C++ FE had the same problem, I've fixed it up there as well. And that's why I had to twek dg-error of two C++ tests; if the size of an array is considered invalid, we give an error message with word "unnamed". (I've removed the comment about crashing in tree_to_[su]hwi since that seems to no longer be the case.) Bootstrapped/regtested on x86_64-linux, ok for trunk? 2015-11-10 Marek Polacek PR c/68107 PR c++/68266 * c-decl.c (grokdeclarator): Check whether the size of arrays is valid earlier. * decl.c (grokdeclarator): Check whether the size of arrays is valid earlier. * c-c++-common/pr68107.c: New test. * g++.dg/init/new38.C (large_array_char): Adjust dg-error. (large_array_char_template): Likewise. * g++.dg/init/new44.C: Adjust dg-error. diff --git gcc/c/c-decl.c gcc/c/c-decl.c index a3d8ead..2ec4865 100644 --- gcc/c/c-decl.c +++ gcc/c/c-decl.c @@ -6007,6 +6007,19 @@ grokdeclarator (const struct c_declarator *declarator, TYPE_SIZE_UNIT (type) = size_zero_node; SET_TYPE_STRUCTURAL_EQUALITY (type); } + + /* Did array size calculations overflow or does the array + cover more than half of the address-space? */ + if (COMPLETE_TYPE_P (type) + && TREE_CODE (TYPE_SIZE_UNIT (type)) == INTEGER_CST + && !valid_constant_size_p (TYPE_SIZE_UNIT (type))) + { + if (name) + error_at (loc, "size of array %qE is too large", name); + else + error_at (loc, "size of unnamed array is too large"); + type = error_mark_node; + } } if (decl_context != PARM @@ -6014,7 +6027,8 @@ grokdeclarator (const struct c_declarator *declarator, || array_ptr_attrs != NULL_TREE || array_parm_static)) { - error_at (loc, "static or type qualifiers in non-parameter array declarator"); + error_at (loc, "static or type qualifiers in non-parameter " + "array declarator"); array_ptr_quals = TYPE_UNQUALIFIED; array_ptr_attrs = NULL_TREE; array_parm_static = 0; @@ -6293,22 +6307,6 @@ grokdeclarator (const struct c_declarator *declarator, } } - /* Did array size calculations overflow or does the array cover more - than half of the address-space? */ - if (TREE_CODE (type) == ARRAY_TYPE - && COMPLETE_TYPE_P (type) - && TREE_CODE (TYPE_SIZE_UNIT (type)) == INTEGER_CST - && ! valid_constant_size_p (TYPE_SIZE_UNIT (type))) -{ - if (name) - error_at (loc, "size of array %qE is too large", name); - else - error_at (loc, "size of unnamed array is too large"); - /* If we proceed with the array type as it is, we'll eventually -crash in tree_to_[su]hwi(). */ - type = error_mark_node; -} - /* If this is declaring a typedef name, return a TYPE_DECL. */ if (storage_class == csc_typedef) diff --git gcc/cp/decl.c gcc/cp/decl.c index bd3f2bc..68ad82e 100644 --- gcc/cp/decl.c +++ gcc/cp/decl.c @@ -9945,6 +9945,18 @@ grokdeclarator (const cp_declarator *declarator, case cdk_array: type = create_array_type_for_decl (dname, type, declarator->u.array.bounds); + if (type != error_mark_node + && COMPLETE_TYPE_P (type) + && TREE_CODE (TYPE_SIZE_UNIT (type)) == INTEGER_CST + && !valid_constant_size_p (TYPE_SIZE_UNIT (type))) + { + if (dname) + error ("size of array %qE is too large", dname); + else + error ("size of unnamed array is too large"); + type = error_mark_node; + } + if (declarator->std_attributes) /* [dcl.array]/1: @@ -10508,19 +10520,6 @@ grokdeclarator (const cp_declarator *declarator, error ("non-parameter %qs cannot be a parameter pack", name); } - /* Did array size calculations overflow or does the array cover more - than half of the address-space? */ - if (TREE_CODE (type) == ARRAY_TYPE - && COMPLETE_TYPE_P (type) - && TREE_CODE (TYPE_SIZE_UNIT (type)) == INTEGER_CST - && ! valid_constant_size_p (TYPE_SIZE_UNIT (type))) -{ -
Re: [Patch AArch64] Switch constant pools to separate rodata sections.
On 04/11/15 14:26, Ramana Radhakrishnan wrote: True and I've just been reading more of the backend - We could now start using blocks for constant pools as well. So let's do that. How does something like this look ? Tested on aarch64-none-elf - no regressions. 2015-11-04 Ramana Radhakrishnan * config/aarch64/aarch64.c (aarch64_can_use_per_function_literal_pools_p): New. (aarch64_use_blocks_for_constant_p): Adjust declaration and use aarch64_can_use_function_literal_pools_p. (aarch64_select_rtx_section): Update. Since r229878, I've been seeing FAIL: gcc.dg/attr-weakref-1.c (test for excess errors) UNRESOLVED: gcc.dg/attr-weakref-1.c compilation failed to produce executable (both previously passing) on aarch64-none-elf, aarch64_be-none-elf, and aarch64-none-linux-gnu. Here's a log from aarch64_be-none-elf (the others look similar): /work/alalaw01/build-aarch64_be-none-elf/obj/gcc2/gcc/xgcc -B/work/alalaw01/build-aarch64_be-none-elf/obj/gcc2/gcc/ /work/alalaw01/src/gcc/gcc/testsuite/gcc.dg/attr-weakref-1.c -fno-diagnostics-show-caret -fdiagnostics-color=never -O2 /work/alalaw01/src/gcc/gcc/testsuite/gcc.dg/attr-weakref-1a.c -specs=aem-validation.specs -lm -o ./attr-weakref-1.exe /tmp/ccEfngi6.o:(.rodata.cst8+0x30): undefined reference to `wv12' /tmp/ccEfngi6.o:(.rodata.cst8+0x38): undefined reference to `wv12' /tmp/ccEfngi6.o:(.rodata.cst8+0x60): undefined reference to `wf12' /tmp/ccEfngi6.o:(.rodata.cst8+0x68): undefined reference to `wf12' collect2: error: ld returned 1 exit status compiler exited with status 1 output is: /tmp/ccEfngi6.o:(.rodata.cst8+0x30): undefined reference to `wv12' /tmp/ccEfngi6.o:(.rodata.cst8+0x38): undefined reference to `wv12' /tmp/ccEfngi6.o:(.rodata.cst8+0x60): undefined reference to `wf12' /tmp/ccEfngi6.o:(.rodata.cst8+0x68): undefined reference to `wf12' collect2: error: ld returned 1 exit status FAIL: gcc.dg/attr-weakref-1.c (test for excess errors)
Re: [Patch AArch64] Switch constant pools to separate rodata sections.
On Tue, Nov 10, 2015 at 4:39 PM, Alan Lawrence wrote: > On 04/11/15 14:26, Ramana Radhakrishnan wrote: >> >> >> True and I've just been reading more of the backend - We could now start >> using blocks for constant pools as well. So let's do that. >> >> How does something like this look ? >> >> Tested on aarch64-none-elf - no regressions. >> >> 2015-11-04 Ramana Radhakrishnan >> >> * config/aarch64/aarch64.c >> (aarch64_can_use_per_function_literal_pools_p): New. >> (aarch64_use_blocks_for_constant_p): Adjust declaration >> and use aarch64_can_use_function_literal_pools_p. >> (aarch64_select_rtx_section): Update. >> > > Since r229878, I've been seeing > > FAIL: gcc.dg/attr-weakref-1.c (test for excess errors) > UNRESOLVED: gcc.dg/attr-weakref-1.c compilation failed to produce executable > > (both previously passing) on aarch64-none-elf, aarch64_be-none-elf, and > aarch64-none-linux-gnu. Here's a log from aarch64_be-none-elf (the others > look similar): > > /work/alalaw01/build-aarch64_be-none-elf/obj/gcc2/gcc/xgcc > -B/work/alalaw01/build-aarch64_be-none-elf/obj/gcc2/gcc/ > /work/alalaw01/src/gcc/gcc/testsuite/gcc.dg/attr-weakref-1.c > -fno-diagnostics-show-caret -fdiagnostics-color=never -O2 > /work/alalaw01/src/gcc/gcc/testsuite/gcc.dg/attr-weakref-1a.c > -specs=aem-validation.specs -lm -o ./attr-weakref-1.exe > /tmp/ccEfngi6.o:(.rodata.cst8+0x30): undefined reference to `wv12' > /tmp/ccEfngi6.o:(.rodata.cst8+0x38): undefined reference to `wv12' > /tmp/ccEfngi6.o:(.rodata.cst8+0x60): undefined reference to `wf12' > /tmp/ccEfngi6.o:(.rodata.cst8+0x68): undefined reference to `wf12' > collect2: error: ld returned 1 exit status > compiler exited with status 1 > output is: > /tmp/ccEfngi6.o:(.rodata.cst8+0x30): undefined reference to `wv12' > /tmp/ccEfngi6.o:(.rodata.cst8+0x38): undefined reference to `wv12' > /tmp/ccEfngi6.o:(.rodata.cst8+0x60): undefined reference to `wf12' > /tmp/ccEfngi6.o:(.rodata.cst8+0x68): undefined reference to `wf12' > collect2: error: ld returned 1 exit status > > FAIL: gcc.dg/attr-weakref-1.c (test for excess errors) > Hmmm I'm surprised it failed in the first place as my testing didn't show it - I need to check on that. Nevertheless this fail has gone away in my testing with https://gcc.gnu.org/ml/gcc-cvs/2015-11/msg00453.html in a bootstrap and regression run on aarch64-none-linux-gnu. I see nothing triplet specific in the testcase here for it to fail differently. Is this something you see really with tip of trunk ? regards Ramana
Re: [PATCH 1/2] Fix invalid left shift of negative value.
On Nov 10, 2015, at 3:13 AM, Dominik Vogt wrote: > On Tue, Nov 10, 2015 at 12:11:23PM +0100, Dominik Vogt wrote: >> The following series of patches fixes all occurences of >> left-shifting negative constants in C code which is undefined by >> the C standard. The patches have been tested on s390x, covering >> only a small subset of the changes. > > Changes in gdb/. So, should these go to the gdb list?
Re: [Patch AArch64] Switch constant pools to separate rodata sections.
On 10/11/15 16:39, Alan Lawrence wrote: Since r229878, I've been seeing FAIL: gcc.dg/attr-weakref-1.c (test for excess errors) UNRESOLVED: gcc.dg/attr-weakref-1.c compilation failed to produce executable (both previously passing) on aarch64-none-elf, aarch64_be-none-elf, and aarch64-none-linux-gnu. Ah, these are fixed by Ramana's partial rollback (r230085). --Alan
Re: [PATCH] Simple optimization for MASK_STORE.
On Nov 10, 2015, at 6:56 AM, Ilya Enkovich wrote: > 2015-11-10 17:46 GMT+03:00 Richard Biener : >> On Tue, Nov 10, 2015 at 1:48 PM, Ilya Enkovich >> wrote: >>> 2015-11-10 15:33 GMT+03:00 Richard Biener : On Fri, Nov 6, 2015 at 2:28 PM, Yuri Rumyantsev wrote: > Richard, > > I tried it but 256-bit precision integer type is not yet supported. What's the symptom? The compare cannot be expanded? Just add a pattern then. After all we have modes up to XImode. >>> >>> I suppose problem may be in: >>> >>> gcc/config/i386/i386-modes.def:#define MAX_BITSIZE_MODE_ANY_INT (128) >>> >>> which doesn't allow to create constants of bigger size. Changing it >>> to maximum vector size (512) would mean we increase wide_int structure >>> size significantly. New patterns are probably also needed. >> >> Yes, new patterns are needed but wide-int should be fine (we only need to >> create >> a literal zero AFACS). The "new pattern" would be equality/inequality >> against zero >> compares only. > > Currently 256bit integer creation fails because wide_int for max and > min values cannot be created. > It is fixed by increasing MAX_BITSIZE_MODE_ANY_INT, but it increases > WIDE_INT_MAX_ELTS > and thus increases wide_int structure. If we use 512 for > MAX_BITSIZE_MODE_ANY_INT then > wide_int structure would grow by 48 bytes (16 bytes if use 256 for > MAX_BITSIZE_MODE_ANY_INT). Not answering for Richard, but the design of wide-int was that though the temporary space would grow, trees and rtl would not. Most wide-int values are short lived.
RE: [PATCH 1/2][ARC] Add support for ARCv2 CPUs
> If you can name a pre-existing testcase to trigger the assert, the patch is > approved for separate check-in. The patch solves the gcc.dg/pr29921-2.c error, visible for ARC700 architecture. I will prepare a new patch for this error. Thank you for the review, Claudiu
Re: [PATCH 01/02] PR/62314: add ability to add fixit-hints
On Tue, 2015-11-10 at 17:26 +0100, Bernd Schmidt wrote: > On 11/10/2015 05:35 PM, David Malcolm wrote: > > + /* Nasty workaround to convince the linker to add > > + rich_location::add_fixit_insert > > + rich_location::add_fixit_remove > > + rich_location::add_fixit_replace > > + to cc1 for use by diagnostic_plugin_test_show_locus, > > + before anything in cc1 is using them. > > + > > + This conditional should never hold, but hopefully the compiler can't > > + figure that out. */ > > Does attribute((used)) help with this problem? For some reason, I'm no longer seeing the problem; I tried simply taking out the kludge, and it now works (this is *without* the in-cc1 usage in patch 2); looking at cc1 shows that the above 3 symbols are indeed being added: $ eu-readelf -s ./cc1 |grep add_fixit 2510: 012a5280 94 FUNCGLOBAL DEFAULT 13 _ZN13rich_location16add_fixit_insertEjPKc 2905: 012a5300 76 FUNCGLOBAL DEFAULT 13 _ZN13rich_location16add_fixit_removeE12source_range 9262: 012a5390 94 FUNCGLOBAL DEFAULT 13 _ZN13rich_location17add_fixit_replaceE12source_rangePKc 37430: 012a5300 76 FUNCGLOBAL DEFAULT 13 _ZN13rich_location16add_fixit_removeE12source_range 46935: 012a5390 94 FUNCGLOBAL DEFAULT 13 _ZN13rich_location17add_fixit_replaceE12source_rangePKc 47508: 012a5280 94 FUNCGLOBAL DEFAULT 13 _ZN13rich_location16add_fixit_insertEjPKc I've tried poking at it, but I'm not sure what changed since I first added the kludge (an earlier version of this, sent as: https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00732.html ); sorry. Dave
Re: State of support for the ISO C++ Transactional Memory TS and remanining work
On 09/11/15 00:19, Torvald Riegel wrote: Hi, I'd like to summarize the current state of support for the TM TS, and outline the current plan for the work that remains to complete the support. I'm aware we're at the end of stage 1, but I'm confident we can still finish this work and hope to include it in GCC 6 because: (1) most of the support is already in GCC, and we have a big head start in the space of TM so it would be unfortunate to not waste that by not delivering support for the TM TS, (2) this is a TS and support for it is considered experimental, (3) most of the affected code is in libitm or the compiler's TM passes, which has to be enabled explicitly by the user. Currently, we have complete support for the syntax and all necessary instrumentation except the exception handling bits listed below. libitm has a good set of STM and HTM-based algorithms. What is missing on the compiler side is essentially a change of how we support atomic_noexcept and atomic_cancel, in particular exception handling. Instead of just using a finally block as done currently, the compiler need to build a catch clause so that it can actively intercept exceptions that escape an atomic_noexcept or atomic_cancel. For atomic_noexcept, the compiler needs to include a call to abort() in the catch clause. For atomic_cancel, it needs to call ITM_commitTransactionEH in the catch clause, and use NULL as exception argument. This can then be used by libitm to look at the currently being handled exception and (a) check whether the type support transaction cancellation as specified by the TS and (b) pick out the allocations that belong to this exception and roll back everything else before rethrowing this exception. For (a), it's probably best to place this check into libstdc++ (specifically, libsupc++ I suppose) because support for transaction cancellation is a property that library parts of the standard (or the TS) require, and that has to match the implementation in libstdc++. Attached is a patch by Jason that implements this check. This adds one symbol, which should be okay we hope. does that mean libitm will depend on libstdc++? i think the tm runtime should not depend on c++, so it is usable from c code. For (b), our plan is to track the additional allocations that happen when during construction of the exception types that support cancellation (eg, creating the what() string for logic_error). There are several ways to do that, one of that being that we create custom transactional clones of those constructors that tell libitm that either such a constructor is currently running or explicitly list the allocations that have been made by the constructor; eventually, we would always (copy) construct into memory returned by cxa_allocate_exception, which then makes available the necessary undo information when such an exception is handled in libitm. The other big piece of missing support is making sure that the functions that are specified in the TS as transaction_safe are indeed that. I believe we do not need to actually add such annotations to any libstdc++ functions that are already transaction-safe and completely defined in headers -- those functions are implicitly transaction-safe, and we can thus let the compiler isntrument them at the point of use inside of a transaction. If a supposedly transaction-safe function is not defined in a header, we'd need a transaction_safe annotation at the declaration. Jason has implemented the TM TS feature test macro, so we can only add the annotation if the user has enabled support for the TM TS in the respective compilation process. sounds ugly: the function type (transaction-safety) depends on the visibility of the definition.. We also need ensure that there is a transaction clode of the function. This will add symbols to libstdc++, but these all have their own special prefix in the mangled name. I'd like to get feedback on how to best trigger the insturmentation and make it a part of a libstdc++ build. (If that would show to be too problematic, we could still fall back to writing transacitonal clones manually.) For the clones of the constructors of the types that support cancellation, I suppose manually written clones might be easier than automatic instrumentation. I've not yet created tests for the full list of functions specified as transaction-safe in the TS, but my understanding is that this list was created after someone from the ISO C++ TM study group looked at libstdc ++'s implementation and investigated which functions might be feasible to be declared transaction-safe in it. is that list available somewhere? libitm seems to try to allow allocating functions in transactions, but syscalls are not supposed to be transaction safe. are allocating functions prohibited? I'm looking forward to your feedback. Thanks, Torvald i'm not familiar with libitm, but i see several implementation issues: xmalloc the runtime exits on memory allocation failure,
Re: [PATCH][ARM][cleanup] Remove uses of CONST_DOUBLE_HIGH/LOW
Hi Ramana, On 10/11/15 14:33, Ramana Radhakrishnan wrote: On Thu, Nov 5, 2015 at 9:32 AM, Kyrill Tkachov wrote: Hi all, This cleanup patch removes handling of CONST_DOUBLE rtxes that carry large integers. These should never be passed down from the midend and the arm backend doesn't create them. The code has been there since 2007 but the arm backend was moved to TARGET_SUPPORTS_WIDE_INT in 2014, so this path should never be taken. Bootstrapped and tested on arm-none-linux-gnueabihf. Ok for trunk? This is OK - Thanks for reviewing. Sorry I had forgotten the ChangeLog in the initial submission. It is 2015-11-10 Kyrylo Tkachov * config/arm/arm.c (neon_valid_immediate): Remove integer CONST_DOUBLE handling. It should never occur. I have committed the patch with that entry as r230115. Thanks, Kyrill Ramana Thanks, Kyrill
Re: [PATCH 1/6] Use IFN_SQRT in tree-vect-patterns.c
On Tue, 10 Nov 2015, Richard Biener wrote: > Looks ok but I wonder if this is dead code with > > (for pows (POW) > sqrts (SQRT) > cbrts (CBRT) > (simplify > (pows @0 REAL_CST@1) > (with { > const REAL_VALUE_TYPE *value = TREE_REAL_CST_PTR (@1); > REAL_VALUE_TYPE tmp; >} >(switch > ... > /* pow(x,0.5) -> sqrt(x). */ > (if (flag_unsafe_math_optimizations > && canonicalize_math_p () > && real_equal (value, &dconsthalf)) > (sqrts @0)) > > also wondering here about canonicalize_math_p (), I'd expected the > reverse transform as canonicalization. Also wondering about > flag_unsafe_math_optimizations (missing from the vectorizer pattern). pow(x,0.5) -> sqrt(x) is unsafe because: pow (-0, 0.5) is specified in Annex F to be +0 but sqrt (-0) is -0; pow (-Inf, 0.5) is specified in Annex F to be +Inf, but sqrt (-Inf) is NaN with "invalid" exception raised. I think it's safe in other cases (the reverse of course is not safe, sqrt is a fully-specified correctly-rounded IEEE operation and pow isn't). -- Joseph S. Myers jos...@codesourcery.com