Re: [patch] Add a lexical block only when the callsite has source location info
Hi, Richard, You are right, setting UNKNOWN_LOCATION will not affect addr2line result. Here is the updated patch: Passed bootstrap and gcc regression tests. Is it ok for trunk? Thanks, Dehao Index: tree-inline.c === --- tree-inline.c (revision 188926) +++ tree-inline.c (working copy) @@ -3836,8 +3836,7 @@ /* Set input_location here so we get the right instantiation context if we call instantiate_decl from inlinable_function_p. */ saved_location = input_location; - if (gimple_has_location (stmt)) -input_location = gimple_location (stmt); + input_location = gimple_location (stmt); /* From here on, we're only interested in CALL_EXPRs. */ if (gimple_code (stmt) != GIMPLE_CALL) On Mon, Jun 25, 2012 at 10:00 PM, Richard Guenther wrote: > On Mon, Jun 25, 2012 at 3:48 PM, Dehao Chen wrote: >> Hi, Richard, >> >> Thanks for the prompt response. >> >> On Mon, Jun 25, 2012 at 8:02 PM, Richard Guenther >> wrote: >>> On Mon, Jun 25, 2012 at 1:43 PM, Dehao Chen wrote: During function inlining, a lexical block is added for each cloned callee, and source info is attached to this block for addr2line to derive the inline stack. >>> >>> Well - the bug is then clearly >>> >>> /* Set input_location here so we get the right instantiation context >>> if we call instantiate_decl from inlinable_function_p. */ >>> saved_location = input_location; >>> if (gimple_has_location (stmt)) >>> input_location = gimple_location (stmt) >>> >>> which retails input_location instead of setting it to UNKNOWN_LOCATION. >>> >>> Not adding a BLOCK will make debug information incorrect, no? >> >> The only case I can think of that gimple_has_location is false for >> call stmt is for function split. >> >> If we have function foo, which is split into: >> >> foo >> foo.part1 >> >> And a callsite foo->foo.part1 is created in foo. >> >> If the ipa-inline decided to inline this callsite, for an instruction >> in foo.part1, it will have an inline stack of size 2. In the original >> buggy code, the bottom of the inline stack will be random. Using your >> proposed approach, the bottom of the inline stack would be >> UNKNOW_LOCATION, but still has two levels. For function split, this >> inline will not create any lexical block, but resumes the original >> lexical block before the split. Thus my change simply not add a new >> lexical block. Do you think this makes sense? > > I don't think it behaves sensibly for any other call without a location. > Basically you assume that this only happens for split functions but > I don't see why that should be true. Why would BLOCKs with > UNKOWN_LOCATION have any effect on addr2line anyways? > That seems to be something to check and fix. > > Richard. > > >> Thanks, >> Dehao >> >> >>> However, some callsites do not have source information attached to it. Adding a lexical block would be misleading in this case. E.g. If a function is split, when the split callsite is inlined back, the cloned callee should stay in the same lexical block with its caller. This patch ensures that lexical blocks are only added when the callsite has source location info in it. Bootstrapped and passed gcc regression tests. Is it ok for trunk? >>> >>> I'd rather see an unconditional set of input_location from gimple_location >>> of the statement. >>> >>> Richard. >>> Thanks, Dehao gcc/ChangeLog: 2012-06-25 Dehao Chen * tree-profile.c: (expand_call_inline): Make a new lexical block only >>> >>> ^ >>> tree-inline.c >>> when the call stmt has source location. Index: gcc/tree-inline.c === --- gcc/tree-inline.c (revision 188926) +++ gcc/tree-inline.c (working copy) @@ -3950,10 +3950,17 @@ actual inline expansion of the body, and a label for the return statements within the function to jump to. The type of the statement expression is the return type of the function call. */ - id->block = make_node (BLOCK); - BLOCK_ABSTRACT_ORIGIN (id->block) = fn; - BLOCK_SOURCE_LOCATION (id->block) = input_location; - prepend_lexical_block (gimple_block (stmt), id->block); + if (gimple_has_location (stmt)) + { + id->block = make_node (BLOCK); + BLOCK_ABSTRACT_ORIGIN (id->block) = fn; + BLOCK_SOURCE_LOCATION (id->block) = input_location; >>> >>> Please use gimple_location (stmt) instead of input_location (yes, I realize >>> its set from that). >>> + prepend_lexical_block (gimple_block (stmt), id->block); + } + else + { + id->block = gimple_block (stmt); + } >>> /* Local declarations will be replaced by their equivalents in this map. */
Re: RFC/RFA: Allow targets to override the definition of FLOAT_BIT_ORDER_MISMATCH
Ping ? http://gcc.gnu.org/ml/gcc-patches/2012-06/msg00806.html libgcc/ChangeLog 2012-06-13 Nick Clifton * fp-bit.h (FLOAT_BIT_ORDER_MISMATCH): If LIBGCC2_FLOAT_BIT_ORDER_MISMATCH is defined then use this to determine if FLOAT_BIT_ORDER_MISMATCH should be defined. * config/rx/rx-lib.h (LIBGCC2_FLOAT_BIT_ORDER_MISMATCH): Define to false.
Re: [PATCH] Don't try to perform conditional_replacement on vectors (PR tree-optimization/53748)
On Mon, 25 Jun 2012, Jakub Jelinek wrote: > Hi! > > On vectors, even when they satisfy > integer_zerop/integer_onep/integer_all_onesp, the routine doesn't > handle vector types and it is questionable if it would be a good > optimization for them anyway. We don't handle complex there either, > so this patch limits it to integral/pointer types. > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk? Ok. Thanks, Richard. > 2012-06-25 Jakub Jelinek > > PR tree-optimization/53748 > * tree-ssa-phiopt.c (conditional_replacement): Only optimize > if arg0/arg1 have integral or pointer types. > > * gcc.c-torture/compile/pr53748.c: New test. > > --- gcc/tree-ssa-phiopt.c.jj 2012-06-25 08:38:26.0 +0200 > +++ gcc/tree-ssa-phiopt.c 2012-06-25 12:10:23.473357828 +0200 > @@ -615,8 +615,12 @@ conditional_replacement (basic_block con >bool neg; > >/* FIXME: Gimplification of complex type is too hard for now. */ > - if (TREE_CODE (TREE_TYPE (arg0)) == COMPLEX_TYPE > - || TREE_CODE (TREE_TYPE (arg1)) == COMPLEX_TYPE) > + /* We aren't prepared to handle vectors either (and it is a question > + if it would be worthwhile anyway). */ > + if (!(INTEGRAL_TYPE_P (TREE_TYPE (arg0)) > + || POINTER_TYPE_P (TREE_TYPE (arg0))) > + || !(INTEGRAL_TYPE_P (TREE_TYPE (arg1)) > +|| POINTER_TYPE_P (TREE_TYPE (arg1 > return false; > >/* The PHI arguments have the constants 0 and 1, or 0 and -1, then > --- gcc/testsuite/gcc.c-torture/compile/pr53748.c.jj 2012-06-25 > 11:54:15.959820372 +0200 > +++ gcc/testsuite/gcc.c-torture/compile/pr53748.c 2012-06-25 > 11:43:35.0 +0200 > @@ -0,0 +1,9 @@ > +/* PR tree-optimization/53748 */ > + > +typedef unsigned int V __attribute__ ((__vector_size__ (sizeof (int) * 4))); > + > +void > +foo (int x, V *y) > +{ > + *y = x ? ((V) { ~0U, ~0U, ~0U, ~0U }) : ((V) { 0, 0, 0, 0 }); > +} > > Jakub > > -- Richard Guenther SUSE / SUSE Labs SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746 GF: Jeff Hawn, Jennifer Guild, Felix Imend
Re: New option to turn off stack reuse for temporaries
On Mon, Jun 25, 2012 at 6:25 PM, Xinliang David Li wrote: > Are there any more concerns about this patch? If not, I'd like to check it in. No - the fact that the flag is C++ specific but in common.opt is odd enough and -ftemp-reuse-stack sounds very very generic - which in fact it is not, it's a no-op in C. Is there a more formal phrase for the temporary kind that is affected? For me "temp" is synonymous to "auto" so I'd have expected the switch to turn off stack slot sharing for { int a[5]; } { int a[5]; } but that is not what it does. So - a little kludgy but probably more to what I'd like it to be would be to move the option to c-family/c.opt enabled only for C++ and Obj-C++ and export it to the middle-end via a new langhook (the gimplifier code should be in Frontend code that lowers to GENERIC really and the WITH_CLEANUP_EXPR code should be C++ frontend specific ...). Thanks, Richard. > thanks, > > David > > On Fri, Jun 22, 2012 at 8:51 AM, Xinliang David Li wrote: >> On Fri, Jun 22, 2012 at 2:39 AM, Richard Guenther >> wrote: >>> On Fri, Jun 22, 2012 at 11:29 AM, Jason Merrill wrote: On 06/22/2012 01:30 AM, Richard Guenther wrote: >> >> What other issues? It enables more potential code motion, but on the >> other hand, causes more conservative stack reuse. As far I can tell, >> the handling of temporaries is added independently after the clobber >> for scoped variables are introduced. This option can be used to >> restore the older behavior (in handling temps). > > > Well, it does not really restore the old behavior (if you mean before > adding > CLOBBERS, not before the single patch that might have used those for > gimplifying WITH_CLEANUP_EXPR). You say it disables stack-slot sharing > for those decls but it also does other things via side-effects of no > longer > emitting the CLOBBER. I say it's better to disable the stack-slot > sharing. The patch exactly restores the behavior of temporaries from before my change to add CLOBBERs for temporaries. The primary effect of that change was to provide stack-slot sharing, but if there are other effects they are probably desirable as well, since the broken code depended on the old behavior. >>> >>> So you see it as workaround option, like -fno-strict-aliasing, rather than >>> debugging aid? >> >> It can be used for both purposes -- if the violations are as pervasive >> as strict-aliasing cases (which looks like so). >> >> thanks, >> >> David >> >>> >>> Richard. >>> Jason
Re: Fix PR tree-optimization/53729 (Re: [PATCH] Fix PR tree-optimization/53636)
On Mon, Jun 25, 2012 at 5:44 PM, Ulrich Weigand wrote: > Richard Guenther wrote: > >> In this testcase the alignment of arr[i] should be irrelevant - it is >> not part of >> the stmts that are going to be vectorized. But of course this may be >> simply an odering issue in how we analyze data-references / statements >> in basic-block vectorization (thus we possibly did not yet declare the >> arr[i] = i statement as not taking part in the vectorization). > > The following patch changes > tree-vect-data-refs.c:vect_verify_datarefs_alignment > to only take into account statements marked as "relevant". > > This should have no impact for loop-based vectorization, since the only caller > (vect_enhance_data_refs_alignment) already skips irrelevant statements. > [ As an aside, that routine should probably use STMT_VINFO_RELEVANT_P instead > of just checking STMT_VINFO_RELEVANT as a boolean. ] > > However, for SLP this change in itself doesn't work, since > vect_slp_analyze_bb_1 > calls vect_verify_datarefs_alignment *before* statements have actually been > marked as relevant or irrelevant. Therefore, the patch also rearranges the > sequence in vect_slp_analyze_bb_1. > > This in turn caused ICEs in vect_get_store_cost/vect_get_load_cost, since > those > now can get called with statements with unsupported alignment. There is > already > one caller (vect_get_data_access_cost) that checks for this case and simply > returns "infinite" cost instead of aborting. The patch moves this behaviour > into vect_get_store_cost/vect_get_load_cost themselves. (The particular cost > shouldn't matter since vectorization will still be rejected in the end, it's > just that the test now runs a bit later.) > > Overall, I've tested the patch with no regresions on powerpc64-linux and > arm-linux-gnueabi. On PowerPC, it fixed the gcc.dg/vect/bb-slp-16.c test. > > Mikael, would you mind verifying that it fixes the problem on sparc64? > > OK for mainline? Ok. Thanks, Richard. > Bye, > Ulrich > > > ChangeLog: > > PR tree-optimization/53729 > PR tree-optimization/53636 > * tree-vect-slp.c (vect_slp_analyze_bb_1): Delay call to > vect_verify_datarefs_alignment until after statements have > been marked as relevant/irrelevant. > * tree-vect-data-refs.c (vect_verify_datarefs_alignment): > Skip irrelevant statements. > (vect_enhance_data_refs_alignment): Use STMT_VINFO_RELEVANT_P > instead of STMT_VINFO_RELEVANT. > (vect_get_data_access_cost): Do not check for supportable > alignment before calling vect_get_load_cost/vect_get_store_cost. > * tree-vect-stmts.c (vect_get_store_cost): Do not abort when > handling unsupported alignment. > (vect_get_load_cost): Likewise. > > > Index: gcc/tree-vect-data-refs.c > === > *** gcc/tree-vect-data-refs.c (revision 188850) > --- gcc/tree-vect-data-refs.c (working copy) > *** vect_verify_datarefs_alignment (loop_vec > *** 1094,1099 > --- 1094,1102 > gimple stmt = DR_STMT (dr); > stmt_vec_info stmt_info = vinfo_for_stmt (stmt); > > + if (!STMT_VINFO_RELEVANT_P (stmt_info)) > + continue; > + > /* For interleaving, only the alignment of the first access matters. > Skip statements marked as not vectorizable. */ > if ((STMT_VINFO_GROUPED_ACCESS (stmt_info) > *** vect_get_data_access_cost (struct data_r > *** 1213,1229 > loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info); > int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo); > int ncopies = vf / nunits; > - bool supportable_dr_alignment = vect_supportable_dr_alignment (dr, true); > > ! if (!supportable_dr_alignment) > ! *inside_cost = VECT_MAX_COST; > else > ! { > ! if (DR_IS_READ (dr)) > ! vect_get_load_cost (dr, ncopies, true, inside_cost, outside_cost); > ! else > ! vect_get_store_cost (dr, ncopies, inside_cost); > ! } > > if (vect_print_dump_info (REPORT_COST)) > fprintf (vect_dump, "vect_get_data_access_cost: inside_cost = %d, " > --- 1216,1226 > loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info); > int vf = LOOP_VINFO_VECT_FACTOR (loop_vinfo); > int ncopies = vf / nunits; > > ! if (DR_IS_READ (dr)) > ! vect_get_load_cost (dr, ncopies, true, inside_cost, outside_cost); > else > ! vect_get_store_cost (dr, ncopies, inside_cost); > > if (vect_print_dump_info (REPORT_COST)) > fprintf (vect_dump, "vect_get_data_access_cost: inside_cost = %d, " > *** vect_enhance_data_refs_alignment (loop_v > *** 1537,1543 > stmt = DR_STMT (dr); > stmt_info = vinfo_for_stmt (stmt); > > ! if (!STMT_VINFO_RELEVANT (stmt_info)) > continue; > > /* For interleaving, only the alignment of the first access > --- 1534,1540 > stmt = DR_STMT (dr); > st
Re: [patch] Add a lexical block only when the callsite has source location info
On Tue, Jun 26, 2012 at 9:03 AM, Dehao Chen wrote: > Hi, Richard, > > You are right, setting UNKNOWN_LOCATION will not affect addr2line > result. Here is the updated patch: > > Passed bootstrap and gcc regression tests. > > Is it ok for trunk? Yes. Thanks, Richard. > Thanks, > Dehao > > Index: tree-inline.c > === > --- tree-inline.c (revision 188926) > +++ tree-inline.c (working copy) > @@ -3836,8 +3836,7 @@ > /* Set input_location here so we get the right instantiation context > if we call instantiate_decl from inlinable_function_p. */ > saved_location = input_location; > - if (gimple_has_location (stmt)) > - input_location = gimple_location (stmt); > + input_location = gimple_location (stmt); > > /* From here on, we're only interested in CALL_EXPRs. */ > if (gimple_code (stmt) != GIMPLE_CALL) > > On Mon, Jun 25, 2012 at 10:00 PM, Richard Guenther > wrote: >> On Mon, Jun 25, 2012 at 3:48 PM, Dehao Chen wrote: >>> Hi, Richard, >>> >>> Thanks for the prompt response. >>> >>> On Mon, Jun 25, 2012 at 8:02 PM, Richard Guenther >>> wrote: On Mon, Jun 25, 2012 at 1:43 PM, Dehao Chen wrote: > During function inlining, a lexical block is added for each cloned > callee, and source info is attached to this block for addr2line to > derive the inline stack. Well - the bug is then clearly /* Set input_location here so we get the right instantiation context if we call instantiate_decl from inlinable_function_p. */ saved_location = input_location; if (gimple_has_location (stmt)) input_location = gimple_location (stmt) which retails input_location instead of setting it to UNKNOWN_LOCATION. Not adding a BLOCK will make debug information incorrect, no? >>> >>> The only case I can think of that gimple_has_location is false for >>> call stmt is for function split. >>> >>> If we have function foo, which is split into: >>> >>> foo >>> foo.part1 >>> >>> And a callsite foo->foo.part1 is created in foo. >>> >>> If the ipa-inline decided to inline this callsite, for an instruction >>> in foo.part1, it will have an inline stack of size 2. In the original >>> buggy code, the bottom of the inline stack will be random. Using your >>> proposed approach, the bottom of the inline stack would be >>> UNKNOW_LOCATION, but still has two levels. For function split, this >>> inline will not create any lexical block, but resumes the original >>> lexical block before the split. Thus my change simply not add a new >>> lexical block. Do you think this makes sense? >> >> I don't think it behaves sensibly for any other call without a location. >> Basically you assume that this only happens for split functions but >> I don't see why that should be true. Why would BLOCKs with >> UNKOWN_LOCATION have any effect on addr2line anyways? >> That seems to be something to check and fix. >> >> Richard. >> >> >>> Thanks, >>> Dehao >>> >>> > However, some callsites do not have source > information attached to it. Adding a lexical block would be misleading > in this case. E.g. If a function is split, when the split callsite is > inlined back, the cloned callee should stay in the same lexical block > with its caller. This patch ensures that lexical blocks are only added > when the callsite has source location info in it. > > Bootstrapped and passed gcc regression tests. > > Is it ok for trunk? I'd rather see an unconditional set of input_location from gimple_location of the statement. Richard. > Thanks, > Dehao > > gcc/ChangeLog: > 2012-06-25 Dehao Chen > > * tree-profile.c: (expand_call_inline): Make a new lexical block > only ^ tree-inline.c > when the call stmt has source location. > > Index: gcc/tree-inline.c > === > --- gcc/tree-inline.c (revision 188926) > +++ gcc/tree-inline.c (working copy) > @@ -3950,10 +3950,17 @@ > actual inline expansion of the body, and a label for the return > statements within the function to jump to. The type of the > statement expression is the return type of the function call. */ > - id->block = make_node (BLOCK); > - BLOCK_ABSTRACT_ORIGIN (id->block) = fn; > - BLOCK_SOURCE_LOCATION (id->block) = input_location; > - prepend_lexical_block (gimple_block (stmt), id->block); > + if (gimple_has_location (stmt)) > + { > + id->block = make_node (BLOCK); > + BLOCK_ABSTRACT_ORIGIN (id->block) = fn; > + BLOCK_SOURCE_LOCATION (id->block) = input_location; Please use gimple_location (stmt) instead of input_location (yes, I realize its set from that). > + prepend_lexical_block (gimple_block (stmt
Re: [PATCH][configure] Make sure CFLAGS_FOR_TARGET And CXXFLAGS_FOR_TARGET contain -O2
On 25.06.2012 17:24, Joseph S. Myers wrote: On Mon, 25 Jun 2012, Christophe Lyon wrote: Ping? I advise CCing appropriate maintainers (in this case, build system maintainers) on pings. Ping again, CCing build system maintainers as suggested by Joseph. (BTW I'm proposing to modify code which was last modified by Paolo Bonzini according to git blame) Christophe.
Re: [patch] Only define JMP_BUF_SIZE in backends that also define DONT_USE_BUILTIN_SETJMP
On Mon, Jun 25, 2012 at 10:46 AM, Eric Botcazou wrote: > It looks like DONT_USE_BUILTIN_SETJMP was invented for the IA-64, probably > because of the Register Stack Engine. But SPARC also has register windows, > albeit less sophisticated, and can do without it. Morever, the Ada compiler > uses __builtin_setjmp/__builtin_longjmp internally on the host (EH is used for > error recovery) and is known to work fine on IA-64/Linux, HP-UX and VMS. > > In the end, it would appear that DONT_USE_BUILTIN_SETJMP was a quick trick to > solve specific issues that could very likely have been solved otherwise. We > should probably keep it for the sake of IA-64 and get rid of it for all other > architectures, documenting that it isn't to be used in normal circumstances. ia64 is now the only remaining architecture that defines DONT_USE_BUILTIN_SETJMP and JMP_BUF_SIZE. If __builtin_setjmp actually does work for ia64, why should we keep DONT_USE_BUILTIN_SETJMP? Could it break user code somehow? Ciao! Steven
Commit: RX: Fix simple_return pattern
Hi Guys, I am applying the patch below to the 4.7 branch. It fixes the simple_return pattern in the RX backend so that it uses the (simple_return) rtl. This fixes 59 gcc testsuite failures and introduces no regressions. I plan on applying a similar patch to the mainline sources once I have finished regression testing them. Cheers Nick gcc/ChangeLog 2012-06-26 Nick Clifton * config/rx/rx.md (simple_return): Use the simple_return rtl. Index: gcc/config/rx/rx.md === --- gcc/config/rx/rx.md (revision 188932) +++ gcc/config/rx/rx.md (working copy) @@ -348,7 +348,7 @@ ) (define_insn "simple_return" - [(return)] + [(simple_return)] "" "rts" [(set_attr "length" "1")
Re: [patch] [gcc/libgcc/ada/libstdc++] Match arm*-*-linux-*eabi* for ARM Linux/GNU EABI
On 25/06/12 22:30, Matthias Klose wrote: > On 25.06.2012 18:21, Matthias Klose wrote: >> On 25.06.2012 15:22, Richard Earnshaw wrote: >>> On 25/06/12 13:08, Matthias Klose wrote: gcc/config.gcc now allows matching arm*-*-linux-*eabi* instead of arm*-*-linux-*eabi for ARM Linux/GNU EABI. This changes the matching in various other places as well. arm-linux-gnueabihf is used as a triplet by some distributions. Ok for the trunk? >>> >>> now that all arm-linux ports are EABI conforming, why can't this just become >>> >>> arm*-*-linux* >>> ? >> >> I assume it could. But I didn't check for other places where this would be >> needed. > > $ grep -r 'arm[^_]*eabi' . |egrep -v 'ChangeLog|\.svn/'|wc -l > 87 > > this seems to be a larger change, which should have been committed when the > old > abi targets were deprecated. I'd like to get the eabi* change in first. > > Matthias > > Removal of the FPA support is still ongoing, but beware that it doesn't mean that all supported ARM configurations will be EABI conforming (some configurations did not use the FPA and are thus not affected by this change); but all ARM Linux configurations will be. R.
[PR other/33190] target macro documentation fixes [1/n]
Hello, I'm going through the comments in http://gcc.gnu.org/PR33190 one-by-one. This patch addresses comment #0. OK for trunk? Ciao! Steven pr33190_1.diff Description: Binary data
Re: [PATCH, gdc] - Merging gdc (GNU D Compiler) into gcc
On Tue, Jun 26, 2012 at 2:43 AM, Iain Buclaw wrote: > On 19 June 2012 17:08, Steven Bosscher > wrote: >> BTW you also include output.h in those two files, and I am about two >> patches away from adding output.h to the list of headers that no front >> end should ever include (a front end should never have to write out >> assembly). Can you please check what you need output.h for, and fix >> this? >> >> >> What are you calling targetm.asm_out.output_mi_thunk and >> targetm.asm_out.generate_internal_label for? Thunks and aliases should >> go through cgraphunit. >> >> (NB: This also means that this front end cannot work with LTO. IMHO we >> shouldn't let in new front ends that don't work with LTO.) >> >> > > I tried switching, but unfortunately it broke the code generation of D > thunks. D requires any class that inherits from an interface to > output thunks for that interface regardless of how far back it is and > whether it being external to the current module. However, the GCC > backend as far as I can tell only outputs aliases when the function is > output as well. So if there is no function to output no thunk will be > generated. This leaves a handful of undefined references to thunks > that were defined in the D frontend, but discarded by the backend. > Whereas forcing the output using output_mi_thunk means the thunk is > emitted and does not cause issues. > > So removing output.h will be a bit of a problem for me with no obvious > way around it. The obvious way around it is to work with this community to fix the problem. You will want to talk with Jan Hubicka about this, see if he can help. Ciao! Steven
Re: [PR other/33190] target macro documentation fixes [1/n]
On Tue, Jun 26, 2012 at 11:26 AM, Steven Bosscher wrote: > Hello, > > I'm going through the comments in http://gcc.gnu.org/PR33190 > one-by-one. This patch addresses comment #0. OK for trunk? Ok. Thanks, Richard. > Ciao! > Steven
[graphite] RFC: Add ISL variants of remaining PPL things
Hi, so, to make progress on the graphite front we want to get rid of the ppl dependency. We start from Tobis move-to-isl-and-isl-scheduler branch at github, merged current trunk into that (see also Richis mails about cloog/isl configury), and add ISL implementations of the essential parts that still are ppl-only. These are the number of iteration analysis for potentially transformed iteration spaces (this wasn't mentioned in Tobis mail from February) and the interchange heuristic (basically stride calculation for transformed spaces). So I learned PPL and ISL and rewrote that part :) Like in the patch below. I'm not asking for any proper review for this, but rather if the isl code makes sort of sense. The testsuite works, except for pr43012.c, which exposes the fact that the current PPL nr-iter implementation is buggy (it computes the nr-iter of a 77<=i<=88 loop to be 65), so that testcase runs into the nr-iter-ppl==nr-iter-isl assert. In a private branch I have already removed all PPL code whatsoever (and cleaned up the ISL code I added) so we make good progress there. Ciao, Michael. diff --git a/gcc/graphite-interchange.c b/gcc/graphite-interchange.c index fa38f5c..96a7967 100644 --- a/gcc/graphite-interchange.c +++ b/gcc/graphite-interchange.c @@ -24,9 +24,11 @@ along with GCC; see the file COPYING3. If not see #include "config.h" #ifdef HAVE_cloog +#include #include #include #include +#include #include #include #endif @@ -64,7 +66,7 @@ along with GCC; see the file COPYING3. If not see c_0 * s_0 + c_1 * s_1 + ... c_n * s_n. */ static ppl_Linear_Expression_t -build_linearized_memory_access (ppl_dimension_type offset, poly_dr_p pdr) +ppl_build_linearized_memory_access (ppl_dimension_type offset, poly_dr_p pdr) { ppl_Linear_Expression_t res; ppl_Linear_Expression_t le; @@ -189,7 +191,7 @@ build_partial_difference (ppl_Pointset_Powerset_C_Polyhedron_t *p, the loop at DEPTH. */ static void -pdr_stride_in_loop (mpz_t stride, graphite_dim_t depth, poly_dr_p pdr) +ppl_pdr_stride_in_loop (mpz_t stride, graphite_dim_t depth, poly_dr_p pdr) { ppl_dimension_type time_depth; ppl_Linear_Expression_t le, lma; @@ -245,7 +247,7 @@ pdr_stride_in_loop (mpz_t stride, graphite_dim_t depth, poly_dr_p pdr) /* Construct the 0|0|0|0|0|S|0|l1|0 part. */ { -lma = build_linearized_memory_access (offset + dim_sctr, pdr); +lma = ppl_build_linearized_memory_access (offset + dim_sctr, pdr); ppl_set_coef (lma, dim_L1, -1); ppl_new_Constraint (&new_cstr, lma, PPL_CONSTRAINT_TYPE_EQUAL); ppl_Pointset_Powerset_C_Polyhedron_add_constraint (p1, new_cstr); @@ -310,6 +312,7 @@ pdr_stride_in_loop (mpz_t stride, graphite_dim_t depth, poly_dr_p pdr) ppl_max_for_le_pointset (p1, le, stride); } +#if 0 if (dump_file && (dump_flags & TDF_DETAILS)) { char *str; @@ -322,12 +325,159 @@ pdr_stride_in_loop (mpz_t stride, graphite_dim_t depth, poly_dr_p pdr) mp_get_memory_functions (NULL, NULL, &gmp_free); (*gmp_free) (str, strlen (str) + 1); } +#endif ppl_delete_Pointset_Powerset_C_Polyhedron (p1); ppl_delete_Pointset_Powerset_C_Polyhedron (p2); ppl_delete_Linear_Expression (le); } +static isl_constraint * +build_linearized_memory_access (isl_map *map, poly_dr_p pdr) +{ + isl_constraint *res; + isl_local_space *ls = isl_local_space_from_space (isl_map_get_space (map)); + unsigned offset, nsubs; + int i; + isl_int size, subsize; + + res = isl_equality_alloc (ls); + isl_int_init (size); + isl_int_set_ui (size, 1); + isl_int_init (subsize); + isl_int_set_ui (subsize, 1); + + nsubs = isl_set_dim (pdr->extent, isl_dim_set); + /* -1 for the already included L dimension. */ + offset = isl_map_dim (map, isl_dim_out) - 1 - nsubs; + res = isl_constraint_set_coefficient_si (res, isl_dim_out, offset + nsubs, -1); + /* Go through all subscripts from last to first. First dimension + is the alias set, ignore it. */ + for (i = nsubs - 1; i >= 1; i--) +{ + isl_space *dc; + isl_aff *aff; + enum isl_lp_result lpres; + + res = isl_constraint_set_coefficient (res, isl_dim_out, offset + i, size); + + dc = isl_set_get_space (pdr->extent); + aff = isl_aff_zero_on_domain (isl_local_space_from_space (dc)); + aff = isl_aff_set_coefficient_si (aff, isl_dim_in, i, 1); + lpres = isl_set_max (pdr->extent, aff, &subsize); + isl_aff_free (aff); + isl_int_mul (size, size, subsize); +} + + isl_int_clear (subsize); + isl_int_clear (size); + + return res; +} + +static void +pdr_stride_in_loop (mpz_t stride, graphite_dim_t depth, poly_dr_p pdr) +{ + poly_bb_p pbb = PDR_PBB (pdr); + isl_map *map; + isl_set *set; + isl_aff *aff; + isl_space *dc; + isl_constraint *lma, *c; + isl_int islstride; + enum isl_lp_result lpres; + graphite_dim_t time_depth; + unsigned offset, nt; + unsigned i; + /* pdr->accesses:[P1..nb_param,I1..nb_domain
PR lto/53604 (incorrectly optmized out comdats)
Hi, this patch solves problem where comdat symbols are incorrectly optimized out even if used from object file. With V1 plugin API this will lead to unnecesary symbols kept in the object file. I am not sure how important it is, since they are not that common. I will backport this to 4.7 if Mozilla will build well with V1 API. Honza Bootstrapped/regtested x86_64-linux, comitted. PR lto/53572 * cgraph.h (varpool_can_remove_if_no_refs): Fix handling of used symbols. --- trunk/gcc/cgraph.h 2012/06/26 10:13:11 188981 +++ trunk/gcc/cgraph.h 2012/06/26 10:15:18 188982 @@ -1126,7 +1126,8 @@ if (DECL_EXTERNAL (node->symbol.decl)) return true; return (!node->symbol.force_output && !node->symbol.used_from_other_partition - && (DECL_COMDAT (node->symbol.decl) + && ((DECL_COMDAT (node->symbol.decl) + && !symtab_used_from_object_file_p ((symtab_node) node)) || !node->symbol.externally_visible || DECL_HAS_VALUE_EXPR_P (node->symbol.decl))); }
Re: ping x3 : latch mem to reg before multi-access in convert_move
Hello Richard, On Jun 25, 2012, at 20:36 , Richard Henderson wrote: > Can you explain the sequence that results in multiple memory > references? Because I can't see how it can... Compared to the point at which we were observing the problem, the code has changed a bit in this area though not so much. In the current mainline, this would be from the two uses of "lowfrom" in: /* Get a copy of FROM widened to a word, if necessary. */ ... lowfrom = convert_to_mode (lowpart_mode, from, unsignedp); lowpart = gen_lowpart (lowpart_mode, to); ==> emit_move_insn (lowpart, lowfrom); /* Compute the value to put in each remaining word. */ if (unsignedp) fill_value = const0_rtx; else fill_value = emit_store_flag (gen_reg_rtx (word_mode), ==> LT, lowfrom, const0_rtx, VOIDmode, 0, -1);
[PATCH] Fix PR53676
This fixes PR53676, extending SCEV analysis to analyze truncated operations by analyzing the operation on truncated operands instead. Bootstrapped and tested on x86_64-unknown-linux-gnu, I plan to install this tomorrow. Richard. 2012-06-26 Richard Guenther PR middle-end/53676 * tree-chrec.c (chrec_convert_1): Represent truncation to a type with undefined overflow as truncation to an unsigned type converted to the type with undefined overflow. * tree-scalar-evolution.c (interpret_rhs_expr): For computing the scalar evolution of a truncated widened operation avoid looking at the non-existing evolution of the widened operation result. * gcc.dg/tree-ssa/scev-6.c: New testcase. Index: gcc/tree-chrec.c === *** gcc/tree-chrec.c(revision 188927) --- gcc/tree-chrec.c(working copy) *** keep_cast: *** 1365,1370 --- 1365,1387 res = fold_build2 (TREE_CODE (chrec), type, fold_convert (type, TREE_OPERAND (chrec, 0)), fold_convert (type, TREE_OPERAND (chrec, 1))); + /* Similar perform the trick that (signed char)((int)x + 2) can be + narrowed to (signed char)((unsigned char)x + 2). */ + else if (use_overflow_semantics + && TREE_CODE (chrec) == POLYNOMIAL_CHREC + && TREE_CODE (ct) == INTEGER_TYPE + && TREE_CODE (type) == INTEGER_TYPE + && TYPE_OVERFLOW_UNDEFINED (type) + && TYPE_PRECISION (type) < TYPE_PRECISION (ct)) + { + tree utype = unsigned_type_for (type); + res = build_polynomial_chrec (CHREC_VARIABLE (chrec), + fold_convert (utype, + CHREC_LEFT (chrec)), + fold_convert (utype, + CHREC_RIGHT (chrec))); + res = chrec_convert_1 (type, res, at_stmt, use_overflow_semantics); + } else res = fold_convert (type, chrec); Index: gcc/tree-scalar-evolution.c === *** gcc/tree-scalar-evolution.c (revision 188927) --- gcc/tree-scalar-evolution.c (working copy) *** interpret_rhs_expr (struct loop *loop, g *** 1634,1639 --- 1634,1640 tree type, tree rhs1, enum tree_code code, tree rhs2) { tree res, chrec1, chrec2; + gimple def; if (get_gimple_rhs_class (code) == GIMPLE_SINGLE_RHS) { *** interpret_rhs_expr (struct loop *loop, g *** 1759,1765 break; CASE_CONVERT: ! chrec1 = analyze_scalar_evolution (loop, rhs1); res = chrec_convert (type, chrec1, at_stmt); break; --- 1760,1788 break; CASE_CONVERT: ! /* In case we have a truncation of a widened operation that in ! the truncated type has undefined overflow behavior analyze !the operation done in an unsigned type of the same precision !as the final truncation. We cannot derive a scalar evolution !for the widened operation but for the truncated result. */ ! if (TREE_CODE (type) == INTEGER_TYPE ! && TREE_CODE (TREE_TYPE (rhs1)) == INTEGER_TYPE ! && TYPE_PRECISION (type) < TYPE_PRECISION (TREE_TYPE (rhs1)) ! && TYPE_OVERFLOW_UNDEFINED (type) ! && TREE_CODE (rhs1) == SSA_NAME ! && (def = SSA_NAME_DEF_STMT (rhs1)) ! && is_gimple_assign (def) ! && TREE_CODE_CLASS (gimple_assign_rhs_code (def)) == tcc_binary ! && TREE_CODE (gimple_assign_rhs2 (def)) == INTEGER_CST) ! { ! tree utype = unsigned_type_for (type); ! chrec1 = interpret_rhs_expr (loop, at_stmt, utype, ! gimple_assign_rhs1 (def), ! gimple_assign_rhs_code (def), ! gimple_assign_rhs2 (def)); ! } ! else ! chrec1 = analyze_scalar_evolution (loop, rhs1); res = chrec_convert (type, chrec1, at_stmt); break; Index: gcc/testsuite/gcc.dg/tree-ssa/scev-6.c === *** gcc/testsuite/gcc.dg/tree-ssa/scev-6.c (revision 0) --- gcc/testsuite/gcc.dg/tree-ssa/scev-6.c (revision 0) *** *** 0 --- 1,23 + /* { dg-do run } */ + /* { dg-options "-O2 -fdump-tree-optimized" } */ + + int main() + { + int i; + signed char result = 0; + for (i = 0; i != 8000; ++i) + { + int tem = result; + tem = tem + 2; + result = tem; + } + if (__builtin_abs ((int)(signed char)((unsigned char ) result + 128)) != 0) + __builtin_abort (); + return 0; + } + + /* SCEV constant propagation should be able to compute the overall effect +of the loop. */ + + /* { dg-final { scan-tree-dump-not
[PR other/33190] target macro documentation fixes [2/n]
On Tue, Jun 26, 2012 at 11:26 AM, Steven Bosscher wrote: > Hello, > > I'm going through the comments in http://gcc.gnu.org/PR33190 > one-by-one. This patch addresses comment #0. OK for trunk? This is for comment #1: PR other/33190 * doc/tm.texi.in: Document LOGICAL_OP_NON_SHORT_CIRCUIT. * doc/tm.texi: Regenerate. Index: doc/tm.texi.in === --- doc/tm.texi.in (revision 188983) +++ doc/tm.texi.in (working copy) @@ -6342,6 +6342,12 @@ Define this macro if it is as good or better to ca function address than to call an address kept in a register. @end defmac +@defmac LOGICAL_OP_NON_SHORT_CIRCUIT +Define this macro if a non-short-circuit operation produced by +@samp{fold_range_test ()} is optimal. This macro defaults to true if +@code{BRANCH_COST} is greater than or equal to the value 2. +@end defmac + @hook TARGET_RTX_COSTS This target hook describes the relative costs of RTL expressions. Index: doc/tm.texi === --- doc/tm.texi (revision 188983) +++ doc/tm.texi (working copy) @@ -6414,6 +6414,12 @@ Define this macro if it is as good or better to ca function address than to call an address kept in a register. @end defmac +@defmac LOGICAL_OP_NON_SHORT_CIRCUIT +Define this macro if a non-short-circuit operation produced by +@samp{fold_range_test ()} is optimal. This macro defaults to true if +@code{BRANCH_COST} is greater than or equal to the value 2. +@end defmac + @deftypefn {Target Hook} bool TARGET_RTX_COSTS (rtx @var{x}, int @var{code}, int @var{outer_code}, int @var{opno}, int *@var{total}, bool @var{speed}) This target hook describes the relative costs of RTL expressions. The patch merely fixes the missed target macro renaming doc change for http://gcc.gnu.org/ml/gcc-patches/2004-11/msg01473.html. Committed as obvious. Ciao! Steven
Re: [PR other/33190] target macro documentation fixes [2/n]
On Tue, Jun 26, 2012 at 1:28 PM, Steven Bosscher wrote: > On Tue, Jun 26, 2012 at 11:26 AM, Steven Bosscher > wrote: >> Hello, >> >> I'm going through the comments in http://gcc.gnu.org/PR33190 >> one-by-one. This patch addresses comment #0. OK for trunk? Ok. Thanks, Richard. > This is for comment #1: > > PR other/33190 > * doc/tm.texi.in: Document LOGICAL_OP_NON_SHORT_CIRCUIT. > * doc/tm.texi: Regenerate. > > Index: doc/tm.texi.in > === > --- doc/tm.texi.in (revision 188983) > +++ doc/tm.texi.in (working copy) > @@ -6342,6 +6342,12 @@ Define this macro if it is as good or better to ca > function address than to call an address kept in a register. > @end defmac > > +@defmac LOGICAL_OP_NON_SHORT_CIRCUIT > +Define this macro if a non-short-circuit operation produced by > +@samp{fold_range_test ()} is optimal. This macro defaults to true if > +@code{BRANCH_COST} is greater than or equal to the value 2. > +@end defmac > + > @hook TARGET_RTX_COSTS > This target hook describes the relative costs of RTL expressions. > > Index: doc/tm.texi > === > --- doc/tm.texi (revision 188983) > +++ doc/tm.texi (working copy) > @@ -6414,6 +6414,12 @@ Define this macro if it is as good or better to ca > function address than to call an address kept in a register. > @end defmac > > +@defmac LOGICAL_OP_NON_SHORT_CIRCUIT > +Define this macro if a non-short-circuit operation produced by > +@samp{fold_range_test ()} is optimal. This macro defaults to true if > +@code{BRANCH_COST} is greater than or equal to the value 2. > +@end defmac > + > @deftypefn {Target Hook} bool TARGET_RTX_COSTS (rtx @var{x}, int > @var{code}, int @var{outer_code}, int @var{opno}, int *@var{total}, > bool @var{speed}) > This target hook describes the relative costs of RTL expressions. > > The patch merely fixes the missed target macro renaming doc change for > http://gcc.gnu.org/ml/gcc-patches/2004-11/msg01473.html. > Committed as obvious. > > Ciao! > Steven
[PATCH][C++] Fix PR53752
This fixes PR53752 - reinstantiating the behavior for an array with domain [0, USIZE_MAX] during mangling (mangle it as zero-size array). Bootstrapped and tested on x86_64-unknown-linux-gnu, ok for trunk and the 4.7 branch? Thanks, Richard. 2012-06-26 Richard Guenther PR c++/53752 * mangle.c (write_array_type): Truncate the number-of-elements result. * g++.dg/torture/pr53752.C: New testcase. Index: gcc/cp/mangle.c === --- gcc/cp/mangle.c (revision 188927) +++ gcc/cp/mangle.c (working copy) @@ -3121,6 +3121,9 @@ write_array_type (const tree type) elements in the array, not the largest allowed index. */ double_int dmax = double_int_add (tree_to_double_int (max), double_int_one); + /* Truncate the result - this will mangle [0, SIZE_INT_MAX] +number of elements as zero. */ + dmax = double_int_zext (dmax, TYPE_PRECISION (TREE_TYPE (max))); gcc_assert (double_int_fits_in_uhwi_p (dmax)); write_unsigned_number (dmax.low); } Index: gcc/testsuite/g++.dg/torture/pr53752.C === --- gcc/testsuite/g++.dg/torture/pr53752.C (revision 0) +++ gcc/testsuite/g++.dg/torture/pr53752.C (revision 0) @@ -0,0 +1,156 @@ +// { dg-do compile } +// { dg-options "-g" } + +typedef unsigned int uint32_t; +typedef unsigned long int uint64_t; +namespace mpl_ { +template< typename T, T N > struct integral_c { + static const T value = N; +}; +} +namespace mpl { +using namespace mpl_; +}; +template struct integral_constant : public mpl::integral_c { +typedef integral_constant type; +}; +template< typename T > struct is_lvalue_reference : public ::integral_constant { +}; +template< typename T > struct is_rvalue_reference : public ::integral_constant { +}; +namespace type_traits { +template struct ice_or; +template <> struct ice_or { + static const bool value = false; +}; +} +template struct is_reference_impl { +static const bool value = (::type_traits::ice_or< ::is_lvalue_reference::value, ::is_rvalue_reference::value >::value) ; +}; +template< typename T > struct is_reference : public ::integral_constant::value> { +}; +struct na { +}; +namespace mpl { +template< bool C , typename T1 , typename T2 > struct if_c { + typedef T2 type; +}; +template< typename T1 = na , typename T2 = na , typename T3 = na > struct if_ { + typedef if_c< static_cast(T1::value) , T2 , T3 > almost_type_; + typedef typename almost_type_::type type; +}; +} +namespace optional_detail { +template struct types_when_isnt_ref { + typedef T & reference_type ; +} +; +template struct types_when_is_ref { +} +; +struct optional_tag { +} +; +template class optional_base : public optional_tag { + typedef types_when_isnt_ref types_when_not_ref ; + typedef types_when_is_ref types_when_ref ; +protected : + typedef typename is_reference::type is_reference_predicate ; + typedef typename mpl::if_::type types ; + typedef typename types::reference_type reference_type ; +} +; +} +template class optional : public optional_detail::optional_base { +typedef optional_detail::optional_base base ; +public : +typedef typename base::reference_type reference_type ; +reference_type operator *() { +} +}; +namespace noncopyable_ { +class noncopyable { +}; +} +typedef noncopyable_::noncopyable noncopyable; +template class shared_ptr { +public: +T * operator-> () const { +} +}; +typedef uint64_t block_address; +class transaction_manager : noncopyable { +public: +typedef shared_ptr ptr; +}; +template class NoOpRefCounter { +}; +struct uint64_traits { +}; +namespace btree_detail { +class shadow_spine : private noncopyable { +public: + shadow_spine(transaction_manager::ptr tm) : tm_(tm) { + } + transaction_manager::ptr tm_; +}; +} +template class btree { +public: +typedef shared_ptr > ptr; +typedef uint64_t key[Levels]; +typedef typename ValueTraits::value_type value_type; +typedef optional maybe_value; +btree(typename transaction_manager::ptr tm, typename ValueTraits::ref_counter rc); +maybe_value lookup(key const &key) const; +void insert(key const &key, typename ValueTraits::value_type const &value); +templatebool insert_location(btree_detail::shadow_spine &spine, block_address block, uint64_t key, int *index); +typename transaction_manager::ptr tm_; +block_address root_; +typename ValueTraits::ref_counter rc_; +}; +template void btree:: insert(key const &key,typename ValueTraits::value_type const &value) { +
Re: [wwwdocs] Update coding conventions for C++
Hi, On Mon, Jun 25, 2012 at 03:26:01PM -0700, Lawrence Crowl wrote: ... > > I have no idea. I don't use emacs. The two-space rule for members > > comes from the wiki. The one-space rule for protection labels is > > common practice. If folks want something else, changes are fine > > with me. I'll also need an emacs C++ indentation style that conforms to this in order to really be able to produce complying code myself. So if anybody else will be working on that, I'm interested (to use it and perhaps help crafting it) and I guess a number of other people on this list are too... Furthermore, even though this mainly reminds me how little I actually know about C++, I have a few general tiny comments: > -Miscellaneous Conventions > - > -Code should use gcc_assert(EXPR) to check invariants. > -Use gcc_unreachable() to mark places that should never be > -reachable (such as an unreachable default case of a > -switch). Do not use gcc_assert(0) for such purposes, as > -gcc_unreachable gives the compiler more information. The > -assertions are enabled unless explicitly configured off with > ---enable-checking=none. Do not use abort. > -User input should never be validated by either gcc_assert > -or gcc_unreachable. If the checks are expensive or the > -compiler can reasonably carry on after the error, they may be > -conditioned on --enable-checking. It seems we should mention gcc_checking_assert here then. ... > + > +Single inheritance is permitted. > +Use public inheritance to describe interface inheritance, > +i.e. 'is-a' relationships. > +Use private and protected inheritance > +to describe implementation inheritance. > +Implementation inheritance can be expediant, s/expediant/expedient/ > +but think twice before using it in code > +intended to last a long time. I think all committed code should be expected to have long-lasting quality. I would not encourage people to think otherwise and would drop the "long time" reference here. If anybody ever commits something ugly to bridge a short time period, it should only be done under the "maintainers grant exceptions" rule anyway. ... > +> + > +For long-term code, at least for now, > +we will continue to use printf style I/O > +rather thanstyle I/O. > +For quick debugging code, > + is permitted. > + Similarly here, no quick and dirty debugging output should ever be committed, we should not > +The Standard Library > + > + > +At present, C++ provides no great advantage for i18n. > +GCC does type checking for printf arguments, > +so the type insecurity of printf is moot, > +but the clarity in layout persists. > +For quick debugging output, requires less work. > + The same applies here. Thanks, Martin > + > +Formatting Conventions > + > +Names > + > + > +Naming data members with a trailing underscore > +highlights the extra overhead of access to fields over local variables. > +Think of the trailing underscore > +like you would Pascal's postfix ^ dereference operator. > + > + > + > +When using the above convention, > +the constructor parameter names > +and getter member function names > +can use the more concise non-underscore form. > + > + > + > + > > -- > Lawrence Crowl
Re: [PATCH] Adjust call stmt cost for tailcalls
On Sat, 23 Jun 2012, Jan Hubicka wrote: > > > > Tailcalls have no argument setup cost and no return value cost. > > This patch adjusts estminate_num_insns to reflect that. > > > > Honza, does this look correct? > > > > Bootstrapped and tested on x86_64-unknown-linux-gnu. > > > > Thanks, > > Richard. > > > > 2012-06-20 Richard Guenther > > > > * tree-inline.c (estimate_num_insns): Estimate call cost for > > tailcalls properly. > > Well, as discussed offline, this change should currently be no-op since > we discover tail calls only very late in the game. > > I am not sure I agree that the argument costs are zeroed out. I.e. the > arguments are generally passed the same way as they would be for normal call, > just they are homed at different place of the stack frame. > (note that we mark by the tail call flag far more calls than those that > are really expanded to tailcall because target limitations are checked only > in calls.c). > > Finally ipa-cp use estimate_move cost to estimate savings for propagating > and I think there is risk in arriving to negative numbers when costs > are not accounted at all calls. > > So I am not sure we want to keep the patch in mainline in the current form... I have reverted the patch. Richard.
Re: [Patch] PR 51938: extend ifcombine
On Sat, Jun 23, 2012 at 7:18 PM, Marc Glisse wrote: > Hello, > > thanks for looking into the patch. A couple more details now that I am back > from a conference: > > > On Wed, 20 Jun 2012, Marc Glisse wrote: > >> On Wed, 20 Jun 2012, Richard Guenther wrote: >> >>> On Sun, Jun 10, 2012 at 4:16 PM, Marc Glisse >>> wrote: Hello, currently, tree-ssa-ifcombine handles pairs of imbricated "if"s that share the same then branch, or the same else branch. There is no particular reason why it couldn't also handle the case where the then branch of one is the else branch of the other, which is what I do here. Any comments? >>> >>> >>> The general idea looks good, but I think the patch is too invasive. As >>> far >>> as I can see the only callers with a non-zero 'inv' argument come from >>> ifcombine_ifnotorif and ifcombine_ifnotandif (and both with inv == 2). > > > The idea of also supporting inv==1 or inv==3 is for uniformity, and because > we might at some point want to get rid of the 'or' function and implement > everything in terms of the 'and' version, with suitably inverted tests. > > >>> I would rather see a more localized patch that makes use of >>> invert_tree_comparison to perform the inversion on the call arguments >>> of maybe_fold_and/or_comparisons. > > > I would rather have done that as well, and as a matter of fact that is what > the first version of the patch did, until I realized that -ftrapping-math > was the default. > > >>> Is there any reason that would not work? >> >> >> invert_tree_comparison is useless for floating point (the case I am most >> interested in) unless we specify -fno-trapping-math (writing this patch >> taught me to add this flag to my default flags, but I can't expect everyone >> to do the same). An issue is that gcc mixes the behaviors of qnan and snan >> (it is not really an issue, it just means that !(comparison) can't be >> represented as comparison2). >> >>> At least >>> >>> + if (inv & 1) >>> + lcompcode2 = COMPCODE_TRUE - lcompcode2; >>> >>> looks as if it were not semantically correct - you cannot simply invert >>> floating-point comparisons (see the restrictions invert_tree_comparison >>> has). >> >> >> I don't remember all details, but I specifically thought of that, and the >> trapping behavior is handled a few lines below. > > > More specifically, it has (was already there, I didn't add it): > if (!honor_nans) > ... > else if (flag_trapping_math) > { > /* Check that the original operation and the optimized ones will trap > under the same condition. */ > bool ltrap = (lcompcode & COMPCODE_UNORD) == 0 > && (lcompcode != COMPCODE_EQ) > && (lcompcode != COMPCODE_ORD); > ... same for rtrap and trap > > /* In a short-circuited boolean expression the LHS might be > such that the RHS, if evaluated, will never trap. For > example, in ORD (x, y) && (x < y), we evaluate the RHS only > if neither x nor y is NaN. (This is a mixed blessing: for > example, the expression above will never trap, hence > optimizing it to x < y would be invalid). */ > if ((code == TRUTH_ORIF_EXPR && (lcompcode & COMPCODE_UNORD)) > || (code == TRUTH_ANDIF_EXPR && !(lcompcode & COMPCODE_UNORD))) > rtrap = false; > > /* If the comparison was short-circuited, and only the RHS > trapped, we may now generate a spurious trap. */ > if (rtrap && !ltrap > && (code == TRUTH_ANDIF_EXPR || code == TRUTH_ORIF_EXPR)) > return NULL_TREE; > > /* If we changed the conditions that cause a trap, we lose. */ > if ((ltrap || rtrap) != trap) > ... > > which presumably ensures that the trapping behavior is preserved (I'll have > to double-check that I didn't break that logic). I was more concerned about the behavior with NaNs in general where x < y is not equal to x >= y. Now combine_comparison handles only the very specific case of logical and or or of two comparisons with the same operands, you basically make it handle && ~ and || ~, too (and ~ && ~ and ~ || ~), so it seems that optionally inverting the result of the 2nd operand should be enough making the interface prettier compared to the bitmask. With the following change this should be installed separately - it's a functional change already now /* If we changed the conditions that cause a trap, we lose. */ if ((ltrap || rtrap) != trap) + { + /* Try the inverse condition. */ + compcode = COMPCODE_TRUE - compcode; + exchg = true; + trap = (compcode & COMPCODE_UNORD) == 0 + && (compcode != COMPCODE_EQ) + && (compcode != COMPCODE_ORD); + } + if ((ltrap || rtrap) != trap) return NULL_TREE; } ... + ret = fold_build2_loc (loc, tcode, truth_type, ll_arg, lr_arg); +
Re: [PATCH][C++] Fix PR53752
OK. Jason
Re: [RFA/ARM 1/3] Add VFP support for VFMA and friends
On 25/06/12 15:59, Matthew Gretton-Dann wrote: > All, > > This patch adds support to the ARM backend for generating floating-point > fused multiply-accumulate. > > OK? > > gcc/ChangeLog: > > 2012-06-25 Matthew Gretton-Dann > > * config/arm/iterators.md (SDF): New mode iterator. > (V_if_elem): Add support for SF and DF modes. > (V_reg): Likewise. > (F_w_constraint): New mode iterator attribute. > (F_r_constraint): Likewise. > (F_fma_type): Likewise. > (F_target): Likewise. > config/arm/vfp.md (fma4): New pattern. > (*fmsub4): Likewise. > (*fmnsub4): Likewise. > (*fmnadd4): Likewise. > F_target as an attribute name doesn't tell me anything useful. I suggest F_maybe_not_df. > + "TARGET_32BIT && TARGET_HARD_FLOAT && TARGET_FMA " This should be written as "TARGET_32BIT && TARGET_HARD_FLOAT && TARGET_FMA && " Then the attribute should expand (define_mode_attr F_maybe_not_df [(SF "1") (DF "TARGET_VFP_DOUBLE")]) As I style nit, I would also suggest using the iterator name when it appears in the pattern name, even though it is redundant. This avoids potential ambiguities when there are multiple iterators operating on different expansions. That is, instead of: (define_insn "fma4" use: (define_insn "fma4" OK with those changes. R. > Thanks, > > Matt > > > 0001-Add-VFP-support-for-VFMA-and-friends.txt > > > diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md > index 795a5ee..3063f00 100644 > --- a/gcc/config/arm/iterators.md > +++ b/gcc/config/arm/iterators.md > @@ -42,6 +42,9 @@ > ;; A list of the 32bit and 64bit integer modes > (define_mode_iterator SIDI [SI DI]) > > +;; A list of modes which the VFP unit can handle > +(define_mode_iterator SDF [SF DF]) > + > ;; Integer element sizes implemented by IWMMXT. > (define_mode_iterator VMMX [V2SI V4HI V8QI]) > > @@ -245,7 +248,8 @@ > (V4HI "P") (V8HI "q") > (V2SI "P") (V4SI "q") > (V2SF "P") (V4SF "q") > - (DI "P") (V2DI "q")]) > + (DI "P") (V2DI "q") > + (SF "") (DF"P")]) > > ;; Wider modes with the same number of elements. > (define_mode_attr V_widen [(V8QI "V8HI") (V4HI "V4SI") (V2SI "V2DI")]) > @@ -303,7 +307,8 @@ > (V4HI "i16") (V8HI "i16") > (V2SI "i32") (V4SI "i32") > (DI "i64") (V2DI "i64") > - (V2SF "f32") (V4SF "f32")]) > + (V2SF "f32") (V4SF "f32") > + (SF "f32") (DF "f64")]) > > ;; Same, but for operations which work on signed values. > (define_mode_attr V_s_elem [(V8QI "s8") (V16QI "s8") > @@ -423,6 +428,12 @@ > ;; Mode attribute for vshll. > (define_mode_attr V_innermode [(V8QI "QI") (V4HI "HI") (V2SI "SI")]) > > +;; Mode attributes used for fused-multiply-accumulate VFP support > +(define_mode_attr F_w_constraint [(SF "=t") (DF "=w")]) > +(define_mode_attr F_r_constraint [(SF "t") (DF "w")]) > +(define_mode_attr F_fma_type [(SF "fmacs") (DF "fmacd")]) > +(define_mode_attr F_target [(SF "") (DF "&& TARGET_VFP_DOUBLE")]) > + > > ;; > ;; Code attributes > > ;; > diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md > index 2061414..2a50353 100644 > --- a/gcc/config/arm/vfp.md > +++ b/gcc/config/arm/vfp.md > @@ -890,6 +890,54 @@ > (set_attr "type" "fmacd")] > ) > > +;; Fused-multiply-accumulate > + > +(define_insn "fma4" > + [(set (match_operand:SDF 0 "register_operand" "") > +(fma:SDF (match_operand:SDF 1 "register_operand" "") > + (match_operand:SDF 2 "register_operand" "") > + (match_operand:SDF 3 "register_operand" "0")))] > + "TARGET_32BIT && TARGET_HARD_FLOAT && TARGET_FMA " > + "vfma%?.\\t%0, %1, %2" > + [(set_attr "predicable" "yes") > + (set_attr "type" "")] > +) > + > +(define_insn "*fmsub4" > + [(set (match_operand:SDF 0 "register_operand" "") > + (fma:SDF (neg:SDF (match_operand:SDF 1 "register_operand" > + "")) > + (match_operand:SDF 2 "register_operand" "") > + (match_operand:SDF 3 "register_operand" "0")))] > + "TARGET_32BIT && TARGET_HARD_FLOAT && TARGET_FMA " > + "vfms%?.\\t%0, %1, %2" > + [(set_attr "predicable" "yes") > + (set_attr "type" "")] > +) > + > +(define_insn "*fnmsub4" > + [(set (match_operand:SDF 0 "register_operand" "") > + (fma:SDF (match_operand:SDF 1 "register_operand" "") > + (match_operand:SDF 2 "register_operand" "") > + (neg:SDF (match_operand:SDF 3 "register_operand" "0"] > + "TARGET_32BIT && TARGET_HARD_FLOAT && TARGET_FMA " > + "vfnms%?.\\t%0, %1, %2" > + [(set_attr "predi
Re: [Patch] PR 51938: extend ifcombine
Thanks for the comments, I'll look into it later in the summer. On Tue, 26 Jun 2012, Richard Guenther wrote: On Sat, Jun 23, 2012 at 7:18 PM, Marc Glisse wrote: Hello, thanks for looking into the patch. A couple more details now that I am back from a conference: On Wed, 20 Jun 2012, Marc Glisse wrote: On Wed, 20 Jun 2012, Richard Guenther wrote: On Sun, Jun 10, 2012 at 4:16 PM, Marc Glisse wrote: Hello, currently, tree-ssa-ifcombine handles pairs of imbricated "if"s that share the same then branch, or the same else branch. There is no particular reason why it couldn't also handle the case where the then branch of one is the else branch of the other, which is what I do here. Any comments? The general idea looks good, but I think the patch is too invasive. As far as I can see the only callers with a non-zero 'inv' argument come from ifcombine_ifnotorif and ifcombine_ifnotandif (and both with inv == 2). The idea of also supporting inv==1 or inv==3 is for uniformity, and because we might at some point want to get rid of the 'or' function and implement everything in terms of the 'and' version, with suitably inverted tests. I would rather see a more localized patch that makes use of invert_tree_comparison to perform the inversion on the call arguments of maybe_fold_and/or_comparisons. I would rather have done that as well, and as a matter of fact that is what the first version of the patch did, until I realized that -ftrapping-math was the default. Is there any reason that would not work? invert_tree_comparison is useless for floating point (the case I am most interested in) unless we specify -fno-trapping-math (writing this patch taught me to add this flag to my default flags, but I can't expect everyone to do the same). An issue is that gcc mixes the behaviors of qnan and snan (it is not really an issue, it just means that !(comparison) can't be represented as comparison2). At least + if (inv & 1) + lcompcode2 = COMPCODE_TRUE - lcompcode2; looks as if it were not semantically correct - you cannot simply invert floating-point comparisons (see the restrictions invert_tree_comparison has). I don't remember all details, but I specifically thought of that, and the trapping behavior is handled a few lines below. More specifically, it has (was already there, I didn't add it): if (!honor_nans) ... else if (flag_trapping_math) { /* Check that the original operation and the optimized ones will trap under the same condition. */ bool ltrap = (lcompcode & COMPCODE_UNORD) == 0 && (lcompcode != COMPCODE_EQ) && (lcompcode != COMPCODE_ORD); ... same for rtrap and trap /* In a short-circuited boolean expression the LHS might be such that the RHS, if evaluated, will never trap. For example, in ORD (x, y) && (x < y), we evaluate the RHS only if neither x nor y is NaN. (This is a mixed blessing: for example, the expression above will never trap, hence optimizing it to x < y would be invalid). */ if ((code == TRUTH_ORIF_EXPR && (lcompcode & COMPCODE_UNORD)) || (code == TRUTH_ANDIF_EXPR && !(lcompcode & COMPCODE_UNORD))) rtrap = false; /* If the comparison was short-circuited, and only the RHS trapped, we may now generate a spurious trap. */ if (rtrap && !ltrap && (code == TRUTH_ANDIF_EXPR || code == TRUTH_ORIF_EXPR)) return NULL_TREE; /* If we changed the conditions that cause a trap, we lose. */ if ((ltrap || rtrap) != trap) ... which presumably ensures that the trapping behavior is preserved (I'll have to double-check that I didn't break that logic). I was more concerned about the behavior with NaNs in general where x < y is not equal to x >= y. Now combine_comparison handles only the very specific case of logical and or or of two comparisons with the same operands, you basically make it handle && ~ and || ~, too (and ~ && ~ and ~ || ~), so it seems that optionally inverting the result of the 2nd operand should be enough making the interface prettier compared to the bitmask. With the following change this should be installed separately - it's a functional change already now /* If we changed the conditions that cause a trap, we lose. */ if ((ltrap || rtrap) != trap) + { + /* Try the inverse condition. */ + compcode = COMPCODE_TRUE - compcode; + exchg = true; + trap = (compcode & COMPCODE_UNORD) == 0 + && (compcode != COMPCODE_EQ) + && (compcode != COMPCODE_ORD); + } + if ((ltrap || rtrap) != trap) return NULL_TREE; } ... + ret = fold_build2_loc (loc, tcode, truth_type, ll_arg, lr_arg); + if (exchg) + ret = fold_build1_loc (loc, TRUTH_NOT_EXPR, truth_type, ret); Do you have an idea how this can be handled in a l
Re: [RFA/ARM 2/3] Add vectorizer support for VFMA
On 25/06/12 15:59, Matthew Gretton-Dann wrote: > All, > > This patch adds vectoriser support for VFMA to the ARM Neon backend. > > Note that the VFP VFNMA and VFNMS instructions do not have Neon > equivalents. > > OK? Sorry, no. The neon versions of FMA do not handle denormalized values, so this needs to reject vectorization unless flag_unsafe_math_optimizations is true. R. > > gcc/ChangeLog: > > 2012-06-25 Matthew Gretton-Dann > > * config/arm/neon.md (fma4): New pattern. > (*fmsub4): Likewise. > > 2012-06-25 Matthew Gretton-Dann > > * gcc.target/arm/neon-vfma-1.c: New testcase. > * gcc.target/arm/neon-vfms-1.c: Likewise. > * lib/target-supports.exp (add_options_for_arm_neonv2): New > function. > (check_effective_target_arm_neonv2_ok_nocache): Likewise. > (check_effective_target_arm_neonv2_ok): Likewise. > (check_effective_target_arm_neonv2_hw): Likewise. > (check_effective_target_arm_neonv2): Likewise. > > Thanks, > > Matt > > > 0002-Add-vectorizer-support-for-VFMA.txt > > > diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md > index 4568dea..4d12fb3 100644 > --- a/gcc/config/arm/neon.md > +++ b/gcc/config/arm/neon.md > @@ -711,6 +711,33 @@ > (const_string > "neon_mla_qqq_32_qqd_32_scalar")] > ) > > +;; Fused multiply-accumulate > +(define_insn "fma4" > + [(set (match_operand:VCVTF 0 "register_operand" "=w") > +(fma:VCVTF (match_operand:VCVTF 1 "register_operand" "w") > + (match_operand:VCVTF 2 "register_operand" "w") > + (match_operand:VCVTF 3 "register_operand" "0")))] > + "TARGET_NEON && TARGET_FMA" > + "vfma%?.\\t%0, %1, %2" > + [(set (attr "neon_type") > + (if_then_else (match_test "") > + (const_string "neon_fp_vmla_ddd") > + (const_string "neon_fp_vmla_qqq")))] > +) > + > +(define_insn "*fmsub4" > + [(set (match_operand:VCVTF 0 "register_operand" "=w") > +(fma:VCVTF (neg:VCVTF (match_operand:VCVTF 1 "register_operand" "w")) > +(match_operand:VCVTF 2 "register_operand" "w") > +(match_operand:VCVTF 3 "register_operand" "0")))] > + "TARGET_NEON && TARGET_FMA" > + "vfms%?.\\t%0, %1, %2" > + [(set (attr "neon_type") > + (if_then_else (match_test "") > + (const_string "neon_fp_vmla_ddd") > + (const_string "neon_fp_vmla_qqq")))] > +) > + > (define_insn "ior3" >[(set (match_operand:VDQ 0 "s_register_operand" "=w,w") > (ior:VDQ (match_operand:VDQ 1 "s_register_operand" "w,0") > diff --git a/gcc/testsuite/gcc.target/arm/neon-vfma-1.c > b/gcc/testsuite/gcc.target/arm/neon-vfma-1.c > new file mode 100644 > index 000..a003a82 > --- /dev/null > +++ b/gcc/testsuite/gcc.target/arm/neon-vfma-1.c > @@ -0,0 +1,12 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target arm_neonv2_ok } */ > +/* { dg-options "-O2 -ftree-vectorize -ffast-math" } */ > +/* { dg-add-options arm_neonv2 } */ > +/* { dg-final { scan-assembler "vfma\\.f32\[ \]+\[dDqQ]" } } */ > + > +/* Verify that VFMA is used. */ > +void f1(int n, float a, float x[], float y[]) { > + int i; > + for (i = 0; i < n; ++i) > +y[i] = a * x[i] + y[i]; > +} > diff --git a/gcc/testsuite/gcc.target/arm/neon-vfms-1.c > b/gcc/testsuite/gcc.target/arm/neon-vfms-1.c > new file mode 100644 > index 000..8cefd8a > --- /dev/null > +++ b/gcc/testsuite/gcc.target/arm/neon-vfms-1.c > @@ -0,0 +1,12 @@ > +/* { dg-do compile } */ > +/* { dg-require-effective-target arm_neonv2_ok } */ > +/* { dg-options "-O2 -ftree-vectorize -ffast-math" } */ > +/* { dg-add-options arm_neonv2 } */ > +/* { dg-final { scan-assembler "vfms\\.f32\[ \]+\[dDqQ]" } } */ > + > +/* Verify that VFMS is used. */ > +void f1(int n, float a, float x[], float y[]) { > + int i; > + for (i = 0; i < n; ++i) > +y[i] = a * -x[i] + y[i]; > +} > diff --git a/gcc/testsuite/lib/target-supports.exp > b/gcc/testsuite/lib/target-supports.exp > index bc5baa7..9fc8a5c 100644 > --- a/gcc/testsuite/lib/target-supports.exp > +++ b/gcc/testsuite/lib/target-supports.exp > @@ -2082,6 +2082,19 @@ proc add_options_for_arm_neon { flags } { > return "$flags $et_arm_neon_flags" > } > > +# Add the options needed for NEON. We need either -mfloat-abi=softfp > +# or -mfloat-abi=hard, but if one is already specified by the > +# multilib, use it. Similarly, if a -mfpu option already enables > +# NEON, do not add -mfpu=neon. > + > +proc add_options_for_arm_neonv2 { flags } { > +if { ! [check_effective_target_arm_neonv2_ok] } { > + return "$flags" > +} > +global et_arm_neonv2_flags > +return "$flags $et_arm_neonv2_flags" > +} > + > # Return 1 if this is an ARM target supporting -mfpu=neon > # -mfloat-abi=softfp or equivalent options. Some multilibs may be > # incompatible with these options. Also set et_arm_neon_flags to the > @@ -2110,6 +2123,38 @@ proc check_effective_tar
Re: Fix PR c++/19351 (operator new[] overflow)
On 06/14/2012 11:55 AM, Florian Weimer wrote: This is another attempt at ensuring that operator new[] always returns a block of sufficient size. This is on top of my previous patch rejecting VLA allocations: http://gcc.gnu.org/ml/gcc-patches/2012-06/msg00616.html I've committed the patch referenced above, and rebased this patch. I also made two additional changes: The Hamming weight reduction (to help encoding the maximum size constant on architectures like ARM) was unnecessarily aggressive, and I had forgotten to disable cookies for non-cookie types, leading to failures when testing Java. Bootstrapped and tested on x86_86-unknown-linux-gnu, with no new regressions (this time including Java). Okay for trunk? -- Florian Weimer / Red Hat Product Security Team 2012-06-26 Florian Weimer PR c++/19351 * call.c (build_operator_new_call): Add size_check argument and evaluate it. * cp-tree.h (build_operator_new_call): Adjust declaration. * init.c (build_new_1): Compute array size check and apply it. 2012-06-26 Florian Weimer * g++.dg/init/new38.C: New test. * g++.dg/init/new39.C: New test. diff --git a/gcc/cp/call.c b/gcc/cp/call.c index 09965b3..806e0ba 100644 --- a/gcc/cp/call.c +++ b/gcc/cp/call.c @@ -3943,15 +3943,19 @@ build_new_function_call (tree fn, VEC(tree,gc) **args, bool koenig_p, total number of bytes required by the allocation, and is updated if that is changed here. *COOKIE_SIZE is non-NULL if a cookie should be used. If this function determines that no cookie should be - used, after all, *COOKIE_SIZE is set to NULL_TREE. If FN is - non-NULL, it will be set, upon return, to the allocation function - called. */ + used, after all, *COOKIE_SIZE is set to NULL_TREE. If SIZE_CHECK + is not NULL_TREE, it is evaluated before calculating the final + array size, and if it fails, the array size is replaced with + (size_t)-1 (usually triggering a std::bad_alloc exception). If FN + is non-NULL, it will be set, upon return, to the allocation + function called. */ tree build_operator_new_call (tree fnname, VEC(tree,gc) **args, - tree *size, tree *cookie_size, + tree *size, tree *cookie_size, tree size_check, tree *fn, tsubst_flags_t complain) { + tree original_size = *size; tree fns; struct z_candidate *candidates; struct z_candidate *cand; @@ -3959,6 +3963,10 @@ build_operator_new_call (tree fnname, VEC(tree,gc) **args, if (fn) *fn = NULL_TREE; + /* Set to (size_t)-1 if the size check fails. */ + if (size_check != NULL_TREE) +*size = fold_build3 (COND_EXPR, sizetype, size_check, + original_size, TYPE_MAX_VALUE (sizetype)); VEC_safe_insert (tree, gc, *args, 0, *size); *args = resolve_args (*args, complain); if (*args == NULL) @@ -4022,7 +4030,11 @@ build_operator_new_call (tree fnname, VEC(tree,gc) **args, if (use_cookie) { /* Update the total size. */ - *size = size_binop (PLUS_EXPR, *size, *cookie_size); + *size = size_binop (PLUS_EXPR, original_size, *cookie_size); + /* Set to (size_t)-1 if the size check fails. */ + gcc_assert (size_check != NULL_TREE); + *size = fold_build3 (COND_EXPR, sizetype, size_check, +*size, TYPE_MAX_VALUE (sizetype)); /* Update the argument list to reflect the adjusted size. */ VEC_replace (tree, *args, 0, *size); } diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h index a4b7ae3..9033108 100644 --- a/gcc/cp/cp-tree.h +++ b/gcc/cp/cp-tree.h @@ -4879,7 +4879,7 @@ extern tree build_user_type_conversion (tree, tree, int, extern tree build_new_function_call (tree, VEC(tree,gc) **, bool, tsubst_flags_t); extern tree build_operator_new_call (tree, VEC(tree,gc) **, tree *, - tree *, tree *, + tree *, tree, tree *, tsubst_flags_t); extern tree build_new_method_call (tree, tree, VEC(tree,gc) **, tree, int, tree *, diff --git a/gcc/cp/init.c b/gcc/cp/init.c index 5a81643..2f2ef69 100644 --- a/gcc/cp/init.c +++ b/gcc/cp/init.c @@ -2175,7 +2175,10 @@ build_new_1 (VEC(tree,gc) **placement, tree type, tree nelts, tree pointer_type; tree non_const_pointer_type; tree outer_nelts = NULL_TREE; + /* For arrays, a bounds checks on the NELTS parameter. */ + tree outer_nelts_check = NULL_TREE; bool outer_nelts_from_type = false; + double_int inner_nelts_count = double_int_one; tree alloc_call, alloc_expr; /* The address returned by the call to "operator new". This node is a VAR_DECL and is therefore reusable. */ @@ -2228,7 +2231,22 @@ build_new_1 (VEC(tree,gc) **placement, tree type, tree nelts, { tree inner_nelts = array_type_nelts_top (elt_type); tree inner_nelts_cst = maybe_constant_value (inner_nelts); - if (!TREE_CONSTANT (inner_nelts_cst)) + if (TREE_CONSTANT (inner_nelts_cst) + && TREE_CODE (inner_nelts_cst) == INTEGER_CST) + { + double_int result; + if (mul_double (TREE_INT_CST_LOW (inner_nelts_cst), + TREE_INT_CST_
Re: [PATCH] Strength reduction
On Tue, 26 Jun 2012, William J. Schmidt wrote: > On Tue, 2012-06-26 at 15:06 +0200, Richard Guenther wrote: > > On Mon, 25 Jun 2012, William J. Schmidt wrote: > > > > > Here's a new version of the main strength reduction patch, addressing > > > previous comments. A couple of quick notes: > > > > > > * I opened PR53773 and PR53774 for the cases where commutative > > > operations were encountered with a constant in rhs1. This version of > > > the patch still has the gcc_asserts in place to catch those cases, but > > > I'll plan to remove those once the patch is approved. > > > > > > * You previously asked: > > > > > > >> > > > >> +static slsr_cand_t > > > >> +base_cand_from_table (tree base_in) > > > >> +{ > > > >> + slsr_cand mapping_key; > > > >> + > > > >> + gimple def = SSA_NAME_DEF_STMT (base_in); > > > >> + if (!def) > > > >> +return (slsr_cand_t) NULL; > > > >> + > > > >> + mapping_key.cand_stmt = def; > > > >> + return (slsr_cand_t) htab_find (stmt_cand_map, &mapping_key); > > > >> > > > >> isn't that reachable via the base-name -> chain mapping for base_in? > > > > > > I had to review this a bit, but the answer is no. If you look at one of > > > the algebraic manipulations in create_mul_ssa_cand as an example, > > > base_in corresponds to Y. base_cand_from_table is looking for a > > > candidate that has Y for its LHS. The base-name -> chain mapping is > > > used to find all candidates that have B as the base_name. > > > > > > * I added a detailed explanation of what's going on with legal_cast_p. > > > Hopefully this will be easier to understand now. > > > > > > I've bootstrapped this on powerpc64-unknown-linux-gnu with three new > > > regressions (for which I opened the two bug reports). Ok for trunk > > > after removing the asserts? > > > > Ok. Please keep an eye on possible fallout. > > Yep, will do. > > > > > One more question - you put the pass inbetween VRP and DOM, any > > reason to not put it after DOM/phi_only_cprop which perform CSE? > > Thus, does strength-reduction expose CSE opportunities? > > It can introduce copies in some circumstances, which DOM will propagate. > That was the main reason for putting it there, IIRC. I originally > placed it after DOM so it would be in a copy-free environment, but it > ended up leaving some garbage behind that way. > > An alternative would be to explicitly propagate any introduced copies > and move the pass later. I don't recall offhand if there were other > reasons besides copy propagation for moving the pass -- I looked back > through my notes and apparently didn't record anything about it at the > time. Fair enough - the pass pipeline after loop optimizers is not thoroughly thought through anyway. Thanks, Richard.
Re: [PATCH ARM iWMMXt 0/5] Improve iWMMXt support
Hi Matt, There's also a trivial documentation fix: [PATCH 1/2] doc: Correct __builtin_arm_tinsr prototype documentation and a test to exercise the intrinsics: [PATCH 2/2] arm: add iwMMXt mmx-2.c test These have both been checked in. It turns out that both needed minor updates as some of the builtins have changed since these patches were written. I have taken care of this however. Cheers Nick
Re: New option to turn off stack reuse for temporaries
On 06/26/2012 04:28 AM, Richard Guenther wrote: No - the fact that the flag is C++ specific but in common.opt is odd enough and -ftemp-reuse-stack sounds very very generic - which in fact it is not, it's a no-op in C. Is there a more formal phrase for the temporary kind that is affected? Not that I'm aware of. This is a temporary introduced by TARGET_EXPR, which lives until the end of the enclosing CLEANUP_POINT_EXPR. So - a little kludgy but probably more to what I'd like it to be would be to move the option to c-family/c.opt enabled only for C++ and Obj-C++ and export it to the middle-end via a new langhook Hmm, that does seem rather kludgy for something that affects the behavior of middle-end code working on GENERIC. (the gimplifier code should be in Frontend code that lowers to GENERIC really and the WITH_CLEANUP_EXPR code should be C++ frontend specific ...). TARGET_EXPR has been a back-end code since the dawn of GCC version control; if it's still only used by the C++ front end I guess it could move to cp-tree.def, but we can't really lower it until gimplification time because we need to strip away enclosing COMPOUND_EXPRs and such so we can see that it's on the RHS of an initialization and optimize away the temporary in that case. And now that GIMPLE isn't a subset of GENERIC, we can't just use the gimplifier at genericization time. And I'd rather not duplicate the entire gimplifier in the front end. Jason
Re: [wwwdocs] Update coding conventions for C++
On 06/25/2012 06:26 PM, Lawrence Crowl wrote: +orgcc_unreachable. If the checks are expensive or the +compiler can reasonably carry on after the error, they may be +conditioned on--enable-checking. by using gcc_checking_assert [Rationale] +FIXME: Discussion of deleting inappropraite special members. Is this FIXME still needed? The text following it seems to cover the issue well enough. +However, by default, RTTI is not permitted +and the compiler must build cleanly with -fno-rtti. This seems like an unnecessary restriction to me. +Disabling RTTI will save space in the compiler. This is a fine reason to disable RTTI if it isn't used, but doesn't seem like a strong reason to prohibit using it. The information is only emitted for classes with virtual functions, isn't very large, is typically emitted in only one object file, and since it lives in .rodata it can be shared between multiple compiler processes. +Checking the type of a class at runtime usually indicates a design problem. The tree_contains_struct machinery recently added to GCC maps directly onto dynamic_cast; if (CODE_CONTAINS_STRUCT(TREE_CODE(t),TS_DECL_WRTL)) /* do something with t->decl_with_rtl */ translates roughly to to if (decl_with_rtl *p = dynamic_cast (t)) /* do something with p */ When new interfaces are added partway down an inheritance tree, dynamic_cast is the right way to access them. This isn't checking what the type is, it's checking whether the object supports a particular method. The fact that we've gone to the trouble to implement this functionality in C suggests to me that using it isn't always indicative of a design problem. :) +If you need to know the type of a class for some other reason, +use an enum or a virtual member function +that coverts a pointer to the more derived class. +For example, + + + +common_type *p = ; +if (specific_type *q = p->to_specific ()) { + // We have and can use a specific_type pointed to by q. +} + This is basically equivalent to dynamic_cast except that you need to clutter up your root base class with one virtual function for each derived class you want to be able to convert to. Jason
Re: ping x3 : latch mem to reg before multi-access in convert_move
On 06/26/2012 03:40 AM, Olivier Hainque wrote: > Hello Richard, > > On Jun 25, 2012, at 20:36 , Richard Henderson wrote: > >> Can you explain the sequence that results in multiple memory >> references? Because I can't see how it can... > > Compared to the point at which we were observing the problem, > the code has changed a bit in this area though not so much. > > In the current mainline, this would be from the two uses of > "lowfrom" in: > > /* Get a copy of FROM widened to a word, if necessary. */ > ... > lowfrom = convert_to_mode (lowpart_mode, from, unsignedp); > > lowpart = gen_lowpart (lowpart_mode, to); > ==> emit_move_insn (lowpart, lowfrom); > > /* Compute the value to put in each remaining word. */ > if (unsignedp) > fill_value = const0_rtx; > else > fill_value = emit_store_flag (gen_reg_rtx (word_mode), > ==> LT, lowfrom, const0_rtx, > VOIDmode, 0, -1); Ah, right, since convert_to_mode can just reshuffle the MEM. The patch is ok. r~
Re: [patch] Only define JMP_BUF_SIZE in backends that also define DONT_USE_BUILTIN_SETJMP
On 06/26/2012 01:55 AM, Steven Bosscher wrote: > If __builtin_setjmp actually does work for ia64, why should we keep > DONT_USE_BUILTIN_SETJMP? Could it break user code somehow? It's an ABI change for anyone using --enable-sjlj-exceptions. But I don't know who that would be... r~
Re: New option to turn off stack reuse for temporaries
Hi, On Tue, 26 Jun 2012, Jason Merrill wrote: > > (the gimplifier code should be in Frontend code that lowers to GENERIC > > really and the WITH_CLEANUP_EXPR code should be C++ frontend specific > > ...). > > TARGET_EXPR has been a back-end code since the dawn of GCC version > control; if it's still only used by the C++ front end I guess it could > move to cp-tree.def, but we can't really lower it until gimplification > time because we need to strip away enclosing COMPOUND_EXPRs and such so > we can see that it's on the RHS of an initialization and optimize away > the temporary in that case. And now that GIMPLE isn't a subset of > GENERIC, we can't just use the gimplifier at genericization time. And > I'd rather not duplicate the entire gimplifier in the front end. I agree with Jason. TARGET_EXPR and CLEANUP_POINT_EXPR might currently be used only for C++, but I think they are sensible general constructs to be supported by the gimplifier. But I also think that the option to disable stack slot sharing should be moved to cfgexpand to trigger non-sharing of everything, not just these cleanup temporaries. After all using the (c++)temporary after expression end is a source bug that the option is supposed to work around, just like this is: char *p; { char str[50]; p = str; } use(p); So, IMO the option should also work around this source bug. We had at least one example of that in our own code base. Ciao, Michael.
Re: New option to turn off stack reuse for temporaries
On Tue, Jun 26, 2012 at 06:07:09PM +0200, Michael Matz wrote: > I agree with Jason. TARGET_EXPR and CLEANUP_POINT_EXPR might currently be > used only for C++, but I think they are sensible general constructs to be > supported by the gimplifier. > > But I also think that the option to disable stack slot sharing should be > moved to cfgexpand to trigger non-sharing of everything, not just these > cleanup temporaries. After all using the (c++)temporary after expression > end is a source bug that the option is supposed to work around, just like > this is: If you move it solely to cfgexpand time, broken code will still often not work the way it happened to work with 4.6 and earlier. You'd need to both disable the sharing and disable additions of gimple clobbers. Because otherwise DCE/DSE and other passes happily optimize (broken) code away. So, if we want a -fno-strict-aliasing like option to work around broken code, we should IMHO do both of those. > > char *p; { char str[50]; p = str; } use(p); > > So, IMO the option should also work around this source bug. We had at > least one example of that in our own code base. Yeah, gengtype I think. Jakub
Re: [patch] Only define JMP_BUF_SIZE in backends that also define DONT_USE_BUILTIN_SETJMP
On 6/26/2012 11:38 AM, Richard Henderson wrote: On 06/26/2012 01:55 AM, Steven Bosscher wrote: If __builtin_setjmp actually does work for ia64, why should we keep DONT_USE_BUILTIN_SETJMP? Could it break user code somehow? It's an ABI change for anyone using --enable-sjlj-exceptions. But I don't know who that would be... SJLJ exceptions are forced for hpux10. It might have been a bug to not define DONT_USE_BUILTIN_SETJMP but nobody complained as far as I know. Dave -- John David Anglindave.ang...@bell.net
[Patch, Fortran] CLASS handling for assumed-rank arrays
This patch assumes that the basic assumed-rank support is included, http://gcc.gnu.org/ml/fortran/2012-06/msg00144.html The attached patch implements the support of passing non-assumed-rank type/class arrays to assumed-rank class/type dummy arguments (type was working before). And passing assumed-rank class arrays to assumed-rank class arrays. It does not support passing assumed-rank class arrays to type arrays. The problem with the latter is that gfortran uses the TYPE_SIZE_UNIT to access the array elements, which imlies a copy in/copy out. For arguments with descriptor, a better choice would be to use the stride multiplier. (Catch: The current descriptor doesn't have one.) As the scalarizer doesn't work for assumed-rank arrays, the copy-in/copy-out fails at run time. (See also http://j3-fortran.org/pipermail/j3/2012-June/005438.html for the fun with pointer association when passing a CLASS with TARGET to a TYPE with TARGET.) Additionally, I think that this patch makes gfortran the second front end (after Ada), which uses a range for the assignment: I do not iterate through for dim, but use a.dim[1:rank] = b.dim[1:rank] in the assignment. The reason that I have to do a component wise assignment is that the class container directly contains the descriptor as a component - not as pointer. Thus, the descriptors can have different ranks... Build and regtested on x86-64-linux. OK for the trunk? Tobias 2012-06-26 Tobias Burnus PR fortran/48820 * class.c (gfc_build_class_symbol): Regard assumed-rank arrays as having GFC_MAX_DIMENSIONS. * trans-array.c (gfc_get_descriptor_dimension): New function, which returns the descriptor. (gfc_conv_descriptor_dimension): Use it. * trans-array.h (gfc_get_descriptor_dimension): New prototype. * trans-expr.c (class_array_data_assign): New static function. (gfc_conv_derived_to_class, gfc_conv_class_to_class): Use it. 2012-06-26 Tobias Burnus PR fortran/48820 * gfortran.dg/assumed_rank_7.f90: New. diff --git a/gcc/fortran/class.c b/gcc/fortran/class.c index c71aa4a..479014e 100644 --- a/gcc/fortran/class.c +++ b/gcc/fortran/class.c @@ -219,7 +219,7 @@ gfc_add_component_ref (gfc_expr *e, const char *name) void gfc_add_class_array_ref (gfc_expr *e) { - int rank = CLASS_DATA (e)->as->rank; + int rank = CLASS_DATA (e)->as->rank; gfc_array_spec *as = CLASS_DATA (e)->as; gfc_ref *ref = NULL; gfc_add_component_ref (e, "_data"); @@ -497,6 +497,7 @@ gfc_build_class_symbol (gfc_typespec *ts, symbol_attribute *attr, gfc_symbol *fclass; gfc_symbol *vtab; gfc_component *c; + int rank; if (as && *as && (*as)->type == AS_ASSUMED_SIZE) { @@ -517,11 +518,12 @@ gfc_build_class_symbol (gfc_typespec *ts, symbol_attribute *attr, return SUCCESS; /* Determine the name of the encapsulating type. */ + rank = !(*as) || (*as)->rank == -1 ? GFC_MAX_DIMENSIONS : (*as)->rank; get_unique_hashed_string (tname, ts->u.derived); if ((*as) && attr->allocatable) -sprintf (name, "__class_%s_%d_%da", tname, (*as)->rank, (*as)->corank); +sprintf (name, "__class_%s_%d_%da", tname, rank, (*as)->corank); else if ((*as)) -sprintf (name, "__class_%s_%d_%d", tname, (*as)->rank, (*as)->corank); +sprintf (name, "__class_%s_%d_%d", tname, rank, (*as)->corank); else if (attr->pointer) sprintf (name, "__class_%s_p", tname); else if (attr->allocatable) diff --git a/gcc/fortran/trans-array.c b/gcc/fortran/trans-array.c index f135af1..36db6ac 100644 --- a/gcc/fortran/trans-array.c +++ b/gcc/fortran/trans-array.c @@ -247,12 +247,11 @@ gfc_conv_descriptor_dtype (tree desc) desc, field, NULL_TREE); } -static tree -gfc_conv_descriptor_dimension (tree desc, tree dim) + +tree +gfc_get_descriptor_dimension (tree desc) { - tree field; - tree type; - tree tmp; + tree type, field; type = TREE_TYPE (desc); gcc_assert (GFC_DESCRIPTOR_TYPE_P (type)); @@ -262,10 +261,19 @@ gfc_conv_descriptor_dimension (tree desc, tree dim) && TREE_CODE (TREE_TYPE (field)) == ARRAY_TYPE && TREE_CODE (TREE_TYPE (TREE_TYPE (field))) == RECORD_TYPE); - tmp = fold_build3_loc (input_location, COMPONENT_REF, TREE_TYPE (field), - desc, field, NULL_TREE); - tmp = gfc_build_array_ref (tmp, dim, NULL); - return tmp; + return fold_build3_loc (input_location, COMPONENT_REF, TREE_TYPE (field), + desc, field, NULL_TREE); +} + + +static tree +gfc_conv_descriptor_dimension (tree desc, tree dim) +{ + tree tmp; + + tmp = gfc_get_descriptor_dimension (desc); + + return gfc_build_array_ref (tmp, dim, NULL); } diff --git a/gcc/fortran/trans-array.h b/gcc/fortran/trans-array.h index 9bafb94..b7ab806 100644 --- a/gcc/fortran/trans-array.h +++ b/gcc/fortran/trans-array.h @@ -154,6 +154,7 @@ tree gfc_conv_descriptor_data_get (tree); tree gfc_conv_descriptor_data_addr (tree); tree gfc_conv_descriptor_offset_get (tree); tree gfc_conv_descriptor_dtype (tree); +tree gfc_get_descriptor_dimension (tree); tre
Re: [Patch, Fortran] CLASS handling for assumed-rank arrays
On 06/26/2012 07:12 PM, Tobias Burnus wrote: +i = 1 That should be i = 0, sorry for attaching an old version of the patch. Tobias +call foo(ac) +call foo(at) +call bar(ac) +call bar(at) +if (i /= 12) call abort() +
Re: [patch] Only define JMP_BUF_SIZE in backends that also define DONT_USE_BUILTIN_SETJMP
On Tue, Jun 26, 2012 at 7:09 PM, John David Anglin wrote: > On 6/26/2012 11:38 AM, Richard Henderson wrote: >> >> On 06/26/2012 01:55 AM, Steven Bosscher wrote: >>> >>> If __builtin_setjmp actually does work for ia64, why should we keep >>> DONT_USE_BUILTIN_SETJMP? Could it break user code somehow? >> >> It's an ABI change for anyone using --enable-sjlj-exceptions. >> But I don't know who that would be... > > SJLJ exceptions are forced for hpux10. It might have been a bug to not > define > DONT_USE_BUILTIN_SETJMP but nobody complained as far as I know. Did hpux10 even ever run on ia64? Ciao! Steven
Re: [patch] Only define JMP_BUF_SIZE in backends that also define DONT_USE_BUILTIN_SETJMP
On 6/26/2012 1:21 PM, Steven Bosscher wrote: On Tue, Jun 26, 2012 at 7:09 PM, John David Anglin wrote: On 6/26/2012 11:38 AM, Richard Henderson wrote: On 06/26/2012 01:55 AM, Steven Bosscher wrote: If __builtin_setjmp actually does work for ia64, why should we keep DONT_USE_BUILTIN_SETJMP? Could it break user code somehow? It's an ABI change for anyone using --enable-sjlj-exceptions. But I don't know who that would be... SJLJ exceptions are forced for hpux10. It might have been a bug to not define DONT_USE_BUILTIN_SETJMP but nobody complained as far as I know. Did hpux10 even ever run on ia64? Not that I know of. I was just thinking that __builtin_setjmp/__builtin_longjmp might not interwork with the libc versions on hpux10. Dave -- John David Anglindave.ang...@bell.net
Re: [wwwdocs] Update coding conventions for C++
On 6/25/12, Alexandre Oliva wrote: > On Jun 25, 2012, Lawrence Crowl wrote: >> +These conventions will change over time, >> +but changing them requires that a convincing rationale. > > s/that// > >> +Complex heirarchies are to be avoided. > > s/heir/hier/ Both fixed. -- Lawrence Crowl
Re: [wwwdocs] Update coding conventions for C++
On 6/26/12, Martin Jambor wrote: > On Mon, Jun 25, 2012 at 03:26:01PM -0700, Lawrence Crowl wrote: > > > I have no idea. I don't use emacs. The two-space rule for > > > members comes from the wiki. The one-space rule for protection > > > labels is common practice. If folks want something else, > > > changes are fine with me. > > I'll also need an emacs C++ indentation style that conforms to > this in order to really be able to produce complying code myself. > So if anybody else will be working on that, I'm interested (to > use it and perhaps help crafting it) and I guess a number of > other people on this list are too... Alternatively, one could change the conventions to match an emacs style. Either is fine we me, as long as the style is reasonable. > > -Miscellaneous Conventions > > - > > -Code should use gcc_assert(EXPR) to check invariants. > > -Use gcc_unreachable() to mark places that should never be > > -reachable (such as an unreachable default case of a > > -switch). Do not use gcc_assert(0) for such purposes, as > > -gcc_unreachable gives the compiler more information. The > > -assertions are enabled unless explicitly configured off with > > ---enable-checking=none. Do not use abort. > > -User input should never be validated by either gcc_assert > > -or gcc_unreachable. If the checks are expensive or the > > -compiler can reasonably carry on after the error, they may be > > -conditioned on --enable-checking. > > It seems we should mention gcc_checking_assert here then. Jason had a suggestion, and I used that. > > + > > +Single inheritance is permitted. > > +Use public inheritance to describe interface inheritance, > > +i.e. 'is-a' relationships. > > +Use private and protected inheritance > > +to describe implementation inheritance. > > +Implementation inheritance can be expediant, > s/expediant/expedient/ Fixed. > > +but think twice before using it in code > > +intended to last a long time. > > I think all committed code should be expected to have long-lasting > quality. I would not encourage people to think otherwise and would > drop the "long time" reference here. If anybody ever commits > something ugly to bridge a short time period, it should only be done > under the "maintainers grant exceptions" rule anyway. > > > +> + > > +For long-term code, at least for now, > > +we will continue to use printf style I/O > > +rather thanstyle I/O. > > +For quick debugging code, > > + is permitted. > > + > > Similarly here, no quick and dirty debugging output should ever be > committed, we should not > > > +The Standard Library > > + > > + > > +At present, C++ provides no great advantage for i18n. > > +GCC does type checking for printf arguments, > > +so the type insecurity of printf is moot, > > +but the clarity in layout persists. > > +For quick debugging output, requires less work. > > + > > The same applies here. The value of these changes depends on when the rules are enforced. If they are enforced only on trunk, then the changes seem fine to me. However, if they are enforced on branches, then they could unnecessarily slow down development. Comments? -- Lawrence Crowl
Re: [wwwdocs] Update coding conventions for C++
> With that concern stated, I will write into the conventions whatever > concensus arises. > > Of course, I have no objection to an occasional inline cleanup. > That is, build with -Werror and adjusting inlines that have, > through the course of time, become larger than is useful. This seems to be what reliably happens in libstdc++-land. -benjamin
Re: Updated to respond to various email comments from Jason, Diego and Cary (issue6197069)
Committed as posted. On Mon, Jun 25, 2012 at 8:35 PM, Jason Merrill wrote: > OK. > > Jason
Re: [wwwdocs] Update coding conventions for C++
On 6/26/12, Jason Merrill wrote: > On 06/25/2012 06:26 PM, Lawrence Crowl wrote: > > +orgcc_unreachable. If the checks are expensive or the > > +compiler can reasonably carry on after the error, they may be > > +conditioned on--enable-checking. > > by using gcc_checking_assert I inserted that suggestion, but note that I did not create that text, only moved it. > [Rationale] > > +FIXME: Discussion of deleting inappropraite special members. > > Is this FIXME still needed? The text following it seems to cover the > issue well enough. No longer needed. > > +However, by default, RTTI is not permitted > > +and the compiler must build cleanly with -fno-rtti. > > This seems like an unnecessary restriction to me. > > > +Disabling RTTI will save space in the compiler. > > This is a fine reason to disable RTTI if it isn't used, but doesn't > seem like a strong reason to prohibit using it. The information is > only emitted for classes with virtual functions, isn't very large, > is typically emitted in only one object file, and since it lives > in .rodata it can be shared between multiple compiler processes. > > > +Checking the type of a class at runtime usually indicates a design problem. > > The tree_contains_struct machinery recently added to GCC maps > directly onto dynamic_cast; > > if (CODE_CONTAINS_STRUCT(TREE_CODE(t),TS_DECL_WRTL)) > /* do something with t->decl_with_rtl */ > > translates roughly to to > > if (decl_with_rtl *p = dynamic_cast (t)) > /* do something with p */ > > When new interfaces are added partway down an inheritance tree, > dynamic_cast is the right way to access them. This isn't checking > what the type is, it's checking whether the object supports a > particular method. > > The fact that we've gone to the trouble to implement this > functionality in C suggests to me that using it isn't always > indicative of a design problem. :) Personally, I am fine with using dynamic_cast. I was writing up what I thought was a consensus on the wiki. I can live without it, or I can use it profitably. Whatever you all decide, I will write up. > > +If you need to know the type of a class for some other reason, > > +use an enum or a virtual member function > > +that coverts a pointer to the more derived class. > > +For example, > > + > > + > > + > > +common_type *p = ; > > +if (specific_type *q = p->to_specific ()) { > > + // We have and can use a specific_type pointed to by q. > > +} > > + > > This is basically equivalent to dynamic_cast except that you need > to clutter up your root base class with one virtual function for > each derived class you want to be able to convert to. Note quite equivalent, as you can implement to_specific with non-virtual classes using the TREE_CODE. Thus would could implement a type-save dynamic pointer converter before converting to virtual classes. OTOH, we could probably work up a template function that looks like dynamic_cast but uses the TREE_CODE instead, achieving the same intermediate step. I agree that it does clutter up the base class. Shall we enable RTTI? -- Lawrence Crowl
Re: [wwwdocs] Update coding conventions for C++
Lawrence Crowl writes: [...] | Shall we enable RTTI? Benjamin made compelling arguments for doing this. And I agree. -- Gaby
Re: Updated to respond to various email comments from Jason, Diego and Cary (issue6197069)
On Mon, Jun 25, 2012 at 7:13 PM, Mike Stump wrote: > On Jun 25, 2012, at 3:15 PM, Sterling Augustine wrote: >> On Sat, Jun 23, 2012 at 3:55 AM, Dominique Dhumieres >> wrote: As I don't have access to a Darwin machine to test a fix, would you mind updating the test? >>> >>> The failures are gone with the obvious patch: > >> I will commit this if someone can approve it. > > Ok. Committed as Dominique posted with the included ChangeLog entry. Thanks Dominique. Sterling gcc/testsuite/ChangeLog 2012-06-26 Sterling Augustine Dominique Dhumieres * gcc.dg/pubtypes-2.c: Update expected output. * gcc.dg/pubtypes-3.c: Likewise. * gcc.dg/pubtypes-4.c: Likewise.
[testsuite] scandump.exp: use printable pattern in test summary
Procedure scan-dump-times in scandump.exp uses a printable version of the scanned pattern in the line reported to the test summary but others in that file don't. This patch fixes that in the remaining procedures in scandump.exp. The primary advantage of using the printable pattern is with patterns that include newlines, to keep the pattern all on the same line in the test summary. Tested on i686-pc-linux-gnu for gcc and g++. OK for trunk? Janis 2012-06-26 Janis Johnson * lib/scandump.exp (scan-dump, scan-dump-not, scan-dump-dem, scan-dump-dem-not): Use printable pattern in test name. Index: lib/scandump.exp === --- lib/scandump.exp(revision 188974) +++ lib/scandump.exp(working copy) @@ -47,8 +47,9 @@ set testcase [testname-for-summary] +set printable_pattern [make_pattern_printable [lindex $args 1]] set suf [dump-suffix [lindex $args 2]] -set testname "$testcase scan-[lindex $args 0]-dump $suf \"[lindex $args 1]\"" +set testname "$testcase scan-[lindex $args 0]-dump $suf \"$printable_pattern\"" set src [file tail [lindex $testcase 0]] set output_file "[glob -nocomplain $src.[lindex $args 2]]" if { $output_file == "" } { @@ -126,8 +127,9 @@ } set testcase [testname-for-summary] +set printable_pattern [make_pattern_printable [lindex $args 1]] set suf [dump-suffix [lindex $args 2]] -set testname "$testcase scan-[lindex $args 0]-dump-not $suf \"[lindex $args 1]\"" +set testname "$testcase scan-[lindex $args 0]-dump-not $suf \"$printable_pattern\"" set src [file tail [lindex $testcase 0]] set output_file "[glob -nocomplain $src.[lindex $args 2]]" if { $output_file == "" } { @@ -178,8 +180,9 @@ } set testcase [testname-for-summary] +set printable_pattern [make_pattern_printable [lindex $args 1]] set suf [dump-suffix [lindex $args 2]] -set testname "$testcase scan-[lindex $args 0]-dump-dem $suf \"[lindex $args 1]\"" +set testname "$testcase scan-[lindex $args 0]-dump-dem $suf \"$printable_pattern\"" set src [file tail [lindex $testcase 0]] set output_file "[glob -nocomplain $src.[lindex $args 2]]" if { $output_file == "" } { @@ -229,8 +232,9 @@ } set testcase [testname-for-summary] +set printable_pattern [make_pattern_printable [lindex $args 1] set suf [dump-suffix [lindex $args 2]] -set testname "$testcase scan-[lindex $args 0]-dump-dem-not $suf \"[lindex $args 1]\"" +set testname "$testcase scan-[lindex $args 0]-dump-dem-not $suf \"$printable_pattern\"" set src [file tail [lindex $testcase 0]] set output_file "[glob -nocomplain $src.[lindex $args 2]]" if { $output_file == "" } {
Re: [wwwdocs] Update coding conventions for C++
On Tue, 26 Jun 2012, Lawrence Crowl wrote: > Shall we enable RTTI? I think any proposal for adding a new C++ feature to the standards should go in its own thread on the gcc list with a meaningful subject, not this thread which should simply be about integrating uncontroversial standards in the existing documents. We should start by documenting standards that are a conservative and uncontroversial extension of the C coding standards currently used - if we need to ask the question about whether something should be allowed, it should not be allowed in the initial standards coming out of this general thread. Then, to add individual features to the list we can discuss those features and standards for them separately. -- Joseph S. Myers jos...@codesourcery.com
Re: [Patch, mips] Fix warning when using --with-synci
Steve Ellcey writes: > On Mon, 2012-06-25 at 20:00 +0100, Richard Sandiford wrote: >> Or to put it another way, MIPS_ISA_LEVEL_SPEC and MIPS_ARCH_FLOAT_SPEC >> make the multilib options explicit on the command line. You can then >> use that information to add whatever options you want to be the default >> for a given multilib. > > OK, I didn't really understand this setup before, but I think I get it > now and I see how I would use this to set -msynci. I guess I would > want to create a new triple for a multilib linux mips target so I will > work on that approach and see how it goes. In case the run-around over this patch has put you off: it'd be great if you could submit the configuration once you're happy with it. If MIPS think a target like this is needed then that's good enough for me. Thanks, Richard
Re: [patch] Only define JMP_BUF_SIZE in backends that also define DONT_USE_BUILTIN_SETJMP
On 06/26/2012 10:36 AM, John David Anglin wrote: > I was just thinking that __builtin_setjmp/__builtin_longjmp might not > interwork with the > libc versions on hpux10. They don't need to. r~
Re: [wwwdocs] Update coding conventions for C++
On 06/26/2012 02:47 PM, Joseph S. Myers wrote: We should start by documenting standards that are a conservative and uncontroversial extension of the C coding standards currently used - if we need to ask the question about whether something should be allowed, it should not be allowed in the initial standards coming out of this general thread. Sure; I don't mind initially disabling RTTI with a note that someone wanting to use it is free to propose adding it to the coding standards. I was mainly objecting to the rationale. Jason
Re: Commit: RX: Fix simple_return pattern
On Jun 26, 2012, at 1:59 AM, Nick Clifton wrote: > I am applying the patch below to the 4.7 branch. It fixes the > simple_return pattern in the RX backend so that it uses the > (simple_return) rtl. This fixes 59 gcc testsuite failures and > introduces no regressions. > > I plan on applying a similar patch to the mainline sources once I have > finished regression testing them. Really, trunk should always go in first... Could you hold 4.7 until trunk goes in?
Re: New option to turn off stack reuse for temporaries
On Jun 26, 2012, at 9:07 AM, Michael Matz wrote: > I agree with Jason. TARGET_EXPR and CLEANUP_POINT_EXPR might currently be > used only for C++, but I think they are sensible general constructs to be > supported by the gimplifier. As do I. The intent was for Ada and every other language with things like temporaries and cleanups to reuse the backend constructs, so that instead of writing optimizers, one for each language, to instead share the optimizer across languages. To me, the middle end and the backend are the best places for these.
Re: [testsuite] scandump.exp: use printable pattern in test summary
On Jun 26, 2012, at 11:29 AM, Janis Johnson wrote: > Procedure scan-dump-times in scandump.exp uses a printable version of > the scanned pattern in the line reported to the test summary but others > in that file don't. > OK for trunk? Ok.
[Ada] Attribute 'Old should only be used in postconditions
Forbid the use of attribute 'Old outside of postconditions, as required in Ada 2012. Similarly as for 'Result, it may also be used in Ensures clauses of test cases and contract cases attached to the subprogram. There is no need anymore to detect specially references to local variables, which would be rejected anyway in the context in which preconditions and postconditions are analyzed. Tested on x86_64-pc-linux-gnu, committed on trunk 2012-06-26 Yannick Moy * sem_attr.adb (Analyze_Attribute): Detect if 'Old is used outside a postcondition, and issue an error in such a case. Index: sem_attr.adb === --- sem_attr.adb(revision 188984) +++ sem_attr.adb(working copy) @@ -3905,10 +3905,95 @@ -- Old -- - - when Attribute_Old => + when Attribute_Old => Old : declare + CS : Entity_Id; + -- The enclosing scope, excluding loops for quantified expressions. + -- During analysis, it is the postcondition subprogram. During + -- pre-analysis, it is the scope of the subprogram declaration. - -- The attribute reference is a primary. If expressions follow, the - -- attribute reference is an indexable object, so rewrite the node + Prag : Node_Id; + -- During pre-analysis, Prag is the enclosing pragma node if any + + begin + -- Find enclosing scopes, excluding loops + + CS := Current_Scope; + while Ekind (CS) = E_Loop loop +CS := Scope (CS); + end loop; + + -- If we are in Spec_Expression mode, this should be the prescan of + -- the postcondition (or contract case, or test case) pragma. + + if In_Spec_Expression then + +-- Check in postcondition or Ensures clause + +Prag := N; +while not Nkind_In (Prag, N_Pragma, +N_Function_Specification, +N_Procedure_Specification, +N_Subprogram_Body) +loop + Prag := Parent (Prag); +end loop; + +if Nkind (Prag) /= N_Pragma then + Error_Attr ("% attribute can only appear in postcondition", P); + +elsif Get_Pragma_Id (Prag) = Pragma_Contract_Case +or else + Get_Pragma_Id (Prag) = Pragma_Test_Case +then + declare + Arg_Ens : constant Node_Id := + Get_Ensures_From_CTC_Pragma (Prag); + Arg : Node_Id; + + begin + Arg := N; + while Arg /= Prag and Arg /= Arg_Ens loop + Arg := Parent (Arg); + end loop; + + if Arg /= Arg_Ens then + if Get_Pragma_Id (Prag) = Pragma_Contract_Case then +Error_Attr + ("% attribute misplaced inside contract case", P); + else +Error_Attr + ("% attribute misplaced inside test case", P); + end if; + end if; + end; + +elsif Get_Pragma_Id (Prag) /= Pragma_Postcondition then + Error_Attr ("% attribute can only appear in postcondition", P); +end if; + + -- Body case, where we must be inside a generated _Postcondition + -- procedure, or else the attribute use is definitely misplaced. The + -- postcondition itself may have generated transient scopes, and is + -- not necessarily the current one. + + else +while Present (CS) and then CS /= Standard_Standard loop + if Chars (CS) = Name_uPostconditions then + exit; + else + CS := Scope (CS); + end if; +end loop; + +if Chars (CS) /= Name_uPostconditions then + Error_Attr ("% attribute can only appear in postcondition", P); +end if; + end if; + + -- Either the attribute reference is generated for a Requires + -- clause, in which case no expressions follow, or it is a + -- primary. In that case, if expressions follow, the attribute + -- reference is an indexable object, so rewrite the node -- accordingly. if Present (E1) then @@ -3926,17 +4011,13 @@ Check_E0; - -- Prefix has not been analyzed yet, and its full analysis will take - -- place during expansion (see below). + -- Prefix has not been analyzed yet, and its full analysis will + -- take place during expansion (see below). Preanalyze_And_Resolve (P); P_Type := Etype (P); Set_Etype (N, P_Type); - if No (Current_Sub
[Ada] Fields _CPU, _Priority and _Dispatching_Domain
This patch creates the fields _CPU, _Priority and _Dispatching_Domain in tasks' corresponding records when a rep item (pragm/attribute definition clause/aspect specification) for this aspect is present. Tested on x86_64-pc-linux-gnu, committed on trunk 2012-06-26 Vincent Pucci * exp_ch3.adb (Build_Init_Statements): Don't check the parents in the Rep Item Chain of the task for aspects Interrupt_Priority, Priority, CPU and Dispatching_Domain. * exp_ch9.adb (Expand_N_Task_Type_Declaration): fields _Priority, _CPU and _Domain are present in the corresponding record type only if the task entity has a pragma, attribute definition clause or aspect specification. (Make_Initialize_Protection): Don't check the parents in the Rep Item Chain of the task for aspects Interrupt_Priority, Priority, CPU and Dispatching_Domain. * freeze.adb (Freeze_Entity): Use of Evaluate_Aspects_At_Freeze_Point call replaced by Analyze_Aspects_At_Freeze_Point. * sem_ch13.adb, sem_ch13.ads (Analyze_Aspects_At_Freeze_Point): Renaming of Evaluate_Aspects_At_Freeze_Point. Index: exp_ch9.adb === --- exp_ch9.adb (revision 188984) +++ exp_ch9.adb (working copy) @@ -11270,30 +11270,36 @@ -- in the pragma, and is used to override the task stack size otherwise -- associated with the task type. - -- The _Priority field is always present. It will be filled at the freeze - -- point, when the record init proc is built, to capture the expression of - -- a Priority pragma, attribute definition clause or aspect specification - -- (see Build_Record_Init_Proc in Exp_Ch3). + -- The _Priority field is present only if the task entity has a Priority or + -- Interrupt_Priority rep item (pragma, aspect specification or attribute + -- definition clause). It will be filled at the freeze point, when the + -- record init proc is built, to capture the expression of the rep item + -- (see Build_Record_Init_Proc in Exp_Ch3). Note that it cannot be filled + -- here since aspect evaluations are delayed till the freeze point. -- The _Task_Info field is present only if a Task_Info pragma appears in -- the task definition. The expression captures the argument that was -- present in the pragma, and is used to provide the Task_Image parameter -- to the call to Create_Task. - -- The _CPU field is always present. It will be filled at the freeze point, - -- when the record init proc is built, to capture the expression of a CPU - -- pragma, attribute definition clause or aspect specification (see - -- Build_Record_Init_Proc in Exp_Ch3). + -- The _CPU field is present only if the task entity has a CPU rep item + -- (pragma, aspect specification or attribute definition clause). It will + -- be filled at the freeze point, when the record init proc is built, to + -- capture the expression of the rep item (see Build_Record_Init_Proc in + -- Exp_Ch3). Note that it cannot be filled here since aspect evaluations + -- are delayed till the freeze point. -- The _Relative_Deadline field is present only if a Relative_Deadline -- pragma appears in the task definition. The expression captures the -- argument that was present in the pragma, and is used to provide the -- Relative_Deadline parameter to the call to Create_Task. - -- The _Domain field is always present. It will be filled at the freeze - -- point, when the record init proc is built, to capture the expression of - -- a Dispatching_Domain pragma, attribute definition clause or aspect - -- specification (see Build_Record_Init_Proc in Exp_Ch3). + -- The _Domain field is present only if the task entity has a + -- Dispatching_Domain rep item (pragma, aspect specification or attribute + -- definition clause). It will be filled at the freeze point, when the + -- record init proc is built, to capture the expression of the rep item + -- (see Build_Record_Init_Proc in Exp_Ch3). Note that it cannot be filled + -- here since aspect evaluations are delayed till the freeze point. -- When a task is declared, an instance of the task value record is -- created. The elaboration of this declaration creates the correct bounds @@ -11566,17 +11572,20 @@ Collect_Entry_Families (Loc, Cdecls, Size_Decl, Tasktyp); - -- Add the _Priority component with no expression + -- Add the _Priority component if a Interrupt_Priority or Priority rep + -- item is present. - Append_To (Cdecls, -Make_Component_Declaration (Loc, - Defining_Identifier => -Make_Defining_Identifier (Loc, Name_uPriority), - Component_Definition => -Make_Component_Definition (Loc, - Aliased_Present=> False, - Subtype_Indication => -
Re: [PATCH][configure] Make sure CFLAGS_FOR_TARGET And CXXFLAGS_FOR_TARGET contain -O2
On Jun 26, 2012, Christophe Lyon wrote: > On 25.06.2012 17:24, Joseph S. Myers wrote: >> On Mon, 25 Jun 2012, Christophe Lyon wrote: >> >>> Ping? >> I advise CCing appropriate maintainers (in this case, build system >> maintainers) on pings. >> > Ping again, CCing build system maintainers as suggested by Joseph. Is this a ping for the patch you quoted in your Jun 25 email? It's generally good practice to include a link to the message holding the patch in a ping. I looked at the patch in there, and I'm afraid I don't understand how it achieves the ChangeLog-suggested purpose of ensuring -O2 makes to C*FLAGS_FOR_TARGET, when all it appears to do is to prepend -g. Can you please clarify? Thanks, -- Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist Red Hat Brazil Compiler Engineer
Re: RFA: dead_debug_* ICE
On Jun 25, 2012, Richard Sandiford wrote: > I'll leave the proper fix to Alex. ACK. Thanks for the patch, it should help me get started. -- Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist Red Hat Brazil Compiler Engineer
Re: [Patch, mips] Fix warning when using --with-synci
On Tue, 2012-06-26 at 19:51 +0100, Richard Sandiford wrote: > > OK, I didn't really understand this setup before, but I think I get it > > now and I see how I would use this to set -msynci. I guess I would > > want to create a new triple for a multilib linux mips target so I will > > work on that approach and see how it goes. > > In case the run-around over this patch has put you off: it'd be > great if you could submit the configuration once you're happy with it. > If MIPS think a target like this is needed then that's good enough for me. > > Thanks, > Richard Yes, I will submit it once I have something. I am new to MIPS and still trying to understand all the different variations and how they all relate to each other. Steve Ellcey sell...@mips.com
Re: ping x3 : latch mem to reg before multi-access in convert_move
On Jun 26, 2012, at 17:28 , Richard Henderson wrote: > Ah, right, since convert_to_mode can just reshuffle the MEM. > > The patch is ok. Great, will commit shortly. Thanks a lot for your prompt feedback, With Kind Regards, Olivier
Re: [wwwdocs] Update coding conventions for C++
On Jun 26, 2012, at 11:06 AM, Lawrence Crowl wrote: > Alternatively, one could change the conventions to match an emacs > style. Either is fine we me, as long as the style is reasonable. It is really nice to be able to use 5 year old emacsen out of the box on the source tree and have it just work. I'd vote for that...
Re: [PR49888, VTA] don't keep VALUEs bound to modified MEMs
On Jun 14, 2012, "H.J. Lu" wrote: > http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53671 These two patches cure the two remaining problems. Basically, we have a problem of tracking equivalent FP-relative addresses across basic blocks when SP varies. Once we lost track of SP, pushing an argument for a call (without any MEM_EXPRs) was regarded as aliasing an incoming argument, so we removed the argument binding from the var tracking table. This patch is a bit kludgy in that, in a perfect world, we'd be able to recognize a broader set of equivalent incoming expressions at dataflow confluences, taking both static and dynamic sets into account. Currently, we only take the dynamic set into account, and don't recognize as equivalent arithmetically equivalent expressions that aren't isomorphic. Fixing the more general problem is more than I can tackle right now, in part because I don't have an efficient algorithm in mind and I'm concerned a brute-force approach may be far too expensive in terms of CPU and memory. In fact, this is *the* main open problem in VTA, that I'd like to discuss at the Cauldron with interested parties. So I put in a kludge that tries to canonicalize incoming SPs and, if they match, record that the merged SP holds the same canonical representation. It solves one of the regressions, but it will *not* solve the general problem: AFAICT, a function that calls alloca, especially if it calls it in a loop, will not get a proper relationship between SP and FP so as to enable alias analysis to tell that a stack push for an outgoing argument doesn't alias an FP-relative incoming argument. Perhaps defining separate alias sets for the various portions of the stack frame, and annotating pushes that save registers and pushes of outgoing arguments with these alias sets might help, but I haven't tried that. The second patch simply adjusts a testcase to account for an optimization possibility I hadn't taken into account that happened to be exposed by the initial PR49888 patch, and the change happens to hide the failure caused by a variant of the alias analysis problem described above: we have an incoming pointer and a register-saving stack push that must not alias because the incoming pointer couldn't possibly point to the register-save area of a callee in a well-defined program, but alias analysis can't figure that out. Both patches were regstrapped on x86_64-linux-gnu and i686-linux-gnu. Ok? for gcc/ChangeLog from Alexandre Oliva PR debug/53671 PR debug/49888 * var-tracking.c (attrs_list_by_loc_eq): New. (track_stack_pointer): New. (dataflow_set_merge): Use it. (vt_initialize): Record the initial stack pointer in the dataflow set. Index: gcc/var-tracking.c === --- gcc/var-tracking.c.orig 2012-06-26 17:33:12.991323578 -0300 +++ gcc/var-tracking.c 2012-06-26 17:51:55.316453808 -0300 @@ -1462,6 +1462,17 @@ attrs_list_member (attrs list, decl_or_v return NULL; } +/* Return the entry whose LOC field equals LOC. */ + +static attrs +attrs_list_by_loc_eq (attrs list, rtx loc) +{ + for (; list; list = list->next) +if (list->loc == loc) + return list; + return NULL; +} + /* Insert the triplet DECL, OFFSET, LOC to the list *LISTP. */ static void @@ -4028,6 +4039,86 @@ variable_merge_over_src (variable s2var, return 1; } +/* Add to DST any needed binding for the canonicalization of the stack + pointer to yield the same expression as in SRC1 and SRC2, if they + both yield the same expression. + + Return TRUE iff we found an equivalence. + + ??? The return value, that was useful during testing, ended up + unused, but this single-use static function will be inlined, and + then the return value computation will be optimized out, so I'm + leaving it in. + + ??? We use this kludge to avoid accidental aliasing between + incoming arguments and register-saving or outgoing-args pushes. We + shouldn't have to add explicit stack pointer tracking for that: + intersect_loc_chains ought to be able to take information from the + static cselib table and recognize additional equivalences, but we + don't have a sufficiently efficient algorithm to do that yet. */ + +static bool +track_stack_pointer (dataflow_set *dst, dataflow_set *src1, dataflow_set *src2) +{ + attrs dattr, s1attr, s2attr; + rtx daddr, s1addr, s2addr; + decl_or_value dv; + + for (dattr = dst->regs[STACK_POINTER_REGNUM]; + (dattr = attrs_list_by_loc_eq (dattr, stack_pointer_rtx)) + && (dattr->offset || !dv_is_value_p (dattr->dv)); + dattr = dattr->next) +; + + for (s1attr = src1->regs[STACK_POINTER_REGNUM]; + (s1attr = attrs_list_by_loc_eq (s1attr, stack_pointer_rtx)) + && (s1attr->offset || !dv_is_value_p (s1attr->dv)); + s1attr = s1attr->next) +; + if (!s1attr) +return false; + + for (s2attr = src2->regs[STACK_POINTER_REGNUM]; + (s2attr = attrs_list_by_loc_eq (s2attr, stack_
[testsuite] don't use lto plugin if it doesn't work
I test i686-linux-gnu in a presumably unusual setting: it's an x86_64-linux-gnu system, and I've configured the GCC build to use as builddev tools wrapper scripts for as, ld, gnatmake and gcc that add flags that make them default to 32-bit. This worked fine for regression testing, but I've recently realized (with the PR49888/53671 mishap) that I'm getting tons of LTO testsuite failures (before and after, so no regression), because the 32-bit LTO plugin built in this setting can't possibly be used by the 64-bit linker installed on the system. Obviously, passing -melf_i386 to the linker through the wrapper is not enough for it to be able to dlopen a 32-bit plugin ;-) I'm considering installing alternate 32-bit tools on my system for better testing, but I figured I'd betetr fix the test harness before tackling this: if we find that the LTO plugin doesn't work, we'd better not use -flto without -fno-use-linker-plugin, to avoid such spurious failures as GCC attempts by its own initiative to use the linker plugin. At first I wished there was a simpler, off-band way to tell it not to use the LTO plugin, that didn't pollute the command line or the test results, but before I even looked around for some such way, I figured it would be useful to have information that the linker plugin was not used in a particular test run, so I went for this patch instead. Ok to install? for gcc/testsuite/ChangeLog from Alexandre Oliva * lib/c-torture.exp (LTO_TORTURE_OPTIONS): Add -fno-use-linker-plugin if plugin detection does not succeed. * lib/gcc-dg.exp (LTO_TORTURE_OPTIONS): Likewise. * lib/lto.exp (LTO_OPTIONS): Likewise. Index: gcc/testsuite/lib/c-torture.exp === --- gcc/testsuite/lib/c-torture.exp.orig 2012-06-25 19:15:07.490911571 -0300 +++ gcc/testsuite/lib/c-torture.exp 2012-06-25 19:15:32.927836277 -0300 @@ -61,8 +61,8 @@ if [check_effective_target_lto] { ] } else { set LTO_TORTURE_OPTIONS [list \ - { -O2 -flto -flto-partition=none } \ - { -O2 -flto } + { -O2 -flto -fno-use-linker-plugin -flto-partition=none } \ + { -O2 -flto -fno-use-linker-plugin } ] } } Index: gcc/testsuite/lib/gcc-dg.exp === --- gcc/testsuite/lib/gcc-dg.exp.orig 2012-06-25 19:15:09.617738045 -0300 +++ gcc/testsuite/lib/gcc-dg.exp 2012-06-25 19:15:33.161817189 -0300 @@ -92,8 +92,8 @@ if [check_effective_target_lto] { set gcc_force_conventional_output "-ffat-lto-objects" } else { set LTO_TORTURE_OPTIONS [list \ - { -O2 -flto -flto-partition=none } \ - { -O2 -flto } + { -O2 -flto -fno-use-linker-plugin -flto-partition=none } \ + { -O2 -flto -fno-use-linker-plugin } ] } } Index: gcc/testsuite/lib/lto.exp === --- gcc/testsuite/lib/lto.exp.orig 2012-06-25 19:15:14.617330140 -0300 +++ gcc/testsuite/lib/lto.exp 2012-06-25 19:15:34.99945 -0300 @@ -77,12 +77,12 @@ proc lto_init { args } { ] } else { set LTO_OPTIONS [list \ - {-O0 -flto -flto-partition=none } \ - {-O2 -flto -flto-partition=none } \ - {-O0 -flto -flto-partition=1to1 } \ - {-O2 -flto -flto-partition=1to1 } \ - {-O0 -flto } \ - {-O2 -flto} \ + {-O0 -flto -fno-use-linker-plugin -flto-partition=none } \ + {-O2 -flto -fno-use-linker-plugin -flto-partition=none } \ + {-O0 -flto -fno-use-linker-plugin -flto-partition=1to1 } \ + {-O2 -flto -fno-use-linker-plugin -flto-partition=1to1 } \ + {-O0 -flto -fno-use-linker-plugin } \ + {-O2 -flto -fno-use-linker-plugin } \ ] } } -- Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/ You must be the change you wish to see in the world. -- Gandhi Be Free! -- http://FSFLA.org/ FSF Latin America board member Free Software Evangelist Red Hat Brazil Compiler Engineer
Re: [testsuite] don't use lto plugin if it doesn't work
On Tue, Jun 26, 2012 at 06:04:54PM -0300, Alexandre Oliva wrote: > I test i686-linux-gnu in a presumably unusual setting: it's an > x86_64-linux-gnu system, and I've configured the GCC build to use as > builddev tools wrapper scripts for as, ld, gnatmake and gcc that add > flags that make them default to 32-bit. > > This worked fine for regression testing, but I've recently realized > (with the PR49888/53671 mishap) that I'm getting tons of LTO testsuite > failures (before and after, so no regression), because the 32-bit LTO > plugin built in this setting can't possibly be used by the 64-bit linker > installed on the system. Obviously, passing -melf_i386 to the linker > through the wrapper is not enough for it to be able to dlopen a 32-bit > plugin ;-) > > I'm considering installing alternate 32-bit tools on my system for > better testing, but I figured I'd betetr fix the test harness before > tackling this: if we find that the LTO plugin doesn't work, we'd better > not use -flto without -fno-use-linker-plugin, to avoid such spurious > failures as GCC attempts by its own initiative to use the linker plugin. > > At first I wished there was a simpler, off-band way to tell it not to > use the LTO plugin, that didn't pollute the command line or the test > results, but before I even looked around for some such way, I figured it > would be useful to have information that the linker plugin was not used > in a particular test run, so I went for this patch instead. I don't think testsuite harness is the right spot to deal with it, IMNSHO gcc just shouldn't default to using the linker plugin solely based on ld --version, it should take into account whether that linker can load the plugin. Right now I'm using the following hack in my ld wrapper script for i686-linux builds on x86_64: #!/bin/sh case "$*" in --version) cat <<\EOF GNU ld version 2.20.52.0.1-10.fc17 20100131 Copyright 2012 Free Software Foundation, Inc. This program is free software; you may redistribute it under the terms of the GNU General Public License version 3 or (at your option) a later version. This program has absolutely no warranty. EOF exit 0;; esac exec /usr/bin/ld -m elf_i386 -L /usr/lib/ "$@" (it lies about the ld version and thus gcc doesn't assume ld plugin will work). Jakub
Re: [testsuite] don't use lto plugin if it doesn't work
On Tue, Jun 26, 2012 at 2:04 PM, Alexandre Oliva wrote: > I test i686-linux-gnu in a presumably unusual setting: it's an > x86_64-linux-gnu system, and I've configured the GCC build to use as > builddev tools wrapper scripts for as, ld, gnatmake and gcc that add > flags that make them default to 32-bit. > > This worked fine for regression testing, but I've recently realized > (with the PR49888/53671 mishap) that I'm getting tons of LTO testsuite > failures (before and after, so no regression), because the 32-bit LTO > plugin built in this setting can't possibly be used by the 64-bit linker > installed on the system. Obviously, passing -melf_i386 to the linker > through the wrapper is not enough for it to be able to dlopen a 32-bit > plugin ;-) I am using this Makefile fragment to bootstrap and test -m32 and -mx32 GCC on Linux/x86-64: ifneq ($(BUILD-ARCH),$(CPU)) ifeq (i386,$(ARCH)) TARGET-FLAGS=$(TARGET) CC=gcc -m32 CXX=g++ -m32 FLAGS-TO-PASS+=CC="$(CC)" FLAGS-TO-PASS+=CXX="$(CXX)" # Need 32bit linker for LTO. */ PATH:=/usr/local32/bin:$(PATH) endif ifeq (x32,$(ARCH)) CC=gcc -mx32 CXX=g++ -mx32 FLAGS-TO-PASS+=CC="$(CC)" FLAGS-TO-PASS+=CXX="$(CXX)" # Need x32 linker for LTO. */ PATH:=/usr/localx32/bin:$(PATH) endif endif [hjl@gnu-32 gcc-32bit]$ file /usr/localx32/bin/ld /usr/localx32/bin/ld: ELF 32-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.38, BuildID[sha1]=0x85a2821594e122d4fc60741e2664c2b57888682e, not stripped [hjl@gnu-32 gcc-32bit]$ file /usr/local32/bin/ld /usr/local32/bin/ld: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.9, not stripped [hjl@gnu-32 gcc-32bit]$ -- H.J.
Re: [patch] Only define JMP_BUF_SIZE in backends that also define DONT_USE_BUILTIN_SETJMP
On Jun 26, 2012, at 1:55 AM, Steven Bosscher wrote: > On Mon, Jun 25, 2012 at 10:46 AM, Eric Botcazou > wrote: >> It looks like DONT_USE_BUILTIN_SETJMP was invented for the IA-64, probably >> because of the Register Stack Engine. But SPARC also has register windows, >> albeit less sophisticated, and can do without it. Morever, the Ada compiler >> uses __builtin_setjmp/__builtin_longjmp internally on the host (EH is used >> for >> error recovery) and is known to work fine on IA-64/Linux, HP-UX and VMS. >> >> In the end, it would appear that DONT_USE_BUILTIN_SETJMP was a quick trick to >> solve specific issues that could very likely have been solved otherwise. We >> should probably keep it for the sake of IA-64 and get rid of it for all other >> architectures, documenting that it isn't to be used in normal circumstances. > > ia64 is now the only remaining architecture that defines > DONT_USE_BUILTIN_SETJMP and JMP_BUF_SIZE. > > If __builtin_setjmp actually does work for ia64, why should we keep > DONT_USE_BUILTIN_SETJMP? Could it break user code somehow? History is not kind: http://gcc.gnu.org/ml/gcc-patches/2002-04/msg00124.html :-) It seems everyone else has seen the error of their ways and fixed it. :-) It got moved to ia64.h, only because a few people mistakingly used it once. Though, history isn't kinda to me either: svn log -c13969 :-( I'd say, remove it, and deprecate the bit and after 2 releases, remove the whole lot. 15 years late... Sorry for the mess.
Re: [testsuite] don't use lto plugin if it doesn't work
On Jun 26, 2012, at 2:04 PM, Alexandre Oliva wrote: > I test i686-linux-gnu in a presumably unusual setting I like the setup and testing... > This worked fine for regression testing, but I've recently realized > (with the PR49888/53671 mishap) that I'm getting tons of LTO testsuite > failures (before and after, so no regression), because the 32-bit LTO > plugin built in this setting can't possibly be used by the 64-bit linker > installed on the system. Obviously, passing -melf_i386 to the linker > through the wrapper is not enough for it to be able to dlopen a 32-bit > plugin ;-) So, let's kick this back to the gcc list for all the interested parties to chime in on... I'd rather have 5 interested people architect the right, nice solution that is engineered to work and then use it. I don't like any of the solutions so far. On darwin, we'd just have the plugin and even ld be fat and then one could load that on any architecture, but that's cheating. H.J.'s solution seems like the most reasonable short term solution, though, I kinda hate all the magic bits one needs for the environment. Hum, thinking about this for a second... If we build the plugin after sensing the 64-bitness of ld, using the flags appropriate for the linker... That's be nice... if people don't want to do that, then failing the build early when mismatched and simply documenting that one must have a 32-bit linker given to gcc for the build to work, at least would be more honest.
Re: New option to turn off stack reuse for temporaries
> As do I. The intent was for Ada and every other language with things like > temporaries and cleanups to reuse the backend constructs, so that instead > of writing optimizers, one for each language, to instead share the > optimizer across languages. To me, the middle end and the backend are the > best places for these. Both are very high-level constructs though. By the time the AST is converted to GENERIC in the Ada compiler, it is already too lowered to make use of them. -- Eric Botcazou
Re: [testsuite] don't use lto plugin if it doesn't work
On Tue, Jun 26, 2012 at 3:39 PM, Mike Stump wrote: > On Jun 26, 2012, at 2:04 PM, Alexandre Oliva wrote: >> I test i686-linux-gnu in a presumably unusual setting > > I like the setup and testing... > >> This worked fine for regression testing, but I've recently realized >> (with the PR49888/53671 mishap) that I'm getting tons of LTO testsuite >> failures (before and after, so no regression), because the 32-bit LTO >> plugin built in this setting can't possibly be used by the 64-bit linker >> installed on the system. Obviously, passing -melf_i386 to the linker >> through the wrapper is not enough for it to be able to dlopen a 32-bit >> plugin ;-) > > So, let's kick this back to the gcc list for all the interested parties to > chime in on... I'd rather have 5 interested people architect the right, nice > solution that is engineered to work and then use it. I don't like any of the > solutions so far. On darwin, we'd just have the plugin and even ld be fat > and then one could load that on any architecture, but that's cheating. > H.J.'s solution seems like the most reasonable short term solution, though, I > kinda hate all the magic bits one needs for the environment. Hum, thinking > about this for a second... If we build the plugin after sensing the > 64-bitness of ld, using the flags appropriate for the linker... That's be > nice... if people don't want to do that, then failing the build early when > mismatched and simply documenting that one must have a 32-bit linker given to > gcc for the build to work, at least would be more honest. Bootstrap/test -m32/-mx32 with LTO on -m64 is similar to cross-compile, in the sense that the native linker may not work with plugin. We just need to make the right linker available to GCC. My kludge uses PATH. One can also include binutils source in GCC source tree. -- H.J.
Re: [wwwdocs] Update coding conventions for C++
On Mon, 2012-06-25 at 15:17 -0700, Lawrence Crowl wrote: > On 6/25/12, Joseph S. Myers wrote: > > On Mon, 25 Jun 2012, Diego Novillo wrote: > > > [ Added doc maintainers in CC ] > > > > I have added a bit more in the rationale, reached through the link > at the end of that section. > > > > > + > > > > +Indent protection labels by one space. > > > > + > > > > + > > > > + > > > > +Indent class members by two spaces. > > > > Do all the listed indentation rules correspond to what a > > will do by default when editing C++ code in GNU Emacs? If not, > > we have conflicting notions of GNU C++ indentation conventions. > > I have no idea. I don't use emacs. The two-space rule for members > comes from the wiki. The one-space rule for protection labels is > common practice. If folks want something else, changes are fine > with me. Two spaces for members is common practice with GNU, and it seems to be used for libstdc++. One space for protection labels is not something I have heard of before and libstdc++ uses no indentation for them. A freshly started emacs also doesn't indent access labels. I do think there is some value in using the same coding style for libstdc++ and the compiler. /MF
[RFC] Tweak reload to const propagate into matching constraint output
The problem I'd like to solve is stuff like pxor%xmm4, %xmm4 ... movdqa %xmm4, %xmm2 pcmpgtd %xmm0, %xmm2 In that there's no point performing the copy from xmm4 rather than just emitting a new pxor insn. The Real Problem, as I see it, is that at the point (g)cse runs we have no visibility into the 2-operand matching constraint on that pcmpgtd so we make the wrong choice in sharing the zero. If we're using AVX, instead of SSE, we don't use matching constraints and given the 3-operand insn, hoisting the zero is the right and proper thing to do because we won't need to emit that movdqa. Of course, this fires for normal integer code as well. Some cases it's a clear win: -: 41 be 1f 00 00 00 mov$0x1f,%r14d ... -: 4c 89 f1mov%r14,%rcx +: b9 1f 00 00 00 mov$0x1f,%ecx sometimes not (increased code size): -: 41 bd 01 00 00 00 mov$0x1,%r13d -: 4d 89 ecmov%r13,%r12 +: 41 bc 01 00 00 00 mov$0x1,%r12d +: 41 bd 01 00 00 00 mov$0x1,%r13d although the total difference is minimal, and ambiguous: new textold text cc1 1397130213971342 cc1plus 1588273615882728 Also, note that in the first case above, r14 is otherwise unused, and we wind up with an unnecessary save/restore of the register in the function. Thoughts? r~ diff --git a/gcc/reload.c b/gcc/reload.c index e42cc5c..fb928be 100644 --- a/gcc/reload.c +++ b/gcc/reload.c @@ -2534,6 +2534,38 @@ safe_from_earlyclobber (rtx op, rtx clobber) early_data = decompose (clobber); return immune_p (op, clobber, early_data); } + +/* For matching operand constraints, we may need to make a reg-reg copy. + In some cases the source reg is equivalent to a constant, and it's + more efficient to set the dest reg from the constant rather than the + source reg. Return the source to use for the reload. */ + +static rtx +matching_constraint_src (rtx src) +{ + unsigned regno; + rtx c; + + if (!REG_P (src)) +return src; + + regno = REGNO (src); + if (HARD_REGISTER_NUM_P (regno)) +{ + regno = ORIGINAL_REGNO (src); + if (HARD_REGISTER_NUM_P (regno)) + return src; +} + c = reg_equiv_constant (regno); + if (c == NULL) +return src; + + /* Only use the constant when it's just as cheap as a reg move. */ + if (set_src_cost (c, optimize_function_for_speed_p (cfun)) == 0) +return c; + else +return src; +} /* Main entry point of this file: search the body of INSN for values that need reloading and record them with push_reload. @@ -4049,28 +4081,28 @@ find_reloads (rtx insn, int replace, int ind_levels, int live_known, else if (modified[i] == RELOAD_READ && modified[goal_alternative_matched[i]] == RELOAD_WRITE) { + int a = goal_alternative_matched[i]; operand_reloadnum[i] - = push_reload (recog_data.operand[i], -recog_data.operand[goal_alternative_matched[i]], + = push_reload (matching_constraint_src (recog_data.operand[i]), +recog_data.operand[a], recog_data.operand_loc[i], - recog_data.operand_loc[goal_alternative_matched[i]], +recog_data.operand_loc[a], (enum reg_class) goal_alternative[i], -operand_mode[i], -operand_mode[goal_alternative_matched[i]], +operand_mode[i], operand_mode[a], 0, 0, i, RELOAD_OTHER); - operand_reloadnum[goal_alternative_matched[i]] = output_reloadnum; + operand_reloadnum[a] = output_reloadnum; } else if (modified[i] == RELOAD_WRITE && modified[goal_alternative_matched[i]] == RELOAD_READ) { - operand_reloadnum[goal_alternative_matched[i]] - = push_reload (recog_data.operand[goal_alternative_matched[i]], + int a = goal_alternative_matched[i]; + operand_reloadnum[a] + = push_reload (matching_constraint_src (recog_data.operand[a]), recog_data.operand[i], - recog_data.operand_loc[goal_alternative_matched[i]], +recog_data.operand_loc[a], recog_data.operand_loc[i], (enum reg_class) goal_alternative[i], -operand_mode[goal_alternative_matched[i]], -operand_mode[i], +operand_mode[a], operand_mode[i], 0, 0, i, RELOAD_OTHER); operand_reloadnum[i] = output_reloadnum; }
[PATCH 2/3] i386: Implement widen_smul_*_v4si for plain sse2
If we don't implement this pattern, the vectorizer is happy to unpack the v4si and use the full mulv2di3. This results in more element shuffling than is required. * config/i386/i386.c (bdesc_args): Update. Change IX86_BUILTIN_VEC_WIDEN_SMUL_ODD_V4SI to OPTION_MASK_ISA_SSE2. (IX86_BUILTIN_VEC_WIDEN_SMUL_EVEN_V4SI): New. (ix86_builtin_mul_widen_even): Use it. (ix86_builtin_mul_widen_odd): Relax SMUL_ODD from sse4 to sse2. (ix86_expand_mul_widen_evenodd): Handle signed for sse2. * config/i386/sse.md (vec_widen_mult_hi_): Allow for all SSE2. (vec_widen_mult_lo_): Likewise. (vec_widen_mult_odd_): Likewise. Relax from V124_AVX2. (vec_widen_smult_even_v4si): New. --- gcc/ChangeLog | 14 + gcc/config/i386/i386.c | 77 +-- gcc/config/i386/sse.md | 29 +++--- 3 files changed, 79 insertions(+), 41 deletions(-) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 5cf230f..b96fc6e 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -25758,6 +25758,7 @@ enum ix86_builtins IX86_BUILTIN_VEC_WIDEN_SMUL_ODD_V8SI, IX86_BUILTIN_VEC_WIDEN_UMUL_ODD_V4SI, IX86_BUILTIN_VEC_WIDEN_UMUL_ODD_V8SI, + IX86_BUILTIN_VEC_WIDEN_SMUL_EVEN_V4SI, IX86_BUILTIN_VEC_WIDEN_UMUL_EVEN_V4SI, IX86_BUILTIN_VEC_WIDEN_UMUL_EVEN_V8SI, @@ -26620,7 +26621,9 @@ static const struct builtin_description bdesc_args[] = { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_umulv1siv1di3, "__builtin_ia32_pmuludq", IX86_BUILTIN_PMULUDQ, UNKNOWN, (int) V1DI_FTYPE_V2SI_V2SI }, { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_umulv2siv2di3, "__builtin_ia32_pmuludq128", IX86_BUILTIN_PMULUDQ128, UNKNOWN, (int) V2DI_FTYPE_V4SI_V4SI }, { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_umulv2siv2di3, "__builtin_vw_umul_even_v4si", IX86_BUILTIN_VEC_WIDEN_UMUL_EVEN_V4SI, UNKNOWN, (int) V2UDI_FTYPE_V4USI_V4USI }, + { OPTION_MASK_ISA_SSE2, CODE_FOR_vec_widen_smult_even_v4si, "__builtin_ia32_vw_smul_even_v4si", IX86_BUILTIN_VEC_WIDEN_SMUL_EVEN_V4SI, UNKNOWN, (int) V2DI_FTYPE_V4SI_V4SI }, { OPTION_MASK_ISA_SSE2, CODE_FOR_vec_widen_umult_odd_v4si, "__builtin_ia32_vw_umul_odd_v4si", IX86_BUILTIN_VEC_WIDEN_UMUL_ODD_V4SI, UNKNOWN, (int) V2UDI_FTYPE_V4USI_V4USI }, + { OPTION_MASK_ISA_SSE2, CODE_FOR_vec_widen_smult_odd_v4si, "__builtin_ia32_vw_smul_odd_v4si", IX86_BUILTIN_VEC_WIDEN_SMUL_ODD_V4SI, UNKNOWN, (int) V2DI_FTYPE_V4SI_V4SI }, { OPTION_MASK_ISA_SSE2, CODE_FOR_sse2_pmaddwd, "__builtin_ia32_pmaddwd128", IX86_BUILTIN_PMADDWD128, UNKNOWN, (int) V4SI_FTYPE_V8HI_V8HI }, @@ -26747,7 +26750,6 @@ static const struct builtin_description bdesc_args[] = { OPTION_MASK_ISA_SSE4_1, CODE_FOR_uminv4si3, "__builtin_ia32_pminud128", IX86_BUILTIN_PMINUD128, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI }, { OPTION_MASK_ISA_SSE4_1, CODE_FOR_uminv8hi3, "__builtin_ia32_pminuw128", IX86_BUILTIN_PMINUW128, UNKNOWN, (int) V8HI_FTYPE_V8HI_V8HI }, { OPTION_MASK_ISA_SSE4_1, CODE_FOR_sse4_1_mulv2siv2di3, "__builtin_ia32_pmuldq128", IX86_BUILTIN_PMULDQ128, UNKNOWN, (int) V2DI_FTYPE_V4SI_V4SI }, - { OPTION_MASK_ISA_SSE4_1, CODE_FOR_vec_widen_smult_odd_v4si, "__builtin_ia32_vw_smul_odd_v4si", IX86_BUILTIN_VEC_WIDEN_SMUL_ODD_V4SI, UNKNOWN, (int) V2DI_FTYPE_V4SI_V4SI }, { OPTION_MASK_ISA_SSE4_1, CODE_FOR_mulv4si3, "__builtin_ia32_pmulld128", IX86_BUILTIN_PMULLD128, UNKNOWN, (int) V4SI_FTYPE_V4SI_V4SI }, /* SSE4.1 */ @@ -31067,18 +31069,10 @@ ix86_builtin_mul_widen_even (tree type) switch (TYPE_MODE (type)) { case V4SImode: - if (uns_p) - { - if (!TARGET_SSE2) - return NULL; - code = IX86_BUILTIN_VEC_WIDEN_UMUL_EVEN_V4SI; - } - else - { - if (!TARGET_SSE4_1) - return NULL; - code = IX86_BUILTIN_PMULDQ128; - } + if (!TARGET_SSE2) + return NULL; + code = (uns_p ? IX86_BUILTIN_VEC_WIDEN_UMUL_EVEN_V4SI + : IX86_BUILTIN_VEC_WIDEN_SMUL_EVEN_V4SI); break; case V8SImode: @@ -31103,18 +31097,10 @@ ix86_builtin_mul_widen_odd (tree type) switch (TYPE_MODE (type)) { case V4SImode: - if (uns_p) - { - if (!TARGET_SSE2) - return NULL; - code = IX86_BUILTIN_VEC_WIDEN_UMUL_ODD_V4SI; - } - else - { - if (!TARGET_SSE4_1) - return NULL; - code = IX86_BUILTIN_VEC_WIDEN_SMUL_ODD_V4SI; - } + if (!TARGET_SSE2) + return NULL; + code = (uns_p ? IX86_BUILTIN_VEC_WIDEN_UMUL_ODD_V4SI + : IX86_BUILTIN_VEC_WIDEN_SMUL_ODD_V4SI); break; case V8SImode: @@ -38774,12 +38760,12 @@ ix86_expand_mul_widen_evenodd (rtx dest, rtx op1, rtx op2, emit_insn (gen_xop_pmacsdqh (dest, op1, op2, x)); return; } + + x = GEN_INT (GET_MODE_UNIT_BITSIZE (mode)); op1 = expand_binop (wmode, lshr_optab, gen_lowpart (wmode, o
[PATCH 1/3] i386: Expand mul earlier
Move the expansion code to i386.c next to mulv4si3. Eliminate one shift by adding the highparts before shifting. Correct costs. * config/i386/sse.md (mul3): Change from insn_and_split to expander; move guts to ... * config/i386/i386.c (ix86_expand_sse2_mulvxdi3): ... here. Add highparts before shifting up. * config/i386/i386-protos.h: Update. --- gcc/ChangeLog |8 gcc/config/i386/i386-protos.h |1 + gcc/config/i386/i386.c| 90 + gcc/config/i386/sse.md| 84 + 4 files changed, 102 insertions(+), 81 deletions(-) diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h index c860e5a..581b25c 100644 --- a/gcc/config/i386/i386-protos.h +++ b/gcc/config/i386/i386-protos.h @@ -227,6 +227,7 @@ extern bool ix86_expand_pinsr (rtx *); extern void ix86_expand_mul_widen_evenodd (rtx, rtx, rtx, bool, bool); extern void ix86_expand_mul_widen_hilo (rtx, rtx, rtx, bool, bool); extern void ix86_expand_sse2_mulv4si3 (rtx, rtx, rtx); +extern void ix86_expand_sse2_mulvxdi3 (rtx, rtx, rtx); /* In i386-c.c */ extern void ix86_target_macros (void); diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index c825033..5cf230f 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -32293,6 +32293,14 @@ ix86_rtx_costs (rtx x, int code_i, int outer_code_i, int opno, int *total, extra = 6; *total = cost->fmul * 2 + cost->fabs * extra; } + /* V*DImode is emulated with 5-8 insns. */ + else if (mode == V2DImode || mode == V4DImode) + { + if (TARGET_XOP && mode == V2DImode) + *total = cost->fmul * 2 + cost->fabs * 3; + else + *total = cost->fmul * 3 + cost->fabs * 5; + } /* Without sse4.1, we don't have PMULLD; it's emulated with 7 insns, including two PMULUDQ. */ else if (mode == V4SImode && !(TARGET_SSE4_1 || TARGET_AVX)) @@ -38915,6 +38923,88 @@ ix86_expand_sse2_mulv4si3 (rtx op0, rtx op1, rtx op2) set_unique_reg_note (res_1, REG_EQUAL, gen_rtx_MULT (V4SImode, op1, op2)); } +void +ix86_expand_sse2_mulvxdi3 (rtx op0, rtx op1, rtx op2) +{ + enum machine_mode mode = GET_MODE (op0); + rtx t1, t2, t3, t4, t5, t6; + + if (TARGET_XOP && mode == V2DImode) +{ + /* op1: A,B,C,D, op2: E,F,G,H */ + op1 = gen_lowpart (V4SImode, op1); + op2 = gen_lowpart (V4SImode, op2); + + t1 = gen_reg_rtx (V4SImode); + t2 = gen_reg_rtx (V4SImode); + t3 = gen_reg_rtx (V2DImode); + t4 = gen_reg_rtx (V2DImode); + + /* t1: B,A,D,C */ + emit_insn (gen_sse2_pshufd_1 (t1, op1, + GEN_INT (1), + GEN_INT (0), + GEN_INT (3), + GEN_INT (2))); + + /* t2: (B*E),(A*F),(D*G),(C*H) */ + emit_insn (gen_mulv4si3 (t2, t1, op2)); + + /* t3: (B*E)+(A*F), (D*G)+(C*H) */ + emit_insn (gen_xop_phadddq (t3, t2)); + + /* t4: ((B*E)+(A*F))<<32, ((D*G)+(C*H))<<32 */ + emit_insn (gen_ashlv2di3 (t4, t3, GEN_INT (32))); + + /* op0: (((B*E)+(A*F))<<32)+(B*F), (((D*G)+(C*H))<<32)+(D*H) */ + emit_insn (gen_xop_pmacsdql (op0, op1, op2, t4)); +} + else +{ + enum machine_mode nmode; + rtx (*umul) (rtx, rtx, rtx); + + if (mode == V2DImode) + { + umul = gen_sse2_umulv2siv2di3; + nmode = V4SImode; + } + else if (mode == V4DImode) + { + umul = gen_avx2_umulv4siv4di3; + nmode = V8SImode; + } + else + gcc_unreachable (); + + + /* Multiply low parts. */ + t1 = gen_reg_rtx (mode); + emit_insn (umul (t1, gen_lowpart (nmode, op1), gen_lowpart (nmode, op2))); + + /* Shift input vectors right 32 bits so we can multiply high parts. */ + t6 = GEN_INT (32); + t2 = expand_binop (mode, lshr_optab, op1, t6, NULL, 1, OPTAB_DIRECT); + t3 = expand_binop (mode, lshr_optab, op2, t6, NULL, 1, OPTAB_DIRECT); + + /* Multiply high parts by low parts. */ + t4 = gen_reg_rtx (mode); + t5 = gen_reg_rtx (mode); + emit_insn (umul (t4, gen_lowpart (nmode, t2), gen_lowpart (nmode, op2))); + emit_insn (umul (t5, gen_lowpart (nmode, t3), gen_lowpart (nmode, op1))); + + /* Combine and shift the highparts back. */ + t4 = expand_binop (mode, add_optab, t4, t5, t4, 1, OPTAB_DIRECT); + t4 = expand_binop (mode, ashl_optab, t4, t6, t4, 1, OPTAB_DIRECT); + + /* Combine high and low parts. */ + force_expand_binop (mode, add_optab, t1, t4, op0, 1, OPTAB_DIRECT); +} + + set_unique_reg_note (get_last_insn (), REG_EQUAL, + gen_rtx_MULT (mode, op1, op2)); +} + /* Expand an insert into a vector register through pinsr insn. Return true
[PATCH 3/3] i386: Correct costs on CONST_DOUBLE and CONST_VECTOR
We were always falling through to the memory default. Also use standard_sse_constant_p on CONST_VECTOR. * config/i386/i386.c (ix86_rtx_costs): Use standard_sse_constant_p and don't fall thru from standard_80387_constant_p to the memory fallback, --- gcc/ChangeLog |6 ++ gcc/config/i386/i386.c | 48 +++- 2 files changed, 37 insertions(+), 17 deletions(-) diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index b96fc6e..edfc649 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -32102,24 +32102,38 @@ ix86_rtx_costs (rtx x, int code_i, int outer_code_i, int opno, int *total, case CONST_DOUBLE: if (mode == VOIDmode) - *total = 0; - else - switch (standard_80387_constant_p (x)) - { - case 1: /* 0.0 */ - *total = 1; - break; - default: /* Other constants */ - *total = 2; - break; - case 0: - case -1: - break; - } - /* FALLTHRU */ - + { + *total = 0; + return true; + } + switch (standard_80387_constant_p (x)) + { + case 1: /* 0.0 */ + *total = 1; + return true; + default: /* Other constants */ + *total = 2; + return true; + case 0: + case -1: + break; + } + if (SSE_FLOAT_MODE_P (mode)) + { case CONST_VECTOR: - /* Start with (MEM (SYMBOL_REF)), since that's where + switch (standard_sse_constant_p (x)) + { + case 0: + break; + case 1: /* 0: xor eliminates false dependency */ + *total = 0; + return true; + default: /* -1: cmp contains false dependency */ + *total = 1; + return true; + } + } + /* Fall back to (MEM (SYMBOL_REF)), since that's where it'll probably end up. Add a penalty for size. */ *total = (COSTS_N_INSNS (1) + (flag_pic != 0 && !TARGET_64BIT) -- 1.7.7.6
Re: [wwwdocs] Update coding conventions for C++
Magnus Fromreide writes: > Two spaces for members is common practice with GNU, and it seems to be > used for libstdc++. > > One space for protection labels is not something I have heard of before > and libstdc++ uses no indentation for them. > > A freshly started emacs also doesn't indent access labels. Yeah, zero indentation (or rather, "aligned with the surrounding brace level", e.g. for nested structures/classes) is certainly the defacto GNU standard for C++ "access labels" (private:, etc). There seems no reason for GCC to be different. -miles -- Quotation, n. The act of repeating erroneously the words of another. The words erroneously repeated.