Re: wide-int branch now up for public comment and review
Mike Stump writes: > On Aug 23, 2013, at 8:02 AM, Richard Sandiford > wrote: >>> * When a constant that has an integer type is converted to a >>>wide-int it comes in with precision 0. For these constants the >>>top bit does accurately reflect the sign of that constant; this >>>is an exception to the normal rule that the signedness is not >>>represented. When used in a binary operation, the wide-int >>>implementation properly extends these constants so that they >>>properly match the other operand of the computation. This allows >>>you write: >>> >>> tree t = ... >>> wide_int x = t + 6; >>> >>>assuming t is a int_cst. >> >> This seems dangerous. Not all code that uses "unsigned HOST_WIDE_INT" >> actually wants it to be an unsigned value. Some code uses it to avoid >> the undefinedness of signed overflow. So these overloads could lead >> to us accidentally zero-extending what's conceptually a signed value >> without any obvious indication that that's happening. Also, hex constants >> are unsigned int, but it doesn't seem safe to assume that 0x8000 was >> meant to be zero-extended. >> >> I realise the same thing can happen if you mix "unsigned int" with >> HOST_WIDE_INT, but the point is that you shouldn't really do that >> in general, whereas we're defining these overloads precisely so that >> a mixture can be used. > > So, I don't like penalizing all users, because one user might write > incorrect code. We have the simple operators so that users can retain > some of the simplicity and beauty that is the underlying language. > Those semantics are well known and reasonable. We reasonably match > those semantics. At the end of the day, we have to be able to trust > what the user writes. Now, a user that doesn't trust themselves, can > elect to not use these functions; they aren't required to use them. But the point of my "unsigned HOST_WIDE_INT" example is that the semantics of the language can also get in the way. If you're adding or multiplying two smallish constants, you should generally use "unsigned HOST_WIDE_INT" in order to avoid the undefinedness of signed overflow. In that kind of case the "unsigned" in no way implies that you want the value to be extended to the wider types that we're adding with this patch. I'm worried about making that assumption just to provide a bit of syntactic sugar, especially when any wrong-code bug would be highly value-dependent. When this kind of thing is done at the rtl level, the original constant (in a CONST_INT) is also HOST_WIDE_INT-sized, so the kind of code that I'm talking about did not do any promotion before the wide-int conversion. That changes if we generalise one of the values to a wide_int. It isn't just a case of being confident when writing new code. It's a case of us changing a large body of existing code in one go, some of which uses unsigned values for the reason above. Like I say, all this objection is for "unsigned int and above". If we provide the syntactic sugar for plain "int" and use templates to force a compiler error for "higher" types then that would still allow things like "x + 2" but force a bit of analysis for "x + y" in cases where "x" and "y" are not the same type and one is wide_int. Thanks, Richard
Re: Type inheritance graph analysis & speculative devirtualization, part 4a/6 simple anonymous namespace devirtualization during cgraph construction
> Hi, > > On 08/23/2013 05:12 PM, Jan Hubicka wrote: > >+/* { dg-final { scan-tree-dump-nop "A::foo" 0 "ssa"} } */ > This should be scan-tree-dump-not right? Yes, thanks! Seems that scan-tree-dump-nop does what name suggests. I will fix it shortly. Honza > > Paolo.
Re: [C++ patch] Move FINAL flag to middle-end trees.
> On 08/23/2013 10:51 AM, Jan Hubicka wrote: > >Sadly we ICE here because BINFO of type is not built yet. > >I tried to move the code after xref_binfos and it does seem to lead to errors > >while building libstdc++ PCH. Any idea what to do here? > > If it's causing trouble, let's just put the flag on the type. OK, I mostly need the flag on type anyway, so it will save some indirection. I will re-test and commit the variant using default_def flag of type. In the next step I would like to introduce the DECL_CPP_CONSTRUCTOR/DESTRUCTOR macro. The catch I run into is that these flags are tested on TEMPLATE_DECL so the middle-end macro bombs on type checking. I wonder what is best approach here. I guess I can just disable FUNCTION_DECL checking on this macro in middle-end, but I do not like it much. Or I can have same C++ FE macros using the same flag that also allow templates, but then we can get them out of sync. Or I can update 100+ places in C++ FE to use different macro on template or I can introduce SET variant that will care about decl type or I can just have two flags and make C++ FE to mirror DECL_CONSTRUCTOR_P into the middle end flag? Any more resonable options? > > >here I now can devirtualize b->foo into A because A is in anonymous > >namespace. > >We however won't remove b because it is externally visible. Is it valid to > >bring b local based on fact that A is anonymous and thus no valid C++ program > >can read it? > > Hmm, determine_visibility ought to be doing that already. You are right. I simplified the testcase from code I looked at but probably I overlooked something. Sorry for the noise! Honza
Re: wide-int branch now up for public comment and review
Kenneth Zadeck writes: >>> * Code that does widening conversions. The canonical way that >>>this is performed is to sign or zero extend the input value to >>>the max width based on the sign of the type of the source and >>>then to truncate that value to the target type. This is in >>>preference to using the sign of the target type to extend the >>>value directly (which gets the wrong value for the conversion >>>of large unsigned numbers to larger signed types). > > >> I don't understand this particular reason. Using the sign of the source >> type is obviously right, but why does that mean we need "infinite" precision, >> rather than just doubling the precision of the source? > in a sense, "infinite" does not really mean infinite, it really just > means large enough so that you never loose any information from the > top.For widening all that you really need to be "infinite" is one > more bit larger than the destination type. I'm still being clueless, sorry, but why does the intermediate int need to be wider than the destination type? If you're going from an N1-bit value to an N2>N1-bit value, I don't see why you ever need more than N2 bits. Maybe I'm misunderstanding what the comment means by "widening conversions". >> This seems dangerous. Not all code that uses "unsigned HOST_WIDE_INT" >> actually wants it to be an unsigned value. Some code uses it to avoid >> the undefinedness of signed overflow. So these overloads could lead >> to us accidentally zero-extending what's conceptually a signed value >> without any obvious indication that that's happening. Also, hex constants >> are unsigned int, but it doesn't seem safe to assume that 0x8000 was >> meant to be zero-extended. >> >> I realise the same thing can happen if you mix "unsigned int" with >> HOST_WIDE_INT, but the point is that you shouldn't really do that >> in general, whereas we're defining these overloads precisely so that >> a mixture can be used. >> >> I'd prefer some explicit indication of the sign, at least for anything >> other than plain "int" (so that the compiler will complain about uses >> of "unsigned int" and above). > > There is a part of me that finds this scary and a part of me that feels > that the concern is largely theoretical.It does make it much easier > to read and understand the code to be able to write "t + 6" rather than > "wide_int (t) + wide_int::from_uhwi" (6) but of course you loose some > control over how 6 is converted. I replied in more detail to Mike's comment, but the reason I suggested only allowing this for plain "int" (and using templates to forbid "unsigned int" and wider types) was that you could still write "t + 6". You just couldn't write "t + 0x8000" or "t + x", where "x" is a HOST_WIDE_INT or similar. Code can store values in "HOST_WIDE_INT" or "unsigned HOST_WIDE_INT" if we know at GCC compile time that HOST_WIDE_INT is big enough. But code doesn't necessarily store values in a "HOST_WIDE_INT" because we know at GCC compile time that the value is signed, or in a "unsigned HOST_WIDE_INT" because we know at GCC compile time that the value is unsigned. The signedness often depends on the input. The language still forces us to pick one or the other though. I don't feel comfortable with having syntactic sugar that reads too much into the choice. >>>Note that the bits above the precision are not defined and the >>>algorithms used here are careful not to depend on their value. In >>>particular, values that come in from rtx constants may have random >>>bits. >> I have a feeling I'm rehashing a past debate, sorry, but rtx constants can't >> have random bits. The upper bits must be a sign extension of the value. >> There's exactly one valid rtx for each (value, mode) pair. If you saw >> something different then that sounds like a bug. The rules should already >> be fairly well enforced though, since something like (const_int 128) -- >> or (const_int 256) -- will not match a QImode operand. >> >> This is probably the part of the representation that I disagree most with. >> There seem to be two main ways we could hande the extension to whole HWIs: >> >> (1) leave the stored upper bits undefined and extend them on read >> (2) keep the stored upper bits in extended form > It is not a matter of opening old wounds. I had run some tests on > x86-64 and was never able to assume that the bits above the precision > had always been canonized. I will admit that i did not try to run down > the bugs because i thought that since the rtx constructors did not have > a mode associated with them now was one required to in the constructors, > that this was not an easily solvable problem. So i have no idea if i > hit the one and only bug or was about to start drinking from a fire > hose. Hopefully the first one. :-) Would you mind going back and trying again, so that we at least have some idea what kind of
Re: [PATCH]: Fix PR middle-end/56382 -- Only move MODE_COMPLEX_FLOAT by parts if we can create pseudos
On Fri, Aug 23, 2013 at 2:47 AM, John David Anglin wrote: > Ping. > > > On 28-Jul-13, at 12:17 PM, John David Anglin wrote: > >> This patch fixes PR middle-end/56382 on hppa64-hp-hpux11.11. The patch >> prevents moving a complex float by parts if we can't >> create pseudos. On a big endian 64-bit target, we need a psuedo to move a >> complex float and this fails during reload. >> >> OK for trunk? >> I'm trying to understand how the patch would help... The code you're patching is: /* Move floating point as parts. */ if (GET_MODE_CLASS (mode) == MODE_COMPLEX_FLOAT +&& can_create_pseudo_p () && optab_handler (mov_optab, GET_MODE_INNER (mode)) != CODE_FOR_nothing) try_int = false; /* Not possible if the values are inherently not adjacent. */ else if (GET_CODE (x) == CONCAT || GET_CODE (y) == CONCAT) try_int = false; /* Is possible if both are registers (or subregs of registers). */ else if (register_operand (x, mode) && register_operand (y, mode)) try_int = true; /* If one of the operands is a memory, and alignment constraints are friendly enough, we may be able to do combined memory operations. We do not attempt this if Y is a constant because that combination is usually better with the by-parts thing below. */ else if ((MEM_P (x) ? !CONSTANT_P (y) : MEM_P (y)) && (!STRICT_ALIGNMENT || get_mode_alignment (mode) == BIGGEST_ALIGNMENT)) try_int = true; else try_int = false; With the new test for can_create_pseudo_p, you're trying to make "try_int" be false. Apparently your failure happens if one of the operands is a MEM? Otherwise the second "else if " test would find x and y be registers and "try_int" still ends up being true. It seems to me that can_create_pseudo_p is not the right test anyway. There many be other targets that can take this path just fine without needing new registers. In the PR audit trail you say: "The problem is SCmode is the same size as DImode on this target, so the subreg can't be extracted by a move." Using can_create_pseudo_p is too big a hammer to solve this problem. The right test would be to see if you end up needing extra registers to perform the move. But emit_move_change_mode already handles that, AFAICT, so can you please try and test if the following patch solves the PR for you? Ciao! Steven Index: expr.c === --- expr.c (revision 201887) +++ expr.c (working copy) @@ -3268,7 +3268,7 @@ emit_move_complex (enum machine_mode mode, rtx x, return get_last_insn (); } - ret = emit_move_via_integer (mode, x, y, true); + ret = emit_move_via_integer (mode, x, y, can_create_pseudo_p ()); if (ret) return ret; }
Re: wide-int branch now up for public comment and review
Richard Sandiford writes: > I wonder how easy it would be to restrict this use of "zero precision" > (i.e. flexible precision) to those where primitive types like "int" are > used as template arguments to operators, and require a precision when > constructing a wide_int. I wouldn't have expected "real" precision 0 > (from zero-width bitfields or whatever) to need any special handling > compared to precision 1 or 2. I tried the last bit -- requiring a precision when constructing a wide_int -- and it seemed surprising easy. What do you think of the attached? Most of the forced knock-on changes seem like improvements, but the java part is a bit ugly. I also went with "wide_int (0, prec).cmp" for now, although I'd like to add static cmp, cmps and cmpu alongside leu_p, etc., if that's OK. It would then be possible to write "wide_int::cmp (0, ...)" and avoid the wide_int construction altogether. I wondered whether you might also want to get rid of the build_int_cst* functions, but that still looks a long way off, so I hope using them in these two places doesn't seem too bad. This is just an incremental step. I've also only run it through a subset of the testsuite so far, but full tests are in progress... Thanks, Richard Index: gcc/fold-const.c === --- gcc/fold-const.c2013-08-24 12:11:08.085684013 +0100 +++ gcc/fold-const.c2013-08-24 01:00:00.0 +0100 @@ -8865,15 +8865,16 @@ pointer_may_wrap_p (tree base, tree offs if (bitpos < 0) return true; + int precision = TYPE_PRECISION (TREE_TYPE (base)); if (offset == NULL_TREE) -wi_offset = wide_int::zero (TYPE_PRECISION (TREE_TYPE (base))); +wi_offset = wide_int::zero (precision); else if (TREE_CODE (offset) != INTEGER_CST || TREE_OVERFLOW (offset)) return true; else wi_offset = offset; bool overflow; - wide_int units = wide_int::from_shwi (bitpos / BITS_PER_UNIT); + wide_int units = wide_int::from_shwi (bitpos / BITS_PER_UNIT, precision); total = wi_offset.add (units, UNSIGNED, &overflow); if (overflow) return true; Index: gcc/gimple-ssa-strength-reduction.c === --- gcc/gimple-ssa-strength-reduction.c 2013-08-24 12:11:08.085684013 +0100 +++ gcc/gimple-ssa-strength-reduction.c 2013-08-24 01:00:00.0 +0100 @@ -777,7 +777,6 @@ restructure_reference (tree *pbase, tree { tree base = *pbase, offset = *poffset; max_wide_int index = *pindex; - wide_int bpu = BITS_PER_UNIT; tree mult_op0, t1, t2, type; max_wide_int c1, c2, c3, c4; @@ -786,7 +785,7 @@ restructure_reference (tree *pbase, tree || TREE_CODE (base) != MEM_REF || TREE_CODE (offset) != MULT_EXPR || TREE_CODE (TREE_OPERAND (offset, 1)) != INTEGER_CST - || !index.umod_floor (bpu).zero_p ()) + || !index.umod_floor (BITS_PER_UNIT).zero_p ()) return false; t1 = TREE_OPERAND (base, 0); @@ -822,7 +821,7 @@ restructure_reference (tree *pbase, tree c2 = 0; } - c4 = index.udiv_floor (bpu); + c4 = index.udiv_floor (BITS_PER_UNIT); *pbase = t1; *poffset = fold_build2 (MULT_EXPR, sizetype, t2, Index: gcc/java/jcf-parse.c === --- gcc/java/jcf-parse.c2013-08-24 12:11:08.085684013 +0100 +++ gcc/java/jcf-parse.c2013-08-24 01:00:00.0 +0100 @@ -1043,9 +1043,10 @@ get_constant (JCF *jcf, int index) wide_int val; num = JPOOL_UINT (jcf, index); - val = wide_int (num).sforce_to_size (32).lshift_widen (32, 64); + val = wide_int::from_hwi (num, long_type_node) + .sforce_to_size (32).lshift_widen (32, 64); num = JPOOL_UINT (jcf, index + 1); - val |= wide_int (num); + val |= wide_int::from_hwi (num, long_type_node); value = wide_int_to_tree (long_type_node, val); break; Index: gcc/loop-unroll.c === --- gcc/loop-unroll.c 2013-08-24 12:11:08.085684013 +0100 +++ gcc/loop-unroll.c 2013-08-24 01:00:00.0 +0100 @@ -816,8 +816,7 @@ unroll_loop_constant_iterations (struct desc->niter -= exit_mod; loop->nb_iterations_upper_bound -= exit_mod; if (loop->any_estimate - && wide_int (exit_mod).leu_p - (loop->nb_iterations_estimate)) + && wide_int::leu_p (exit_mod, loop->nb_iterations_estimate)) loop->nb_iterations_estimate -= exit_mod; else loop->any_estimate = false; @@ -860,8 +859,7 @@ unroll_loop_constant_iterations (struct desc->niter -= exit_mod + 1; loop->nb_iterations_upper_bound -= exit_mod + 1; if (loop->any_estimate - && wide_int (exit_mod + 1).leu_p - (loop->nb_iterations_estimate)) + && wide_int::leu_p (exit_mod + 1, loop->nb_iterations_estimate
Fwd: [RFC, patch] Detect lack of 32-bit devel environment on x86_64-linux targets
ping > Given that I did not receive any feedback on my earlier email on this topic, > I would like to send this patch for RFC. I'm not expert at this > configury-stuff, so please try to comment on both the test proposed and my > actual implementation :) > > The idea is to find a patch which both catches probable issues early on for > most x86_64-linux users, yet does not make build more complex for our power > users. So, I propose to include a specific check in toplevel configure: > > The cumulative conditions I suggest, in order to make it as unobtrusive as > possible for current users, are: > > 1. if we build a native compiler, > 2. on x86_64-linux (and possible other x86_64 targets whose maintainers want > to opt in), > 3. and neither --enable-multilib nor --disable-multilib were passed > > then: > > a. we check that the native compiler can handle 32-bit, by compiling a test > executable with the "-m32" option > b. if we fail, we error out of the configure process, indicating that this > can be overriden with --{enable,disable}-multilib > > I suspect this might catch (at configure time) the large majority of users > who currently get stuck at stage 2 with the "gnu/stubs-32.h" error, while > being invisible to a large majority of the power users. > > So, what do you think? > > FX > Index: configure.ac === --- configure.ac(revision 201292) +++ configure.ac(working copy) @@ -2861,6 +2861,26 @@ case "${target}" in ;; esac +# Special user-friendly check for native x86_64-linux build, if +# multilib is not explicitly enabled. +case "$target:$have_compiler:$host:$target:$enable_multilib" in + x86_64-*linux*:yes:$build:$build:) +# Make sure we have a developement environment that handles 32-bit +dev64=no +echo "int main () { return 0; }" > conftest.c +${CC} -m32 -o conftest ${CFLAGS} ${CPPFLAGS} ${LDFLAGS} conftest.c +if test $? = 0 ; then + if test -s conftest || test -s conftest.exe ; then + dev64=yes + fi +fi +rm -f conftest* +if test x${dev64} != xyes ; then + AC_MSG_ERROR([I suspect your system does not have 32-bit developement libraries (libc and headers). If you have them, rerun configure with --enable-multilib. If you do not have them, and want to build a 64-bit-only compiler, rerun configure with --disable-multilib.]) +fi +;; +esac + # Default to --enable-multilib. if test x${enable_multilib} = x ; then target_configargs="--enable-multilib ${target_configargs}"
Re: [PATCH]: Fix PR middle-end/56382 -- Only move MODE_COMPLEX_FLOAT by parts if we can create pseudos
On 24-Aug-13, at 6:43 AM, Steven Bosscher wrote: I'm trying to understand how the patch would help... The code you're patching is: /* Move floating point as parts. */ if (GET_MODE_CLASS (mode) == MODE_COMPLEX_FLOAT +&& can_create_pseudo_p () && optab_handler (mov_optab, GET_MODE_INNER (mode)) != CODE_FOR_nothing) try_int = false; /* Not possible if the values are inherently not adjacent. */ else if (GET_CODE (x) == CONCAT || GET_CODE (y) == CONCAT) try_int = false; /* Is possible if both are registers (or subregs of registers). */ else if (register_operand (x, mode) && register_operand (y, mode)) try_int = true; /* If one of the operands is a memory, and alignment constraints are friendly enough, we may be able to do combined memory operations. We do not attempt this if Y is a constant because that combination is usually better with the by-parts thing below. */ else if ((MEM_P (x) ? !CONSTANT_P (y) : MEM_P (y)) && (!STRICT_ALIGNMENT || get_mode_alignment (mode) == BIGGEST_ALIGNMENT)) try_int = true; else try_int = false; With the new test for can_create_pseudo_p, you're trying to make "try_int" be false. Apparently your failure happens if one of the operands is a MEM? Otherwise the second "else if " test would find x and y be registers and "try_int" still ends up being true. I was trying to prevent "try_int" from being set to false in the "if" if we can't create a pseudo. If this is done, try_int gets set to true in the second "else if" in the failing testcase. As a result, we don't directly use "emit_move_complex_parts" which currently needs a register on hppa64. It seems to me that can_create_pseudo_p is not the right test anyway. There many be other targets that can take this path just fine without needing new registers. In the PR audit trail you say: "The problem is SCmode is the same size as DImode on this target, so the subreg can't be extracted by a move." Using can_create_pseudo_p is too big a hammer to solve this problem. The right test would be to see if you end up needing extra registers to perform the move. But emit_move_change_mode already handles that, AFAICT, so can you please try and test if the following patch solves the PR for you? I'll give your patch a try. Ciao! Steven Index: expr.c === --- expr.c (revision 201887) +++ expr.c (working copy) @@ -3268,7 +3268,7 @@ emit_move_complex (enum machine_mode mode, rtx x, return get_last_insn (); } - ret = emit_move_via_integer (mode, x, y, true); + ret = emit_move_via_integer (mode, x, y, can_create_pseudo_p ()); if (ret) return ret; } Thanks, Dave -- John David Anglin dave.ang...@bell.net
Re: Type inheritance graph analysis & speculative devirtualization, part 2/6 (type inheritance graph builder)
On Sun, Aug 18, 2013 at 08:19:57PM +0200, Jan Hubicka wrote: > Hi, > this patch implements the type inheritance graph builder. Once the graph is > built it stays in memory and unchanged thorough the compilation (we do not > expect to invent new virtual methods during the optimization) > The graph is dumped into new IPA dump file "type-inheritance". > [...] > * Makeifle-in (ipa-devirt.o): New. > (GTFILES): Add ipa-utils.h and ipa-devirt.c > * cgraphunit.c (decide_is_symbol_needed): Do not care about virtuals. > (analyze_functions): Look into possible targets of polymorphic call. > * dumpfile.c (dump_files): Add type-inheritance dump. > * dumpfile.h (TDI_inheritance): New. > * ipa-devirt.c: New file. > * ipa-utils.h (odr_type_d): Forward declare. > (odr_type): New type. > (build_type_inheritance_graph): Declare. > (possible_polymorphic_call_targets): Declare and introduce inline > variant when only edge is pased. > (dump_possible_polymorphic_call_targets): Likewise. > * timevar.def (TV_IPA_INHERITANCE, TV_IPA_VIRTUAL_CALL): New. > * tree.c (type_in_anonymous_namespace_p): Break out from ... > (types_same_for_odr): ... here. > * tree.h (type_in_anonymous_namespace_p): Declare. > > * g++.dg/ipa/type-inheritance-1.C: New testcase. This was bit tough to review but I do not see any problems. Perhaps we could get rid off the matched_vtables parameters but that is a very minor thing. Thanks, Martin
Re: [PATCH]: Fix PR middle-end/56382 -- Only move MODE_COMPLEX_FLOAT by parts if we can create pseudos
On 24-Aug-13, at 10:37 AM, John David Anglin wrote: On 24-Aug-13, at 6:43 AM, Steven Bosscher wrote: I'm trying to understand how the patch would help... The code you're patching is: /* Move floating point as parts. */ if (GET_MODE_CLASS (mode) == MODE_COMPLEX_FLOAT +&& can_create_pseudo_p () && optab_handler (mov_optab, GET_MODE_INNER (mode)) != CODE_FOR_nothing) try_int = false; /* Not possible if the values are inherently not adjacent. */ else if (GET_CODE (x) == CONCAT || GET_CODE (y) == CONCAT) try_int = false; /* Is possible if both are registers (or subregs of registers). */ else if (register_operand (x, mode) && register_operand (y, mode)) try_int = true; /* If one of the operands is a memory, and alignment constraints are friendly enough, we may be able to do combined memory operations. We do not attempt this if Y is a constant because that combination is usually better with the by-parts thing below. */ else if ((MEM_P (x) ? !CONSTANT_P (y) : MEM_P (y)) && (!STRICT_ALIGNMENT || get_mode_alignment (mode) == BIGGEST_ALIGNMENT)) try_int = true; else try_int = false; With the new test for can_create_pseudo_p, you're trying to make "try_int" be false. Apparently your failure happens if one of the operands is a MEM? Otherwise the second "else if " test would find x and y be registers and "try_int" still ends up being true. I was trying to prevent "try_int" from being set to false in the "if" if we can't create a pseudo. If this is done, try_int gets set to true in the second "else if" in the failing testcase. As a result, we don't directly use "emit_move_complex_parts" which currently needs a register on hppa64. It seems to me that can_create_pseudo_p is not the right test anyway. There many be other targets that can take this path just fine without needing new registers. In the PR audit trail you say: "The problem is SCmode is the same size as DImode on this target, so the subreg can't be extracted by a move." Using can_create_pseudo_p is too big a hammer to solve this problem. The right test would be to see if you end up needing extra registers to perform the move. But emit_move_change_mode already handles that, AFAICT, so can you please try and test if the following patch solves the PR for you? I'll give your patch a try. Ciao! Steven Index: expr.c === --- expr.c (revision 201887) +++ expr.c (working copy) @@ -3268,7 +3268,7 @@ emit_move_complex (enum machine_mode mode, rtx x, return get_last_insn (); } - ret = emit_move_via_integer (mode, x, y, true); + ret = emit_move_via_integer (mode, x, y, can_create_pseudo_p ()); if (ret) return ret; } Actually, I don't think it will work because "try_int" gets set to false and the code isn't used. Dave -- John David Anglin dave.ang...@bell.net
[PATCH] Create pass through and ancestor jump functions even there is dynamic type change
Hi, the patch below does not punt when creating pass through and ancestor jump functions when there is a possible dynamic type change detected but rather clears a flag in those functions. Indirect inlining and IPA-CP then make sure they only propagate when the flag is set. I have also merged one or two pieces of code in IPA-CP that did the same so that I could change behavior at one place only and made update_jump_functions_after_inlining slightly less ugly by not relying on structure copying so much and constructing all but one jump function types with the ipa_set_jf_* methods. Hopefully that will make the rather complicated function less error prone as it apparently gets even more complex over time. Bootstrapped and tested on x86_64-linux. OK for trunk? Thanks, Martin 2013-08-23 Martin Jambor * ipa-prop.h (ipa_pass_through_data): New field type_preserved. (ipa_ancestor_jf_data): Likewise. (ipa_get_jf_pass_through_agg_preserved): Fix comment typo. (ipa_get_jf_pass_through_type_preserved): New function. (ipa_get_jf_ancestor_agg_preserved): Fix comment typo. (ipa_get_jf_ancestor_type_preserved): New function. * ipa-cp.c (ipa_get_jf_pass_through_result): Honor type_preserved flag. (ipa_get_jf_ancestor_result): Likewise. (propagate_vals_accross_pass_through): Use ipa_get_jf_pass_through_result to do all the value mappings. * ipa-prop.c (ipa_print_node_jump_functions_for_edge): Dump the type_preserved flag. (ipa_set_jf_cst_copy): New function. (ipa_set_jf_simple_pass_through): Set the type_preserved flag. (ipa_set_jf_arith_pass_through): Likewise. (ipa_set_ancestor_jf): Likewise. (compute_complex_assign_jump_func): Set type_preserved instead of punting. (ipa_compute_jump_functions_for_edge): Likewise. (combine_known_type_and_ancestor_jfs): Honor type_preserved. (update_jump_functions_after_inlining): Update type_preserved. Explicitely create jump functions when combining one with pass_through. (ipa_write_jump_function): Stream the type_preserved flags. (ipa_read_jump_function): Likewise. Index: src/gcc/ipa-cp.c === --- src.orig/gcc/ipa-cp.c 2013-08-22 11:49:25.0 +0200 +++ src/gcc/ipa-cp.c2013-08-23 12:20:04.0 +0200 @@ -745,17 +745,26 @@ initialize_node_lattices (struct cgraph_ /* Return the result of a (possibly arithmetic) pass through jump function JFUNC on the constant value INPUT. Return NULL_TREE if that cannot be - determined or itself is considered an interprocedural invariant. */ + determined or be considered an interprocedural invariant. */ static tree ipa_get_jf_pass_through_result (struct ipa_jump_func *jfunc, tree input) { tree restype, res; + if (TREE_CODE (input) == TREE_BINFO) +{ + if (ipa_get_jf_pass_through_type_preserved (jfunc)) + { + gcc_checking_assert (ipa_get_jf_pass_through_operation (jfunc) + == NOP_EXPR); + return input; + } + return NULL_TREE; +} + if (ipa_get_jf_pass_through_operation (jfunc) == NOP_EXPR) return input; - else if (TREE_CODE (input) == TREE_BINFO) -return NULL_TREE; gcc_checking_assert (is_gimple_ip_invariant (input)); if (TREE_CODE_CLASS (ipa_get_jf_pass_through_operation (jfunc)) @@ -779,9 +788,13 @@ static tree ipa_get_jf_ancestor_result (struct ipa_jump_func *jfunc, tree input) { if (TREE_CODE (input) == TREE_BINFO) -return get_binfo_at_offset (input, - ipa_get_jf_ancestor_offset (jfunc), - ipa_get_jf_ancestor_type (jfunc)); +{ + if (!ipa_get_jf_ancestor_type_preserved (jfunc)) + return NULL; + return get_binfo_at_offset (input, + ipa_get_jf_ancestor_offset (jfunc), + ipa_get_jf_ancestor_type (jfunc)); +} else if (TREE_CODE (input) == ADDR_EXPR) { tree t = TREE_OPERAND (input, 0); @@ -1013,26 +1026,16 @@ propagate_vals_accross_pass_through (str struct ipcp_value *src_val; bool ret = false; - if (ipa_get_jf_pass_through_operation (jfunc) == NOP_EXPR) -for (src_val = src_lat->values; src_val; src_val = src_val->next) - ret |= add_scalar_value_to_lattice (dest_lat, src_val->value, cs, - src_val, src_idx); /* Do not create new values when propagating within an SCC because if there are arithmetic functions with circular dependencies, there is infinite number of them and we would just make lattices bottom. */ - else if (edge_within_scc (cs)) + if ((ipa_get_jf_pass_through_operation (jfunc) != NOP_EXPR) + and edge_within_scc (cs)) ret = set_lattice_contains_variable (dest_lat); else f
Re: wide-int branch now up for public comment and review
On 08/24/2013 08:05 AM, Richard Sandiford wrote: Richard Sandiford writes: I wonder how easy it would be to restrict this use of "zero precision" (i.e. flexible precision) to those where primitive types like "int" are used as template arguments to operators, and require a precision when constructing a wide_int. I wouldn't have expected "real" precision 0 (from zero-width bitfields or whatever) to need any special handling compared to precision 1 or 2. I tried the last bit -- requiring a precision when constructing a wide_int -- and it seemed surprising easy. What do you think of the attached? Most of the forced knock-on changes seem like improvements, but the java part is a bit ugly. I also went with "wide_int (0, prec).cmp" for now, although I'd like to add static cmp, cmps and cmpu alongside leu_p, etc., if that's OK. It would then be possible to write "wide_int::cmp (0, ...)" and avoid the wide_int construction altogether. I wondered whether you might also want to get rid of the build_int_cst* functions, but that still looks a long way off, so I hope using them in these two places doesn't seem too bad. This is just an incremental step. I've also only run it through a subset of the testsuite so far, but full tests are in progress... So i am going to make two high level comments here and then i am going to leave the ultimate decision to the community. (1) I am mildly in favor of leaving prec 0 stuff the way that it is (2) my guess is that richi also will favor this. My justification for (2) is because he had a lot of comments about this before he went on leave and this is substantially the way that it was when he left. Also, remember that one of his biggest dislikes was having to specify precisions. However, this question is really bigger than this branch which is why i hope others will join in, because this really comes down to how do we want the compiler to look when it is fully converted to c++. It has taken me a while to get used to writing and reading code like this where there is a lot of c++ magic going on behind the scenes. And with that magic comes the responsibility of the programmer to get it right. There were/are a lot of people in the gcc community that did not want go down the c++ pathway for exactly this reason. However, i am being converted. The rest of my comments are small comments about the patch, because some of it should be done no matter how the decision is made. = It is perfectly fine to add the static versions of the cmp functions and the usage of those functions in this patch looks perfectly reasonable. Thanks, Richard Index: gcc/fold-const.c === --- gcc/fold-const.c2013-08-24 12:11:08.085684013 +0100 +++ gcc/fold-const.c2013-08-24 01:00:00.0 +0100 @@ -8865,15 +8865,16 @@ pointer_may_wrap_p (tree base, tree offs if (bitpos < 0) return true; + int precision = TYPE_PRECISION (TREE_TYPE (base)); if (offset == NULL_TREE) -wi_offset = wide_int::zero (TYPE_PRECISION (TREE_TYPE (base))); +wi_offset = wide_int::zero (precision); else if (TREE_CODE (offset) != INTEGER_CST || TREE_OVERFLOW (offset)) return true; else wi_offset = offset; bool overflow; - wide_int units = wide_int::from_shwi (bitpos / BITS_PER_UNIT); + wide_int units = wide_int::from_shwi (bitpos / BITS_PER_UNIT, precision); total = wi_offset.add (units, UNSIGNED, &overflow); if (overflow) return true; So this is a part of the code that really should have been using addr_wide_int rather that wide_int. It is doing address arithmetic with bit positions.Because of this, the precision that the calculations should have been done with the precision of 3 + what comes out of the type. The addr_wide_int has a fixed precision that is guaranteed to be large enough for any address math on the machine. Index: gcc/gimple-ssa-strength-reduction.c === --- gcc/gimple-ssa-strength-reduction.c 2013-08-24 12:11:08.085684013 +0100 +++ gcc/gimple-ssa-strength-reduction.c 2013-08-24 01:00:00.0 +0100 @@ -777,7 +777,6 @@ restructure_reference (tree *pbase, tree { tree base = *pbase, offset = *poffset; max_wide_int index = *pindex; - wide_int bpu = BITS_PER_UNIT; tree mult_op0, t1, t2, type; max_wide_int c1, c2, c3, c4; @@ -786,7 +785,7 @@ restructure_reference (tree *pbase, tree || TREE_CODE (base) != MEM_REF || TREE_CODE (offset) != MULT_EXPR || TREE_CODE (TREE_OPERAND (offset, 1)) != INTEGER_CST - || !index.umod_floor (bpu).zero_p ()) + || !index.umod_floor (BITS_PER_UNIT).zero_p ()) return false; t1 = TREE_OPERAND (base, 0); @@ -822,7 +821,7 @@ restructure_reference (tree *pbase, tree c2 = 0; } - c4 = index.udiv_floor (bpu); + c4 = index.udiv_floor (BITS_PER_UNIT); this
Re: wide-int branch now up for public comment and review
On 08/13/2013 10:57 PM, Kenneth Zadeck wrote: 1) The 4 files that hold the wide-int code itself. You have seen a lot of this code before except for the infinite precision templates. Also the classes are more C++ than C in their flavor. In particular, the integration with trees is very tight in that an int-cst or regular integers can be the operands of any wide-int operation. Are any of these conversions lossy? Maybe some of these constructors should be made explicit? -- Florian Weimer / Red Hat Product Security Team
Re: wide-int branch now up for public comment and review
On 08/24/2013 02:16 PM, Florian Weimer wrote: On 08/13/2013 10:57 PM, Kenneth Zadeck wrote: 1) The 4 files that hold the wide-int code itself. You have seen a lot of this code before except for the infinite precision templates. Also the classes are more C++ than C in their flavor. In particular, the integration with trees is very tight in that an int-cst or regular integers can be the operands of any wide-int operation. Are any of these conversions lossy? Maybe some of these constructors should be made explicit? It depends, there is nothing wrong with lossy conversions as long as you know what you are doing.
Re: [Ping^4] [Patch, AArch64, ILP32] 3/5 Minor change in function.c:assign_parm_find_data_types()
On Thu, Aug 15, 2013 at 11:21 AM, Yufeng Zhang wrote: > Ping^4~ > > I am aware that it is currently holiday season, but it would be really nice > if this tiny patch can get some further comments even if it is not an > approval. > > The original RFA email is here: > http://gcc.gnu.org/ml/gcc-patches/2013-06/msg01485.html >From my point of view it is correct after understanding the ABI better though I cannot approve it. Thanks, Andrew Pinski > > Regards, > Yufeng > > > On 07/18/13 11:28, Yufeng Zhang wrote: >> >> Ping^3~ >> >> Thanks, >> Yufeng >> >> On 07/08/13 11:11, Yufeng Zhang wrote: >>> >>> Ping^2~ >>> >>> Thanks, >>> Yufeng >>> >>> >>> On 07/02/13 23:44, Yufeng Zhang wrote: Ping~ Can I get an OK please if there is no objection? Regards, Yufeng On 06/26/13 23:39, Yufeng Zhang wrote: > > This patch updates assign_parm_find_data_types to assign passed_mode > and > nominal_mode with the mode of the built pointer type instead of the > hard-coded Pmode in the case of pass-by-reference. This is in line > with > the assignment to passed_mode and nominal_mode in other cases inside > the > function. > > assign_parm_find_data_types generally uses TYPE_MODE to calculate > passed_mode and nominal_mode: > >/* Find mode of arg as it is passed, and mode of arg as it > should be > during execution of this function. */ >passed_mode = TYPE_MODE (passed_type); >nominal_mode = TYPE_MODE (nominal_type); > > this includes the case when the passed argument is a pointer by itself. > > However there is a discrepancy when it deals with argument passed by > invisible reference; it builds the argument's corresponding pointer > type, but sets passed_mode and nominal_mode with Pmode directly. > > This is OK for targets where Pmode == ptr_mode, but on AArch64 with > ILP32 they are different with Pmode as DImode and ptr_mode as SImode. > When such a reference is passed on stack, the reference is prepared by > the caller in the lower 4 bytes of an 8-byte slot but is fetched by the > callee as an 8-byte datum, of which the higher 4 bytes may contain > junk. > It is probably the combination of Pmode != ptr_mode and the > particular > ABI specification that make the AArch64 ILP32 the first target on which > the issue manifests itself. > > Bootstrapped on x86_64-none-linux-gnu. > > OK for the trunk? > > Thanks, > Yufeng > > > gcc/ > * function.c (assign_parm_find_data_types): Set passed_mode and > nominal_mode to the TYPE_MODE of nominal_type for the built > pointer type in case of the struct-pass-by-reference. >>> >>> >>> >> >> >> > >
Re: wide-int branch now up for public comment and review
fixed with the enclosed patch. On 08/23/2013 11:02 AM, Richard Sandiford wrote: /* Return true if THIS is negative based on the interpretation of SGN. For UNSIGNED, this is always false. This is correct even if precision is 0. */ inline bool wide_int::neg_p (signop sgn) const It seems odd that you have to pass SIGNED here. I assume you were doing it so that the caller is forced to confirm signedness in the cases where a tree type is involved, but: * neg_p kind of implies signedness anyway * you don't require this for minus_one_p, so the interface isn't consistent * at the rtl level signedness isn't a property of the "type" (mode), so it seems strange to add an extra hoop there Index: gcc/ada/gcc-interface/decl.c === --- gcc/ada/gcc-interface/decl.c (revision 201967) +++ gcc/ada/gcc-interface/decl.c (working copy) @@ -7479,7 +7479,7 @@ annotate_value (tree gnu_size) tree op1 = TREE_OPERAND (gnu_size, 1); wide_int signed_op1 = wide_int::from_tree (op1).sforce_to_size (TYPE_PRECISION (sizetype)); - if (signed_op1.neg_p (SIGNED)) + if (signed_op1.neg_p ()) { op1 = wide_int_to_tree (sizetype, -signed_op1); pre_op1 = annotate_value (build1 (NEGATE_EXPR, sizetype, op1)); Index: gcc/c-family/c-ada-spec.c === --- gcc/c-family/c-ada-spec.c (revision 201967) +++ gcc/c-family/c-ada-spec.c (working copy) @@ -2197,7 +2197,7 @@ dump_generic_ada_node (pretty_printer *b { wide_int val = node; int i; - if (val.neg_p (SIGNED)) + if (val.neg_p ()) { pp_minus (buffer); val = -val; Index: gcc/config/sparc/sparc.c === --- gcc/config/sparc/sparc.c (revision 201967) +++ gcc/config/sparc/sparc.c (working copy) @@ -10624,7 +10624,7 @@ sparc_fold_builtin (tree fndecl, int n_a overall_overflow |= overall_overflow; tmp = e0.add (tmp, SIGNED, &overflow); overall_overflow |= overall_overflow; - if (tmp.neg_p (SIGNED)) + if (tmp.neg_p ()) { tmp = tmp.neg (&overflow); overall_overflow |= overall_overflow; Index: gcc/expr.c === --- gcc/expr.c (revision 201967) +++ gcc/expr.c (working copy) @@ -6718,7 +6718,7 @@ get_inner_reference (tree exp, HOST_WIDE if (offset) { /* Avoid returning a negative bitpos as this may wreak havoc later. */ - if (bit_offset.neg_p (SIGNED)) + if (bit_offset.neg_p ()) { addr_wide_int mask = addr_wide_int::mask (BITS_PER_UNIT == 8 Index: gcc/fold-const.c === --- gcc/fold-const.c (revision 201967) +++ gcc/fold-const.c (working copy) @@ -183,13 +183,13 @@ div_if_zero_remainder (const_tree arg1, precision by 1 bit, iff the top bit is set. */ if (sgn == UNSIGNED) { - if (warg1.neg_p (SIGNED)) + if (warg1.neg_p ()) warg1 = warg1.force_to_size (warg1.get_precision () + 1, sgn); sgn = SIGNED; } else { - if (warg2.neg_p (SIGNED)) + if (warg2.neg_p ()) warg2 = warg2.force_to_size (warg2.get_precision () + 1, sgn2); } } @@ -979,7 +979,7 @@ int_const_binop_1 (enum tree_code code, case RSHIFT_EXPR: case LSHIFT_EXPR: - if (arg2.neg_p (SIGNED)) + if (arg2.neg_p ()) { arg2 = -arg2; if (code == RSHIFT_EXPR) @@ -999,7 +999,7 @@ int_const_binop_1 (enum tree_code code, case RROTATE_EXPR: case LROTATE_EXPR: - if (arg2.neg_p (SIGNED)) + if (arg2.neg_p ()) { arg2 = -arg2; if (code == RROTATE_EXPR) @@ -7180,7 +7180,7 @@ fold_plusminus_mult_expr (location_t loc /* As we canonicalize A - 2 to A + -2 get rid of that sign for the purpose of this canonicalization. */ if (TYPE_SIGN (TREE_TYPE (arg1)) == SIGNED - && wide_int (arg1).neg_p (SIGNED) + && wide_int (arg1).neg_p () && negate_expr_p (arg1) && code == PLUS_EXPR) { @@ -12323,7 +12323,7 @@ fold_binary_loc (location_t loc, && TYPE_SIGN (type) == SIGNED && TREE_CODE (arg1) == INTEGER_CST && !TREE_OVERFLOW (arg1) - && wide_int (arg1).neg_p (SIGNED) + && wide_int (arg1).neg_p () && !TYPE_OVERFLOW_TRAPS (type) /* Avoid this transformation if C is INT_MIN, i.e. C == -C. */ && !sign_bit_p (arg1, arg1)) Index: gcc/gimple-ssa-strength-reduction.c === --- gcc/gimple-ssa-strength-reduction.c (revision 201967) +++ gcc/gimple-ssa-strength-reduction.c (working copy) @@ -1824,7 +1824,7 @@ cand_abs_increment (slsr_cand_t c) { max_wide_int increment = cand_increment (c); - if (!address_arithmetic_p && increment.neg_p (SIGNED)) + if (!address_arithmetic_p && increment.neg_p ()) increment = -increment; return increment; @@ -1872,7 +1872,7 @@
Re: wide-int branch now up for public comment and review
The patch goes for (1) but (2) seems better to me, for a few reasons: * As above, constants coming from rtl are already in the right form, so if you create a wide_int from an rtx and only query it, no explicit extension is needed. * Things like logical operations and right shifts naturally preserve the sign-extended form, so only a subset of write operations need to take special measures. right now the internals of wide-int do not keep the bits above the precision clean. as you point out this could be fixed by changing lshift, add, sub, mul, div (and anything else i have forgotten about) and removing the code that cleans this up on exit. I actually do not really care which way we go here but if we do go on keeping the bits clean above the precision inside wide-int, we are going to have to clean the bits in the constructors from rtl, or fix some/a lot of bugs. But if you want to go with the stay clean plan you have to start clean, so at the rtl level this means copying. and it is the not copying trick that pushed me in the direction we went. At the tree level, this is not an issue. There are no constructors for tree-csts that do not have a type and before we copy the rep from the wide-int to the tree, we clean the top bits. (BTW - If i had to guess what the bug is with the missing messages on the mips port, it would be because the front ends HAD a bad habit of creating constants that did not fit into a type and then later checking to see if there were any interesting bits above the precision in the int-cst. This now does not work because we clean out those top bits on construction but it would not surprise me if we missed the fixed point constant path.) So at the tree level, we could easily go either way here, but there is a cost at the rtl level with doing (2). TBH, I think we should do (2) and fix whatever bugs you saw with invalid rtx constants. luckily (or perhaps unluckily) if you try the test it fails pretty quickly - building gcclib on the x86-64. I have enclosed a patch to check this.you can try it your self and see if you really think this is right path. good luck, i fear you may need it. on the other hand, if it is just a few quick bugs, then i will agree that (2) is right path. kenny Index: gcc/wide-int.cc === --- gcc/wide-int.cc (revision 201968) +++ gcc/wide-int.cc (working copy) @@ -171,6 +171,10 @@ wide_int_ro::from_rtx (const rtx_mode_t case CONST_INT: result.val[0] = INTVAL (x); result.len = 1; + + if (prec != HOST_BITS_PER_WIDE_INT) + gcc_assert (result.val[0] == sext_hwi (result.val[0], prec)); + #ifdef DEBUG_WIDE_INT debug_wh ("wide_int:: %s = from_rtx ("HOST_WIDE_INT_PRINT_HEX")\n", result, INTVAL (x));
Re: [C++ patch] Move FINAL flag to middle-end trees.
On 08/24/2013 05:18 AM, Jan Hubicka wrote: In the next step I would like to introduce the DECL_CPP_CONSTRUCTOR/DESTRUCTOR macro. The catch I run into is that these flags are tested on TEMPLATE_DECL so the middle-end macro bombs on type checking. I wonder what is best approach here. I think fix the front end to use STRIP_TEMPLATE to make sure we're checking/setting the flag on a FUNCTION_DECL. Jason
Add overload for register_pass
Hi, I've been working on a SH specific RTL pass and just adapted it to the new pass handling. One thing that bugged me was pass registration. How about adding an overload for 'register_pass' as in the attached patch? Registering a pass is then as simple as: register_pass (make_new_ifcvt_sh (g, false, "ifcvt1_sh"), PASS_POS_INSERT_AFTER, "ce1", 1); Tested with make all-gcc. Cheers, Oleg gcc/ChangeLog: * passes.c (register_pass): Add overload. * tree-pass.h (register_pass): Forward declare it. Add comment. Index: gcc/tree-pass.h === --- gcc/tree-pass.h (revision 201967) +++ gcc/tree-pass.h (working copy) @@ -91,7 +91,8 @@ virtual opt_pass *clone (); /* If has_gate is set, this pass and all sub-passes are executed only if - the function returns true. */ + the function returns true. + The default implementation returns true. */ virtual bool gate (); /* This is the code to run. If has_execute is false, then there should @@ -330,6 +331,14 @@ enum pass_positioning_ops pos_op; /* how to insert the new pass. */ }; +/* Registers a new pass. Either fill out the register_pass_info or specify + the individual parameters. The pass object is expected to have been + allocated using operator new and the pass manager takes the ownership of + the pass object. */ +extern void register_pass (register_pass_info *); +extern void register_pass (opt_pass* pass, pass_positioning_ops pos, + const char* ref_pass_name, int ref_pass_inst_number); + extern gimple_opt_pass *make_pass_mudflap_1 (gcc::context *ctxt); extern gimple_opt_pass *make_pass_mudflap_2 (gcc::context *ctxt); extern gimple_opt_pass *make_pass_asan (gcc::context *ctxt); @@ -594,7 +603,6 @@ extern void ipa_read_optimization_summaries (void); extern void register_one_dump_file (struct opt_pass *); extern bool function_called_by_processed_nodes_p (void); -extern void register_pass (struct register_pass_info *); /* Set to true if the pass is called the first time during compilation of the current function. Note that using this information in the optimization Index: gcc/passes.c === --- gcc/passes.c (revision 201967) +++ gcc/passes.c (working copy) @@ -1365,7 +1365,19 @@ register_pass (struct register_pass_info *pass_info) { g->get_passes ()->register_pass (pass_info); +} +void +register_pass (opt_pass* pass, pass_positioning_ops pos, + const char* ref_pass_name, int ref_pass_inst_number) +{ + register_pass_info i; + i.pass = pass; + i.reference_pass_name = ref_pass_name; + i.ref_pass_instance_number = ref_pass_inst_number; + i.pos_op = pos; + + g->get_passes ()->register_pass (&i); } void
RE: [PATCH] Add a new option "-ftree-bitfield-merge" (patch / doc inside)
On 23 August 2013 16:05:32 Zoran Jovanovic wrote: Hello, This is new patch version. Optimization does not use BIT_FIELD_REF any more, instead it generates new COMPONENT_REF and FIELD_DECL. Existing Bit field representative is associated with newly created field declaration. During analysis phase optimization uses bit field representatives when deciding which bit-fields accesses can be merged. Instead of having separate pass optimization is moved to tree-sra.c and executed with sra early. New test case involving unions is added. Also, some other comments received on first patch are applied in new implementation. Example: Original code: D.1351; D.1350; D.1349; D.1349_2 = p1_1(D)->f1; p2_3(D)->f1 = D.1349_2; D.1350_4 = p1_1(D)->f2; p2_3(D)->f2 = D.1350_4; D.1351_5 = p1_1(D)->f3; p2_3(D)->f3 = D.1351_5; Optimized code: D.1358; _16 = pr1_2(D)->_field0; pr2_4(D)->_field0 = _16; Algorithm works on basic block level and consists of following 3 major steps: 1. Go trough basic block statements list. If there are statement pairs that implement copy of bit field content from one memory location to another record statements pointers and other necessary data in corresponding data structure. 2. Identify records that represent adjacent bit field accesses and mark them as merged. 3. Modify trees accordingly. New command line option "-ftree-bitfield-merge" is introduced. Tested - passed gcc regression tests. Changelog - gcc/ChangeLog: 2013-08-22 Zoran Jovanovic (zoran.jovano...@imgtec.com) * Makefile.in : Added tree-sra.c to GTFILES. * common.opt (ftree-bitfield-merge): New option. * doc/invoke.texi: Added reference to "-ftree-bitfield-merge". * tree-sra.c (ssa_bitfield_merge): New function. Entry for (-ftree-bitfield-merge). (bitfield_stmt_access_pair_htab_hash): New function. (bitfield_stmt_access_pair_htab_eq): New function. (cmp_access): New function. (create_and_insert_access): New function. (get_bit_offset): New function. (get_merged_bit_field_size): New function. (add_stmt_access_pair): New function. * dwarf2out.c (simple_type_size_in_bits): moved to tree.c. (field_byte_offset): declaration moved to tree.h, static removed. * testsuite/gcc.dg/tree-ssa/bitfldmrg1.c: New test. * testsuite/gcc.dg/tree-ssa/bitfldmrg2.c: New test. * tree-ssa-sccvn.c (expressions_equal_p): moved to tree.c. * tree-ssa-sccvn.h (expressions_equal_p): declaration moved to tree.h. * tree.c (expressions_equal_p): moved from tree-ssa-sccvn.c. (simple_type_size_in_bits): moved from dwarf2out.c. * tree.h (expressions_equal_p): declaration added. (field_byte_offset): declaration added. Patch - diff --git a/gcc/Makefile.in b/gcc/Makefile.in index 6034046..dad9337 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -3831,6 +3831,7 @@ GTFILES = $(CPP_ID_DATA_H) $(srcdir)/input.h $(srcdir)/coretypes.h \ $(srcdir)/vtable-verify.c \ $(srcdir)/asan.c \ $(srcdir)/tsan.c $(srcdir)/ipa-devirt.c \ + $(srcdir)/tree-sra.c \ @all_gtfiles@ # Compute the list of GT header files from the corresponding C sources, diff --git a/gcc/common.opt b/gcc/common.opt index 9082280..fe0ecd9 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -2160,6 +2160,10 @@ ftree-sra Common Report Var(flag_tree_sra) Optimization Perform scalar replacement of aggregates +ftree-bitfield-merge +Common Report Var(flag_tree_bitfield_merge) Init(0) Optimization +Enable bit-field merge on trees + ftree-ter Common Report Var(flag_tree_ter) Optimization Replace temporary expressions in the SSA->normal pass diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index dae7605..7abe538 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -412,7 +412,7 @@ Objective-C and Objective-C++ Dialects}. -fsplit-ivs-in-unroller -fsplit-wide-types -fstack-protector @gol -fstack-protector-all -fstack-protector-strong -fstrict-aliasing @gol -fstrict-overflow -fthread-jumps -ftracer -ftree-bit-ccp @gol --ftree-builtin-call-dce -ftree-ccp -ftree-ch @gol +-ftree-bitfield-merge -ftree-builtin-call-dce -ftree-ccp -ftree-ch @gol -ftree-coalesce-inline-vars -ftree-coalesce-vars -ftree-copy-prop @gol -ftree-copyrename -ftree-dce -ftree-dominator-opts -ftree-dse @gol -ftree-forwprop -ftree-fre -ftree-loop-if-convert @gol @@ -7646,6 +7646,11 @@ pointer alignment information. This pass only operates on local scalar variables and is enabled by default at @option{-O} and higher. It requires that @option{-ftree-ccp} is enabled. +@item -ftree-bitfield-merge +@opindex ftree-bitfield-merge +Combines several adjacent bit-field accesses that copy values +from one memory location to another into single bit-field access. into one single Would be easier to understand, IMHO. Same for the other occurrences in this patch. + @item -ftree-ccp @opindex ftree-ccp Perform sparse conditional constant propagation (CCP) on trees. This diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c index fc1c3f2..f35
Clean up pretty printers [13/n]
Most of the specialized pretty printing functions from the C-family languages are really virtual functions. This patchlet makes the first explicitly so. Tested on an x86_64-suse-linux. Applied to trunk. -- Gaby c-family/ 2013-08-24 Gabriel Dos Reis * c-pretty-print.h (c_pretty_printer::constant): Now a virtual member function. (pp_constant): Adjust. (pp_c_constant): Remove. * c-pretty-print.c (c_pretty_printer::constant): Rename from pp_c_constant. Adjust. (pp_c_constant) (pp_c_primary_expression): Call pp_constant in lieu of pp_c_constant. (c_pretty_printer::c_pretty_printer): Remove assignment to constant. cp/ * cxx-pretty-print.h (cxx_pretty_printer::constant): Now a member function, overriding c_pretty_printer::constant. * cxx-pretty-print.c (cxx_pretty_printer::constant): Rename from pp_cxx_constant. Adjust. (cxx_pretty_printer::cxx_pretty_printer): Do not assign to constant. Index: c-family/c-pretty-print.c === --- c-family/c-pretty-print.c (revision 201968) +++ c-family/c-pretty-print.c (working copy) @@ -1130,7 +1130,7 @@ character-constant */ void -pp_c_constant (c_pretty_printer *pp, tree e) +c_pretty_printer::constant (tree e) { const enum tree_code code = TREE_CODE (e); @@ -1140,38 +1140,38 @@ { tree type = TREE_TYPE (e); if (type == boolean_type_node) - pp_c_bool_constant (pp, e); + pp_c_bool_constant (this, e); else if (type == char_type_node) - pp_c_character_constant (pp, e); + pp_c_character_constant (this, e); else if (TREE_CODE (type) == ENUMERAL_TYPE -&& pp_c_enumeration_constant (pp, e)) +&& pp_c_enumeration_constant (this, e)) ; else - pp_c_integer_constant (pp, e); + pp_c_integer_constant (this, e); } break; case REAL_CST: - pp_c_floating_constant (pp, e); + pp_c_floating_constant (this, e); break; case FIXED_CST: - pp_c_fixed_constant (pp, e); + pp_c_fixed_constant (this, e); break; case STRING_CST: - pp_c_string_literal (pp, e); + pp_c_string_literal (this, e); break; case COMPLEX_CST: /* Sometimes, we are confused and we think a complex literal is a constant. Such thing is a compound literal which grammatically belongs to postfix-expr production. */ - pp_c_compound_literal (pp, e); + pp_c_compound_literal (this, e); break; default: - pp_unsupported_tree (pp, e); + pp_unsupported_tree (this, e); break; } } @@ -1236,7 +1236,7 @@ case REAL_CST: case FIXED_CST: case STRING_CST: - pp_c_constant (pp, e); + pp_constant (pp, e); break; case TARGET_EXPR: @@ -1357,7 +1357,7 @@ { pp_c_left_bracket (pp); if (TREE_PURPOSE (init)) - pp_c_constant (pp, TREE_PURPOSE (init)); + pp_constant (pp, TREE_PURPOSE (init)); pp_c_right_bracket (pp); } pp_c_whitespace (pp); @@ -2339,7 +2339,6 @@ statement = pp_c_statement; - constant = pp_c_constant; id_expression = pp_c_id_expression; primary_expression= pp_c_primary_expression; postfix_expression= pp_c_postfix_expression; Index: c-family/c-pretty-print.h === --- c-family/c-pretty-print.h (revision 201968) +++ c-family/c-pretty-print.h (working copy) @@ -51,6 +51,7 @@ { c_pretty_printer (); + virtual void constant (tree); /* Points to the first element of an array of offset-list. Not used yet. */ int *offset_list; @@ -76,7 +77,6 @@ c_pretty_print_fn statement; - c_pretty_print_fn constant; c_pretty_print_fn id_expression; c_pretty_print_fn primary_expression; c_pretty_print_fn postfix_expression; @@ -109,7 +109,7 @@ #define pp_statement(PP, S) (PP)->statement (PP, S) -#define pp_constant(PP, E) (PP)->constant (PP, E) +#define pp_constant(PP, E) (PP)->constant (E) #define pp_id_expression(PP, E) (PP)->id_expression (PP, E) #define pp_primary_expression(PP, E)(PP)->primary_expression (PP, E) #define pp_postfix_expression(PP, E)(PP)->postfix_expression (PP, E) @@ -169,7 +169,6 @@ void pp_c_postfix_expression (c_pretty_printer *, tree); void pp_c_primary_expression (c_pretty_printer *, tree); void pp_c_init_declarator (c_pretty_printer *, tree); -void pp_c_constant (c_pretty_printer *, tree); void pp_c_id_expression (c_pretty_printer *, tree); void pp_c_ws_string (c_pretty_printer *, const char *); void pp_c_identifier (c_pretty_printer *, const c
Clean up pretty printers [14/n]
Same as previous topic; for id_expression. -- Gaby 2013-08-24 Gabriel Dos Reis c-family/ * c-pretty-print.h (c_pretty_printer::id_expression): Now a virtual function. (pp_c_id_expression): Remove. (pp_id_expression): Adjust. * c-pretty-print.c (c_pretty_printer::id_expression): Rename from pp_c_id_expression. Adjust. (pp_c_postfix_expression): Use pp_id_expression. (c_pretty_printer::c_pretty_printer): Do not assign to id_expression. cp/ * cxx-pretty-print.h (cxx_pretty_printer::id_expression): Declare. * cxx-pretty-print.c (cxx_pretty_printer::id_expression): Rename from pp_cxx_id_expression. Adjust. (pp_cxx_userdef_literal): Use pp_id_expression. (pp_cxx_primary_expression): Likewise. (pp_cxx_direct_declarator): Likewise. (cxx_pretty_printer::cxx_pretty_printer): Do not assign to id_expression. Index: c-family/c-pretty-print.c === --- c-family/c-pretty-print.c (revision 201969) +++ c-family/c-pretty-print.c (working copy) @@ -1422,7 +1422,7 @@ identifier */ void -pp_c_id_expression (c_pretty_printer *pp, tree t) +c_pretty_printer::id_expression (tree t) { switch (TREE_CODE (t)) { @@ -1433,15 +1433,15 @@ case FUNCTION_DECL: case FIELD_DECL: case LABEL_DECL: - pp_c_tree_decl_identifier (pp, t); + pp_c_tree_decl_identifier (this, t); break; case IDENTIFIER_NODE: - pp_c_tree_identifier (pp, t); + pp_c_tree_identifier (this, t); break; default: - pp_unsupported_tree (pp, t); + pp_unsupported_tree (this, t); break; } } @@ -1645,7 +1645,7 @@ case ADDR_EXPR: if (TREE_CODE (TREE_OPERAND (e, 0)) == FUNCTION_DECL) { - pp_c_id_expression (pp, TREE_OPERAND (e, 0)); + pp_id_expression (pp, TREE_OPERAND (e, 0)); break; } /* else fall through. */ @@ -2339,7 +2339,6 @@ statement = pp_c_statement; - id_expression = pp_c_id_expression; primary_expression= pp_c_primary_expression; postfix_expression= pp_c_postfix_expression; unary_expression = pp_c_unary_expression; Index: c-family/c-pretty-print.h === --- c-family/c-pretty-print.h (revision 201969) +++ c-family/c-pretty-print.h (working copy) @@ -52,6 +52,7 @@ c_pretty_printer (); virtual void constant (tree); + virtual void id_expression (tree); /* Points to the first element of an array of offset-list. Not used yet. */ int *offset_list; @@ -77,7 +78,6 @@ c_pretty_print_fn statement; - c_pretty_print_fn id_expression; c_pretty_print_fn primary_expression; c_pretty_print_fn postfix_expression; c_pretty_print_fn unary_expression; @@ -110,7 +110,7 @@ #define pp_statement(PP, S) (PP)->statement (PP, S) #define pp_constant(PP, E) (PP)->constant (E) -#define pp_id_expression(PP, E) (PP)->id_expression (PP, E) +#define pp_id_expression(PP, E) (PP)->id_expression (E) #define pp_primary_expression(PP, E)(PP)->primary_expression (PP, E) #define pp_postfix_expression(PP, E)(PP)->postfix_expression (PP, E) #define pp_unary_expression(PP, E) (PP)->unary_expression (PP, E) @@ -169,7 +169,6 @@ void pp_c_postfix_expression (c_pretty_printer *, tree); void pp_c_primary_expression (c_pretty_printer *, tree); void pp_c_init_declarator (c_pretty_printer *, tree); -void pp_c_id_expression (c_pretty_printer *, tree); void pp_c_ws_string (c_pretty_printer *, const char *); void pp_c_identifier (c_pretty_printer *, const char *); void pp_c_string_literal (c_pretty_printer *, tree); Index: cp/cxx-pretty-print.c === --- cp/cxx-pretty-print.c (revision 201969) +++ cp/cxx-pretty-print.c (working copy) @@ -355,15 +355,15 @@ unqualified-id qualified-id */ -static inline void -pp_cxx_id_expression (cxx_pretty_printer *pp, tree t) +void +cxx_pretty_printer::id_expression (tree t) { if (TREE_CODE (t) == OVERLOAD) t = OVL_CURRENT (t); if (DECL_P (t) && DECL_CONTEXT (t)) -pp_cxx_qualified_id (pp, t); +pp_cxx_qualified_id (this, t); else -pp_cxx_unqualified_id (pp, t); +pp_cxx_unqualified_id (this, t); } /* user-defined literal: @@ -373,7 +373,7 @@ pp_cxx_userdef_literal (cxx_pretty_printer *pp, tree t) { pp_constant (pp, USERDEF_LITERAL_VALUE (t)); - pp_cxx_id_expression (pp, USERDEF_LITERAL_SUFFIX_ID (t)); + pp_id_expression (pp, USERDEF_LITERAL_SUFFIX_ID (t)); } @@ -436,7 +436,7 @@ case OVERLOAD: case CONST_DECL: case TEMPLATE_DECL: - pp_cxx_id_expression (pp, t); + pp_id_expression (pp, t); break; case RESULT_DECL: @@
Clean up pretty printers [15/n]
Instead of defining the same macro several times in different translation units, we can just make it a function and use it where needed. Tested on an x86_64-suse-linux. Applied to mainline. -- Gaby 2013-08-25 Gabriel Dos Reis c-family/ * c-pretty-print.h (c_pretty_printer::translate_string): Declare. * c-pretty-print.c (M_): Remove. (c_pretty_printer::translate_string): Define. (pp_c_type_specifier): Use it. (pp_c_primary_expression): Likewise. (pp_c_expression): Likewise. cp/ * cxx-pretty-print.c (M_): Remove. (pp_cxx_unqualified_id): Use translate_string instead of M_. (pp_cxx_canonical_template_parameter): Likewise. Index: c-family/c-pretty-print.c === --- c-family/c-pretty-print.c (revision 201973) +++ c-family/c-pretty-print.c (working copy) @@ -29,10 +29,6 @@ #include "tree-iterator.h" #include "diagnostic.h" -/* Translate if being used for diagnostics, but not for dump files or - __PRETTY_FUNCTION. */ -#define M_(msgid) (pp_translate_identifiers (pp) ? _(msgid) : (msgid)) - /* The pretty-printer code is primarily designed to closely follow (GNU) C and C++ grammars. That is to be contrasted with spaghetti codes we used to have in the past. Following a structured @@ -341,7 +337,7 @@ switch (code) { case ERROR_MARK: - pp_c_ws_string (pp, M_("")); + pp->translate_string (""); break; case IDENTIFIER_NODE: @@ -379,15 +375,15 @@ switch (code) { case INTEGER_TYPE: - pp_string (pp, (TYPE_UNSIGNED (t) - ? M_("translate_string (TYPE_UNSIGNED (t) +? "translate_string ("translate_string ("")); + pp->translate_string (""); break; case UNION_TYPE: @@ -415,12 +411,12 @@ else if (code == ENUMERAL_TYPE) pp_c_ws_string (pp, "enum"); else - pp_c_ws_string (pp, M_("")); + pp->translate_string (""); if (TYPE_NAME (t)) pp_id_expression (pp, TYPE_NAME (t)); else - pp_c_ws_string (pp, M_("")); + pp->translate_string (""); break; default: @@ -1187,6 +1183,15 @@ pp->padding = pp_before; } +void +c_pretty_printer::translate_string (const char *gmsgid) +{ + if (pp_translate_identifiers (this)) +pp_c_ws_string (this, _(gmsgid)); + else +pp_c_ws_string (this, gmsgid); +} + /* Pretty-print an IDENTIFIER_NODE, which may contain UTF-8 sequences that need converting to the locale encoding, preceded by whitespace is necessary. */ @@ -1225,11 +1230,11 @@ break; case ERROR_MARK: - pp_c_ws_string (pp, M_("")); + pp->translate_string (""); break; case RESULT_DECL: - pp_c_ws_string (pp, M_("")); + pp->translate_string (""); break; case INTEGER_CST: @@ -2155,7 +2160,7 @@ && !DECL_ARTIFICIAL (SSA_NAME_VAR (e))) pp_c_expression (pp, SSA_NAME_VAR (e)); else - pp_c_ws_string (pp, M_("")); + pp->translate_string (""); break; case POSTINCREMENT_EXPR: Index: c-family/c-pretty-print.h === --- c-family/c-pretty-print.h (revision 201973) +++ c-family/c-pretty-print.h (working copy) @@ -51,6 +51,9 @@ { c_pretty_printer (); + // Format string, possibly translated. + void translate_string (const char *); + virtual void constant (tree); virtual void id_expression (tree); /* Points to the first element of an array of offset-list. Index: cp/cxx-pretty-print.c === --- cp/cxx-pretty-print.c (revision 201973) +++ cp/cxx-pretty-print.c (working copy) @@ -27,10 +27,6 @@ #include "cxx-pretty-print.h" #include "tree-pretty-print.h" -/* Translate if being used for diagnostics, but not for dump files or - __PRETTY_FUNCTION. */ -#define M_(msgid) (pp_translate_identifiers (pp) ? _(msgid) : (msgid)) - static void pp_cxx_unqualified_id (cxx_pretty_printer *, tree); static void pp_cxx_nested_name_specifier (cxx_pretty_printer *, tree); static void pp_cxx_qualified_id (cxx_pretty_printer *, tree); @@ -149,7 +145,7 @@ switch (code) { case RESULT_DECL: - pp_cxx_ws_string (pp, M_("")); + pp->translate_string (""); break; case OVERLOAD: @@ -168,7 +164,7 @@ case IDENTIFIER_NODE: if (t == NULL) - pp_cxx_ws_string (pp, M_("")); + pp->translate_string (""); else if (IDENTIFIER_TYPENAME_P (t)) pp_cxx_conversion_function_id (pp, t); else @@ -2154,7 +2150,7 @@ parm = TEMPLATE_TYPE_PARM_INDEX (parm); pp_cxx_begin_template_argument_list (pp); - pp_cxx_ws_string (pp, M_("template-parameter-")); + pp->translate_string ("template-parameter-"); pp_wi
Re: wide-int branch now up for public comment and review
Kenneth Zadeck writes: > On 08/24/2013 08:05 AM, Richard Sandiford wrote: >> Richard Sandiford writes: >>> I wonder how easy it would be to restrict this use of "zero precision" >>> (i.e. flexible precision) to those where primitive types like "int" are >>> used as template arguments to operators, and require a precision when >>> constructing a wide_int. I wouldn't have expected "real" precision 0 >>> (from zero-width bitfields or whatever) to need any special handling >>> compared to precision 1 or 2. >> I tried the last bit -- requiring a precision when constructing a >> wide_int -- and it seemed surprising easy. What do you think of >> the attached? Most of the forced knock-on changes seem like improvements, >> but the java part is a bit ugly. I also went with "wide_int (0, prec).cmp" >> for now, although I'd like to add static cmp, cmps and cmpu alongside >> leu_p, etc., if that's OK. It would then be possible to write >> "wide_int::cmp (0, ...)" and avoid the wide_int construction altogether. >> >> I wondered whether you might also want to get rid of the build_int_cst* >> functions, but that still looks a long way off, so I hope using them in >> these two places doesn't seem too bad. >> >> This is just an incremental step. I've also only run it through a >> subset of the testsuite so far, but full tests are in progress... > So i am going to make two high level comments here and then i am going > to leave the ultimate decision to the community. (1) I am mildly in > favor of leaving prec 0 stuff the way that it is (2) my guess is that > richi also will favor this. My justification for (2) is because he had > a lot of comments about this before he went on leave and this is > substantially the way that it was when he left. Also, remember that one > of his biggest dislikes was having to specify precisions. Hmm, but you seem to be talking about zero precision in general. (I'm going to call it "flexible precision" to avoid confusion with the zero-width bitfield stuff.) Whereas this patch is specifically about constructing flexible-precision _wide_int_ objects. I think wide_int objects should always have a known, fixed precision. Note that fixed_wide_ints can still use primitive types in the same way as before, since there the precision is inherent to the fixed_wide_int. The templated operators also work in the same way as before. Only the construction of wide_int proper is affected. As it stands you have various wide_int operators that cannot handle two flexible-precision inputs. This means that innocent-looking code like: extern wide_int foo (wide_int); wide_int bar () { return foo (0); } ICEs when combined with equally innocent-looking code like: wide_int foo (wide_int x) { return x + 1; } So in practice you have to know when calling a function whether any paths in that function will try applying an operator with a primitive type. If so, you need to specify a precison when constructing the wide_int argument. If not you can leave it out. That seems really unclean. The point of this template stuff is to avoid constructing wide_int objects from primitive integers whereever possible. And I think the fairly small size of the patch shows that you've succeeded in doing that. But I think we really should specify a precision in the handful of cases where a wide_int does still need to be constructed directly from a primitive type. Thanks, Richard