Question on BOOT_CFLAGS vs. CFLAGS
All - When I configure with --disable-bootstrap and build with: CFLAGS="-g -O0" The resultant compiler is built with the specified options. However, if I --enable-bootstrap, when I build with the same CFLAGS, these options are not used to build the final compiler. I can get past this by using BOOT_CFLAGS="-g -O0", but that seems a bit inconsistent. My question is: Is this behaving as designed or would it be reasonable to file a bug and/or supply a patch to change the behavior so that CFLAGS are respected whether bootstrapping is enabled or not? Thanks - Josh
Re: Question on BOOT_CFLAGS vs. CFLAGS
Paul Brook wrote: > On Friday 15 December 2006 01:37, Josh Conner wrote: >> All - >> >> When I configure with --disable-bootstrap and build with: >> >> CFLAGS="-g -O0" >> >> The resultant compiler is built with the specified options. However, if >> I --enable-bootstrap, when I build with the same CFLAGS, these options >> are not used to build the final compiler. I can get past this by using >> BOOT_CFLAGS="-g -O0", but that seems a bit inconsistent. >> >> My question is: Is this behaving as designed or would it be reasonable >> to file a bug and/or supply a patch to change the behavior so that >> CFLAGS are respected whether bootstrapping is enabled or not? > > It is working as documented: > > http://gcc.gnu.org/onlinedocs/gccint/Makefile.html > http://gcc.gnu.org/install/build.html OK - it seemed a bit strange to me, but I'm glad to hear it is behaving as intended. - Josh
ARM EABI Exception Handling
Can anyone tell me if there are plans to incorporate the ARM EABI exception handling (a la CSL) into the mainline gcc ARM port? Thanks - Josh Josh Conner Apple Computer
Re: GCC 4.0 RC2 Available
On Apr 18, 2005, at 3:12 PM, Julian Brown wrote: Results for arm-none-elf, cross-compiled from i686-pc-linux-gnu (Debian) for C and C++ are here: http://gcc.gnu.org/ml/gcc-testresults/2005-04/msg01301.html Relative to RC1, there are several new tests which pass, and: g++.dg/warn/Wdtor1.C (test for excess errors) works whereas it didn't before. Julian Has there been any discussion of builtin-bitops-1.c failing for arm-none-elf with RC1 + RC2? It's a moderately serious regression from 3.4.3 -- we're no longer correctly calculating the size of load double instructions, and so the constant pool is generated at an address that is out of range of the instructions that use them. It will show up in moderately large functions that use double word constants. The problem arises because the function const_double_needs_minipool, which decides whether to use a constant pool or do an inline calculation of a constant, bases its decision on fairly sophisticated logic (from arm_gen_constant), for example figuring that a constant like 0xcafecafe can be calculated in 4 instructions: mov r0, #0 mov r1, #51712 ; 0xca00 add r1, r1, #254 ; 0xcafe orr r1, r1, r1, asl #16 ; 0xcafecafe But this code never gets generated, because the instruction pattern arm_movdi actually uses a much simpler algorithm (from the function output_move_double), which takes 5 instructions for this same example: mov r0, #0 mvn r1, #1 bic r1, r1, #13568 bic r1, r1, #65536 bic r1, r1, #889192448 Which means that the "length" attribute for arm_movdi with these addressing modes should more accurately be 20 instead of 16. Richard Earnshaw fixed this behavior in the mainline on 8 April: http://gcc.gnu.org/ml/gcc-patches/2005-04/msg00850.html by using the more efficient constant generator function (arm_gen_constant) when generating double-word constants. However, this was not applied to the 4.0 release branch. Has this regression been discussed already (if so, sorry - I did search the archives)? If not, is it worth considering applying Richard's patch, or even a simpler one: *** arm.md Wed Feb 16 13:57:10 2005 --- arm.md Tue Apr 19 17:09:47 2005 *** *** 4167,4173 "* return (output_move_double (operands)); " ! [(set_attr "length" "8,12,16,8,8") (set_attr "type" "*,*,*,load2,store2") (set_attr "pool_range" "*,*,*,1020,*") (set_attr "neg_pool_range" "*,*,*,1008,*")] --- 4167,4173 "* return (output_move_double (operands)); " ! [(set_attr "length" "8,12,20,8,8") (set_attr "type" "*,*,*,load2,store2") (set_attr "pool_range" "*,*,*,1020,*") (set_attr "neg_pool_range" "*,*,*,1008,*")] Thanks - Josh Josh Conner
RFC: improving estimate_num_insns
In looking at the function inliner, I did some analysis of the correctness of estimate_num_insns (from tree-inline.c), as this measurement is critical to the inlining heuristics. Here is the overview of what I found on the functions in two sample files from the FAAD2 AAC library (measured at -O3 with inlining disabled): syntax.c avg.avg. std. targetratio errordeviation arm-none 2.88 57% 2.37 darwin-ppc3.17 55% 2.50 linux-x86 2.72 55% 2.06 huffman.c avg.avg. std. targetratio errordeviation arm-none 3.2756% 2.37 darwin-ppc4.1953% 2.88 linux-x86 3.4256% 2.39 fn. differential = actual bytes / estimated insns avg. ratio = average (fn. differential) of all functions fn. error = max (fn. differential, avg. ratio) / min (fn. differential, avg. ratio) avg. error = average (fn. error) of all functions std. deviation = standard deviation (fn. differential) across all functions Note that by choosing these metrics, my intent was to isolate the consistency of the differential from the magnitude of the differential. In other words, the goal was to have estimates that were a consistent ratio to the actual results, whatever that ratio may be. Thinking that there may be room for improvement on this, I tried this same experiment with a couple of adjustments to estimate_num_insns: - Instead of ignoring constants, assign them a weight of 2 instructions (one for the constant itself and one to load into memory) - Instead of ignoring dereferences, assign a weight of 2 to an array reference and 1 for a pointer dereference - Instead of ignoring case labels, assign them a weight of 2 instructions (to represent the cost of control logic to get there) With these modifications (tree-estimate.patch), I see the following results: syntax.c avg.avg. std. targetratio errordeviation arm-none 1.6829% 0.67 darwin-ppc1.8429% 0.69 linux-x86 1.6226% 0.55 huffman.c avg.avg. std. targetratio errordeviation arm-none 1.7526% 0.52 darwin-ppc2.3124% 0.69 linux-x86 1.9525% 0.67 Which appears to be a significant improvement in the accuracy of estimate_num_insns. However, note that since the avg. ratio has decreased significantly (estimates have gone up because we're counting things that were ignored before), any heuristics that are based on instruction counts will have effectively decreased by ~25-50%. To compensate for that, I bumped up any constants that are based on instruction estimates by 50% each (see inl-costs.patch). With both of these changes in place, I ran some simple SPEC2000 integer benchmarks (not fine-tuned, not reportable) and saw the following impact on performance (positive is better): darwin-ppc linux-x86 linux-arm* gzip -4.7%+11.7% +1.3% vpr -0.7%-1.1% - mcf -0.3%+1.1% - crafty-0.8%+1.8% - parser-- - eon -4.4%+0.3% - perlbmk +0.4%0.0% -1.1% gap -+1.0% -2.9% vortex+4.1%0.0% 0.0% bzip2 +1.2%+1.1% -1.3% twolf -0.5%-2.9% -1.2% - = unable to obtain a valid result * linux-arm results are based on a subset of SPEC data and command- line options for use on a 32MB embedded board -- not even close to the full SPEC suite. For what it's worth, code size is equal to or smaller for all benchmarks across all platforms. So, here are the open issues I see at this point: 1. It appears that this change to estimate_num_instructions generates a much better estimate of actual code size. However, the benchmark results are ambiguous. Is this patch worth considering as-is? 2. Increasing instruction weights causes the instruction-based values (e.g., --param max-inline-insns-auto) to be effectively lower. However, changing these constants/defaults as in the second patch will cause a semantic change to anyone who is setting these values at the command line. Is that change acceptable? I do realize that this area is one that will eventually be likely best served by target-dependent routines (and constants), however I also see a significant benefit to all targets in fixing the default implementation first. Thoughts? Advice? Thanks - Josh Josh Conner tree-estimate.patch Description: Binary data inl-costs.patch Description: Binary data
Re: RFC: improving estimate_num_insns
On Jul 12, 2005, at 1:07 PM, Steven Bosscher wrote: You don't say what compiler you used for these measurements. I suppose you used mainline? Yes, I am working with the mainline. I think you should look at a lot more code than this. OK - I stopped because I was seeing fairly consistent results. However, since both of these files were from the same source base, I can see how this may not be representative. I'll try some more examples from different source code, including C++. Thinking that there may be room for improvement on this, I tried this same experiment with a couple of adjustments to estimate_num_insns: - Instead of ignoring constants, assign them a weight of 2 instructions (one for the constant itself and one to load into memory) Why the load into memory, you mean constants that must be loaded from the constant pool? I guess the cost for the constant itself depends on whether it is a legitimate constant or not, and how it is loaded, and this is all very target specific. But a cost greater than 0 probably makes sense. Good point - I was thinking of constant pools when I wrote that, even though that isn't always the case. It would be nice if you retain the comment about constants in the existing code somehow. The cost of constants is not trivial, and you should explain your choice better in a comment. I'm not sure which comment you mean - the one from my email, or the one that was originally in the source code? - Instead of ignoring case labels, assign them a weight of 2 instructions (to represent the cost of control logic to get there) This depends on how the switch statement is expanded, e.g. to a binary decision tree, or to a table jump, or to something smaller than that. So again (like everything else, sadly) this is highly target-specific and even context-specific. I'd think a cost of 2 is too pessimistic in most cases. Do you mean that 2 is too high? I actually got (slightly) better statistical results with a cost of 3, but I had the same reaction (that it was too pessimistic), which is why I settled on 2. You could look at the code in stmt.c to see how switch statements are expanded. Maybe there is a cheap test you can do to make CASE_LABELs free for very small switch statements (i.e. the ones which should never hold you back from inlining the function containing them ;-). I'll look into this. For what it's worth, code size is equal to or smaller for all benchmarks across all platforms. What about the compile time? Oh no! I didn't measure this. I will have a look. So, here are the open issues I see at this point: 1. It appears that this change to estimate_num_instructions generates a much better estimate of actual code size. However, the benchmark results are ambiguous. Is this patch worth considering as-is? I would say you'd at least have to look into the ppc gzip and eon regressions before this is acceptable. But it is not my decision to make, of course. Makes sense. I'll see what kind of time I can put into this. 2. Increasing instruction weights causes the instruction-based values (e.g., --param max-inline-insns-auto) to be effectively lower. However, changing these constants/defaults as in the second patch will cause a semantic change to anyone who is setting these values at the command line. Is that change acceptable? This has constantly changed from one release to the next since GCC 3.3, so I don't think this should be a problem. Whew... Thoughts? Advice? ...on to advice then. First of all, I think you should handle ARRAY_RANGE_REF and ARRAY_REF the same. And you probably should make BIT_FIELD_REF more expensive, and maybe make its cost depend on which bit you're addressing (you can see that in operands 1 and 2 of the BIT_FIELD_REF). Its current cost of just 1 is probably too optimistic, just like the other cases you are trying to address with this patch. Great - thanks for the suggestions, I'll try these changes as well and see if I get even better correlation. Second, you may want to add a target hook to return the cost of target builtins. Even builtins that expand to just one instruction are now counted as 16 insns plus the cost of the arguments list. This badly hurts when you use e.g. SSE intrinsics. It's probably not an issue for the benchmark you looked at now, but maybe you want to look into it anyway, while you're at it. OK, I'll look into this. Third, (measured at -O3 with inlining disabled): Then why not just -O2 with inlining disabled? Now you have enabled loop unswitching, which is known to sometimes significantly increase code size. Well, I thought that inlining estimates were most likely to be relevant at O3, and so I should include as many other optimizations as were likely to be performed alongside inlining, even though this may make it more difficult to get accurate estimates. Fourth, look at _much_ more code than this. I w
RFA: Combine issue
I'm seeing invalid code produced for a simple test case targeting arm- none-elf (attached), which I believe is caused by an invalid transformation in simplify_comparison. It's transforming code of the form: (compare (subreg:QI (plus (reg:SI) (-1))) (-3)) into: (compare (plus (reg:SI) (-1)) (-3)) But I don't believe this transformation (lifting the subreg) is valid for this particular instance. Here are the more explicit rtl dumps - before: (insn 14 12 15 0 (set (reg:SI 104) (plus:SI (reg:SI 103 [ .uc8 ]) (const_int -1 [0x]))) 4 {*arm_addsi3} (insn_list:REG_DEP_TRUE 12 (nil)) (expr_list:REG_DEAD (reg:SI 103 [ .uc8 ]) (nil))) (insn 15 14 16 0 (set (reg:SI 105) (and:SI (reg:SI 104) (const_int 255 [0xff]))) 53 {*arm_andsi3_insn} (insn_list:REG_DEP_TRUE 14 (nil)) (expr_list:REG_DEAD (reg:SI 104) (nil))) (insn 16 15 17 0 (set (reg:CC 24 cc) (compare:CC (reg:SI 105) (const_int 253 [0xfd]))) 194 {*arm_cmpsi_insn} (insn_list:REG_DEP_TRUE 15 (nil)) (expr_list:REG_DEAD (reg:SI 105) (nil))) After: (note 14 12 15 0 NOTE_INSN_DELETED) (insn 15 14 16 0 (set (reg:SI 105) (plus:SI (reg:SI 103 [ .uc8 ]) (const_int -1 [0x]))) 4 {*arm_addsi3} (insn_list:REG_DEP_TRUE 12 (nil)) (expr_list:REG_DEAD (reg:SI 103 [ .uc8 ]) (nil))) (insn 16 15 17 0 (set (reg:CC 24 cc) (compare:CC (reg:SI 105) (const_int -3 [0xfffd]))) 194 {*arm_cmpsi_insn} (insn_list:REG_DEP_TRUE 15 (nil)) (expr_list:REG_DEAD (reg:SI 105) (nil))) The conversion is being done at the "case SUBREG" code starting around line 10191 of combine.c. The conversion is allowed because the following tests are met: ((unsigned HOST_WIDE_INT) c1 < (unsigned HOST_WIDE_INT) 1 << (mode_width - 2) /* (A - C1) always sign-extends, like C2. */ && num_sign_bit_copies (a, inner_mode) > (unsigned int) (GET_MODE_BITSIZE (inner_mode) - mode_width - 1)) In my case: c1 is 3, which is less than 1 << 6, and num_sign_bit_copies(a,SI) is 24 (because the SI register "a" contains a zero-extended QI value), and "GET_MODE_BITSIZE (inner_mode) - mode_width - 1" is 23 Now, as I understand things, we're trying to see if "compare (subreg (a - c1), c2)" is the same as "compare (a - c1, c2)". And, this is considered acceptable if the result of "a - c1" is identical to a sign extended "subreg (a - c1)". So far, so good. However, as I apply the code to the SI/QI subreg case reads: if (INTVAL (c1) < 64) and there are at least 24 identical bits at the top of a This logic, however, doesn't seem to match. First, it would seem there should be at least 25 identical bits to verify that we do indeed have a sign-extended QI value. But even with that addressed, consider another case (again with the QI/SI for concreteness). Suppose we have an expression like: lt (subreg (a - 1), -3) And, suppose for this example that "a" is a SI sign-extended from a signed QI. So, num_sign_bit_copies would be 25. The transformation would be allowed (because c1 is less than 64 and num_sign_bit_copies is greater than 24). However, if "a" contained a value of -128 at run-time: lt (subreg (a - 1), -3) becomes lt (subreg (-129), -3) becomes lt (127), -3 == FALSE whereas after the transformation: lt (a - 1, -3) becomes lt (-129, -3) == TRUE Am I misinterpreting the logic? Am I missing something fundamental? I appreciate any feedback / pointers / clues / etc... - Josh test.c Description: Binary data
Re: RFA: Combine issue
Am I misinterpreting the logic? Am I missing something fundamental? I appreciate any feedback / pointers / clues / etc... Nothing like hitting the send button to make the lightbulb go on. I think I understand what was being attempted now. IIUC, the logic should have been (again, for the QI/SI instance): If at least _26_ bits are set, and if c1 is < 64, ok to make the transformation. In this case, I don't see an instance where "compare (subreg (a - c1), c2)" is not equivalent to "compare (a - c1, c2)". It appears to be a simple typo in the last line of the conditional: (GET_MODE_BITSIZE (inner_mode) - mode_width - 1) Should be: (GET_MODE_BITSIZE (inner_mode) - (mode_width - 1)) or: (GET_MODE_BITSIZE (inner_mode) - mode_width + 1) I'll run this through a number of tests and submit an actual patch, unless I'm still missing something... - Josh
TREE_READONLY vs TREE_CONSTANT
In looking at PR23237, I ran into a bit of confusion over the difference between TREE_READONLY and TREE_CONSTANT. Andrew Pinski indicated in PR logs that I am misunderstanding their uses, so rather than bogging down the PR logs trying to clear up my confusion (which isn't really fair to Andrew), I thought I would bring up the question in a more general forum to see if there is someone willing to offer some insight. In the example from this PR, the TREE_READONLY attribute of a static variable declaration is being set in the ipa-reference pass if is determined that the variable is never modified. The problem is that when the assembly code is generated for this variable, the section flags are also set based on TREE_READONLY, so it ends up in a section inconsistent with its declaration. When I started digging into the tree flags, the descriptions in tree.h seemed to imply that TREE_CONSTANT is the appropriate attribute for a variable whose value doesn't change: /* Value of expression is constant. Always on in all ..._CST nodes. May also appear in an expression or decl where the value is constant. */ #define TREE_CONSTANT(NODE) (NON_TYPE_CHECK (NODE)- >common.constant_flag) And, based on other bits of source, TREE_READONLY seems to represent whether the variable was actually declared constant. Can someone provide a bit of insight here - what is the difference between TREE_READONLY and TREE_CONSTANT (and why is TREE_READONLY the one that ipa-reference should be setting)? Thanks for any help you can offer. - Josh
Re: TREE_READONLY vs TREE_CONSTANT
On Aug 24, 2005, at 12:33 PM, Richard Henderson wrote: On Wed, Aug 24, 2005 at 11:10:38AM -0700, Josh Conner wrote: Can someone provide a bit of insight here - what is the difference between TREE_READONLY and TREE_CONSTANT (and why is TREE_READONLY the one that ipa-reference should be setting)? It depends on what property IPA is computing. And they aren't terribly differentiated. ... Thanks very much for the thorough description. I think I understand the distinction now. I'm not sure how much of this will be possible before 4.1 is released. At minimum we have to change IPA to stop setting TREE_READONLY when DECL_SECTION_NAME is set; if we have to live with reduced optimization so be it. I'd be glad to create and test a patch for this, if it would be welcomed. - Josh
[RFH] C++ FE question
I've been looking at PR23708, where a non-inline explicitly specialized template function is inheriting the inline semantics of the inline member function it specializes, when using pre-compiled headers (whew). This can be seen in the following example: template class simple_class { public: inline void testfn (T); }; template <> void simple_class::testfn (int) {} // incorrectly inheriting 'inline' semantics when using PCH Where the specialization function ends up in a COMDAT section. I have some understanding on why this is happening, but could use a little help in fixing the problem. If I compile the example above into a precompiled header, the sequence of events is as follows: simple_class::testfn is generated simple_class::testfn is assigned to the COMDAT section by note_decl_for_pch() simple_class::testfn (implicit instantiation) is generated, which inherits the COMDAT-ness from simple_class::testfn simple_class::testfn (explicit specialization) is generated simple_class::testfn (explicit specialization) is assigned to the COMDAT section by duplicate_decls(), when it merges the declarations with the implicit instantiation When not using precompiled headers, the sequence is this: simple_class::testfn is generated simple_class::testfn (implicit) is generated simple_class::testfn (explicit) is generated simple_class::testfn is assigned to the COMDAT section simple_class::testfn (implicit) is assigned to the COMDAT section My first approach was to modify duplicate_decls() so it wouldn't copy the COMDAT-ness into the explicit specialization, the same way that we avoid copying the DECL_DECLARED_INLINE_P. This worked for my particular instance, however because COMDAT implementation is host- dependent, it seems very dangerous. My second approach is to disable the code in note_decl_for_pch() that assigns COMDAT-ness early. This works, too, but it would be nice to not lose the advantage that this code was put in for. So, my third approach was to modify note_decl_for_pch() to filter out simple_class::testfn, similarly to how the implicit declaration is filtered out. However, in order to do so, I need to detect that simple_class::testfn is a member function of a template class. But how do I do this? I can detect which class the function is a member of by looking at DECL_CONTEXT, but I don't see how to determine if this class is a template class. Is there a way to find the class type tree from the FUNCTION_DECL? I can get see the type information I want at DECL_CONTEXT(decl)->common.chain in the debugger, but I have no reason to believe that this is the correct way to get it, or that it will work all of the time. Does anyone have any suggestions on: a) whether I'm headed in the right direction, and b) if so, how to determine that simple_class::testfn is a member of a template class, and not an instantiated class? Thanks - Josh
[RFC] c++ template instantiation generates zero-sized array (pr 19989)
I've been investigating PR 19989, where we are rejecting code when a template instantiation generates a zero-sized array, such as: template struct A { static const int i = 0; } template struct B { int x[A::i]; }; B<0> b; This is rejected on the grounds that not failing could generate an incorrect match in other instances, e.g.: template void foobar (int (*) [M] = 0 ); template void foobar ( ); void fn (void) { foobar<0>(); } Where we are required to match the second form of foobar. Before I propose a patch for this behavior, however, I'd like to understand the most desirable behavior (understanding that implementation complexity may limit what is possible in the short term). For my first example (the class template), I believe gcc should in an ideal world: * by default: accept it with no warnings/errors * with -pedantic: generate an error * with -pedantic -fpermissive: generate a warning For the second example (the function template), I believe gcc should: * by default: bind to the second foobar * with -fpermissive: bind to the first foobar (which will generate a duplicate matching template error) * with -fpermissive -pedantic: bind to the first foobar, with a warning (also generating a duplicate matching template error) Is this an accurate summary of what we'd like to see? Thanks - Josh
Re: [RFC] c++ template instantiation generates zero-sized array (pr 19989)
Mark Mitchell wrote: > I understand what you're after: tolerate uses of the extension where > it's sufficiently harmless. > > I don't think your proposed solution is correct, though, because we want > to maintain the invariant that all conforming programs compile and > behave as required by the standard in all compilation modes. In other > words, supplying -fpermissive and friends can make non-conforming > programs compile, when they otherwise wouldn't, but conforming programs > should behave identically in all modes. I think this is consistent with my proposal -- the first example was non-conforming, but accepted without -pedantic (as we do with other zero-sized arrays). The second example was conforming and the only way to alter its behavior was with the -fpermissive option. > I would imagine that if we encounter a zero-sized array when the > "complain" flag is tf_error, then we can just issue a conditional > pedwarn, with "if (pedantic) pedwarn (...)". But, if tf_error is not > set, we must reject the instantiation. > > I know that sounds backwards. The point is that when tf_error is set, > we're committed to the instantiation. When tf_error is not set, SFINAE > applies. And, in a conforming program, we must reject the instantiation > in that case. This is interesting - that's exactly the approach I came up (well, the part about using the complain flag to determine whether to reject the instantiation), but I was concerned about whether it would be accepted, since it seemed a bit awkward. FWIW, attached is my last iteration of the patch -- it does pass all regression tests on i686-linux and arm-elf. If we can come to an agreement on the desired behavior and it matches, I'll submit it for official approval. Thanks! - Josh Index: gcc/cp/pt.c === --- gcc/cp/pt.c (revision 106267) +++ gcc/cp/pt.c (working copy) @@ -7065,25 +7065,27 @@ tsubst (tree t, tree args, tsubst_flags_ max = tsubst_template_arg (omax, args, complain, in_decl); max = fold_decl_constant_value (max); - if (integer_zerop (omax)) - { - /* Still allow an explicit array of size zero. */ - if (pedantic) - pedwarn ("creating array with size zero"); - } - else if (integer_zerop (max) -|| (TREE_CODE (max) == INTEGER_CST -&& INT_CST_LT (max, integer_zero_node))) - { - /* [temp.deduct] + /* [temp.deduct] - Type deduction may fail for any of the following - reasons: + Type deduction may fail for any of the following + reasons: -Attempting to create an array with a size that is -zero or negative. */ +Attempting to create an array with a size that is +zero or negative. */ + if (integer_zerop (max) && flag_pedantic_errors + && ! (complain & tf_error)) + { + /* We must fail if performing argument deduction (as + indicated by the state of complain), so that + another substitution can be found (unless -fpermissive + was given). */ + return error_mark_node; + } + else if (TREE_CODE (max) == INTEGER_CST +&& INT_CST_LT (max, integer_zero_node)) + { if (complain & tf_error) - error ("creating array with size zero (%qE)", max); + error ("creating array with negative size (%qE)", max); return error_mark_node; }
Re: [RFC] c++ template instantiation generates zero-sized array (pr 19989)
Mark Mitchell wrote: > Josh Conner wrote: > > >>I think this is consistent with my proposal -- the first example was >>non-conforming, but accepted without -pedantic (as we do with other >>zero-sized arrays). The second example was conforming and the only way >>to alter its behavior was with the -fpermissive option. > > > My point was that conforming programs should compile and behave > identically in all modes; therefore -fpermissive must not alter the > behavior. OK, thanks for the clarification. I'll prepare a revised patch and submit it for approval. - Josh
Restoring svn write access
My apologies in advance for the spam if this isn't the right forum. I had svn access in a former life (jcon...@apple.com), and I'm trying to restore my access for my new employer. I sent an email to overseers last week, but haven't heard back. So... can anyone here tell me if there is some way to restore my access, or if should I re-apply for provisioning? Thanks! - Josh
Re: Restoring svn write access
Sorry, it was indeed in my spam box. Thanks to you both. On Tue, Nov 1, 2016 at 3:05 PM, Ian Lance Taylor wrote: > On Tue, Nov 1, 2016 at 2:01 PM, Josh Conner wrote: >> My apologies in advance for the spam if this isn't the right forum. I >> had svn access in a former life (jcon...@apple.com), and I'm trying to >> restore my access for my new employer. I sent an email to overseers >> last week, but haven't heard back. >> >> So... can anyone here tell me if there is some way to restore my >> access, or if should I re-apply for provisioning? > > Did you not see the reply from Gerald Pfeifer? You might want to > check your spam filter. You asked what your user ID was on > gcc.gnu.org, and he said the answer was "jconner". > > Ian
Unnecessary call to mark_temp_addr_taken in calls.c (related to pr25505)?
All - I've been investigating the cause of pr25505, where the amount of stack space being used for a particular C++ function went from <1K to ~20K between gcc 3.3 and 4.0. One of the issues I ran into was that the stack slot allocated to hold the result of a function returning a structure was never being reused (see comment #5 in the PR for a simple example). The code in calls.c that allocates the location for storing the result marks its address as being taken, which prevents it from being reused for other function calls in the same scope: /* For variable-sized objects, we must be called with a target specified. If we were to allocate space on the stack here, we would have no way of knowing when to free it. */ rtx d = assign_temp (TREE_TYPE (exp), 1, 1, 1); mark_temp_addr_taken (d); structure_value_addr = XEXP (d, 0); target = 0; So, my question is: is it really necessary to mark this location as having its address taken? Yes, the address of the slot is passed to a function, however I can't imagine any instances where the function retaining that address after the call would be valid. Stepping through the code with a debugger showed that the location is preserved until after its value is copied out, so I don't believe that the value will ever be overwritten too early. In an effort to help understand whether this call was necessary, I ran the test suite against 4.1 and mainline for both i686-pc-linux-gnu and powerpc-apple-darwin8 (with all default languages enabled), but there were no regressions. I also tried to track down the origins of this code, but the original email discussions were somewhat nebulous. Here is a link to the original patch: http://gcc.gnu.org/ml/gcc-patches/1999-10n/msg00856.html I also investigated where the other calls to mark_temp_addr_taken were removed, which happened here: http://gcc.gnu.org/ml/gcc-patches/2001-11/msg00813.html These appeared to me to be related to a patch that was introduced shortly before which performed alias analysis on MEM locations (IIUC): http://gcc.gnu.org/ml/gcc-patches/2001-11/msg00643.html Before submitting a patch to remove this call, and the resultant dead code, I thought I would see if anyone had some background knowledge as to why this was present, and why I shouldn't remove it. Thanks in advance for any advice. - Josh
Re: Unnecessary call to mark_temp_addr_taken in calls.c (related to pr25505)?
Richard Kenner wrote: >> So, my question is: is it really necessary to mark this location as >> having its address taken? Yes, the address of the slot is passed to a >> function, however I can't imagine any instances where the function >> retaining that address after the call would be valid. > > Your tracing below confirms my recollection that I was the one who added that > code. Unfortunately, it came in as a merge from another tree, so it's going > to be hard to do the archeology to figure out what it was added for, but the > history certainly suggests to me that there *was* some test case when it did > have a longer life and that's why it was added. Remember that all this stuff > is very sensitive to calling sequences so you may not be able to find it on > one or two machines (e.g., try Sparc). I suspect the test case involved > something like two calls to the same function (that returns a structure) > within the parameter list to a third function. > > My strong suggestion is to leave this alone. The benefits in terms of > increased optimization opportunities is small and the costs if it > *does* cause wrong code to be generated is large because we know it'll > be an obscure and hard-to-find case. Richard - Thanks for looking at this and the advice. I can understand the concern about introducing a subtle bug for negligible benefit, although in my case the benefit is somewhat more meaningful (~6k stack regained on a single function). Might you be able to provide any insight as to why you were able to remove the other calls to mark_temp_addr_taken? Was it indeed because of improved alias analysis of MEM locations? I wasn't, unfortunately, able to derive this from the email traffic. Here's the link again, in case it's helpful: http://gcc.gnu.org/ml/gcc-patches/2001-11/msg00813.html I did investigate the case you described, where two function parameters are calls to the same function returning a structure. The front-end generates temporaries to handle this, and so the middle-end-generated temporaries are still restricted to a lifetime of a single statement. Here's the tree dump: baz () { struct S D.1105; struct S D.1104; : D.1104 = foo (); D.1105 = foo (); bar (D.1104, D.1105) [tail call]; return; } Note that we will get 4 instances of struct S generated on the stack: two for D.110[45], and two more for holding the return values from the two calls to foo. Thanks again, and I appreciate any additional context you might be able to offer. - Josh
Re: Unnecessary call to mark_temp_addr_taken in calls.c (related to pr25505)?
Richard Kenner wrote: >> I did investigate the case you described, where two function parameters >> are calls to the same function returning a structure. The front-end >> generates temporaries to handle this, and so the middle-end-generated >> temporaries are still restricted to a lifetime of a single statement. > Neverthless, it may be that gimplification renders this OBE. If so, > then I suspect a that *a lot* of the temporary lifetime tracking code is > also no longer needed. But I wouldn't want to jump to any of these > conclusions too quickly: this stuff is very complex. OK, thanks. If you have any suggestions on other approaches to verifying this, I'd certainly appreciate it. - Josh
Re: Unnecessary call to mark_temp_addr_taken in calls.c (related to pr25505)?
Richard Guenther wrote: >> OK, thanks. If you have any suggestions on other approaches to >> verifying this, I'd certainly appreciate it. > Other than testing on more targets, no. Does it fix PR25505? In > this case it would be a fix for a regression and maybe rth can have a > look at the patch as he should know this area equally well? PR25505 reports that the stack usage of a particular function increased from 512 bytes to 20k from 3.3 -> 4.x. This patch reduces the stack usage for that function back to ~13k. So, it solves one aspect of the regression. I'll attach the patch I've been testing. - Josh Index: gcc/calls.c === --- gcc/calls.c (revision 116236) +++ gcc/calls.c (working copy) @@ -1987,7 +1987,6 @@ expand_call (tree exp, rtx target, int i we would have no way of knowing when to free it. */ rtx d = assign_temp (TREE_TYPE (exp), 1, 1, 1); - mark_temp_addr_taken (d); structure_value_addr = XEXP (d, 0); target = 0; }
Re: Unnecessary call to mark_temp_addr_taken in calls.c (related to pr25505)?
Richard Kenner wrote: > I found the test case. As I suspected, it was on Sparc (the test case failed > for a different reason on Mips) and was indeed a case where there were two > function calls in the same parameter list of an outer call (though not the > same function). It was an Ada test case, which we had no place to put > back in 1999. Here it is. Thanks for tracking this down. I would offer to add this test case to the test suite as my penance, but I don't have the ability to run ada tests on any of my machines. - Josh
Re: Potential fix for rdar://4658012
Richard Kenner wrote: > I disagree. Testing is not, and should never be, a substitute for analysis. > > A patch is proposed because we have a reason to believe it's correct. Then > we test to increase our confidence that it is, indeed, correct. But both > parts are essential for any patch. > > Here we have a situation where we'd like to see some optimization done (the > merging of temporaries). There's some code that suppresses that optimization > in one case because that optimization was unsafe in that case. It is not > acceptable to find out whether that code is still needed by merely running > a test suite: there has to be some analysis that says what changed to > make that code no longer needed. The burden of proof on that analysis has > to be on the person who's proposing that the code is no longer needed. Richard - I agree with you -- my intent for originally posting was to hopefully find that understanding of why the code had originally been added. Perhaps I should have better explained my own understanding, so that you could help identify if this was a conceptual mistake on my part. I think I had too much confidence that this would be an easy question to answer, but it appears that the code is quite dated and not well understood. So, without further ado, let me provide my analysis, and hopefully those who understand this code best can help point out if I've gone astray. Here is my interpretation of this code: In the code for managing middle-end-generated temps, it appears that push/pop_temp_slots is used to create nested contexts into which unique temps are generated. When I need a temp, I call push_temp_slots, allocate a temp, perform my calculation, and call pop_temp_slots when I'm done. The one exception to this is if the address of the temp is taken before I call pop_temp_slots. In that instance, even though I may be "done" with the temp, it needs to live until the end of the high-level-language scope, and so it is marked with addr_taken and is pushed up in the temp scopes using preserve_temp_slots. It is this logic that is used for function calls that return a value on the stack. However, in the case where we're passing the address of a temp slot to a function, it doesn't make sense to me that this is the same as other "address-of" operations on a stack slot. The function call (at least in C and C++) cannot preserve this address, and it is reasonable to say that attempts to access this address after the caller is done with the location, are invalid. So, for these reasons, marking the temp for a lifetime longer than what is needed by the middle-end for copying it into the destination, appeared to be excessive. Hopefully this is enough of an explanation to reveal whether my understanding is consistent or inconsistent with the experts, and can move the discussion forward. Thanks again for all of the time looking at this - it's very much appreciated. - Josh
Help running a SPARC/Ada test case?
Is there anyone out there who might have a SPARC environment with Ada support who could run the attached Ada testcase on a version of gcc patched with the attached patch? I'd like to verify whether the test behaves correctly when compiled at -O0, -O1, and -O2. The expected (correct) behavior is the following output: a_value= 10 b_value= 20 Some context of what I'm trying to accomplish can be seen by looking at the thread starting here: http://gcc.gnu.org/ml/gcc/2006-08/msg00389.html My utmost gratitude to anyone who can help! - Josh Index: gcc/calls.c === --- gcc/calls.c (revision 116236) +++ gcc/calls.c (working copy) @@ -1985,9 +1985,8 @@ /* For variable-sized objects, we must be called with a target specified. If we were to allocate space on the stack here, we would have no way of knowing when to free it. */ - rtx d = assign_temp (TREE_TYPE (exp), 1, 1, 1); + rtx d = assign_temp (TREE_TYPE (exp), 0, 1, 1); - mark_temp_addr_taken (d); structure_value_addr = XEXP (d, 0); target = 0; } with Ada.Text_Io; procedure Test_Func is type A_Int is new Integer range 1..10; type B_Int is new Integer range 11..20; type A_Array is array(1..5) of A_Int; type B_Array is array(1..5) of B_Int; procedure Print_Values (A_Value : in A_Int; B_Value : in B_Int) is begin Ada.Text_Io.Put_Line("a_value=" & Integer'Image(Integer(A_Value)) & " b_value=" & Integer'Image(Integer(B_Value))); end Print_Values; function Get_A return A_Array is A : A_Array := (others => 10); begin return A; end Get_A; function Get_B return B_Array is B : B_Array := (others => 20); begin return B; end Get_B; J : Natural := 3; begin Print_Values (A_Value =>Get_A(J), B_Value =>Get_B(J)); end Test_Func;
Re: Help running a SPARC/Ada test case?
Eric Botcazou wrote: >> Is there anyone out there who might have a SPARC environment with Ada >> support who could run the attached Ada testcase on a version of gcc >> patched with the attached patch? I'd like to verify whether the test >> behaves correctly when compiled at -O0, -O1, and -O2. The expected >> (correct) behavior is the following output: >> >> a_value= 10 b_value= 20 > > This was a little vague so I used "gcc version 4.2.0 20060805" which is > the most recent compiler with Ada support I have on SPARC. The > testcase runs correctly at -O0, -O1 and -O2 when compiled with the > patched compiler. The size of the frame went down from 216 to 200 bytes > at -O2. Great - thanks. I really appreciate your help. - Josh
Re: Potential fix for rdar://4658012
Sorry for my own slow response -- I've been doing more digging through code, and didn't want to respond without a reasonable understanding... Richard Kenner wrote: >> However, in the case where we're passing the address of a temp slot to a >> function, it doesn't make sense to me that this is the same as other >> "address-of" operations on a stack slot. The function call (at least in >> C and C++) cannot preserve this address, and it is reasonable to say >> that attempts to access this address after the caller is done with the >> location, are invalid. > > Here's where I disagree. The issue isn't what the front-ends (and especially > not a *subset* of the front-ends) do, but what they *are allowed* to do. > > Going back to 3.4 for a moment, the question is whether you could > create a valid tree where the address would survive. And I think you can. > All it would take is a machine like Sparc that passes all aggregates by > reference is to have a CALL_EXPR with two operands that are each CALL_EXPRs > of aggregate types. Yes, this makes perfect sense. I can envision a tree where removing this code would create an invalid reuse of the stack slot. However, we are doing two things to preserve this temp -- marking it with "addr_taken" and creating it with the "keep" bit set. The former says "promote this temp one level up", and the latter "keep this temp around for the lifetime of the current block". This still seems redundant. I think marking the temp as addr_taken is sufficient to handle the case where the address is used as the input to a function call (see attached patch). Note that since Jason's recent change to the frontend here: http://gcc.gnu.org/ml/gcc-patches/2006-09/msg00202.html This change would be sufficient to address the remaining issues in pr25505. I'd like to hear your opinion of this, though, before submitting it for approval. > We never had a precise specification of what trees were valid then (which > is precisely the set of valid GENERIC, and hence is similarly not very > precisely defined), but I think almost everybody working on the 3.4 compiler > would say that the above is valid whether or not C or C++ could generate it. > Therefore, the middle-end had to properly support it. > > However, in GCC 4, with tree-ssa, the RTL expanders receive a much smaller > subset of trees, namely those defined in GIMPLE. I *believe*, but aren't > sure, that the above is not valid GIMPLE. > > That would make the change safe. But: > > (1) The code we generate is pretty inefficient in the current case, since we > copy the entire aggregate. If we try to fix that, we *will* be preserving > the address for later, though perhaps in a temporary more properly allocated. > > (2) I suspect that we can rip out much more of this code than just this line > and I'd prefer to do it that way. > >> Hopefully this is enough of an explanation to reveal whether my >> understanding is consistent or inconsistent with the experts, and can >> move the discussion forward. > > As I said once before, this code didn't change between 3.4 and 4.x, so > something else is the cause of the regression. I'd like to understand > what that was. I think you posted the changes you propose for the C > and C++ front ends, but I don't remember what they were. To help address your concerns about fixing the regression closest to its point of change, I've also been looking into how this was handled in 3.4. In that version of the compiler, a TARGET_EXPR was surviving all the way until the expander, where the function expand_expr_real was recognizing TARGET_EXPRs and generating a reusable temp. Unfortunately, I don't see an obvious correlation in 4.x, as the TARGET_EXPR never reaches the expander, since gimplification has already converted it into a CALL_EXPR. Thanks again for taking the time to look into this. - Josh Index: gcc/calls.c === --- gcc/calls.c (revision 116642) +++ gcc/calls.c (working copy) @@ -1985,7 +1985,7 @@ expand_call (tree exp, rtx target, int i /* For variable-sized objects, we must be called with a target specified. If we were to allocate space on the stack here, we would have no way of knowing when to free it. */ - rtx d = assign_temp (TREE_TYPE (exp), 1, 1, 1); + rtx d = assign_temp (TREE_TYPE (exp), 0, 1, 1); mark_temp_addr_taken (d); structure_value_addr = XEXP (d, 0);