Question on BOOT_CFLAGS vs. CFLAGS

2006-12-14 Thread Josh Conner
All -

When I configure with --disable-bootstrap and build with:

  CFLAGS="-g -O0"

The resultant compiler is built with the specified options.  However, if
I --enable-bootstrap, when I build with the same CFLAGS, these options
are not used to build the final compiler.  I can get past this by using
BOOT_CFLAGS="-g -O0", but that seems a bit inconsistent.

My question is:  Is this behaving as designed or would it be reasonable
to file a bug and/or supply a patch to change the behavior so that
CFLAGS are respected whether bootstrapping is enabled or not?

Thanks -

Josh


Re: Question on BOOT_CFLAGS vs. CFLAGS

2006-12-14 Thread Josh Conner
Paul Brook wrote:
> On Friday 15 December 2006 01:37, Josh Conner wrote:
>> All -
>>
>> When I configure with --disable-bootstrap and build with:
>>
>>   CFLAGS="-g -O0"
>>
>> The resultant compiler is built with the specified options.  However, if
>> I --enable-bootstrap, when I build with the same CFLAGS, these options
>> are not used to build the final compiler.  I can get past this by using
>> BOOT_CFLAGS="-g -O0", but that seems a bit inconsistent.
>>
>> My question is:  Is this behaving as designed or would it be reasonable
>> to file a bug and/or supply a patch to change the behavior so that
>> CFLAGS are respected whether bootstrapping is enabled or not?
> 
> It is working as documented:
> 
> http://gcc.gnu.org/onlinedocs/gccint/Makefile.html
> http://gcc.gnu.org/install/build.html

OK - it seemed a bit strange to me, but I'm glad to hear it is behaving
as intended.

- Josh


ARM EABI Exception Handling

2005-04-07 Thread Josh Conner
Can anyone tell me if there are plans to incorporate the ARM EABI 
exception handling (a la CSL) into the mainline gcc ARM port?

Thanks -
Josh

Josh Conner
Apple Computer


Re: GCC 4.0 RC2 Available

2005-04-19 Thread Josh Conner
On Apr 18, 2005, at 3:12 PM, Julian Brown wrote:
Results for arm-none-elf, cross-compiled from i686-pc-linux-gnu 
(Debian)
for C and C++ are here:

http://gcc.gnu.org/ml/gcc-testresults/2005-04/msg01301.html
Relative to RC1, there are several new tests which pass, and:
g++.dg/warn/Wdtor1.C (test for excess errors)
works whereas it didn't before.
Julian
Has there been any discussion of builtin-bitops-1.c failing for 
arm-none-elf with RC1 + RC2?  It's a moderately serious regression from 
3.4.3 -- we're no longer correctly calculating the size of load double 
instructions, and so the constant pool is generated at an address that 
is out of range of the instructions that use them.  It will show up in 
moderately large functions that use double word constants.

The problem arises because the function const_double_needs_minipool, 
which decides whether to use a constant pool or do an inline 
calculation of a constant, bases its decision on fairly sophisticated 
logic (from arm_gen_constant), for example figuring that a constant 
like 0xcafecafe can be calculated in 4 instructions:
	mov r0, #0
	mov r1, #51712		; 0xca00
	add	r1, r1, #254		; 0xcafe
	orr	r1, r1, r1, asl #16	; 0xcafecafe

But this code never gets generated, because the instruction pattern 
arm_movdi actually uses a much simpler algorithm (from the function 
output_move_double), which takes 5 instructions for this same example:
	mov	r0, #0
	mvn	r1, #1
	bic	r1, r1, #13568
	bic	r1, r1, #65536
	bic	r1, r1, #889192448

Which means that the "length" attribute for arm_movdi with these 
addressing modes should more accurately be 20 instead of 16.

Richard Earnshaw fixed this behavior in the mainline on 8 April:
	http://gcc.gnu.org/ml/gcc-patches/2005-04/msg00850.html
by using the more efficient constant generator function 
(arm_gen_constant) when generating double-word constants.   However, 
this was not applied to the 4.0 release branch.

Has this regression been discussed already (if so, sorry - I did search 
the archives)?  If not, is it worth considering applying Richard's 
patch, or even a simpler one:

*** arm.md  Wed Feb 16 13:57:10 2005
--- arm.md  Tue Apr 19 17:09:47 2005
***
*** 4167,4173 
"*
return (output_move_double (operands));
"
!   [(set_attr "length" "8,12,16,8,8")
 (set_attr "type" "*,*,*,load2,store2")
 (set_attr "pool_range" "*,*,*,1020,*")
 (set_attr "neg_pool_range" "*,*,*,1008,*")]
--- 4167,4173 
"*
return (output_move_double (operands));
"
!   [(set_attr "length" "8,12,20,8,8")
 (set_attr "type" "*,*,*,load2,store2")
 (set_attr "pool_range" "*,*,*,1020,*")
 (set_attr "neg_pool_range" "*,*,*,1008,*")]
Thanks -
Josh

Josh Conner


RFC: improving estimate_num_insns

2005-07-12 Thread Josh Conner
In looking at the function inliner, I did some analysis of the  
correctness of estimate_num_insns (from tree-inline.c), as this  
measurement is critical to the inlining heuristics.  Here is the  
overview of what I found on the functions in two sample files from  
the FAAD2 AAC library (measured at -O3 with inlining disabled):


syntax.c
  avg.avg. std.
targetratio   errordeviation
arm-none  2.88 57% 2.37
darwin-ppc3.17 55% 2.50
linux-x86 2.72 55% 2.06

huffman.c
  avg.avg. std.
targetratio   errordeviation
arm-none  3.2756%  2.37
darwin-ppc4.1953%  2.88
linux-x86 3.4256%  2.39

fn. differential = actual bytes / estimated insns
  avg. ratio = average (fn. differential) of all functions
fn. error = max (fn. differential, avg. ratio) / min (fn.  
differential, avg. ratio)

  avg. error = average (fn. error) of all functions
  std. deviation = standard deviation (fn. differential) across all  
functions


Note that by choosing these metrics, my intent was to isolate the  
consistency of the differential from the magnitude of the  
differential.  In other words, the goal was to have estimates that  
were a consistent ratio to the actual results, whatever that ratio  
may be.


Thinking that there may be room for improvement on this, I tried this  
same experiment with a couple of adjustments to estimate_num_insns:
- Instead of ignoring constants, assign them a weight of 2  
instructions (one for the constant itself and one to load into memory)
- Instead of ignoring dereferences, assign a weight of 2 to an array  
reference and 1 for a pointer dereference
- Instead of ignoring case labels, assign them a weight of 2  
instructions (to represent the cost of control logic to get there)


With these modifications (tree-estimate.patch), I see the following  
results:


syntax.c
  avg.avg. std.
targetratio   errordeviation
arm-none  1.6829%  0.67
darwin-ppc1.8429%  0.69
linux-x86 1.6226%  0.55

huffman.c
  avg.avg. std.
targetratio   errordeviation
arm-none  1.7526%  0.52
darwin-ppc2.3124%  0.69
linux-x86 1.9525%  0.67

Which appears to be a significant improvement in the accuracy of  
estimate_num_insns.  However, note that since the avg. ratio has  
decreased significantly (estimates have gone up because we're  
counting things that were ignored before), any heuristics that are  
based on instruction counts will have effectively decreased by  
~25-50%.  To compensate for that, I bumped up any constants that are  
based on instruction estimates by 50% each (see inl-costs.patch).


With both of these changes in place, I ran some simple SPEC2000  
integer benchmarks (not fine-tuned, not reportable) and saw the  
following impact on performance (positive is better):


  darwin-ppc   linux-x86  linux-arm*
gzip  -4.7%+11.7% +1.3%
vpr   -0.7%-1.1%  -
mcf   -0.3%+1.1%  -
crafty-0.8%+1.8%  -
parser--  -
eon   -4.4%+0.3%  -
perlbmk   +0.4%0.0%   -1.1%
gap   -+1.0%  -2.9%
vortex+4.1%0.0%   0.0%
bzip2 +1.2%+1.1%  -1.3%
twolf -0.5%-2.9%  -1.2%

  - = unable to obtain a valid result
  * linux-arm results are based on a subset of SPEC data and command- 
line options for use on a 32MB embedded board -- not even close to  
the full SPEC suite.


For what it's worth, code size is equal to or smaller for all  
benchmarks across all platforms.


So, here are the open issues I see at this point:
1. It appears that this change to estimate_num_instructions generates  
a much better estimate of actual code size.  However, the benchmark  
results are ambiguous.  Is this patch worth considering as-is?
2. Increasing instruction weights causes the instruction-based values  
(e.g., --param max-inline-insns-auto) to be effectively lower.   
However, changing these constants/defaults as in the second patch  
will cause a semantic change to anyone who is setting these values at  
the command line.  Is that change acceptable?


I do realize that this area is one that will eventually be likely  
best served by target-dependent routines (and constants), however I  
also see a significant benefit to all targets in fixing the default  
implementation first.


Thoughts?  Advice?

Thanks -

Josh

Josh Conner




tree-estimate.patch
Description: Binary data


inl-costs.patch
Description: Binary data




Re: RFC: improving estimate_num_insns

2005-07-12 Thread Josh Conner


On Jul 12, 2005, at 1:07 PM, Steven Bosscher wrote:

You don't say what compiler you used for these measurements.  I  
suppose

you used mainline?


Yes, I am working with the mainline.


I think you should look at a lot more code than this.


OK - I stopped because I was seeing fairly consistent results.   
However, since both of these files were from the same source base, I  
can see how this may not be representative.  I'll try some more  
examples from different source code, including C++.



Thinking that there may be room for improvement on this, I tried this
same experiment with a couple of adjustments to estimate_num_insns:
- Instead of ignoring constants, assign them a weight of 2
instructions (one for the constant itself and one to load into  
memory)




Why the load into memory, you mean constants that must be loaded from
the constant pool?  I guess the cost for the constant itself  
depends on

whether it is a legitimate constant or not, and how it is loaded, and
this is all very target specific.  But a cost greater than 0 probably
makes sense.


Good point - I was thinking of constant pools when I wrote that, even  
though that isn't always the case.



It would be nice if you retain the comment about constants in the
existing code somehow.  The cost of constants is not trivial, and you
should explain your choice better in a comment.


I'm not sure which comment you mean - the one from my email, or the  
one that was originally in the source code?



- Instead of ignoring case labels, assign them a weight of 2
instructions (to represent the cost of control logic to get there)



This depends on how the switch statement is expanded, e.g. to a binary
decision tree, or to a table jump, or to something smaller than that.
So again (like everything else, sadly) this is highly target-specific
and even context-specific.  I'd think a cost of 2 is too  
pessimistic in

most cases.


Do you mean that 2 is too high?  I actually got (slightly) better  
statistical results with a cost of 3, but I had the same reaction  
(that it was too pessimistic), which is why I settled on 2.



You could look at the code in stmt.c to see how switch statements are
expanded.  Maybe there is a cheap test you can do to make CASE_LABELs
free for very small switch statements (i.e. the ones which should  
never

hold you back from inlining the function containing them ;-).


I'll look into this.


For what it's worth, code size is equal to or smaller for all
benchmarks across all platforms.



What about the compile time?


Oh no!  I didn't measure this.  I will have a look.


So, here are the open issues I see at this point:
1. It appears that this change to estimate_num_instructions generates
a much better estimate of actual code size.  However, the benchmark
results are ambiguous.  Is this patch worth considering as-is?



I would say you'd at least have to look into the ppc gzip and eon
regressions before this is acceptable.  But it is not my decision to
make, of course.


Makes sense.  I'll see what kind of time I can put into this.


2. Increasing instruction weights causes the instruction-based values
(e.g., --param max-inline-insns-auto) to be effectively lower.
However, changing these constants/defaults as in the second patch
will cause a semantic change to anyone who is setting these values at
the command line.  Is that change acceptable?



This has constantly changed from one release to the next since GCC  
3.3,

so I don't think this should be a problem.


Whew...


Thoughts?  Advice?



...on to advice then.

First of all, I think you should handle ARRAY_RANGE_REF and ARRAY_REF
the same.  And you probably should make BIT_FIELD_REF more expensive,
and maybe make its cost depend on which bit you're addressing (you can
see that in operands 1 and 2 of the BIT_FIELD_REF).  Its current cost
of just 1 is probably too optimistic, just like the other cases you  
are

trying to address with this patch.


Great - thanks for the suggestions, I'll try these changes as well  
and see if I get even better correlation.



Second, you may want to add a target hook to return the cost of target
builtins.  Even builtins that expand to just one instruction are now
counted as 16 insns plus the cost of the arguments list.  This badly
hurts when you use e.g. SSE intrinsics.  It's probably not an issue  
for

the benchmark you looked at now, but maybe you want to look into it
anyway, while you're at it.


OK, I'll look into this.


Third,


(measured at -O3 with inlining disabled):


Then why not just -O2 with inlining disabled?  Now you have enabled
loop unswitching, which is known to sometimes significantly increase
code size.


Well, I thought that inlining estimates were most likely to be  
relevant at O3, and so I should include as many other optimizations  
as were likely to be performed alongside inlining, even though this  
may make it more difficult to get accurate estimates.


Fourth, look at _much_ more code than this.  I w

RFA: Combine issue

2005-08-04 Thread Josh Conner
I'm seeing invalid code produced for a simple test case targeting arm- 
none-elf (attached), which I believe is caused by an invalid  
transformation in simplify_comparison.  It's transforming code of the  
form:


  (compare (subreg:QI (plus (reg:SI) (-1)))
   (-3))

into:

  (compare (plus (reg:SI) (-1))
   (-3))

But I don't believe this transformation (lifting the subreg) is valid  
for this particular instance.  Here are the more explicit rtl dumps -  
before:


(insn 14 12 15 0 (set (reg:SI 104)
(plus:SI (reg:SI 103 [ .uc8 ])
(const_int -1 [0x]))) 4 {*arm_addsi3}  
(insn_list:REG_DEP_TRUE 12 (nil))

(expr_list:REG_DEAD (reg:SI 103 [ .uc8 ])
(nil)))

(insn 15 14 16 0 (set (reg:SI 105)
(and:SI (reg:SI 104)
(const_int 255 [0xff]))) 53 {*arm_andsi3_insn}  
(insn_list:REG_DEP_TRUE 14 (nil))

(expr_list:REG_DEAD (reg:SI 104)
(nil)))

(insn 16 15 17 0 (set (reg:CC 24 cc)
(compare:CC (reg:SI 105)
(const_int 253 [0xfd]))) 194 {*arm_cmpsi_insn}  
(insn_list:REG_DEP_TRUE 15 (nil))

(expr_list:REG_DEAD (reg:SI 105)
(nil)))

After:

(note 14 12 15 0 NOTE_INSN_DELETED)

(insn 15 14 16 0 (set (reg:SI 105)
(plus:SI (reg:SI 103 [ .uc8 ])
(const_int -1 [0x]))) 4 {*arm_addsi3}  
(insn_list:REG_DEP_TRUE 12 (nil))

(expr_list:REG_DEAD (reg:SI 103 [ .uc8 ])
(nil)))

(insn 16 15 17 0 (set (reg:CC 24 cc)
(compare:CC (reg:SI 105)
(const_int -3 [0xfffd]))) 194 {*arm_cmpsi_insn}  
(insn_list:REG_DEP_TRUE 15 (nil))

(expr_list:REG_DEAD (reg:SI 105)
(nil)))

The conversion is being done at the "case SUBREG" code starting  
around line 10191 of combine.c.  The conversion is allowed because  
the following tests are met:


((unsigned HOST_WIDE_INT) c1
  < (unsigned HOST_WIDE_INT) 1 << (mode_width - 2)
 /* (A - C1) always sign-extends, like C2.  */
 && num_sign_bit_copies (a, inner_mode)
> (unsigned int) (GET_MODE_BITSIZE (inner_mode)
  - mode_width - 1))

In my case:
  c1 is 3, which is less than 1 << 6, and
  num_sign_bit_copies(a,SI) is 24 (because the SI register "a"  
contains a zero-extended QI value), and

  "GET_MODE_BITSIZE (inner_mode) - mode_width - 1" is 23

Now, as I understand things, we're trying to see if "compare (subreg  
(a - c1), c2)" is the same as "compare (a - c1, c2)".  And, this is  
considered acceptable if the result of "a - c1" is identical to a  
sign extended "subreg (a - c1)".  So far, so good.  However, as I  
apply the code to the SI/QI subreg case reads:


if (INTVAL (c1) < 64) and there are at least 24 identical bits  
at the top of a


This logic, however, doesn't seem to match.  First, it would seem  
there should be at least 25 identical bits to verify that we do  
indeed have a sign-extended QI value.  But even with that addressed,  
consider another case (again with the QI/SI for concreteness).   
Suppose we have an expression like:


  lt (subreg (a - 1), -3)

And, suppose for this example that "a" is a SI sign-extended from a  
signed QI.  So, num_sign_bit_copies would be 25.  The transformation  
would be allowed (because c1 is less than 64 and num_sign_bit_copies  
is greater than 24).  However, if "a" contained a value of -128 at  
run-time:


  lt (subreg (a - 1), -3) becomes
lt (subreg (-129), -3) becomes
  lt (127), -3 == FALSE

whereas after the transformation:

  lt (a - 1, -3) becomes
lt (-129, -3) == TRUE

Am I misinterpreting the logic?  Am I missing something fundamental?   
I appreciate any feedback / pointers / clues / etc...


- Josh




test.c
Description: Binary data


Re: RFA: Combine issue

2005-08-04 Thread Josh Conner


Am I misinterpreting the logic?  Am I missing something  
fundamental?  I appreciate any feedback / pointers / clues / etc...


Nothing like hitting the send button to make the lightbulb go on.

I think I understand what was being attempted now.  IIUC, the logic  
should have been (again, for the QI/SI instance):


  If at least _26_ bits are set, and if c1 is < 64, ok to make the  
transformation.


In this case, I don't see an instance where "compare (subreg (a -  
c1), c2)" is not equivalent to "compare (a - c1, c2)".  It appears to  
be a simple typo in the last line of the conditional:


  (GET_MODE_BITSIZE (inner_mode) - mode_width - 1)

Should be:

  (GET_MODE_BITSIZE (inner_mode) - (mode_width - 1))

or:

  (GET_MODE_BITSIZE (inner_mode) - mode_width + 1)

I'll run this through a number of tests and submit an actual patch,  
unless I'm still missing something...


- Josh



TREE_READONLY vs TREE_CONSTANT

2005-08-24 Thread Josh Conner
In looking at PR23237, I ran into a bit of confusion over the  
difference between TREE_READONLY and TREE_CONSTANT.  Andrew Pinski  
indicated in PR logs that I am misunderstanding their uses, so rather  
than bogging down the PR logs trying to clear up my confusion (which  
isn't really fair to Andrew), I thought I would bring up the question  
in a more general forum to see if there is someone willing to offer  
some insight.


In the example from this PR, the TREE_READONLY attribute of a static  
variable declaration is being set in the ipa-reference pass if is  
determined that the variable is never modified.  The problem is that  
when the assembly code is generated for this variable, the section  
flags are also set based on TREE_READONLY, so it ends up in a section  
inconsistent with its declaration.


When I started digging into the tree flags, the descriptions in  
tree.h seemed to imply that TREE_CONSTANT is the appropriate  
attribute for a variable whose value doesn't change:


/* Value of expression is constant.  Always on in all ..._CST  
nodes.  May
   also appear in an expression or decl where the value is  
constant.  */
#define TREE_CONSTANT(NODE) (NON_TYPE_CHECK (NODE)- 
>common.constant_flag)


And, based on other bits of source, TREE_READONLY seems to represent  
whether the variable was actually declared constant.


Can someone provide a bit of insight here - what is the difference  
between TREE_READONLY and TREE_CONSTANT (and why is TREE_READONLY the  
one that ipa-reference should be setting)?


Thanks for any help you can offer.

- Josh



Re: TREE_READONLY vs TREE_CONSTANT

2005-08-24 Thread Josh Conner


On Aug 24, 2005, at 12:33 PM, Richard Henderson wrote:


On Wed, Aug 24, 2005 at 11:10:38AM -0700, Josh Conner wrote:


Can someone provide a bit of insight here - what is the difference
between TREE_READONLY and TREE_CONSTANT (and why is TREE_READONLY the
one that ipa-reference should be setting)?



It depends on what property IPA is computing.  And they aren't
terribly differentiated.

...

Thanks very much for the thorough description.  I think I understand  
the distinction now.



I'm not sure how much of this will be possible before 4.1 is
released.  At minimum we have to change IPA to stop setting
TREE_READONLY when DECL_SECTION_NAME is set; if we have to
live with reduced optimization so be it.


I'd be glad to create and test a patch for this, if it would be  
welcomed.


- Josh



[RFH] C++ FE question

2005-09-02 Thread Josh Conner
I've been looking at PR23708, where a non-inline explicitly  
specialized template function is inheriting the inline semantics of  
the inline member function it specializes, when using pre-compiled  
headers (whew).  This can be seen in the following example:


template  class simple_class {
  public:
inline void testfn (T);
};

template <> void simple_class::testfn (int) {}  //  
incorrectly inheriting 'inline' semantics when using PCH


Where the specialization function ends up in a COMDAT section.

I have some understanding on why this is happening, but could use a  
little help in fixing the problem.  If I compile the example above  
into a precompiled header, the sequence of events is as follows:


  simple_class::testfn is generated
  simple_class::testfn is assigned to the COMDAT section by  
note_decl_for_pch()
  simple_class::testfn (implicit instantiation) is generated,  
which inherits the COMDAT-ness from simple_class::testfn

  simple_class::testfn (explicit specialization) is generated
  simple_class::testfn (explicit specialization) is assigned to  
the COMDAT section by duplicate_decls(), when it merges the  
declarations with the implicit instantiation


When not using precompiled headers, the sequence is this:

  simple_class::testfn is generated
  simple_class::testfn (implicit) is generated
  simple_class::testfn (explicit) is generated
  simple_class::testfn is assigned to the COMDAT section
  simple_class::testfn (implicit) is assigned to the COMDAT  
section


My first approach was to modify duplicate_decls() so it wouldn't copy  
the COMDAT-ness into the explicit specialization, the same way that  
we avoid copying the DECL_DECLARED_INLINE_P.  This worked for my  
particular instance, however because COMDAT implementation is host- 
dependent, it seems very dangerous.


My second approach is to disable the code in note_decl_for_pch() that  
assigns COMDAT-ness early.  This works, too, but it would be nice to  
not lose the advantage that this code was put in for.


So, my third approach was to modify note_decl_for_pch() to filter out  
simple_class::testfn, similarly to how the implicit   
declaration is filtered out.  However, in order to do so, I need to  
detect that simple_class::testfn is a member function of a  
template class.  But how do I do this?  I can detect which class the  
function is a member of by looking at DECL_CONTEXT, but I don't see  
how to determine if this class is a template class.  Is there a way  
to find the class type tree from the FUNCTION_DECL?  I can get see  
the type information I want at DECL_CONTEXT(decl)->common.chain in  
the debugger, but I have no reason to believe that this is the  
correct way to get it, or that it will work all of the time.


Does anyone have any suggestions on:
a) whether I'm headed in the right direction, and
b) if so, how to determine that simple_class::testfn is a member  
of a template class, and not an instantiated class?


Thanks -

Josh



[RFC] c++ template instantiation generates zero-sized array (pr 19989)

2005-11-01 Thread Josh Conner
I've been investigating PR 19989, where we are rejecting code when a
template instantiation generates a zero-sized array, such as:

  template struct A
  {
static const int i = 0;
  }

  template struct B
  {
int x[A::i];
  };

  B<0> b;

This is rejected on the grounds that not failing could generate an
incorrect match in other instances, e.g.:

  template void foobar (int (*) [M] = 0 );
  template void foobar ( );

  void fn (void)
  {
foobar<0>();
  }

Where we are required to match the second form of foobar.

Before I propose a patch for this behavior, however, I'd like to
understand the most desirable behavior (understanding that
implementation complexity may limit what is possible in the short term).
 For my first example (the class template), I believe gcc should in an
ideal world:

  * by default: accept it with no warnings/errors
  * with -pedantic: generate an error
  * with -pedantic -fpermissive: generate a warning

For the second example (the function template), I believe gcc should:

  * by default: bind to the second foobar
  * with -fpermissive: bind to the first foobar (which will generate a
duplicate matching template error)
  * with -fpermissive -pedantic: bind to the first foobar, with a
warning (also generating a duplicate matching template error)

Is this an accurate summary of what we'd like to see?

Thanks -

Josh


Re: [RFC] c++ template instantiation generates zero-sized array (pr 19989)

2005-11-02 Thread Josh Conner
Mark Mitchell wrote:

> I understand what you're after: tolerate uses of the extension where
> it's sufficiently harmless.
> 
> I don't think your proposed solution is correct, though, because we want
> to maintain the invariant that all conforming programs compile and
> behave as required by the standard in all compilation modes.  In other
> words, supplying -fpermissive and friends can make non-conforming
> programs compile, when they otherwise wouldn't, but conforming programs
> should behave identically in all modes.

I think this is consistent with my proposal -- the first example was
non-conforming, but accepted without -pedantic (as we do with other
zero-sized arrays).  The second example was conforming and the only way
to alter its behavior was with the -fpermissive option.

> I would imagine that if we encounter a zero-sized array when the
> "complain" flag is tf_error, then we can just issue a conditional
> pedwarn, with "if (pedantic) pedwarn (...)".  But, if tf_error is not
> set, we must reject the instantiation.
> 
> I know that sounds backwards.  The point is that when tf_error is set,
> we're committed to the instantiation.  When tf_error is not set, SFINAE
> applies.  And, in a conforming program, we must reject the instantiation
> in that case.

This is interesting - that's exactly the approach I came up (well, the
part about using the complain flag to determine whether to reject the
instantiation), but I was concerned about whether it would be accepted,
since it seemed a bit awkward.

FWIW, attached is my last iteration of the patch -- it does pass all
regression tests on i686-linux and arm-elf.  If we can come to an
agreement on the desired behavior and it matches, I'll submit it for
official approval.

Thanks!

- Josh


Index: gcc/cp/pt.c
===
--- gcc/cp/pt.c (revision 106267)
+++ gcc/cp/pt.c (working copy)
@@ -7065,25 +7065,27 @@ tsubst (tree t, tree args, tsubst_flags_
max = tsubst_template_arg (omax, args, complain, in_decl);
max = fold_decl_constant_value (max);
 
-   if (integer_zerop (omax))
- {
-   /* Still allow an explicit array of size zero.  */
-   if (pedantic)
- pedwarn ("creating array with size zero");
- }
-   else if (integer_zerop (max)
-|| (TREE_CODE (max) == INTEGER_CST
-&& INT_CST_LT (max, integer_zero_node)))
- {
-   /* [temp.deduct]
+   /* [temp.deduct]
 
-  Type deduction may fail for any of the following
-  reasons:
+  Type deduction may fail for any of the following
+  reasons:
 
-Attempting to create an array with a size that is
-zero or negative.  */
+Attempting to create an array with a size that is
+zero or negative.  */
+   if (integer_zerop (max) && flag_pedantic_errors
+   && ! (complain & tf_error))
+ {
+   /* We must fail if performing argument deduction (as
+  indicated by the state of complain), so that
+  another substitution can be found (unless -fpermissive
+  was given).  */
+   return error_mark_node;
+ }
+   else if (TREE_CODE (max) == INTEGER_CST
+&& INT_CST_LT (max, integer_zero_node))
+ {
if (complain & tf_error)
- error ("creating array with size zero (%qE)", max);
+ error ("creating array with negative size (%qE)", max);
 
return error_mark_node;
  }


Re: [RFC] c++ template instantiation generates zero-sized array (pr 19989)

2005-11-02 Thread Josh Conner
Mark Mitchell wrote:
> Josh Conner wrote:
> 
> 
>>I think this is consistent with my proposal -- the first example was
>>non-conforming, but accepted without -pedantic (as we do with other
>>zero-sized arrays).  The second example was conforming and the only way
>>to alter its behavior was with the -fpermissive option.
> 
> 
> My point was that conforming programs should compile and behave
> identically in all modes; therefore -fpermissive must not alter the
> behavior. 

OK, thanks for the clarification.  I'll prepare a revised patch and
submit it for approval.

- Josh



Restoring svn write access

2016-11-01 Thread Josh Conner
My apologies in advance for the spam if this isn't the right forum. I
had svn access in a former life (jcon...@apple.com), and I'm trying to
restore my access for my new employer. I sent an email to overseers
last week, but haven't heard back.

So... can anyone here tell me if there is some way to restore my
access, or if should I re-apply for provisioning?

Thanks!

- Josh


Re: Restoring svn write access

2016-11-01 Thread Josh Conner
Sorry, it was indeed in my spam box. Thanks to you both.


On Tue, Nov 1, 2016 at 3:05 PM, Ian Lance Taylor  wrote:
> On Tue, Nov 1, 2016 at 2:01 PM, Josh Conner  wrote:
>> My apologies in advance for the spam if this isn't the right forum. I
>> had svn access in a former life (jcon...@apple.com), and I'm trying to
>> restore my access for my new employer. I sent an email to overseers
>> last week, but haven't heard back.
>>
>> So... can anyone here tell me if there is some way to restore my
>> access, or if should I re-apply for provisioning?
>
> Did you not see the reply from Gerald Pfeifer?  You might want to
> check your spam filter.  You asked what your user ID was on
> gcc.gnu.org, and he said the answer was "jconner".
>
> Ian


Unnecessary call to mark_temp_addr_taken in calls.c (related to pr25505)?

2006-08-21 Thread Josh Conner
All -

I've been investigating the cause of pr25505, where the amount of stack
space being used for a particular C++ function went from <1K to ~20K
between gcc 3.3 and 4.0.  One of the issues I ran into was that the
stack slot allocated to hold the result of a function returning a
structure was never being reused (see comment #5 in the PR for a simple
example).

The code in calls.c that allocates the location for storing the result
marks its address as being taken, which prevents it from being reused
for other function calls in the same scope:

  /* For variable-sized objects, we must be called with a target
 specified.  If we were to allocate space on the stack here,
 we would have no way of knowing when to free it.  */
  rtx d = assign_temp (TREE_TYPE (exp), 1, 1, 1);

  mark_temp_addr_taken (d);
  structure_value_addr = XEXP (d, 0);
  target = 0;

So, my question is: is it really necessary to mark this location as
having its address taken?  Yes, the address of the slot is passed to a
function, however I can't imagine any instances where the function
retaining that address after the call would be valid.  Stepping through
the code with a debugger showed that the location is preserved until
after its value is copied out, so I don't believe that the value will
ever be overwritten too early.

In an effort to help understand whether this call was necessary, I ran
the test suite against 4.1 and mainline for both i686-pc-linux-gnu and
powerpc-apple-darwin8 (with all default languages enabled), but there
were no regressions.  I also tried to track down the origins of this
code, but the original email discussions were somewhat nebulous.  Here
is a link to the original patch:

  http://gcc.gnu.org/ml/gcc-patches/1999-10n/msg00856.html

I also investigated where the other calls to mark_temp_addr_taken were
removed, which happened here:

  http://gcc.gnu.org/ml/gcc-patches/2001-11/msg00813.html

These appeared to me to be related to a patch that was introduced
shortly before which performed alias analysis on MEM locations (IIUC):

  http://gcc.gnu.org/ml/gcc-patches/2001-11/msg00643.html

Before submitting a patch to remove this call, and the resultant dead
code, I thought I would see if anyone had some background knowledge as
to why this was present, and why I shouldn't remove it.

Thanks in advance for any advice.

- Josh



Re: Unnecessary call to mark_temp_addr_taken in calls.c (related to pr25505)?

2006-08-21 Thread Josh Conner
Richard Kenner wrote:
>> So, my question is: is it really necessary to mark this location as
>> having its address taken?  Yes, the address of the slot is passed to a
>> function, however I can't imagine any instances where the function
>> retaining that address after the call would be valid.
> 
> Your tracing below confirms my recollection that I was the one who added that
> code.  Unfortunately, it came in as a merge from another tree, so it's going
> to be hard to do the archeology to figure out what it was added for, but the
> history certainly suggests to me that there *was* some test case when it did
> have a longer life and that's why it was added.  Remember that all this stuff
> is very sensitive to calling sequences so you may not be able to find it on
> one or two machines (e.g., try Sparc).  I suspect the test case involved
> something like two calls to the same function (that returns a structure)
> within the parameter list to a third function.
> 
> My strong suggestion is to leave this alone.  The benefits in terms of
> increased optimization opportunities is small and the costs if it
> *does* cause wrong code to be generated is large because we know it'll
> be an obscure and hard-to-find case.

Richard -

Thanks for looking at this and the advice.  I can understand the concern
about introducing a subtle bug for negligible benefit, although in my
case the benefit is somewhat more meaningful (~6k stack regained on a
single function).

Might you be able to provide any insight as to why you were able to
remove the other calls to mark_temp_addr_taken?  Was it indeed because
of improved alias analysis of MEM locations?  I wasn't, unfortunately,
able to derive this from the email traffic.  Here's the link again, in
case it's helpful:

  http://gcc.gnu.org/ml/gcc-patches/2001-11/msg00813.html

I did investigate the case you described, where two function parameters
are calls to the same function returning a structure.  The front-end
generates temporaries to handle this, and so the middle-end-generated
temporaries are still restricted to a lifetime of a single statement.
Here's the tree dump:

baz ()
{
  struct S D.1105;
  struct S D.1104;

:
  D.1104 = foo ();
  D.1105 = foo ();
  bar (D.1104, D.1105) [tail call];
  return;

}

Note that we will get 4 instances of struct S generated on the stack:
two for D.110[45], and two more for holding the return values from the
two calls to foo.

Thanks again, and I appreciate any additional context you might be able
to offer.

- Josh


Re: Unnecessary call to mark_temp_addr_taken in calls.c (related to pr25505)?

2006-08-21 Thread Josh Conner
Richard Kenner wrote:

>> I did investigate the case you described, where two function parameters
>> are calls to the same function returning a structure.  The front-end
>> generates temporaries to handle this, and so the middle-end-generated
>> temporaries are still restricted to a lifetime of a single statement.

> Neverthless, it may be that gimplification renders this OBE.  If so,
> then I suspect a that *a lot* of the temporary lifetime tracking code is
> also no longer needed.  But I wouldn't want to jump to any of these
> conclusions too quickly: this stuff is very complex.

OK, thanks.  If you have any suggestions on other approaches to
verifying this, I'd certainly appreciate it.

- Josh



Re: Unnecessary call to mark_temp_addr_taken in calls.c (related to pr25505)?

2006-08-22 Thread Josh Conner
Richard Guenther wrote:

>> OK, thanks.  If you have any suggestions on other approaches to 
>> verifying this, I'd certainly appreciate it.

> Other than testing on more targets, no.  Does it fix PR25505?  In
> this case it would be a fix for a regression and maybe rth can have a
> look at the patch as he should know this area equally well?

PR25505 reports that the stack usage of a particular function increased
from 512 bytes to 20k from 3.3 -> 4.x.  This patch reduces the stack
usage for that function back to ~13k.  So, it solves one aspect of the
regression.  I'll attach the patch I've been testing.

- Josh

Index: gcc/calls.c
===
--- gcc/calls.c (revision 116236)
+++ gcc/calls.c (working copy)
@@ -1987,7 +1987,6 @@ expand_call (tree exp, rtx target, int i
   we would have no way of knowing when to free it.  */
rtx d = assign_temp (TREE_TYPE (exp), 1, 1, 1);
 
-   mark_temp_addr_taken (d);
structure_value_addr = XEXP (d, 0);
target = 0;
  }


Re: Unnecessary call to mark_temp_addr_taken in calls.c (related to pr25505)?

2006-08-22 Thread Josh Conner
Richard Kenner wrote:

> I found the test case.  As I suspected, it was on Sparc (the test case failed
> for a different reason on Mips) and was indeed a case where there were two
> function calls in the same parameter list of an outer call (though not the
> same function).  It was an Ada test case, which we had no place to put
> back in 1999.  Here it is.

Thanks for tracking this down.  I would offer to add this test case to
the test suite as my penance, but I don't have the ability to run ada
tests on any of my machines.

- Josh


Re: Potential fix for rdar://4658012

2006-08-28 Thread Josh Conner
Richard Kenner wrote:

> I disagree.  Testing is not, and should never be, a substitute for analysis.
> 
> A patch is proposed because we have a reason to believe it's correct.  Then
> we test to increase our confidence that it is, indeed, correct.  But both
> parts are essential for any patch.
> 
> Here we have a situation where we'd like to see some optimization done (the
> merging of temporaries).  There's some code that suppresses that optimization
> in one case because that optimization was unsafe in that case.  It is not
> acceptable to find out whether that code is still needed by merely running
> a test suite: there has to be some analysis that says what changed to
> make that code no longer needed.  The burden of proof on that analysis has
> to be on the person who's proposing that the code is no longer needed.

Richard -

I agree with you -- my intent for originally posting was to hopefully
find that understanding of why the code had originally been added.
Perhaps I should have better explained my own understanding, so that you
could help identify if this was a conceptual mistake on my part.  I
think I had too much confidence that this would be an easy question to
answer, but it appears that the code is quite dated and not well
understood.  So, without further ado, let me provide my analysis, and
hopefully those who understand this code best can help point out if I've
gone astray.  Here is my interpretation of this code:

In the code for managing middle-end-generated temps, it appears that
push/pop_temp_slots is used to create nested contexts into which unique
temps are generated.  When I need a temp, I call push_temp_slots,
allocate a temp, perform my calculation, and call pop_temp_slots when
I'm done.

The one exception to this is if the address of the temp is taken before
I call pop_temp_slots.  In that instance, even though I may be "done"
with the temp, it needs to live until the end of the high-level-language
scope, and so it is marked with addr_taken and is pushed up in the temp
scopes using preserve_temp_slots.  It is this logic that is used for
function calls that return a value on the stack.

However, in the case where we're passing the address of a temp slot to a
function, it doesn't make sense to me that this is the same as other
"address-of" operations on a stack slot.  The function call (at least in
C and C++) cannot preserve this address, and it is reasonable to say
that attempts to access this address after the caller is done with the
location, are invalid.  So, for these reasons, marking the temp for a
lifetime longer than what is needed by the middle-end for copying it
into the destination, appeared to be excessive.

Hopefully this is enough of an explanation to reveal whether my
understanding is consistent or inconsistent with the experts, and can
move the discussion forward.

Thanks again for all of the time looking at this - it's very much
appreciated.

- Josh



Help running a SPARC/Ada test case?

2006-08-31 Thread Josh Conner
Is there anyone out there who might have a SPARC environment with Ada
support who could run the attached Ada testcase on a version of gcc
patched with the attached patch?  I'd like to verify whether the test
behaves correctly when compiled at -O0, -O1, and -O2.  The expected
(correct) behavior is the following output:

a_value= 10 b_value= 20

Some context of what I'm trying to accomplish can be seen by looking at
the thread starting here:

  http://gcc.gnu.org/ml/gcc/2006-08/msg00389.html


My utmost gratitude to anyone who can help!

- Josh
Index: gcc/calls.c
===
--- gcc/calls.c (revision 116236)
+++ gcc/calls.c (working copy)
@@ -1985,9 +1985,8 @@
/* For variable-sized objects, we must be called with a target
   specified.  If we were to allocate space on the stack here,
   we would have no way of knowing when to free it.  */
-   rtx d = assign_temp (TREE_TYPE (exp), 1, 1, 1);
+   rtx d = assign_temp (TREE_TYPE (exp), 0, 1, 1);
 
-   mark_temp_addr_taken (d);
structure_value_addr = XEXP (d, 0);
target = 0;
  }
with Ada.Text_Io;

procedure Test_Func is
   type A_Int is new Integer range 1..10;
   type B_Int is new Integer range 11..20;
   type A_Array is array(1..5) of A_Int;
   type B_Array is array(1..5) of B_Int;

   procedure  Print_Values (A_Value  : in A_Int;
B_Value : in B_Int)  is
   begin
  Ada.Text_Io.Put_Line("a_value="  & Integer'Image(Integer(A_Value)) &
   " b_value=" & Integer'Image(Integer(B_Value)));
   end Print_Values;

   function Get_A return A_Array is
  A : A_Array := (others => 10);
   begin
  return A;
   end Get_A;

   function Get_B return B_Array is
  B : B_Array := (others => 20);
   begin
  return B;
   end Get_B;

   J : Natural := 3;
begin

   Print_Values
 (A_Value =>Get_A(J), B_Value =>Get_B(J));
end Test_Func;



Re: Help running a SPARC/Ada test case?

2006-09-01 Thread Josh Conner
Eric Botcazou wrote:
>> Is there anyone out there who might have a SPARC environment with Ada
>> support who could run the attached Ada testcase on a version of gcc
>> patched with the attached patch?  I'd like to verify whether the test
>> behaves correctly when compiled at -O0, -O1, and -O2.  The expected
>> (correct) behavior is the following output:
>>
>> a_value= 10 b_value= 20
> 
> This was a little vague so I used "gcc version 4.2.0 20060805" which is
> the most recent compiler with Ada support  I have on SPARC.  The
> testcase runs correctly at -O0, -O1 and -O2 when compiled with the
> patched compiler.  The size of the frame went down from 216 to 200 bytes
> at -O2.

Great - thanks.  I really appreciate your help.

- Josh


Re: Potential fix for rdar://4658012

2006-09-08 Thread Josh Conner
Sorry for my own slow response -- I've been doing more digging through
code, and didn't want to respond without a reasonable understanding...

Richard Kenner wrote:

>> However, in the case where we're passing the address of a temp slot to a
>> function, it doesn't make sense to me that this is the same as other
>> "address-of" operations on a stack slot.  The function call (at least in
>> C and C++) cannot preserve this address, and it is reasonable to say
>> that attempts to access this address after the caller is done with the
>> location, are invalid.
> 
> Here's where I disagree.  The issue isn't what the front-ends (and especially
> not a *subset* of the front-ends) do, but what they *are allowed* to do.
> 
> Going back to 3.4 for a moment, the question is whether you could
> create a valid tree where the address would survive.  And I think you can.
> All it would take is a machine like Sparc that passes all aggregates by
> reference is to have a CALL_EXPR with two operands that are each CALL_EXPRs
> of aggregate types.

Yes, this makes perfect sense.  I can envision a tree where removing
this code would create an invalid reuse of the stack slot.  However, we
are doing two things to preserve this temp -- marking it with
"addr_taken" and creating it with the "keep" bit set.  The former says
"promote this temp one level up", and the latter "keep this temp around
for the lifetime of the current block".  This still seems redundant.  I
think marking the temp as addr_taken is sufficient to handle the case
where the address is used as the input to a function call (see attached
patch).

Note that since Jason's recent change to the frontend here:

  http://gcc.gnu.org/ml/gcc-patches/2006-09/msg00202.html

This change would be sufficient to address the remaining issues in
pr25505.  I'd like to hear your opinion of this, though, before
submitting it for approval.

> We never had a precise specification of what trees were valid then (which
> is precisely the set of valid GENERIC, and hence is similarly not very
> precisely defined), but I think almost everybody working on the 3.4 compiler
> would say that the above is valid whether or not C or C++ could generate it.
> Therefore, the middle-end had to properly support it.
> 
> However, in GCC 4, with tree-ssa, the RTL expanders receive a much smaller
> subset of trees, namely those defined in GIMPLE.  I *believe*, but aren't
> sure, that the above is not valid GIMPLE.
> 
> That would make the change safe.  But:
> 
> (1) The code we generate is pretty inefficient in the current case, since we
> copy the entire aggregate.  If we try to fix that, we *will* be preserving
> the address for later, though perhaps in a temporary more properly allocated.
> 
> (2) I suspect that we can rip out much more of this code than just this line
> and I'd prefer to do it that way.
> 
>> Hopefully this is enough of an explanation to reveal whether my
>> understanding is consistent or inconsistent with the experts, and can
>> move the discussion forward.
> 
> As I said once before, this code didn't change between 3.4 and 4.x, so
> something else is the cause of the regression.  I'd like to understand
> what that was.  I think you posted the changes you propose for the C
> and C++ front ends, but I don't remember what they were.

To help address your concerns about fixing the regression closest to its
point of change, I've also been looking into how this was handled in
3.4.  In that version of the compiler, a TARGET_EXPR was surviving all
the way until the expander, where the function expand_expr_real was
recognizing TARGET_EXPRs and generating a reusable temp.  Unfortunately,
I don't see an obvious correlation in 4.x, as the TARGET_EXPR never
reaches the expander, since gimplification has already converted it into
a CALL_EXPR.

Thanks again for taking the time to look into this.

- Josh


Index: gcc/calls.c
===
--- gcc/calls.c (revision 116642)
+++ gcc/calls.c (working copy)
@@ -1985,7 +1985,7 @@ expand_call (tree exp, rtx target, int i
/* For variable-sized objects, we must be called with a target
   specified.  If we were to allocate space on the stack here,
   we would have no way of knowing when to free it.  */
-   rtx d = assign_temp (TREE_TYPE (exp), 1, 1, 1);
+   rtx d = assign_temp (TREE_TYPE (exp), 0, 1, 1);
 
mark_temp_addr_taken (d);
structure_value_addr = XEXP (d, 0);