from:"Paul Schlie"

Re: Char shifts promoted to int. Why?

2006-12-17 Thread Paul Schlie

Chris Lattner wrote:
> On Dec 17, 2006, at 12:40 PM, Rask Engelmann Lamberts wrote:
>> I seem unable to get a QImode shift instruction from this code:
>>
>> unsigned char x;
>>
>> void qishifttest2 (unsigned int c)
>> {
>>x <<= c;
>> }
>> 
>> should have been generated. Also, notice the redundant zero extension.
>> Why are we not generating a QImode shift instruction?
>
> Consider when c = 16. With the (required) integer promotion, the result
> is defined (the result is zero). If converted to QImode, the shift would
> be undefined, because the (dynamic) shift amount would be larger than the
> data type.

??? A left shift >= the precision of its shifted unsigned operand can only
logically result in a value of 0 regardless of its potential promotion.

Although integer promotion as specified by C may technically be performed
lazily as a function of the implicit target precision required for a given
operation, GCC tends to initially promote everything and then attempt to
determine if an operation's precision may be subsequently lowered after
having already lost critical knowledge of its originally specified operand's
precision.

Thereby although many operands tend to remain unnecessarily promoted, this
is often benign for larger machines with int or larger sized registers
(being the focus of GCC development efforts) as char -> int promotion is
effectively free; although potentially very expensive for smaller machines
(which tend receive less development attention).

Re: Char shifts promoted to int. Why?

2006-12-18 Thread Paul Schlie

> From: Paul Brook <[EMAIL PROTECTED]>
> On Monday 18 December 2006 01:15, Paul Schlie wrote:
>> Chris Lattner wrote:
>>> On Dec 17, 2006, at 12:40 PM, Rask Engelmann Lamberts wrote:
>>>> I seem unable to get a QImode shift instruction from this code:
>>>> 
>>>> unsigned char x;
>>>> 
>>>> void qishifttest2 (unsigned int c)
>>>> {
>>>>x <<= c;
>>>> }
>>>> 
>>>> should have been generated. Also, notice the redundant zero extension.
>>>> Why are we not generating a QImode shift instruction?
>>> 
>>> Consider when c = 16. With the (required) integer promotion, the result
>>> is defined (the result is zero). If converted to QImode, the shift would
>>> be undefined, because the (dynamic) shift amount would be larger than the
>>> data type.
>> 
>> ??? A left shift >= the precision of its shifted unsigned operand can only
>> logically result in a value of 0 regardless of its potential promotion.
> 
> Shifting >= the size of the value being shifted can and do give nonzero
> results on common hardware. Typically hardware will truncate the shift count.
> eg. x << 8 implemented with a QImode shift will give x, not 0.
> 
>> Although integer promotion as specified by C may technically be performed
>> lazily as a function of the implicit target precision required for a given
>> operation, GCC tends to initially promote everything and then attempt to
>> determine if an operation's precision may be subsequently lowered after
>> having already lost critical knowledge of its originally specified
>> operand's precision.
> 
> No. You're confusing some language you just invented with C.
> 
> The operand of the shift operator is of type unsigned int.
> "x <<= c" is exactly the same as "((int)x) << c"
> It doesn't matter whether the promotion is explicit or implicit, the semantics
> are the same.

((char)x) = ((char)( ((int)((char)x)) << ((int)c) ) ) ::
((char)x) = ((char)(   ((char)x)  << ((int)c) ) )

if the shift count ((int)x) is semantically preserved.

thereby conditionally shifting left ((char)x) by ((int)c) if c is less than
the smaller of it's shifted operand's or target's precision (both being char
in this instance) or otherwise returning 0; is semantically equivalent and
typically more efficient on smaller lightly pipelined machines without
needing to literally promote the shifted operand to int width.

(I believe)

Re: Char shifts promoted to int. Why?

2006-12-19 Thread Paul Schlie

> Dorit Nuzman wrote:
>> Paul Schlie wrote:
>> ((char)x) = ((char)( ((int)((char)x)) << ((int)c) ) ) ::
>> ((char)x) = ((char)(   ((char)x)  << ((int)c) ) )
>>
>> if the shift count ((int)c) is semantically preserved.
>>
>> thereby conditionally shifting left ((char)x) by ((int)c) if c
>> is less than the smaller of it's shifted operand's or target's
>> precision (both being char in this instance) or otherwise
>> returning 0; is semantically equivalent and typically more
>> efficient on smaller lightly pipelined machines without
>> needing to literally promote the shifted operand to int width.
>
> Something along these lines may be useful to do in the vectorizer
> when we get code like this:
>  > ((char)x) = ((char)( ((int)((char)x)) << ((int)c) ) )
> and don't feel like doing all the unpacking of chars to ints and
> then packing the ints to chars after the shift. An alternative could
> be to transform the above pattern to:
>  char_x1 = 0
>  char_x2 = char_x << c
>  char_x = ((int)c < size_of_char) ? char_x2 : char_x1
> and vectorize that (since we already know how to vectorize selects).

Seems reasonable to me; and may analogously be further potentially
generalized to allow vectorizable operands specified as being wider
than the expression's target precision requirement to be reduced in
precision themselves. i.e. (presuming unsigned char):

(char)x = (unsigned)y + (unsigned)( ((int)z)<<((unsigned)n) );

=>

(char)x = (char)y + (n

Re: GCC optimizes integer overflow: bug or feature?

2006-12-19 Thread Paul Schlie

Various folks wrote:
>> Compiler can optimize it any way it wants,
>> as long as result is the same as unoptimized one.
>
> We have an option for that. It's called -O0.
>
> Pretty much all optimization will change the behavior of your program.

 Now that's a bit TOO strong a statement, critical optimizations like
 register allocation and instruction scheduling will generally not change
 the behavior of the program (though the basic decision to put something
 in a register will, and *surely* no one suggests avoiding this critical
 optimization).
 
>>> Actually they will with multi threaded program, since you can have a case
>>> where it works and now it is broken because one thread has speed up so much
>>> it writes to a variable which had a copy on another thread's stack.
>>>
>> Why isn't that just a buggy program with wilful disregard for the use of
>> correct synchronisation techniques?
>
> It is that, as well as a program that features a result that is different
> from unoptimized code.

As a compromise, I'd vote that no optimizations may alter program behavior
in any way not explicitly diagnosed in each instance of their application.

(That way there's less likelihood of "surprises", and an increasing
likelihood of productive correction of code and/or alternatively feedback
encouraging either refinement or elimination of optimizations considered
more likely counter productive than useful prior to them resulting in bugs
in the programs previously verified as behaving as intended, although
potentially relying on a behavior not warranted by the language's spec.)

Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."

2006-12-29 Thread Paul Schlie

> Richard Guenther wrote:
>> Robert Dewar wrote:
>>> Daniel Berlin wrote:
>>> I'm sure no matter what argument i come up with, you'll just explain it
>>> away.  The reality is the majority of our users seem to care more about
>>> whether they have to write "typename" in front of certain declarations
>>> than they do about signed integer overflow.
>>
>> I have no idea how you know this, to me ten reports seems a lot for
>> something like this.
>
> Not compared to the number of type-based aliasing "bugs" reported.

- as aliasing optimizations are typically more subtle, it's understandable
  how these may continue to be reported.

- however as overflow optimizations are somewhat more obviously identified
  as being related to GCC's arguably somewhat notoriously overzealous
  leveraging of this form of "undefined behavior" at higher levels of
  optimization regardless of the factual behavior of target machines;
  it's understandable that folks after a while simply stop reporting these
  as "bugs" (especially as this "optimization" has been historically so
  vocally defended by the few as being proper, regardless of arguably
  reasonable expectations that it not be included in any generically
  specified level optimization by default to minimize "surprise").

Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."

2006-12-31 Thread Paul Schlie

Upon attempted careful reading of the standard's excerpts quoted by
Gabriel Dos Reis per ,
it's not clear that GCC's current presumption of LIA-1 overflow semantics
in absents of their true support is actually advocated by the standard.

As by my read, it seems fairly clear that "If an implementation adds
support for the LIA-1 exception values ... then those types are LIA-1
conformant types"; implies to me an intent that LIA-1 semantics may be
legitimately presumed "if" the semantics are "supported" by a target
implementation (just as null pointer optimizations should not be
considered legitimate if not correspondingly literally supported by
a given target).

Which makes sense, as if a target factually supports LIA-1 overflow
trapping, then a compiler may "safely" presume the behavior, and
thereby leverage it knowing that the target's runtime semantics are
preserved; just as a compiler may "safely" presume that wrapping or
other overflow semantics for the purpose of optimization for targets
which factually "support" those semantics; all of which are legitimate,
as the standard defines signed integer overflow semantics as being
undefined, and thereby provides implementation's the liberty to augment
the language by defining that otherwise being undefined by the standard.

However GCC's current predisposition to presume semantics which are known
to differ from a target's factual behavior in the name of optimization
is likely beyond that intended to be productively enabled by the standard
(although arguably perversely legitimate); and should be reconsidered as
any optimization which risks altering a program's expressed behavior is
most often never desirable (although the diagnosis of behaviors which
can't be strictly portably relied upon is most often always desirable).

Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."

2006-12-31 Thread Paul Schlie

> Robert wrote:
>> Paul Schlie wrote:
>> Upon attempted careful reading of the standard's excerpts quoted by
>> Gabriel Dos Reis per <http://gcc.gnu.org/ml/gcc/2006-12/msg00763.html>,
>> it's not clear that GCC's current presumption of LIA-1 overflow semantics
>> in absents of their true support is actually advocated by the standard.
>> 
>> As by my read, it seems fairly clear that "If an implementation adds
>> support for the LIA-1 exception values ... then those types are LIA-1
>> conformant types";
> 
> You are reaching here, based on your peculiar notion of the relevance
> of behavior of some target instructions to language semantics. But
> there is no such relation. The C standard creates a semantic model
> that is entirely independent of the target architecture with respect
> to the overflow issue. The behavior of instructions on the machine
> has nothing to do with what a compiler must implement.

- Yes it's a stretch, however although it's agreed there's no intrinsic
  relationship between a language's semantics and a particular target
  machine's instruction set; the compiler in effect defines one, which
  arguably should ideally be consistently preserved, as attempting to
  leverage behaviors factually not present in the compiler's originally
  chosen mappings, may worst case silently alter the originally expressed
  behavior of the resulting program to likely no one's benefit, presuming
  the original behavior as defined by the compiler was in fact desired,
  although both considered legitimate.

>> implies to me an intent that LIA-1 semantics may be
>> legitimately presumed "if" the semantics are "supported" by a target
>> implementation
> 
> It may (and apparently does) imply this to you, but there is
> absolutely no implication of this in the standard. if the standard
> wanted to say this, it would (although it would be VERY difficult
> to state this in meaningful normative form).

- Correspondingly, if the standard didn't intend to clarify LIA-1
  conforming support within the scope of undefined signed overflow
  semantics, it need not have "said" anything; thereby seemingly
  considered an implementation's optional "support" of LIA-1 to be
  worth noting, and predicated on its "support" (which most targets
  simply inherently don't).

>> (just as null pointer optimizations should not be
>> considered legitimate if not correspondingly literally supported by
>> a given target).
> 
> There is no such implication in the standard. either

- Only by analogy to above.

> You persist in this strange notion of "factual support" of the
> "target", but there is nothing to support this notion in either
> the standard
> 
> There is reasonable grounds for arguing for limiting the
> effects of this particular undefined optimization, but you
> can not find any support for this in the standard itself
> at all.

- Agreed.

Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."

2007-01-01 Thread Paul Schlie

> Ian Lance Taylor wrote:
> ...
> I don't personally see that as the question.  This code is
> undefined, and, therefore, is in some sense not C.  If we take
> any other attitude, then we will be defining and supporting
> a different language.  I think that relatively few people want
> the language "C plus signed integers wrap", which is the language
> we support with the -fwrapv option.
> ...

No, all such code is perfectly legal C and specified to have undefined
semantics in the instance of signed overflow; which seems clear from the
excerpts noted by Gabriel 
may be specified by an implementation to assume whatever behavior desired.

Thereby full liberty in given to the implementers of the compiler to
apply whatever semantics are deemed most desirable; regardless of their
practical utility, or historical compatibility.  Ultimately the issue is
to what degree should optimizations preserve semantics otherwise expressed
and/or historically expected in their absents; and how should they when
deemed desired be invoked (i.e. by named exception or default at -Ox).

Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."

2007-01-02 Thread Paul Schlie

> Robert Dewar wrote:
> ...
> I think it is a bad idea for the optimization levels to deal with
> anything other than optimization. -fwrapv is not about optimization,
> it is about changing the language semantics.
> 
> So this proposal would be tantamount to implementing a different
> language at -O1 and -O2, and having -O2 change the formal
> semantic interpretation of the program. That seems a very
> bad idea to me.
> ...

Yes, it would be laudable for GCC to adopt the principle that whatever
language semantics are chosen in absents of optimization should be
preserved through -O2 by default; although may be explicitly overridden
as may be desired by the user.

Further as this may be target specific, for target machine implementations
which inherently support trapping on overflow (or null pointer dereference);
GCC may correspondingly then enable through -O2 optimizations presuming the
same; but by principle not otherwise for targets for which GCC does not
factually support these semantics.

Re: changing "configure" to default to "gcc -g -O2 -fwrapv ..."

2007-01-02 Thread Paul Schlie

> Richard Kenner wrote:
> ...
> In other words, -fwrapv says that we are modifying the language semantics
> to define signed overflows as wrapping and this will have effects on the
> optimizer (so the language effect is primary), while -fno-strict-aliasing
> says what the optimizer will do and hence how we're modifying the language
> (meaning the optimizer effect is primary).
> ...

> Robert Dewar wrote:
> ...
> Note Paul that I think you may be confusing what I say, when I talk
> about language semantics, I am talking about the formal semantics of
> the language, which may well have non-deterministic, undefined, or
> implementation defined elements. I am NOT talking about preserving
> observed behavior. You have rather consistently confused the two.
> ...
> No, that's completely wrong, overflow is undefined in the semantics, so
> preserving undefined semantics has nothing whatever with preserving
> observed behavior for these programs with undefined semantics. Here
> is exactly where you are confusing behavior with semantics.
> ...

Any any choice/presumption of a signed overflow semantics alters the
languages specification (including the presumption of it's absents) as
it's semantics are undefined.

Given the controversy surrounding the issue is seems rooted in that some
optimizations enabled by default at -O1/O2 modify the expressed semantics
in ways that have been shown to be counterproductive (regardless of their
strictly formal validity), the issue is whether or not such optimizations
should be enabled to by default at these generically specifiable
optimization levels. (for example, first in effect supporting wrapping
semantics in absents of optimization, then presuming their absents during
optimization).

In other words, although the standard enables latitude in these choices,
once an implementation has chosen a set of semantics by way of defining
a mapping to a target's instruction set; should the expressed semantics
be preserved?  Personally the answer is clearly yes within reason, and
only modified by explicit request otherwise through -O2. (As otherwise,
the optimizations are in effect modifying the semantics of the chosen
implementation, which although beneficial in some circumstances, may be
clearly counterproductive in others; although all being strictly
legitimate).

As strict aliasing presumptions may in fact alter the expressed semantics;
this too should arguably only be enabled by explicit request through -O2
to be consistent with the philosophy if adopted.

I appreciate that my views/presumptions may be so inconsistent with the
status quo that they are untenable; and as I feel I've expressed my thoughts
as best I can, I'll leave the issue to rest and hopeful resolution to those
who know better.

Re: Signed int overflow behavior in the security context

2007-01-26 Thread Paul Schlie

> Robert Dewar wrote:
>
> People always say this, but they don't really realize what they are
> saying. This would mean you could not put variables in registers, and
> would essentially totally disable optimization.

- can you provide an example of a single threaded program where the
assignment of variable to a machine register validly changes its
observable logical results?

> The -O2 flag is exactly a request to do optimizations that may cause
> wrong programs to generate different results.

- well this is certainly an interesting definition of -O2; and implicit
definition of any program which invokes an undefined behavior as being
"wrong"; as opposed to being arguably more accurately non-portable; as
the standard enables compilers to specify a well defined behavior to
that otherwise specified as being undefined. (nor seemingly particularly
clever, as intentionally invoking a behavior not previously expressed
seems like a great way to silently inject bugs into a program debugged
utilizing lesser degrees of optimization as is typically done.)

Re: Signed int overflow behavior in the security context

2007-01-26 Thread Paul Schlie

>> On Fri, Jan 26, 2007 at 06:57:43PM -0500, Paul Schlie wrote:
>> > Robert Dewar wrote:
>> >
>> > People always say this, but they don't really realize what they are
>> > saying. This would mean you could not put variables in registers, and
>> > would essentially totally disable optimization.
>> 
>> - can you provide an example of a single threaded program where the
>> assignment of variable to a machine register validly changes its
>> observable logical results?
>
> If the program has a hash table that stores pointers to objects, and
> the hash function depends on pointer addresses, then the choice to
> allocate some objects in registers rather than in stack frames will change
> the addresses. If the algorithm depends on the order of hash traversal,
> then -O2 will change its behavior.

- if the compiler chooses to alias an object's logical storage location
utilizing a register, and that object's logical address is well specified
by a pointer whose value is itself subsequently utilized; it shouldn't have
any logical effect on that object's logical pointer's value; as it's the
responsibility of the compiler to preserve the semantics specified by the
program.

(however as you appear to be describing an algorithm attempting to rely on
the implicit addresses of object storage locations resulting from an assumed
calling or allocation convention; and as such assumptions are well beyond
the scope of most typical language specifications; it' not clear that such
an algorithm should ever be presumed to reliably work regardless of any
applied optimizations?)

> Likewise, if the program has an uninitialized variable, the behavior
> will differ depending on details of optimization and how variables are
> assigned to memory.  Heap allocated for the first time might always be
> zero (as the OS delivers it that way), turning on optimization might then
> result in a nonzero initial value because of reuse of a register.

- I would argue that in this circumstance although the resulting value may
differ, the results are actually equivalent; as in both circumstances the
value returned is the value associated with it's storage location; and as
the value of all storage locations are arbitrary unless otherwise well
specified; the result may change from run to run regardless of any applied
optimizations.

Re: Signed int overflow behavior in the security context

2007-01-26 Thread Paul Schlie

With hind sight, it would appear that as many of these difficulties
seem rooted in the historical implicit declaration/conversions of
variables/parameters to signed int which has correspondingly tended
to be implemented as having wrapping semantics regardless of overflow
being undefined; and as typically the greatest interest in optimization
tends to be focusing on loops which have historically utilized default
int indexes although most typically never intending to rely on negative
or modular wrapping semantics; I wonder if rather than attempting to
argue further about whether ints should wrap or not, both signed and
unsigned should simply be assumed to wrap by default; and add a flag to
enable the specific optimization of only loop index variables such that
either or both signed and unsigned indexes are assumed to never overflow
or modulo wrap, and enable the explicit assertion of explicit value ranges
within code by adding a flag to interpret or ignore assert expressions
as value range assertions as previously proposed by others on a few past
occasions.

Thereby most code will simply work based on historical assumptions, and
loops may be explicitly enabled to be further optimized in bulk by
assuming non-wrapping index semantics, and/or explicit assertions may
be added to more finely control the assumptions otherwise desired with
knowledge of the program.

Re: Signed int overflow behavior in the security context

2007-01-26 Thread Paul Schlie

> David Daney wrote:
>> Paul Schlie wrote:
>> (however as you appear to be describing an algorithm attempting to rely on
>> the implicit addresses of object storage locations resulting from an assumed
>> calling or allocation convention; and as such assumptions are well beyond
>> the scope of most typical language specifications; it' not clear that such
>> an algorithm should ever be presumed to reliably work regardless of any
>> applied optimizations?)
>
> Isn't that the gist of the entire overflow wraps issue? Signed overflow is
> undefined in C and always has been. It' not clear that any program that relies
> on it should ever be presumed to reliably work regardless of any applied
> optimization.

Almost, if implementations had historically initialized variable memory
to some value and programmers had correspondingly relied on that behavior
although the language specified it as being undefined, then I would
correspondingly argue that although the compiler may assign a variable
to a register, it should correspondingly initialize that register with the
value historically expected as if that value were stored in memory.

However as compilers and machines have not historically initialized memory
in this way, there is no historical semantic to preserve as exists with
signed integer overflow semantics, which have been well understood to wrap
on most all implementations regardless of the languages non-guarantee of
this behavior for good or bad.

Personally although I prefer wrapping semantics, I think consistently in the
presence of optimization is more important, and thereby would prefer an
implementation which enforces trap-on-overflow (hopefully recoverably so, as
forcibly terminating a program in the presence of an error without any means
to transparently recover is about the worst behavior imaginable), in lieu of
an schizophrenic implementation which changes the languages observable
semantics during optimization.

Signed int overflow behavior in the security context

2007-01-27 Thread Paul Schlie

> Richard Guenther wrote:
> On 1/27/07, Paul Schlie <[EMAIL PROTECTED]> wrote:
>>>> On Fri, Jan 26, 2007 at 06:57:43PM -0500, Paul Schlie wrote:
>>> Likewise, if the program has an uninitialized variable, the behavior
>>> will differ depending on details of optimization and how variables are
>>> assigned to memory.  Heap allocated for the first time might always be
>>> zero (as the OS delivers it that way), turning on optimization might then
>>> result in a nonzero initial value because of reuse of a register.
>> 
>> - I would argue that in this circumstance although the resulting value may
>> differ, the results are actually equivalent; as in both circumstances the
>> value returned is the value associated with it's storage location; and as
>> the value of all storage locations are arbitrary unless otherwise well
>> specified; the result may change from run to run regardless of any applied
>> optimizations.
> 
> If you read from an uninitialized variable twice you might as well get
> a different result each time.  This is exactly the same issue than with
> signed overflow and observable behavior - though as somebody notes
> later - the uninitialized variable case doesn't stir up too many peoples
> mind.

-  However x ^= x :: 0 for example is well defined because absent any
intervening assignments, all reference to x must semantically yield the same
value, regardless of what that value may be. (but does not require that its
associated storage location be literally referenced multiple times unless
correspondingly declared as being volatile; as in effect variables declared
as being volatile have no guarantee of retaining the value most recently
logically assigned within a single tread of execution, and thereby in effect
are bound to a storage location known to perpetually yield indeterminate
values upon each distinct logically specified reference, but even here the
semantics of the language requires that the reference be literally
performed, and not assumed to yield any value although the value is
undefined).

Re: Signed int overflow behavior in the security context

2007-01-27 Thread Paul Schlie

> Robert Dewar wrote
>> Paul Schlie wrote:
>> -  However x ^= x :: 0 for example is well defined because absent any
>> intervening assignments, all reference to x must semantically yield the
>> same value, regardless of what that value may be.
> 
> Nope, there is no such requirement in the standard. Undefined means
> undefined. Again you are confusing the language C defined in the C
> standard with some ill-defined language in your mind with different
> semantics. Furthermore, it is quite easy to see how in practice you
> might get different results on successive accesses.

I'm game; how might multiple specified references to the same non-volatile
variable with no specified intervening assignments in a single threaded
language ever justifiably be interpreted to validly yield differing values?

(any logically consistent concrete example absent reliance on undefined
hand-waving would be greatly appreciated; as any such interpretation or
implementation would seem clearly logically inconsistent and thereby
useless; as although the value of a variable may be undefined, variable
reference semantics are well defined and are independent of its value)

Re: Signed int overflow behavior in the security context

2007-01-27 Thread Paul Schlie

> Brooks Moses wrote:
>  ...
> Does that logic work for you?

no, as although a variable's value may not have been previously defined
within the context of a particular program, a variable's access semantics
are orthogonal to what ever value may result from that variable's access;
and thereby although it's resulting value may be indeterminate, successive
logical references must sensibly be presumed to yield equivalent values
baring the variable being declared as being volatile, or having an
intervening assignment.

(there seems to be too much desire to arbitrarily justify anything at the
drop of a hat given an opportunity to associate it directly or indirectly
with an undefined behavior, regardless of its sensibly; as opposed to
recognizing an undefined behavior as an opportunity to define useful
logically consistently semantics in their absents, although potentially
not strictly portable between implementations)

Re: Signed int overflow behavior in the security context

2007-01-27 Thread Paul Schlie

> Robert Dewar wrote:
>> Paul Schlie wrote:
>>> Brooks Moses wrote:
>>> <http://gcc.gnu.org/ml/gcc/2007-01/msg01119.html> ...
>>> Does that logic work for you?
>> 
>> no, as although a variable's value may not have been previously defined
>> within the context of a particular program, a variable's access semantics
>> are orthogonal to what ever value may result from that variable's access;
>> and thereby although it's resulting value may be indeterminate, successive
>> logical references must sensibly be presumed to yield equivalent values
>> baring the variable being declared as being volatile, or having an
>> intervening assignment.
> 
> Paul, you really really need to read the C standard. It's beginning
> to sound as though you haven't, since you keep making things up that
> are just not justified by the standard.
>> 
>> (there seems to be too much desire to arbitrarily justify anything at the
>> drop of a hat given an opportunity to associate it directly or indirectly
>> with an undefined behavior, regardless of its sensibly
> 
> There is no question about what the C standard makes undefined here.
> That of course does not preclude giving it defined semantics in a
> particular language, thus effectively extending the language, but the
> C standard committee in making something undefined is very deliberately
> deciding that it is more appropriate than making it implementation
> defined.
> 
> In the case of access to uninitialized variables in C, no one (or
> almost no one, I realize you are an exception), would favor trying
> to make this implementation defined. The only way you will get this
> is by creating a variant of gcc for your own use yourself :-)
> 
> Now a special mode for debugging which does initialize all
> uninitialized variables to a specified value, a la GNAT's
> pragma Initialize_Scalars, could be useful. But anyone relying
> on Paul Schlie semantics for uninitialized variables is writing
> rubbish instead of C!

[ISO/IEC 14882-2003] Section 8.5, paragraph 9 says: "... if no initializer
is specified for a nonstatic object, the object and its subobjects, if any,
have an indeterminate initial value"

Thereby it seems fairly clear to me that although an uninitialized variable
has an indeterminate INITIAL value, all subsequent accesses to that variable
will yield that INITIAL value until otherwise programmatically modified.

Thereby x ^= x is correspondingly well defined if non-volatile as originally
asserted.

Re: Signed int overflow behavior in the security context

2007-01-27 Thread Paul Schlie

> Paul Schlie wrote:
>>> Brooks Moses wrote:
>>> <http://gcc.gnu.org/ml/gcc/2007-01/msg01119.html> ...
>>> Does that logic work for you?
>> 
>> no, as although a variable's value may not have been previously defined
>> within the context of a particular program, a variable's access semantics
>> are orthogonal to what ever value may result from that variable's access;
>
> No; it's not just the value that is undefined; it's the
> behavior of code attempting to use that value that is
> undefined.  Aborting the program is quite conforming if
> your program uses "the value" of an uninitialized int,
> for example.  I write "the value" in quotes because the
> variable does not *have* a value until one is assigned
> to it.  The fact that storage allocated for the variable
> holds some bit pattern shouldn't be confused with that
> variable having a value; so long as the variable has not
> been given a value, the compiler might read it from
> anywhere or nowhere, and has no obligation to be
> consistent.  I've seen no justification for any claim
> that there is an obligation on the compiler to produce
> consistent values in this situation; the C standard, on
> the other hand, states quite clearly that code *cannot*
> rely on any such thing.

please see: <http://gcc.gnu.org/ml/gcc/2007-01/msg01124.html>

clearly clarifying uninitialized variables have an INITIAL value,
and thereby regardless of that value, x ^= x is then correspondingly
well defined as the language correspondingly clearly defines the
semantics of the expression independently of it's variable argument
values whether they may be apriority determinate or otherwise.

your corresponding supporting standard citation to the contrary?

Re: Signed int overflow behavior in the security context

2007-01-27 Thread Paul Schlie

> Paul Jarc wrote:
>> Paul Schlie <[EMAIL PROTECTED]> wrote:
>> your corresponding supporting standard citation to the contrary?
> 
> C99 3.17.2 defines "indeterminate value" as "either an unspecified
> value or a trap representation".  6.2.6.1p5 says of trap
> representations: "If the stored value of an object has such a
> representation and is read by an lvalue expression that does not have
> character type, the behavior is undefined."  Possibly other parts of
> the standard also also make it undefined behavior to access an
> uninitialized character-type variable; I haven't looked too closely.

- an lvalue expression is the target expression of an assignment, and
thereby if it has an indeterminate value, it's sensible that it's
semantics are undefined, as in effect it specifies value is being
assigned to an indeterminate storage location (which obviously isn't
a good or determinate thing to do; but has no bearing on an rvalue
access to a well defined storage location returning an indeterminate
value). i.e. lvalue = rvalue :: known_location = indeterminate_value;
or in the case of x ^= x :: known_location = x_value ^ x_value :: 0.

Re: Signed int overflow behavior in the security context

2007-01-27 Thread Paul Schlie

> Paul Jarc wrote:
>> Paul Schlie <[EMAIL PROTECTED]> wrote:
>> your corresponding supporting standard citation to the contrary?
> 
> C99 3.17.2 defines "indeterminate value" as "either an unspecified
> value or a trap representation".  6.2.6.1p5 says of trap
> representations: "If the stored value of an object has such a
> representation and is read by an lvalue expression that does not have
> character type, the behavior is undefined."  Possibly other parts of
> the standard also also make it undefined behavior to access an
> uninitialized character-type variable; I haven't looked too closely.

- an lvalue expression is the target expression of an assignment, and
thereby if it has an indeterminate value, it's sensible that it's
semantics are undefined, as in effect it specifies value is being
assigned to an indeterminate storage location (which obviously isn't
a good or determinate thing to do; but has no bearing on an rvalue
access to a well defined storage location returning an indeterminate
value). i.e. lvalue = rvalue :: known_location = indeterminate_value;
or in the case of x ^= x :: known_location = x_value ^ x_value :: 0.

(sorry, or more generally an lvalue expression represent a storage
location designation, either source or target, which if indeterminate
doesn't arguably have a reasonably well specified behavior; which isn't
the case with x ^= x, where all storage location designations are well
defined)

Re: Signed int overflow behavior in the security context

2007-01-28 Thread Paul Schlie

> Paul Jarc wrote:
>> Paul Schlie wrote:
>> if it has an indeterminate value [...] has no bearing on an rvalue
>> access to a well defined storage location
> 
> You might think so, but that's actually not true in the C standard's
> terminology.  It sounds like you interpret "indeterminate value" to
> mean what the standard defines as "unspecified value" (3.17.3): "valid
> value of the relevant type where this International Standard imposes
> no requirements on which value is chosen in any instance".  But
> "indeterminate value" is defined differently (3.17.2), and any
> reasoning based on your common-sense understanding of the term,
> instead of the standard's definition of it, has no relevance to the
> standard.  The standard is not intuitive; it can only be understood by
> careful reading.
> 
> The key concept that you seem to be missing is trap representations.
> See 6.2.6.1p5, also keeping in mind that "lvalue", as used in the
> standard, probably means something slightly different from what you
> might expect; see 6.3.2.1p1.

Thanks, however I interpret the standard to clearly mean:

 int x; int* y;

 x = x ; perfectly fine; as lvaue x clearly designates an object (no trap)

 x = *y ; undefined, as lvalue *y is indeterminate (may trap if referenced)

 *y = x ; undefined, as above.

As otherwise further given:

 volatile int* z = (int*)ADDRESS ;

 x = *z ; would also be undefined, as with lvalue x above, lvaue *z
; references an object who's rvalue has not been initialized;
; and hopefully a more clearly a wrong interpretation; as
; these semantics are critical to writing low-level drivers.

Re: Signed int overflow behavior in the security context

2007-01-28 Thread Paul Schlie

> Paul Jarc wrote:
>> Paul Schlie <[EMAIL PROTECTED]> wrote:
>>  x = x ; perfectly fine; as lvaue x clearly designates an object (no trap)
> 
> Can you cite the part of the standard that says that?  The fact that
> an expression designates an object does not exclude that object from
> holding a trap representation.  A trap representation, as defined by
> the standard (6.2.6.1p5), is unrelated to dereferencing an invalid
> pointer.  The word "trap" is also sometimes used to refer to
> dereferencing an invalid pointer, but that's not relevant here, since
> the standard uses a different definition.

In context:

- Certain object representations need not represent a value of the object
type.

- If the stored value of an object has such a representation and is read by
an lvalue expression that does not have character type, the behavior is
undefined.

- If such a representation is produced by a side effect that modifies all
or any part of the object by an lvalue expression that does not have
character type, the behavior is undefined.41)

- Such a representation is called a trap representation.

Means: a trap representation is a value representation which does not
constitute a valid member of an object's type except if a char; and
if such an object value representation is read or produced by an lvalue
expression, the behavior is undefined (because simply any operation which
produces or attempts to utilize a value whose representation does not
validly represent a member of that object's type, it's behavior is sensibly
undefined; i.e. access or store a value having the logical representation
of the value 7 in association with an object having a specified enum range
of 0..5, and all bets are off with respect to how that object's value may
be subsequently interpreted, as such an access may be recognized as being
invalid and be trapped).

Just as the dereference of an indeterminate lvalue may be trapped if its
value is an illegitimate member of that type.

However as the only thing C says about uninitialized variables is that its
value will be initialized with an indeterminate value; which could be a trap
representation, but as most object type implementations inclusive of ints
don't have any invalid value representations; and as the standard very
clearly requires a trap representation be read or stored by the lvalue
expression to invoke undefined behavior, the only circumstance in which this
may typically validly occur in conventional implementations is upon storing
or reading an pointer, struct or emum object type with an illegitimate
(trap) value representation.

Re: Signed int overflow behavior in the security context

2007-01-29 Thread Paul Schlie

> Joseph Myers wrote:
> DR#260 seems clear enough that indeterminate values may be treated
> distinctly from determinate values including randomly changing at any
> time. 

One can only hope that the recommendations won't see the light of day in
their present form, as it's fairly clear that once an unspecified value is
read (presuming absents of a trap representation), it becomes logically
visible, and thereby clearly no longer logically indeterminate.

Further, regardless of the recommendations; and presuming absents of any
possibility of a trap representation for a given implementation;  x ^= x
remains well defined, although not necessarily equivalent to 0; as although
lvalue x remains well defined, its rvalue is proposed to remain repetitively
unspecified until being assigned the result of the dynamically evaluated xor
operation having two potentially differing unspecified operand values.

(as the undefined behavior referenced in the DR is related to a pointer
becoming indeterminate because it was assigned an indeterminate value, or
the object it had referenced has been freed and thereby no longer a valid
object of the pointer's type, and thereby although the pointer's value has
not changed, it's value is now considered a trap representation; not because
the value it references has been assigned an indeterminate value)

Re: Signed int overflow behavior in the security context

2007-01-29 Thread Paul Schlie

> Paul Jarc wrote:
>> Paul Schlie wrote:
>> One can only hope that the recommendations won't see the light of day in
>> their present form, as it's fairly clear that once an unspecified value is
>> read (presuming absents of a trap representation), it becomes logically
>> visible, and thereby clearly no longer logically indeterminate.
> 
> An unspecified value was never indeterminate in the first place.  The
> terms are not synonymous.  And an object holding an indeterminate
> value does not stop being indeterminate when its value is read;
> reading it invokes undefined behavior.  This is true Because The
> Standard Says So, no matter how illogical it may seem.

- undefined behavior is only invoked if the value is a trap representation.

>> presuming absents of any possibility of a trap representation for a
>> given implementation
> 
> That's an unreliable presumption.  As noted in the defect report, a
> trap representation can have the same bit pattern as a valid value.
> Trapness depends not just on the bit pattern, but also how the bit
> pattern was produced.  So even if there are no hardware-level traps,
> if you read an indeterminate object, a compiler is allowed to produce
> the same behavior as if there were hardware-level traps.

- that's not what is says, a pointer value may be/become a trap
representation when the object pointed to by that value is not or ceases
to be a legitimate member of the pointers type, thereby the pointer's
value is/becomes an illegitimate member of that pointers type, ergo a trap
representation.

>> (as the undefined behavior referenced in the DR is related to a pointer
>> becoming indeterminate
> 
> *Some* of the DR relates to pointers.  But reading the value of any
> object (pointer or otherwise) holding a trap representation invokes
> undefined behavior.

- agreed, and thereby objects having no legitimate trap representation,
such as most if not all implementations of integers and floating point
objects on most if not all current target machines, and thereby their
access does not invoke an undefined behavior.

Just as:

 volatile int* port = (int*)PORT_ADDRESS;

 int input = *port; supposedly invoking an undefined behavior.

is required to be well specified, effectively reading through a pointer
an un-initialized object's value, and then assigning that unspecified value
to the variable input; as otherwise the language couldn't be utilized to
write even the most hardware drivers required of all computer systems.

(however regardless, I acquiesce to those to continue to wish otherwise,
to the apparent continued destruction of the language for no apparent
particularly useful purpose by increasingly striving to render it
undefined)

Re: Signed into overflow behavior in the security context

2007-01-30 Thread Paul Schlie

> Robert wrote:
>> Paul Schlie wrote:
>> - agreed, and thereby objects having no legitimate trap representation,
>> such as most if not all implementations of integers and floating point
>> objects on most if not all current target machines, and thereby their
>> access does not invoke an undefined behavior.
> 
> First of all, trap representations of COURSE exist for floating-point
> objects, I guess you don't know fpt formats (most people don't).

- as trap representation within the context of C is a value
representation which is not defined to be a member of a type, where if
accessed or produced evokes undefined behavior; so admit as to the best of
my knowledge all potentially enclosable values for IEEE floats and doubles
are defined, it would seem trap representations don't exist in typical fp
implementations, as such an implementation would require more bits of
encoding than the type itself requires.

> But in any case, your reasoning here is once again based on the language
> you wish you had, rather than the formal semantic language defined by
> the standard, which has no notion of "no legitimate trap representation".
> 
> The standard says an uninitialized variable can have a trap
> representation. Therefor it can.

- yes that is consistent, one wouldn't want to have to think about reality.

> There is no license to reason about  how you think code is generated, any
> compiled is allowed to generate code AS IF a trap representation were present.

- yes, and thereby inconsistent with reality, and thereby wrong.
 (as may and may not are equivalent possibilities)

> I think you often miss this distinction between
> 
> generated code at the implementation level
> 
> as if behavior from the rules in the standard
>> 
>> Just as:
>> 
>>  volatile int* port = (int*)PORT_ADDRESS;
>> 
>>  int input = *port; supposedly invoking an undefined behavior.
>> 
>> is required to be well specified, effectively reading through a pointer
>> an un-initialized object's value, and then assigning that unspecified value
>> to the variable input; as otherwise the language couldn't be utilized to
>> write even the most hardware drivers required of all computer systems.
> 
> Looks unspecified to me, but in any case reasoning which says
> 
> The standard must say X, since otherwise I could not write "hardware
> drivers required of all computer systems", is bogus. There is nothing
> that says valid C can be used to write such drivers.

- I buy the value is unspecified, the semantics are not.

Re: Signed into overflow behavior in the security context

2007-01-30 Thread Paul Schlie

> Paul wrote
>> Robert wrote:
>>> Paul Schlie wrote:
>>> - agreed, and thereby objects having no legitimate trap representation,
>>> such as most if not all implementations of integers and floating point
>>> objects on most if not all current target machines, and thereby their
>>> access does not invoke an undefined behavior.
>> 
>> First of all, trap representations of COURSE exist for floating-point
>> objects, I guess you don't know fpt formats (most people don't).
> 
> - as trap representation within the context of C is a value
> representation which is not defined to be a member of a type, where if
> accessed or produced evokes undefined behavior; so admit as to the best of my
> knowledge all potentially enclosable values for IEEE floats and doubles are
> defined, it would seem trap representations don't exist in typical fp
> implementations, as such an implementation would require more bits of encoding
> than the type itself requires.

- admittedly, SNaN values may be considered as such; however ints would
appear to have no counterpart.

Re: Signed int overflow behavior in the security context

2007-01-30 Thread Paul Schlie

> Paul Jarc wrote:
>> Paul Schlie wrote:
>> is required to be well specified [...] as otherwise the language
>> couldn't be utilized to write even the most hardware drivers
>> required of all computer systems.
>
> In a sense, the language *can't* be used to write most hardware
> drivers.  Drivers do invoke undefined behavior - that is, the standard
> makes no guarantees about their behavior - but the particular platform
> they are targeted for makes its own guarantees, so the code is still
> useful, e

The root of this discussion was based on whether or not GCC's relatively
aggressive assumption that an undefined behavior gave it the reasonable
and useful right to presume that any expression which may be interpreted
as having undefined semantics may be presumed to either mystically never
or always occur depending on it's whim, regardless of practical reality.

Overall, it would seem there should be a more practical and consistent basis
applied.

Re: RFC: Plan for cleaning up the "Addressing Modes" macros

2005-02-28 Thread Paul Schlie

> Zack Weinberg 
> The target macros described in the "Addressing Modes" section of the
> internals manual are rather badly in need of cleaning up.  I see three
> primary reasons why this is so:

- Very Nice; and wonder, although somewhat orthogonal, if it would be
  similarly desirable to clean up GCC's type mode definitions a little,
  thereby enabling their declared use by the various targets to be more
  consistently conditionally utilized by GCC's built-in data and operator
  definitions than they are presently? (for example by unwindxx and libgcc)

  A few other things which would seem possibly nice to be refined include:

  - being able to properly denote/estimate the cost of naked (set src dst)

  - generalizing (set ...) to include an size field, as opposed to
utilizing an odd implied definition for block moves to make it more
consistent with the rest of the operators i.e. (set dst src [size])?
(as an explicit alignment field would seem unnecessary as it could be
 implied by the mode of the src and/or destination operands, which
 need not be BLK)

  - folding machmode.def and mode-classes.def etc. into machmode.h
(and simply conditionally defining things as may be necessary)?

  - very minor nit, but MODE_RANDOM seems like an odd name for a mode class,
as opposed to MODE_ANY example?

Re: RFC: Plan for cleaning up the "Addressing Modes" macros

2005-02-28 Thread Paul Schlie

An explicit Addressing Mode or mechanism to enable the identification
of "read-only" rtl static data references, there by enabling uC, and/or
DSP's with Harvard memory architectures, which typically require the use
of specialized load instructions to access it's ROM based program memory
space, to efficiently access such data without needing to copy it in
bulk to the machines typically sparse data memory for use; would be
very helpful for such targets.

(as it's not clear how this may be done presently?)


> From: Paul Schlie <[EMAIL PROTECTED]>
>> Zack Weinberg 
>> The target macros described in the "Addressing Modes" section of the
>> internals manual are rather badly in need of cleaning up.  I see three
>> primary reasons why this is so:
> 
> - Very Nice; and wonder, although somewhat orthogonal, if it would be
>   similarly desirable to clean up GCC's type mode definitions a little,
>   thereby enabling their declared use by the various targets to be more
>   consistently conditionally utilized by GCC's built-in data and operator
>   definitions than they are presently? (for example by unwindxx and libgcc)
> 
>   A few other things which would seem possibly nice to be refined include:
> 
>   - being able to properly denote/estimate the cost of naked (set dst src)
> 
>   - generalizing (set ...) to include an size field, as opposed to
> utilizing an odd implied definition for block moves to make it more
> consistent with the rest of the operators i.e. (set dst src [size])?
> (as an explicit alignment field would seem unnecessary as it could be
>  implied by the mode of the src and/or destination operands, which
>  need not be BLK)
> 
>   - folding machmode.def and mode-classes.def etc. into machmode.h
> (and simply conditionally defining things as may be necessary)?
> 
>   - very minor nit, but MODE_RANDOM seems like an odd name for a mode class,
> as opposed to MODE_ANY example?
>

Re: Different sized data and code pointers

2005-03-01 Thread Paul Schlie

> Thomas Gill wrote:
> I'm working on a GCC backend for a small embedded processor. We've got a
> Harvard architecture with 16 bit data addresses and 24 bit code
> addresses. How well does GCC support having different sized pointers for
> this sort of thing? The macros POINTER_SIZE and Pmode seem to suggest that
> there's one pointer size for everything.
>
> The backend that I've inherited gets most of the way with some really
> horrible hacks, but it would be nice if those hacks weren't necessary. In
> any case, the hacks don't cope with casting function pointers to integers.

With the arguable exception of function pointers (which need not be literal
address) all pointers are presumed to point to data, not code; therefore
may be simplest to define pointers as being 16-bits, and call functions
indirectly through a lookup table constructed at link time from program
memory, assuming it's readable via some mechanism; as the call penalty
incurred would likely be insignificant relative to the potential complexity
of attempting to support 24-bit code pointers in the rare circumstances
they're typically used, on an otherwise native 16-bit machine.

(and just as a heads up, there seems to be no exiting mechanism to enable
 the convenient access of static constant data stored in an orthogonal
 address space relative to read-write data memory; although suspect one
 could implement a scheme in which every address is discriminated at
 run-time based on some address range split, but likely not worth the
 run-time overhead to do so, but should work if desperate to conserve ram)

Re: Extension compatibility policy

2005-03-01 Thread Paul Schlie

> Joseph S. Myers writes:
> How about calling decl_attributes from fname_decl so a target
> insert_attributes hook can add attributes to __func__?  Would that suffice
> to solve your problem?

Might it be possible to alternatively add an attribute symbol hook so that a
target may easily define an arbitrary target specific named attribute which
may be utilized without having to patch the parser, etc. to do so?

Thereby one could easily define a ROM and/or PMEM attribute hypothetically
for not only __FUNCTION__, but any arbitrary declared type or parameter
declaration, preserved through to the back end to aid in target specific
code generation and/or memory allocation?

PMEM __FUNCTION__ 

ROM static const x[] = "some string"

char y[] = ROM "some string"

struct {int a; int b;} z = PMEM {5312, 3421};

For example? (with a little luck this could kill two bird with one stone)

Re: Extension compatibility policy

2005-03-01 Thread Paul Schlie

> From: "Joseph S. Myers" <[EMAIL PROTECTED]>
>> On Tue, 1 Mar 2005, Paul Schlie wrote:
>> Might it be possible to alternatively add an attribute symbol hook so that a
>> target may easily define an arbitrary target specific named attribute which
>> may be utilized without having to patch the parser, etc. to do so?
>> 
>> Thereby one could easily define a ROM and/or PMEM attribute hypothetically
>> for not only __FUNCTION__, but any arbitrary declared type or parameter
>> declaration, preserved through to the back end to aid in target specific
>> code generation and/or memory allocation?
> 
> The insert_attributes hook *already exists*.  It can insert arbitrary
> attributes on arbitrary declarations completely under target control.
> You can have target command-line options to control what it does.  You can
> have target pragmas that control what it does.  And of course source code
> can explicitly use attributes in any place documented in "Attribute
> Syntax".  I was simply suggesting filling a lacuna by applying this hook
> to implicit __func__ declarations as well as to explicit declarations.

- Got it, I think. Sorry for being dense. So in summary:

  - an attribute may be defined, such as:

#define ROM __attribute__("ROM");

  - and used following the above referenced ""Attribute Syntax", as
either a variable, or function parameter declaration/implementation:

  int ROM x = 3;

  int foo (int ROM y)

where the parameter's attribute is visible when ever that parameter
is used as an operand within the function tree, and correspondingly
during rtl/template matching; and further if there's an attribute
mismatch between the function argument and it's parameter, the
compiler will warn? (is there any way to force an error instead?)

And just to double check with respect to your other comments:

>> char y[] = ROM "some string"
>
> We know that being able to control sections of string constants is
> desirable ... it may be best not to allow attributes on individual strings,
> only a strings_section attribute to control the section of strings within
> an object or function definition.

- understood.

>> struct {int a; int b;} z = PMEM {5312, 3421};
> 
> This syntax makes even less sense.  A brace-enclosed initializer is not an
> object!  If z has static storage duration, put the attribute on z.  If it
> doesn't, how it is initialized is a matter of compiler optimization and
> specifying attributes of where a copy of the initializer might go doesn't
> seem to make sense.

- except that when GCC treats it as a static constant value, accessed during
  run-time to initialized the declared variable, just as strings and arrays
  are;  it must also have the same attribute the back end is relying on
  to identify such references as needing to be accessed differently than
  references to other variables are. (so suspect it would be appropriate to
  be able to define programmatically an attribute which may be attached to
  all such references to initializing data not just strings if not optimized
  away, although agree it's not necessary to specify each individually)


Thanks again, and apologize for my confusion.

-paul-

Re: Extension compatibility policy

2005-03-02 Thread Paul Schlie

> From: "Joseph S. Myers" <[EMAIL PROTECTED]>
>>   - and used following the above referenced ""Attribute Syntax", as
>> either a variable, or function parameter declaration/implementation:
>> 
>>   int ROM x = 3;
>> 
>>   int foo (int ROM y)
>> 
>> where the parameter's attribute is visible when ever that parameter
>> is used as an operand within the function tree, and correspondingly
>> during rtl/template matching; and further if there's an attribute
>> mismatch between the function argument and it's parameter, the
>> compiler will warn? (is there any way to force an error instead?)
> ...
> It doesn't seem to me that you've ever implemented target attributes.
> When I last rewrote the attribute handling interfaces and so much of the
> target attribute handling I tried to make sure the interfaces were general
> enough to allow targets to define attributes how they like.  You need to
> make serious experiments with implementing the semantics you want within
> your chosen back ends, looking at all existing back ends for examples of
> what can be done, so questions can be based on real experience rather than
> general lack of understanding.

- This is true, and will invest in experimenting with their implementation.

>> - except that when GCC treats it as a static constant value, accessed during
>>   run-time to initialized the declared variable, just as strings and arrays
>>   are;  it must also have the same attribute the back end is relying on
>>   to identify such references as needing to be accessed differently than
>>   references to other variables are. (so suspect it would be appropriate to
>>   be able to define programmatically an attribute which may be attached to
>>   all such references to initializing data not just strings if not optimized
>>   away, although agree it's not necessary to specify each individually)
> 
> Please write in proper complete coherent English sentences so your
> messages can be understood.

- I'll try again.

> It is a matter for the compiler to determine where it stores data which is
> copied into an object of automatic storage duration as an initializer.

- Yes I understand, and simply assert that when this does occur, they should
  be given the same optional attribute that the analogous compiler generated
  static constant string objects are given; for the same reasons and
  purpose.

> It may be best to emit explicit initialization code; it may be best to
> copy an initialization image; it may be best to call memset and then emit
> initialization code for a few nonzero values.  The compiler can choose
> where that initialization image goes.

- Yes I understand, and further assert that any compiler generated static
  constant objects accessed by reference (as opposed to being embedded in
  the program code itself as either immediate or otherwise generate values)
  for the purpose of run-time initialization of program specified objects,
  should be given the same optional attributes that analogous compiler
  generated static constant string objects are given; for the same reasons
  and purpose.

>...  This may depend on command-line
> options controlling whether code size, or amount of one sort of data, or
> amount of another sort of data, or speed, should be optimized.  But
> control for individual initializers is not something that makes sense at
> source file level any more than controlling which machine instruction is
> used for a given piece of C code makes sense; if you want to control
> code-generation strategies at that fine a level, write in assembly
> language directly.  (Or provide a static const initialization image with
> the right attributes, then write your own C code to copy it rather than
> using an initializer for the automatic storage duration variable.)

- I fully agree that no finer level of attribute control need be provided
  than deemed appropriate for compiler generated static constant string
  objects; but assert the same attributes should be applied to all analogous
  compiler generated data, regardless of it's type; for the same reasons and
  purpose. (As it's these attributes which has been chosen to identify such
  compiler generated stored data objects.)

More specifically, there seem to be two predominant motivating reasons
why it may be desired to attach a target specified attributes to compiler
generated static constant objects:

1 - To enable their identification so that their compile/link time storage
location/type may be influenced by the target.

2 - To enable their identification so that their run-time access method may
be influenced by the target. (Likely due to their storage location/type)

As such, it is imperative that all compiler generated objects, for which
allocated storage is required (beyond alternatively being embedded in the
code as immediate, or otherwise generated values), be enabled to be assigned
the same optional programmatically specified

Re: Extension compatibility policy

2005-03-02 Thread Paul Schlie

> From: "E. Weddington" <[EMAIL PROTECTED]>
>> Paul Schlie wrote:
>> More specifically, there seem to be two predominant motivating reasons
>> why it may be desired to attach a target specified attributes to compiler
>> generated static constant objects:
>> 
>> 1 - To enable their identification so that their compile/link time storage
>>location/type may be influenced by the target.
>> 
>> 2 - To enable their identification so that their run-time access method may
>>be influenced by the target. (Likely due to their storage location/type)
>> 
> I know that you are doing this for the AVR target.

- Not necessarily, but do have an interest in improving it for personal use.

> You are under the assumption that it is acceptable to attach attributes
> to static constant objects to put data in Program Space for the AVR target.

- Yes. (and in general yes.)

> As Joseph pointed out to you in bug #20258,
> <http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20258>
> implementing DTR 18037 is the best way to go about this.

- Yes, and by utilizing the existing mechanism to define and utilize
  target specific attributes, this prescribed method may be prototyped;
  allowing me to better understand the issues, methods, and potential
  complications than I do presently.
  
> You consistently do not check with the official AVR maintainers (Denis
> Chertykov and Marek Michalkiewicz), or on the avr-gcc-list
> <http://savannah.nongnu.org/mail/?group=avr> to see if any of this will
> fly with the rest of the AVR community.

- ??? For what purpose? My short term goal is to simply try to determine
  if these facilities may be utilized to achieve a close approximation of
  a general desire to be able to identify and affect the target specific
  method by which static constant objects (which tend to be allocated in
  ROM within an orthogonal address space on embedded targets with Harvard
  architectures like the AVR uC, and would be nice to avoid having to
  redundantly copy them into data space just to then enable their use).

  Where if successful, and with a better understanding; then discussions
  may be worth while if there's potential interest in adopting the approach,
  or some variant of it, for incorporation into the maintained avr port.

  (it seems to me)

Re: Different sized data and code pointers

2005-03-03 Thread Paul Schlie

>>>> Paul Schlie wrote:
>>>> If the program's address space pointer is more accurately implemented
>>>> as a 16-bit pointer combined with an 8-bit segment address; wonder if
>>>> it may be worth your while to take a look at how the old segmented x86
>>>> GCC targets treat segmented addresses?
>>>
>>> Thomas Gill wrote:
>>> Hmmm.. it's possible. Where can I find that?
>>
>> the target ports are in gcc/config/...
> 
> Sure, I mean which target should I be looking at?

Was thinking x386, but it seems to use the same scheme for data or program
addresses; candidly, I believe building a jump table for function-pointer
calls just like interrupt vectors may be your best bet; as it will only add
a few cycles per call through a data pointer, and may even be eliminated if
the code never calls a function pointer (which most code never does).

Re: Extension compatibility policy

2005-03-03 Thread Paul Schlie

> Bernardo Innocenti wrote:
>> Joseph S. Myers wrote:
>>> On Wed, 2 Mar 2005, Bernardo Innocenti wrote:
>>> To move strings into program memory, there's a macro like this:
>>>
>>> #define PSTR(s) ({ static const char __c[] PROGMEM = (s); &__c[0]; })
>>>
>>> But this wouldn't work because __func__ does not work like a string literal:
>>>
>>> #define TRACEMSG(msg,...) __tracemsg(PSTR(__func__), msg, ## __VA_ARGS__)
>>>
>>> C99's __func__ is supposed to work as if a "const char __func__[]".
>>> The __FUNCTION__ extension could instead be made to work like a
>>> string literal.   We could live without string pasting capabilities
>>> if it helps keeping the interface between cpp and the C frontend
>>> cleaner.
>>
>> How about calling decl_attributes from fname_decl so a target
>> insert_attributes hook can add attributes to __func__?
>> Would that suffice to solve your problem?
>>
> It would be an excellent solution for myself, but portable code
> might not expect __func__ to be stored in program memory.  This isn't
> as transparent a a .rodata section.

After having the chance to experiment a little, it would seem most ideal in
the short term to enable GCC to add an explicit target specific attribute to
the effective implied __FUNCTION__ declaration; in AVR's case for example:

  #define ROM __attribute__((__progmem__)) /* an avr attribute */

  something (ROM __FUNCTION__);

Thereby effectively implying the local declaration/use of __FUNCTION__ as:

  ROM static const char __FUNCTION__[4] = "foo";
  something (__FUNCTION__);

Which would enable the backward compatible addition of target specific
attributes to the implied declaration of __FUNCTION__, enabling avr's
backend to place it in progmem, and be explicitly accessed using PSTR()
macros.

Where in the longer term, if target specific attributes are properly
retained through the compilation process, the backend can automatically
reference objects stored in progmem properly without need of an explicit
macro to do so.

(Although I know there's concern about enabling fine grain specification
 of attributes for implied static const initializer objects; but now believe
 that both both fine grain as well as having the ability to define a target
 specific attribute in bulk for all implied static const initializer objects
 as may be most appropriate in different circumstances; a more detailed
 analysis/summary will follow.)

Re: Target specific memory attributes from RTL

2005-03-03 Thread Paul Schlie

(I and likely others would also be grateful for further insight)

As to enable the efficient use of non-Von-Newman memory architectures
typical of many vector/signal processors, and smaller microcontrollers;
it seems imperative that target specific attributes assigned to an object
remain correct throughout the compilation process to enable the backend to
properly identify which memory space a referenced object is allocated
within, so that it may be correspondingly accessed appropriately.

But as you have observed, and Jim Wilson has noted, it seems that object
attributes may either be lost, or not fully accessible beyond one level of
indirection by the time rtl/template matching begins. This would would seem
to seriously prohibit GCC being able to reliably track an object's allocated
location through the use of target specific attributes?

Does anyone have any further insight with respect to this potential problem?

Referencing your and Jim's earlier messages:

 http://gcc.gnu.org/ml/gcc/2005-02/msg00456.html

> Balaji S writes:
> As suggested in references, i'm able to get the target specific attributes
> from RTL in the following cases:
> 1. Through SYMBOL_REF_FLAGS and if it is a SYMBOL_REF RTL (symbol_ref: ...),
>direct variable
> 2. Through MEM_EXPR of MEM if its operand is not a SYMBOL_REF and a REG
>(mem:HI (reg:...)), indirect variable access
> 3. Through REG_EXPR of REG of a MEM and memory access is a first level pointer
>reference (mem:HI (reg:...)), indirect memory access
>
> But, i'm not able to access target specific attribute from second level of
> indirection onwards.
>
> For example,
> int **gpi ;
>
> variable_decl of 'gpi' is available during access of 'gpi' and '*gpi' and not
> during '**gpi'.
>
> I have looked at existing ports for similar implementation of memory
> attributes but failed to find one.
>
> Please point me a good resource for handling memory attributes.

Re: Extension compatibility policy

2005-03-04 Thread Paul Schlie

> From: "E. Weddington" <[EMAIL PROTECTED]>
>> Paul Schlie wrote:
>> After having the chance to experiment a little, it would seem most ideal in
>> the short term to enable GCC to add an explicit target specific attribute to
>> the effective implied __FUNCTION__ declaration; in AVR's case for example:
>> 
>>  #define ROM __attribute__((__progmem__)) /* an avr attribute */
>>  
>> 
> This is unnecessary as you should already know; avr-libc already
> #defines PROGMEM as the attribute specified above.

- GCC's avr port is not dependant on avr-libc; do you perceive it is?

>>  something (ROM __FUNCTION__);
>> 
>> Thereby effectively implying the local declaration/use of __FUNCTION__ as:
>> 
>>  ROM static const char __FUNCTION__[4] = "foo";
>>  something (__FUNCTION__);
>> 
>> Which would enable the backward compatible addition of target specific
>> attributes to the implied declaration of __FUNCTION__, enabling avr's
>> backend to place it in progmem, and be explicitly accessed using PSTR()
>> macros.
> 
> You're lost. You really don't know what the PSTR() macro does in
> avr-libc, do you?

- I stand corrected, the proposed functionality obsoletes avr-libc's
  declared PSTR macro in this instance.

>> (Although I know there's concern about enabling fine grain specification
>> of attributes for implied static const initializer objects; but now believe
>> that both both fine grain as well as having the ability to define a target
>> specific attribute in bulk for all implied static const initializer objects
>> as may be most appropriate in different circumstances
>> 
> No, Joseph Myers already told you that overloading "static const" as
> defining the attribute for objects in different address spaces is NOT
> the way to do it. There should not be an implied way of doing this that
> overloads C keywords. It should be explicit.

- Overloading what? With hopefully a little more though on your part,
  it may have become apparent to you that this is not what is proposed.
  What is being proposed to enable the ability to explicitly add a target
  defined attribute to the implied declaration of __FUNCTION__, as it may
  be added to any other explicit declaration; as it seems be the most
  flexible way to enable target specific allocation of the static const
  data, thereby offering the same level of flexibility that the PSTR (),
  previously had in being able to influence the allocation of the
  __FUNCTION__ string.

> This issue has already been talked to death. But, hey, everybody is
> waiting for a patch from you, Paul.

- wow; the adage of shouldn't throw stones when living in a glass house
  comes to mind.

Re: [OT] __builtin_cpow((0,0),(0,0))

2005-03-08 Thread Paul Schlie

> Ronny Peine 
>
> Maybe i found something:
>
> http://www.cs.berkeley.edu/~wkahan/ieee754status/ieee754.ps
> page 9 says:
>
> "A number of real expressions are sometimes implemented as INVALID
> by mistake, or declared Undefined by illconsidered
> language standards; a few examples are ...
> 0.0**0.0 = inf**0.0 = NaN**0.0 = 1.0, not Nan;"
> 
> I'm not really sure if he means that it should be 1.0 or it should be NaN
> but i think he means 1.0.

It seems like an acknowledgement that run-time generated Nan are much less
useful than those which would otherwise result from the interpretation that
+/-0 were more conveniently and consistently considered equivalent to their
reciprocal +/-inf counterparts, and visa-versa; implying among other things:

0/0 == (x/inf)/(x/inf) == inf/inf == x/x == x^0 == 1

And further observing that Nan real valued results are also less useful
than simply returning the real part of an otherwise complex valued result
i.e. sqrt(-1) == 0, vs Nan; just as an assignment of a complex value to a
real valued variable would simply return real the valued component, not Nan.
(seemingly enabling most if not all uses of Nan's to be largely eliminated).

But acknowledge that for such an interpretation to be cleanly consistent
it's value representation implementation should be symmetric, thereby the
reciprocal of every representable value x should have a corresponding
representable value y such that x ~ 1/y inclusive of +/- 0 and +/- inf at
the representation's limits; thereby also enabling the correction of the
inconsistent interpretation that -0 == +0, as clearly +1/0 == +inf and
-1/0 == -inf, therefore not equivalent, although +0 == |+/- 0| would be.

Unfortunately however, it's not clear that the industry's vested interest
in preserving the legitimacy of present product implementations will allow
the arguably misguided introduction of Nan to be largely corrected.

Re: __builtin_cpow((0,0),(0,0))

2005-03-11 Thread Paul Schlie

> Gabriel Dos Reis wrote:
> You probably noticed that in the polynomial expansion, you are using
> an integer power -- which everybody agrees on yield 1 at the limit.
>
> I'm tlaking about 0^0, when you look at the limit of function x^y

Out of curiosity, on what basis can one conclude:

 lim{|x|==|y|->0} x^y :: lim{|x|==|y|->0} (exp (* x (log y))) != 1 ?

As although it's logarithmic decomposition may yield intermediate complex
values, and may diverge prior to converging as they approach their limit,
it seems fairly obvious that the expression converges to the value of 1
about the limit of 0; as although it may be argued that the (log 0) is
undefined (it more accurately -> -inf), but does so at an exponentially
slower rate than it's operand, i.e.: lim{|x|==|y|->0} (* x (log y)) = 0,
thereby lim{|x|==|y|->0} (exp (* x (log y))) = (exp 0) = 1; it would seem?

Re: __builtin_cpow((0,0),(0,0))

2005-03-12 Thread Paul Schlie

> From: Gabriel Dos Reis <[EMAIL PROTECTED]>
>> Paul Schlie <[EMAIL PROTECTED]> writes:
> | > Gabriel Dos Reis wrote:
> | > You probably noticed that in the polynomial expansion, you are using
> | > an integer power -- which everybody agrees on yield 1 at the limit.
> | >
> | > I'm tlaking about 0^0, when you look at the limit of function x^y
> | 
> | Out of curiosity, on what basis can one conclude:
> | 
> |  lim{|x|==|y|->0} x^y :: lim{|x|==|y|->0} (exp (* x (log y))) != 1 ?
> 
> The issue is not whether the limit of x^x, as x approaches 0, is 1 not.
> We all, mathematically, agree on that.
> 
> The issue is whether the *bivariate* function x^y has a defined limit
> at (0,0).  And the answer is unambiguously no.
> Checking just one path does NOT suffice to assrt that the limit
> exists. (However, that might suffice to assert that a limit does not
> exist). 
> 
> I'm deeply burried somewhere in the middle-west deserts and I have no
> much reliable connection, so I'll point you to the message
> 
> http://gcc.gnu.org/ml/gcc/2005-03/msg00469.html
> 
> where I've tried to taint this discussion with some realities from what
> standard bodies think on the 0^0 arithmetic, and conterexample you can
> check by yourself.
> 
> | As although it's logarithmic decomposition may yield intermediate complex
> | values, and may diverge prior to converging as they approach their limit,
> | it seems fairly obvious that the expression converges to the value of 1
> 
> You've transmuted the function x^y to the function x^x which is a
> different beast.  Existing of limit of the latter does not imply
> existance of limit of the former.  Again check the counterexamples in
> the message I referred to above.

Thank you. In essence, I've intentionally defined the question of x^y's
value about x=y->0 as a constrained "bivariate" function, to where only
the direction, not the relative rate of the argument's paths are ambiguous,
as I believe that when the numerical representation system has no provision
to express their relative rates of convergence, they should be assumed to be
equivalent; as the question of a functions value about any static point such
as (0,0) or (2,4) etc., is invalid unless that point is well defined within
it's arguments path; where if it is, then the constrained representation is
equally valid, but not otherwise (as nor is the question).

Therefore in other words, the question of an arbitrary function's value
about an arbitrary static point is just that, it's not a question about a
functions value about an arbitrary point which may or may not be intersected
by another function further constraining it's arguments.

Therefore the counter argument observing that x^y is ambiguous if further
constrained by y = k/ln(x), is essentially irrelevant; as the question is
what's the value of x^y, with no provision to express further constraints
on it's arguments. Just as the value of (x + y) if further constrained by
y = x, about the point (1,2) would be both ambiguous and an irrelevant to
the defined value of (x + y) about (1,2).

I believe things are being confused by a misinterpretation of the meaning
of what a limit about an infinite boundary truly means; as although most
understand that lim{x->1; y->2} implies convergence about the static valid
point (1,2) where it would be obvious that if x and y were further
constrained such that (1,2) were invalid, then so too would be the question;
just as lim{x->0; y->0} should be equally treated.  But it's being abused by
those who don't understand that just because all of a function's arguments
may approach a given set of values eventually, if they do not
simultaneously, then that set of values does not lie in the function's path,
therefore irrelevant; just as (0,0) does not lie in y = k/ln(x)'s path,
therefore an invalidating simultaneous constraint; as otherwise it would be
valid to argue that (0,0) lies on y = x, y = x + 1, y = x + 2, ...
simultaneously, which is more obviously false, making it more apparent that
it's important to differentiate the static points they imply, from the
infinite boundaries they simultaneously abstractly represent in the form of
0 ~ lim{->0} :: lim{->1/inf}, and inf ~ lim{->inf} :: lim{->1/->0}.

Very long story short, it seems clear that:

  f(a,b) :: lim{v->1/inf) f(a+/-v,b+/-v)

about any static point, when defined independently of any other arbitrary
constraints.

Re: __builtin_cpow((0,0),(0,0))

2005-03-12 Thread Paul Schlie

> From: Gabriel Dos Reis <[EMAIL PROTECTED]>
> |Paul Schlie <[EMAIL PROTECTED]> writes:
> | Thank you. In essence, I've intentionally defined the question of x^y's
> | value about x=y->0 as a constrained "bivariate" function, to where only
> | the direction, not the relative rate of the argument's paths are ambiguous,
> | as I believe that when the numerical representation system has no provision
> | to express their relative rates of convergence, they should be assumed to be
> | equivalent;
> 
> You're seriously mistaken.  In lack of any further knowledge, one should not
> assume anything particular.  Which is reflected in LIA-2's rationale.
> You just don't know anything about the rate of the arguments.

I guess I'd simply contend that the value of a function about any point
in the absents of further formal constraints should be assumed to represent
it's static value about that point i.e. lim{|v|->1/inf) f(x+v, y+v, ...)

And reserve the obligation for applications requiring the calculation
of formally parameterized multi-variate functions at boundary limits to
themselves; rather than burdening either uses of such functions with
arguably less useful Nan results.

But understand, that regardless of my own opinion; it's likely more
important that a function produces predicable results, regardless of
their usefulness on occasion. (which is the obligation of the committees
to hopefully decide well)

Re: __builtin_cpow((0,0),(0,0))

2005-03-13 Thread Paul Schlie

> From: Gabriel Dos Reis <[EMAIL PROTECTED]>
>> Paul Schlie <[EMAIL PROTECTED]> writes:
> | > From: Gabriel Dos Reis <[EMAIL PROTECTED]>
> | > |Paul Schlie <[EMAIL PROTECTED]> writes:
> | > | Thank you. In essence, I've intentionally defined the question of x^y's
> | > | value about x=y->0 as a constrained "bivariate" function, to where only
> | > | the direction, not the relative rate of the argument's paths are
> | > | ambiguous, as I believe that when the numerical representation system
> | > | has no provision to express their relative rates of convergence, they
> | > | should be assumed to be equivalent;
> | > 
> | > You're seriously mistaken.  In lack of any further knowledge, one should
> | > not assume anything particular.  Which is reflected in LIA-2's rationale.
> | > You just don't know anything about the rate of the arguments.
> | 
> | I guess I'd simply contend that the value of a function about any point
> | in the absents of further formal constraints should be assumed to represent
> | it's static value about that point i.e. lim{|v|->1/inf) f(x+v, y+v, ...)
> 
> That is menaingless.
> 
> A floating point system is a projection on a discrete base set, as a
> consequence when you compute a value, you almost always don't get an
> element in that set: You need to make projection.  Consistency predictable

- What is meaningless? lim{|v|->1/inf) f(x+v, y+v, ...) isn't meant to
  necessarily be literally computed, but only abstractly express a limit
  about a uniformly converging point, as a proposed generally useful and
  predictable basis of a functions value definition; as opposed to assuming
  that if a function's arguments about some point are subject to some very
  specific but non-specifiable set of constraints which yield an ambiguity,
  it's value is deemed generally ambiguous, and appropriate to return a Nan
  result for the remaining infinite-1 set of conditions where it's otherwise
  reasonably well defined at that limit (which seems counterproductive).

> | And reserve the obligation for applications requiring the calculation
> | of formally parameterized multi-variate functions at boundary limits to
> | themselves; rather than burdening either uses of such functions with
> | arguably less useful Nan results.
> 
> But that is nnot 

- We simply disagree. As I perceive Nan run-time results to be about as
  useful as an "I don't know" response to a question which demands an
  answer, even if only the most typically useful one. (Where if a more
  accurate situation specific results are required, I perceive it as the
  application's responsibility to provision, thereby not burdening either
  with run-time "I don't know" responses.

> | But understand, that regardless of my own opinion; it's likely more
> | important that a function produces predicable results, regardless of
> | their usefulness on occasion. (which is the obligation of the committees
> | to hopefully decide well)

Re: short int and conversions

2005-03-17 Thread Paul Schlie

> I'm trying to port gcc 4.1 for an architecture that has the following
> memory layout BITS_PER_UNIT=32 and UNITS_PER_WORD=1.
> It has support (16bit registers and operators) for 16bit signed
> atithmetic used mainly for addressing. There are also operators for 32
> bit integer and floating point support.
> I set SHORT_TYPE_SIZE=POINTER_SIZE=(BITS_PER_WORD/2).
>  
> I reserved QImode and QFmode for 32 bit integer/floating point operations.
> And I defined a new fractional int mode FRACTIONAL_INT_MODE (PQ, 16, 1) for
> pointers and short int operations.
> When I try to compile a very simple program with short int xgcc
> segments for stack overflow because it calls recursively
> #32 0x0806dd6d in convert (type=0xb7c7b288, expr=0xb7c88090) at
> ../../gcc/c-convert.c:95
> #33 0x08160626 in convert_to_integer (type=0xb7c7b288,
> expr=0xb7c88090) at ../../gcc/convert.c:442
> 
> I presume it tries to convert a small precision mode in something
> bigger but I cannot understand why.
> This is the first time I try to port gcc, so I don't know if my
> assumptions are reasonable or not.

With the caveat that I've never boot-strapped a port myself:

- "unit" tends to be an acronym for char, as is QI mode for < 16-bit chars.
- "word" tends to be an acronym for int/void*, typically represented as HI
  (16-bit) or SI (32-bit) mode operands, and typically the natural size of
  a memory access, although not necessarily.
- correspondingly, 32-bit float operands tend to be represented as SF mode
  operands.

(Q = quarter, H = half, S = single, D = Double) (I = integer, F = float)

So in rough summary, the following may be reasonable choices (given your
machine's apparent support of 16-bit and possibly lesser sized operations):

  bits  mode  ~type
     -
   8 QI   char/short (which can be emulated if necessary)
  16 HI   char/short/int/void*
  16 HF   (target-specific-float)
  32 SI   int/void*/long
  32 SF   float
  64 DI   long/void*/long-long/
  64 DF   double

Also as a generalization, it's likely wise not to try modeling a port after
the c4x, as it's implementation seems at best very odd. (alternatively, a
better model may be one of the supported 16/32 bit targets, depending on
your machine's architecture.)

best of luck.

Re: AVR: CC0 to CCmode conversion

2005-03-18 Thread Paul Schlie

> Denis wrote:
> I have converted the AVR port from CC0 to CCmode.
> But may be I have converted the port in wrong way.
> (It's because I was interested in *this* way.)
> 
> I have used CCmode register and havn't added the
> '(clobber (reg:QI CC_REGNUM))' to any insn that really clobber the
> CC_REGNUM just because AVR is'n needed in scheduling.
> I think that sequence of compare + cond-jump will exists in any
> compiler pass.
> The port was successfully tested without new regressions.
> What do you (MAINTAINERS) think about this ?

Interesting:

- might you be able to post the resulting port files for review?

- are you proposing that all conditional branches then required to be
  explicitly paired with a corresponding immediately previous compare
  instruction?

  (if so, how is this a good thing observing that it's fairly typical
  for most conditional branches to be naturally based on comparisons
  against 0 resulting from the immediately preceding operation, which
  would have otherwise not required an explicit compare?)

thanks

Re: Suggestion for a fix to Bug middle-end/20177

2005-03-18 Thread Paul Schlie

> Steven Bosscher wrote:
>> Mostafa Hagog <[EMAIL PROTECTED]> wrote:
>> This is interesting, so there could be cases were want to copy CC
>> register when doing SMS.  what happens if we want to move the set
>> of a CC to another iteration of the loop ? or the use of the CC ?  but
>> usually this is couldn't happen in a simple loop, right? the use of CC
>> is eventually used in a branch, or there is something that I am missing ?
>
> IIRC these notes are for CCO, and you have to move the CC setter
> and user together.

- unless it can be guaranteed that the particular setter's cc, will be
  preserved (i.e. not corrupted by successive operations) prior to it's
  ultimate use; or alternatively regenerating the setter's cc typically
  by comparing it's previously computed data result with 0 (to regenerate
  the cc side effect, not necessarily the operation), if the user is moved
  past potentially corrupting successive operations; it would seem?

> Actually I think SMS for CC0 targets is Just Silly to do at all ;-)

- with the exception that it should be useful to eliminate otherwise
  unnecessary explicit comparison operations if an otherwise required
  operation (with the the desired setter cc side effects) can be more
  optimally scheduled immediately prior to a user of it's cc side
  effect (as is often typically the case when a conditional branch is
  dependant on a previously computed result being compared against 0).

  (but agree that as cc0 targets tend to be lightly pipelined in-order
   issue and completion machines, they tend to not be highly sensitive to
   instruction ordering; with the exception of conditional branching code
   does typically benefit from ideally scheduled sequences which do not
   require the re-synthesis of a cc through the use of an explicit
   comparison (or likely worse, a saved cc register) operation; which a
   good schedule would tend to likely avoid if possible; I'd guess.)

Re: AVR: CC0 to CCmode conversion

2005-03-18 Thread Paul Schlie

> From: Denis Chertykov <[EMAIL PROTECTED]>
>> Paul Schlie <[EMAIL PROTECTED]> writes:
>>> Denis wrote:
>>> I have converted the AVR port from CC0 to CCmode.
>>> But may be I have converted the port in wrong way.
>>> (It's because I was interested in *this* way.)
>>> 
>>> I have used CCmode register and havn't added the
>>> '(clobber (reg:QI CC_REGNUM))' to any insn that really clobber the
>>> CC_REGNUM just because AVR is'n needed in scheduling.
>>> I think that sequence of compare + cond-jump will exists in any
>>> compiler pass.
>>> The port was successfully tested without new regressions.
>>> What do you (MAINTAINERS) think about this ?
>> 
>> Interesting:
>> 
>> - might you be able to post the resulting port files for review?
> 
> patch against cvs:
> http://home.overta.ru/users/denisc/cc0-ccmode/cc0-ccmode.patch.gz
> new port:
> http://home.overta.ru/users/denisc/cc0-ccmode/avr.tgz

- Thank you, I've had the chance to review it to better understand.

>> - are you proposing that all conditional branches then required to be
>>   explicitly paired with a corresponding immediately previous compare
>>   instruction?
> 
> I founded that GCC isn't break cmp+jump sequences.
> (My port havn't scheduling.)

- Maybe presently, but there's nothing in the machine description which
  would seem to prohibit it either. (continued in following section)

>>   (if so, how is this a good thing observing that it's fairly typical
>>   for most conditional branches to be naturally based on comparisons
>>   against 0 resulting from the immediately preceding operation, which
>>   would have otherwise not required an explicit compare?)
> 
> I think that it's not good.

- although I agree that there's likely a cleaner more consistent way to
  accurately describe and track condition-code side-effects of AVR's ISA,
  it seems that this approach actually inhibits GCC helping to optimize
  the code, as too much information is being hidden from it? For example:

  It seems that by relying on peephole optimization to try to eliminate
  otherwise redundant explicit expensive comparison operations on wider
  than byte sized operands which were generated because multi-byte wide
  operations don't expose their cc-mode side-effects, may not be a good
  strategy?

  As it would seem that the initial hiding of this critical information
  only inhibits GCC from being able to optimally (not to mention safely)
  schedule basic block instruction sequences in an effort to eliminate
  the necessity of otherwise relatively expensive multi-byte comparisons
  to begin with. (Which would seem to be more optimal than relying on
  no scheduling, and likely only catching some of the potential
  opportunities to eliminate expensive compares after the fact?)

(however acknowledge that may misunderstand GCC's instruction scheduling
 capabilities, and/or be missing something more significant?)

Thanks, -paul-

Re: AVR: CC0 to CCmode conversion

2005-03-19 Thread Paul Schlie

> From: Denis Chertykov <[EMAIL PROTECTED]>
>> Paul Schlie <[EMAIL PROTECTED]> writes:
>>> From: Denis Chertykov <[EMAIL PROTECTED]>
>> - although I agree that there's likely a cleaner more consistent way
>>   to accurately describe and track condition-code side-effects of AVR's
>>   ISA, it seems that this approach actually inhibits GCC helping to
>>   optimize the code, as too much information is being hidden from it?
> 
> I don't want to hide useful information. I want to ignore unusable.
> 
> For example: even i386 port (probably better port) isn't
> "accurately describe and track condition-code side-effects"
> 
> FLAGS_REG just clobbered. It's not a true.
> As I understand '(clobber (reg:CC FLAGS_REG))' needed only for
> scheduling.
> AVR havn't scheduling and I want to omit such clobbers, but I have
> added special insns for optimization.
> For example: ...

- yes but only for QI mode operations, but not otherwise; which prohibits
  safe instruction re-ordering/scheduling, which I understand you rely on
  not occurring, then use peephole optimization in an effort to identify
  opportunities to eliminate otherwise redundant compare operations.

- alternatively, why not accurately expose the instruction's FLAGS_REG
  side-effects, and enable GCC to re-order/schedule (maintaining the fully
  exposed data-flow-control-graph's sequential dependencies) to attempt
  to find more optimal sequences which may reduce the cost of an arbitrary
  instruction sequence within a basic block (including the potential
  elimination of explicit comparison operations, when an instruction which
  generates the necessary FLAGS_REG side-effect may be safely re-ordered
  prior to it's requirement with no instructions with interfering
  side-effects in between)?

>> ...
>>   As it would seem that the initial hiding of this critical information
>>   only inhibits GCC from being able to optimally (not to mention safely)
>>   schedule basic block instruction sequences in an effort to eliminate
>>   the necessity of otherwise relatively expensive multi-byte comparisons
>>   to begin with. (Which would seem to be more optimal than relying on
>>   no scheduling, and likely only catching some of the potential
>>   opportunities to eliminate expensive compares after the fact?)
> 
> Ohhh. I'm probably understand you. Are you mean that better way for
> splitting comparisions is cmpHI,
> cbranch -> cmpQI1_and_set_CC, cbranch1, cmpQI2_and_use_CC, cbranch2 ?
> In this case cmpQI1,cbranch1 may be in one basic block
> and cmpQI2, cbranch2 in another. You right it's impossible without
> "accurately describe and track condition-code side-effects".
> If you (or any other) want to support such splitting then clobbers
> must be added to insns.

- basically yes, but it would seem that it's necessary to accurately
  describe instructions side effects to do so optimally; as clobbering
  FLAGS_REG only prevents unsafe re-reordering, it doesn't enable optimal
  reordering (which would seem to be one of the few optimizations that GCC
  could do for avr, or other lightly pipelined in-order issue/completion
  targets, and would be unfortunate to prohibit)?

> I think that better to support
> cmpHI, cbranch  ->  cmpQI1_set_CC, cmpQI2_use_CC, cbranch. because
> AVR is a microcontroller and code size more important than code speed.

- I fully agree that code-size tends to be most important, which is why I
  believe it's important to enable instruction scheduling/re-ordering
  optimizations that are capable of eliminating potentially unnecessary
  explicit comparison operations for wider than byte-sized operand results
  against 0, if the instructions within a basic block can be safely
  rescheduled to eliminate them.

  Which would seem to require that both instruction FLAGS_REG side-effects
  be fully exposed, and correspondingly that conditional branches expose
  their dependency on them (and all are visible during DFCG scheduling).

- possibly something like: ?

  (define_insn "*addhi3"
[(set (match_operand:HI 0 ...)
   (plus:HI (match_operand:HI 1 ...)
(match_operand:HI 2 ...)))
 (set (reg ZCMP_FLAGS)
   (compare:HI (plus:HI (match_dup 1) (match_dup 2))) (const_int 0))
 (set (reg CARRY_FLAGS)
   (compare:HI (plus:HI (match_dup 1) (match_dup 2))) (const_int 0))]
""
"@ add %A0,%A2\;adc %B0,%B2
   ..."
[(set_attr "length" "2, ...")])

  (define_insn "*andhi3"
[(set (match_operand:HI 0 ...)
   (and:HI (match_operand:HI 1 ...)
   (match_operand:HI 2 ...)))
 (set (reg ZCMP_FLAGS)
   (compare:HI (and:HI (match_dup 1) (mat

Re: AVR: CC0 to CCmode conversion

2005-03-19 Thread Paul Schlie

Sorry meant denote the setting of FLAG_REGS based solely on it's result,
which should be implied by operand 0, the target of 3-operand instructions;
with the exception of compare, as it's effective target are the FLAGS_REGS.

corrected below:

> From: Paul Schlie <[EMAIL PROTECTED]>
> - possibly something like: ?

  (define_insn "*minushi3"
[(set (match_operand:HI 0 ...)
   (minus:HI (match_operand:HI 1 ...)
 (match_operand:HI 2 ...)))
 (set (reg ZCMP_FLAGS)
   (compare:HI (match_dup 0) (const_int 0)))
 (set (reg CARRY_FLAGS)
   (compare:HI (match_dup 0) (const_int 0)))]
""
"@ add %A0,%A2\;adc %B0,%B2
   ..."
[(set_attr "length" "2, ...")])

  (define_insn "*andhi3"
[(set (match_operand:HI 0 ...)
   (and:HI (match_operand:HI 1 ...)
   (match_operand:HI 2 ...)))
 (set (reg ZCMP_FLAGS)
   (compare:HI (match_dup 0) (const_int 0)))]
""
"@ and %A0,%A2\;and %B0,%B2
   ..."
[(set_attr "length" "2, ...")])

  (define_insn "*comparehi"
[(set (reg ZCMP_FLAGS)
   (compare:HI (minus:HI (match_operand:HI 1 ...)
 (match_operand:HI 2 ...))
   (const_int 0)))
 (set (reg CARRY_FLAGS)
   (compare:HI (minus:HI (match_operand:HI 1 ...)
 (match_operand:HI 2 ...))
   (const_int 0)))]
""
"@ cp %A0,%A0\;cpc %B0,%B1
   ..."
[(set_attr "length" "2, ...")])

  (define_insn "branch"
[(set (pc)
   (if_then_else (condition (reg ZCMP_FLAGS) (match_operand 1 ...))
 (label_ref (match_operand 0 ...))
 (pc)))]
"* return ret_cond_branch (operands);
   ..."
[(set_attr "type" "branch")])

Re: AVR indirect_jump addresses limited to 16 bits

2005-03-19 Thread Paul Schlie

> Marek Michalkiewicz wrote:
> I'm looking into adding support for ATmega256x and larger devices to
> the AVR port.  This means that program memory addresses no longer fit
> in 16 bits - and I'm looking how to avoid making pointers larger.

- Fully agree, just can't keep from wondering if this may be most
  efficiency accomplish by simply requiring the alignment of all
  function entry points to be two instruction word aligned. Thereby
  although there will be some more inefficiency than otherwise required
  for code mapped @ <64K; apps with even as many as 1K or so functions
  (which is likely a lot for even large avr apps), it would only imply
  a worst-case inefficiency of 1K instruction words, but likely averaging
  closer to 500 words, which seems like a small price to pay for an
  additional 64K words of program space. (and would likely less than the
  corresponding overhead of having to thread function entries to their
  bodies otherwise).

  Thereby all function address pointers simply first set the extended
  code address register bit to the logical high order address bit of the
  16-bit code pointer. which technically is likely best physically mapped
  into the low-order bit of the function pointer so that it does not need
  to cleared prior to use (effectively implying that function entries <64K
  are aligned to even word addresses, and those >64K are aligned to odd word
  address.

  (no jump tables, etc, required)

  ???

Re: AVR indirect_jump addresses limited to 16 bits

2005-03-19 Thread Paul Schlie

> From: Paul Schlie <[EMAIL PROTECTED]>
>> Marek Michalkiewicz wrote:
>> I'm looking into adding support for ATmega256x and larger devices to
>> the AVR port.  This means that program memory addresses no longer fit
>> in 16 bits - and I'm looking how to avoid making pointers larger.
> 
> - Fully agree, just can't keep from wondering if this may be most
>   efficiency accomplish by simply requiring the alignment of all
>   function entry points to be two instruction word aligned. Thereby
>   although there will be some more inefficiency than otherwise required
>   for code mapped @ <64K; apps with even as many as 1K or so functions
>   (which is likely a lot for even large avr apps), it would only imply
>   a worst-case inefficiency of 1K instruction words, but likely averaging
>   closer to 500 words, which seems like a small price to pay for an
>   additional 64K words of program space. (and would likely less than the
>   corresponding overhead of having to thread function entries to their
>   bodies otherwise).
> 
>   Thereby all function address pointers simply first set the extended
>   code address register bit to the logical high order address bit of the
>   16-bit code pointer. which technically is likely best physically mapped
>   into the low-order bit of the function pointer so that it does not need
>   to cleared prior to use (effectively implying that function entries <64K
>   are aligned to even word addresses, and those >64K are aligned to odd word
>   address.
> 
>   (no jump tables, etc, required)

- and likely continue to assume that the static-const progmem mapped data
  simply remains limited to a maximum of 64K-bytes mapped into the lower 32K
  words of program memory space to keep it's addressing and access simple,
  uniform and efficient within the constraints of AVR's ISA. (which seems
  likely reasonable?)

Re: AVR: CC0 to CCmode conversion

2005-03-19 Thread Paul Schlie

> From: Björn Haase <[EMAIL PROTECTED]>
> I have the impression that you are trying to open open doors :-) : If IIUC
> what Denis aims to do is to segment the re-organization of the back-end into
> several independent small steps. One step will be the cc0 -> CC_mode issue he
> is addressing now. The splitting issue would be one of the following steps.
> One will have to verify this point, but it seems that only the splitting
> issue requires accurate tracking of all the clobbers/settings of the
> condition code.
> 
> In my opinion segmenting the rework of the back-end would indeed be the best
> approach, also because I expect that the instruction patterns *with*
> splitting will be fairly different. E.g. I do not think that the "addsi3"
> will be present any more. So it would be probably a lot of useless work to
> add all of the clobbers for instruction patterns that are likely to vanish in
> the near future.

Thank you, however I still don't understand the advantage of adopting an
intermediate step which only seems to prohibit all forms of scheduling; and
likely producing inferior code when multi-byte comparisons against 0 can't
be peephole optimized away because the operation which may have produced
equivalent side-effects doesn't happen to immediately sequentially precede
it?

Maybe I misunderstand the generality of the specified compare/if-then-else
peephole optimizations?  Are they guaranteed to match any opportunity to
eliminate otherwise redundant a multi-byte comparison against 0, by forcing
an exiting equivalent side-effect producing operation to be sequentially
safely placed immediately preceding it's requirement (thereby effectively
forcing an instruction ordering)?

If this is the case, then my concerns are mostly satisfied; but still don't
think its a good idea to hind side-effects, as although they may not be
fully leveraged at the moment, hiding them would seem to only hurt, and
never help in any circumstance. (But maybe I misunderstand the benefits of
hiding them, are there any?)

Re: AVR indirect_jump addresses limited to 16 bits

2005-03-19 Thread Paul Schlie

> From: Marek Michalkiewicz <[EMAIL PROTECTED]>
>> - Fully agree, just can't keep from wondering if this may be most
>>   efficiency accomplish by simply requiring the alignment of all
>>   function entry points to be two instruction word aligned. Thereby
> 
> This only doubles the available address space, and I'd rather not do
> it all again (this time with 4-word alignment) if 512K chips appear.

- understood, however unlikely; observing it will likely take Atmel at
  least 2-3 years to stabilize production of the 256K devices, and larger
  device volume potential vs. competitive offerings couldn't likely justify
  their development (as just an opinion).

> But function entry points are not the only problem - indirect jumps are
> another (as you can see in the subject of my message), they can point
> somewhere within a function (so function alignment may not help here).
> On the other hand, indirect_jump is rarely seen, so it must be correct
> but doesn't have to be very efficient (OK if it costs even a few more
> instructions to stay in the low 64K words).

- Sorry, I'm confused; can you give me an example of legal C expression
  specifying an indirect jump to an arbitrary location within a function?
  
  (as unless I misunderstand, there's no such thing?)

  I suspect they only exist as a result of a possible switch statement
  optimization strategy, which should have nothing to do with the size of
  the target's data pointers, therefore likely stored and accessed however
  convenient by the compiler in program memory (not data memory)
  independently of the representation chosen for function-pointers.

  (I'd guess)

> So, I'm trying to figure out when the indirect_jump pattern can actually
> be generated on the AVR (haven't yet seen it in any real application),
> and where does the pointer comes from (to see if the jump target can be
> moved to the low 64K words somehow).
> 
> Marek

Re: AVR indirect_jump addresses limited to 16 bits

2005-03-19 Thread Paul Schlie

> From: Marek Michalkiewicz <[EMAIL PROTECTED]>
> Good question - I can't.  On the other hand, the manual says:
> 
> `indirect_jump'
> An instruction to jump to an address which is operand zero.  *This pattern
> name is mandatory on all machines.*
> 
> Why would it be mandatory if it was not truly needed?  If the manual is
> correct, it seems this pattern is truly needed (not just an optional
> optimization like some other patterns).
> 
> If it is impossible on the AVR, it could be implemented with invalid
> assembler output (so we get an error if "impossible" ever happens).
> But I'd like to be sure if this is really the case.  GCC is not only
> a C compiler, perhaps indirect_jump is needed for some other language?

- I believe it's simply a vehicle to allow the target describe how to
  jump indirectly to an address which may be required if the compiler
  chooses to generate a static jump table mapped into presumably the
  program's "text" section, therefore would guess the right thing to do
  would be to (assuming operand-0 is a progmem reference) load from
  progmem 2-words for 256K devices (or 1 word otherwise) into the
  appropriate registers, then executes an extended jump instruction
  (or regular jump otherwise).

  (as I'd hope the compiler would never map static jump tables that it
   chooses to generate into the "data" section, but can't find any
   description of under what circumstances it may generate/put them?)

(again, just my guess)

Re: AVR: CC0 to CCmode conversion

2005-03-19 Thread Paul Schlie

> From: Björn Haase <[EMAIL PROTECTED]>
> In my opinion segmenting the rework of the back-end would indeed be the best
> approach, also because I expect that the instruction patterns *with*
> splitting will be fairly different. E.g. I do not think that the "addsi3"
> will be present any more. So it would be probably a lot of useless work to
> add all of the clobbers for instruction patterns that are likely to vanish in
> the near future.

Related more specifically to the above comment:

- I understand the desire not to add stuff which is only likely to be
  removed/replaced in the near future.

- at that "near future" point in time, do you anticipate all side-effect
  STATUS-FLAG dependencies between split byte operations to be fully
  exposed such that it will guarantee that potential scheduling
  optimizations will never unsafely reorder them?

- in lieu of replacing the exiting target files with a short term
  interim solution, might it make sense to check-it-in as a parallel
  avr target description, which may be alternatively built with possibly
  --target=new-avr; thereby providing a convenient mechanism by which
  it could be both updated and experimented with conveniently without
  potentially disrupting the base-line existing target files; thereby
  when it's stabilized and clearly superior, it could replace the
  older ones, and then deleted (or kept as a experimental vehicle,
  to test further refinements without disrupting the base-line files?)

Re: AVR indirect_jump addresses limited to 16 bits

2005-03-19 Thread Paul Schlie

> From: Paul Schlie <[EMAIL PROTECTED]>
>> From: Marek Michalkiewicz <[EMAIL PROTECTED]>
>> Good question - I can't.  On the other hand, the manual says:
>> 
>> `indirect_jump'
>> An instruction to jump to an address which is operand zero.  *This pattern
>> name is mandatory on all machines.*
>> 
>> Why would it be mandatory if it was not truly needed?  If the manual is
>> correct, it seems this pattern is truly needed (not just an optional
>> optimization like some other patterns).
>> 
>> If it is impossible on the AVR, it could be implemented with invalid
>> assembler output (so we get an error if "impossible" ever happens).
>> But I'd like to be sure if this is really the case.  GCC is not only
>> a C compiler, perhaps indirect_jump is needed for some other language?
> 
> - I believe it's simply a vehicle to allow the target describe how to
>   jump indirectly to an address which may be required if the compiler
>   chooses to generate a static jump table mapped into presumably the
>   program's "text" section, therefore would guess the right thing to do
>   would be to (assuming operand-0 is a progmem reference) load from
>   progmem 2-words for 256K devices (or 1 word otherwise) into the
>   appropriate registers, then executes an extended jump instruction
>   (or regular jump otherwise).
> 
>   (as I'd hope the compiler would never map static jump tables that it
>chooses to generate into the "data" section, but can't find any
>description of under what circumstances it may generate/put them?)
> 
> (again, just my guess)

- or possibly GCC may try to be clever by jumping to no-return attribute
  function calls?

Re: AVR indirect_jump addresses limited to 16 bits

2005-03-19 Thread Paul Schlie

> From: Giovanni Bajo <[EMAIL PROTECTED]>
>> Paul Schlie <[EMAIL PROTECTED]> wrote:
>> - Sorry, I'm confused; can you give me an example of legal C
>>   expression specifying an indirect jump to an arbitrary location
>>   within a function?
> It is possible in GNU C at least:

- thanks, obviously wasn't aware of that.

(which unfortunately creates an interesting problem; which may not
 be as worth trying to solve, as investing in other improvements
 which may generally benefit all devices)

Re: AVR indirect_jump addresses limited to 16 bits

2005-03-19 Thread Paul Schlie

> From: Paul Brook <[EMAIL PROTECTED]>
> Don't we know which labels are targets of indirect jumps?
> So the proposed restriction now becomes: functions *and targets of indirect
> jumps* must be aligned to an N word boundary. I'd guess that the latter are
> sufficiently rare that this is still an acceptable restriction.

- seams plausible, if it were reasonably easy to identify such labels to the
linker with a corresponding alignment requirement to the exclusion of all
other labels? (and the restricted alignment method were deemed acceptable?)

Re: AVR: CC0 to CCmode conversion

2005-03-20 Thread Paul Schlie

> From: Richard Henderson <[EMAIL PROTECTED]>
> On Sun, Mar 20, 2005 at 01:59:44PM +0300, Denis Chertykov wrote:
>> The reload will generate addhi3 and reload will have a problem with
>> two modified regs (ZCMP_FLAGS, CARRY_FLAGS) which will be a bad
>> surprise for reload. :( As I remember.
> 
> In order to expose the flags register before reload, you *must*
> have load, store, reg-reg move, and add operations that do not
> modify the flags.
> 
> Note, for instance, that i386 "add" instruction always modifies
> the flags, but the "lea" instruction does not.  So we emit the
> later when reload emits an add.
> 
> If you cannot meet these requirements, then you must represent
> "setcc" and "compare-and-branch" patterns as a single insn until
> after reload.  You can then split them apart, followed by peep2
> patterns to remove compare patterns that are redundant with
> immediately preceeding arithmetic.

- OK, so GCC's assumes that it may arbitrarily spill/reload at any
  point and not destructively modify the machines state; as opposed
  to attempting to select optimal points to do so through the analysis
  of the code's data-flow/control dependencies, and then as potentially
  may be necessary, re-synthesize machine state post the actions to
  consistently satisfy any dependencies which may remain.

- so in AVR's case, simply pretending that add operations don't modify
  CC state may only be asking for trouble; however may it be sufficient to
  somehow force spill/reload to only use indexed/auto-inc/dec load/store
  operations, without inadvertently picking up a general add/sub/etc.
  operation in the process which will modify CC state?

Re: AVR: CC0 to CCmode conversion

2005-03-20 Thread Paul Schlie

> From: Paul Schlie <[EMAIL PROTECTED]>
>> From: Richard Henderson <[EMAIL PROTECTED]>
>> On Sun, Mar 20, 2005 at 01:59:44PM +0300, Denis Chertykov wrote:
>>> The reload will generate addhi3 and reload will have a problem with
>>> two modified regs (ZCMP_FLAGS, CARRY_FLAGS) which will be a bad
>>> surprise for reload. :( As I remember.
>> 
>> In order to expose the flags register before reload, you *must*
>> have load, store, reg-reg move, and add operations that do not
>> modify the flags.
>> 
>> Note, for instance, that i386 "add" instruction always modifies
>> the flags, but the "lea" instruction does not.  So we emit the
>> later when reload emits an add.
> 
> - OK, so GCC's assumes that it may arbitrarily spill/reload at any
>   point and not destructively modify the machines state; as opposed
>   to attempting to select optimal points to do so through the analysis
>   of the code's data-flow/control dependencies, and then as potentially
>   may be necessary, re-synthesize machine state post the actions to
>   consistently satisfy any dependencies which may remain.
> 
> - so in AVR's case, simply pretending that add operations don't modify
>   CC state may only be asking for trouble; however may it be sufficient to
>   somehow force spill/reload to only use indexed/auto-inc/dec load/store
>   operations, without inadvertently picking up a general add/sub/etc.
>   operation in the process which will modify CC state?

- what about blk moves? (as they would seem to most likely destructively
  modify the machine's cc state in most implementations, as their
  implementation implies a conditional loop; or are they an exception?
  if so, why?)

>> If you cannot meet these requirements, then you must represent
>> "setcc" and "compare-and-branch" patterns as a single insn until
>> after reload.  You can then split them apart, followed by peep2
>> patterns to remove compare patterns that are redundant with
>> immediately preceeding arithmetic.

- what would be the requirements to enable the SMS pass (assuming it
  to be the most likely appropriate) to try to reorder operations such
  that a naturally occurring operation with the required side effects
  for a conditional branch may be moved closer to to it, such that an
  otherwise explicit compare may be optimally eliminated? (as opposed
  to an otherwise more coincidental peephole opportunity)

Re: AVR: CC0 to CCmode conversion

2005-03-21 Thread Paul Schlie

> From: Denis Chertykov <[EMAIL PROTECTED]>
>> - possibly something like: ?
>> 
>>   (define_insn "*addhi3"
>> [(set (match_operand:HI 0 ...)
>>(plus:HI (match_operand:HI 1 ...)
>> (match_operand:HI 2 ...)))
>>  (set (reg ZCMP_FLAGS)
>>(compare:HI (plus:HI (match_dup 1) (match_dup 2))) (const_int 0))
>>  (set (reg CARRY_FLAGS)
>>(compare:HI (plus:HI (match_dup 1) (match_dup 2))) (const_int 0))]
>> ""
>> "@ add %A0,%A2\;adc %B0,%B2
>>..."
>> [(set_attr "length" "2, ...")])
> 
> You have presented a very good example. Are you know any port which
> already used this technique ?
> As I remember - addhi3 is a special insn which used by reload.
> The reload will generate addhi3 and reload will have a problem with
> two modified regs (ZCMP_FLAGS, CARRY_FLAGS) which will be a bad
> surprise for reload. :( As I remember.

Thanks for your patience, and now that I understand GCC's spill/reload
requirements/limitations a little better; I understand your desire to merge
compare-and-branch.

However, as an alternative to merging compare-and-branch's to overcome the
fact that the use of a conventional add operation to calculate the effective
spill/reload address for FP offsets >63 bytes would corrupt the machine's
cc-state that a following conditional skip/branch may be dependant on;
I wonder if it may be worth considering simply saving the status register to
a temp register and restoring it after computing the spill/reload address
when a large FP offset is required. (which seems infrequent relative to
those with <63 byte offsets, so would typically not seem to be required?)

If this were done, then not only could compares be split from branches, and
all side-effects fully disclosed; but all compares against 0 resulting from
any arbitrary expression calculation may be initially directly optimized
without relying on a subsequent peephole optimization to accomplish.

Further, if there were a convenient way to determine if the now fully
exposed cc-status register was "dead" (i.e. having no dependants), then
it should be then possible to eliminate its preservation when calculating
large FP offset spill/reload effective addresses, as it would be known that
no subsequent conditional skip/branch operations were dependant on it.

With this same strategy, it may even be desirable to then conditionally
preserve the cc-status register abound all corrupting effective address
calculations when cc-status register is not "dead", as it would seem to
be potentially more efficient to do so rather than otherwise needing
to re-compute an explicit comparison afterward?

(Observing that I'm basically suggesting treating the cc-status register
 like any other hard register, who's value would need to be saved/restored
 around any corrupting operation if it's value has live dependants; what's
 preventing GCC's register and value dependency tracking logic from being
 able to manage its value properly just like it can for other register
 allocated values ?)

Re: AVR indirect_jump addresses limited to 16 bits

2005-03-21 Thread Paul Schlie




> From: Marek Michalkiewicz <[EMAIL PROTECTED]>
>> On Sun, Mar 20, 2005 at 04:29:01PM -0800, Richard Henderson wrote:
>> The easiest way is to do this in the linker instead of the compiler.
>> See the xstormy16 port and how it handles R_XSTORMY16_FPTR16.  This
>> has the distinct advantage that you do not commit to the creation of
>> an indirect jump until you discover that the target label is outside
>> the low 64k.
> 
> Looks perfect to me.  So we are not the first architecture needing
> such tricks...  AVR would need 3 new relocs, used like this:
> 
> .word pm16(label)
> 
> ldi r30,pm16_lo8(label)
> ldi r31,pm16_hi8(label)
> 
> and the linker can do the rest of the magic (add jumps in a section
> below 64K words if the label is above).
> 
> Cc: to Denis, as I may need help actually implementing these changes
> (you know binutils internals much better than I do).

- yup, and nicer than trying to play games with alignment, etc.,

  And just to double check, using the earlier example:

> int foo(int dest)
> {
>__label__ l1, l2, l3;
>void *lb[] = { &&l1, &&l2, &&l3 };
>int x = 0;
> 
>goto *lb[dest];
> 
> l1:
>x += 1;
> l2:
>x += 1;
> l3:
>x += 1;
>return x;
> }

  It would seem that the only time the pm16(label) address would ever
  be used, would as an initializing constant pointer value being assigned
  to a  _label_/function pointer variable; as a CALL/JUMP LABEL instruction
  would be used to call/jump-to the true entry point directly otherwise.
  (is that correct?)

Re: AVR: CC0 to CCmode conversion

2005-03-21 Thread Paul Schlie

> From: Denis Chertykov <[EMAIL PROTECTED]>
>> Paul Schlie <[EMAIL PROTECTED]> writes:
>> ...
>> (Observing that I'm basically suggesting treating the cc-status register
>>  like any other hard register, who's value would need to be saved/restored
>>  around any corrupting operation if it's value has live dependants; what's
>>  preventing GCC's register and value dependency tracking logic from being
>>  able to manage its value properly just like it can for other register
>>  allocated values ?)
> 
> Why not CCmode register ?

- For what? As it would seem that as along as all rtl instruction data-flow
  dependencies are satisfied, the code will execute the program correctly ?

 (as all conditionals are effectively based upon a comparison of a result
  against 0, and GCC always converts both operands to the same type, so
  all that's necessary to know is if that type is signed or unsigned, as
  even floats compare just like signed integers. Therefore it would seem
  that the only difference between a compare operation and a subtract, is
  that it doesn't produce a value result which clobbers one of it's
  operands; otherwise they're identical, therefore arguably just an
  optimization to be used instead of a subtract when the result value
  isn't needed or invalid when comparing floats. It would seem ?)

Re: A plan for eliminating cc0

2005-03-25 Thread Paul Schlie

> Ian Lance Taylor  writes:
> We would like to eliminate cc0 and the associated machinery from the
> compiler, because it is complicated and not supported on popular or
> modern processors.  Here is a plan which I think could accomplish that
> without unreasonable effort.

I pre-apologize if this is a dumb question, but:

Does GCC truley need to identify/treat condition state registers uniquely
from any other value produced as a result of a calculation?

As it would seem that condition state dependencies may be tracked just as
any other defined (virtual or physical) register value dependency is between
instructions; as it seems that the only thing unique about the machine's
condition state value, is that it tends to be an implied vs. explicit
operand of an instruction, just as accumulators tend to be in some machines?

Where then it would seem that as long as a target accurately describes it's
instruction dependencies on any implied register operands it chooses to
partition machine state into; they may be treated just any register value
dependency would be, irrespective of it's value or purpose?

Thereby GCC need only designate the statically determined "branch condition"
as a function of a single previously computed register value as compared
against 0* to a logical branch instruction in addition to a basic block
label; which the target may implement as being composed of any combination
of instruction and/or register value dependencies specified as necessary;
without GCC needing to be aware of their purpose, only it's dependencies?

(* as 0 tends to be the basis of all arithmetic and logical operation result
comparisons, including comparison operations themselves, which tend to be
simply a subtraction, who's result is compared against 0, but not saved;
therefore truly, explicit comparison operations are optimizations, and not a
fundamental operation, so should never be explicitly required. i.e.:
conditional branching should be based on the canonical comparison of an
arbitrary value against 0:

 (set rx (op ...)) ;; rx is the result of an arbitrary operation
 (branch bc rx label)  ;; (branch bc rx label) :: if (rx bc 0) goto label;

 Where some targets may need to generate an explicit subtract (or compare),
 or others may specify an implied register in which the result of rx ?? 0
 is stored in and the branch is dependant on, or that branch may compute
 the comparison between rx and 0 itself, etc)

Re: A plan for eliminating cc0

2005-03-25 Thread Paul Schlie

> From: Ian Lance Taylor 
>> Paul Schlie <[EMAIL PROTECTED]> writes:
>> Does GCC truley need to identify/treat condition state registers uniquely
>> from any other value produced as a result of a calculation?
> 
> No, it doesn't.  The change I am proposing removes the unique handling
> of condition state registers, and treats them like other registers.
> The unique handling of condition state registers is historical, and
> arose because of the characteristics of the initial gcc targets (e.g.,
> vax, m68k).
> 
> The idea to do this is not mine; for more background see the
> discussion of cc0 here:
> http://gcc.gnu.org/wiki/general%20backend%20cleanup

Thank you. After reviewing that reference, a question comes to mind:

Is there any convenient way to reference the newly set register by an
instruction, as opposed to otherwise needing to redundantly re-specify
the operation producing it's value again?

Thereby enabling something like:

(insn xxx [(set (reg: A) (xxx: (reg: B) (reg: C)))
   (set (reg: CC) (newly-set-reg: A))
  )

(insn branch-equal (set (pc) (if_then_else
   (ge: CC 0)
   (label_ref 23)
   (pc)))
...)

Thereby enabling an xxx instruction to specify the CC register value being
virtually assigned the result of the instruction's operation (i.e. no code
will actually be generated for assignments to the CC register), upon which
an independently specified conditional branch may be defined to be dependant
upon it (the virtual CC register). Which would seem to be a simple way to
closely approximate the semantics of a global cc-state register?

Re: A plan for eliminating cc0

2005-03-25 Thread Paul Schlie

> From: Ian Lance Taylor 
>> Paul Schlie <[EMAIL PROTECTED]> writes:
>> Thereby enabling something like:
>> 
>> (insn xxx [(set (reg: A) (xxx: (reg: B) (reg: C)))
>>(set (reg: CC) (newly-set-reg: A))
>>   )
>> 
>> (insn branch-equal (set (pc) (if_then_else
>>(ge: CC 0)
>>(label_ref 23)
>>(pc)))
>> ...)
>> 
> Yes, a backend could be implemented this way.  There are two problems.
> 
> 1) Many of the optimizers analyze instructions by first calling
>single_set and working with the results of that.  For example,
>combine won't work with any insn for which single_set returns NULL.
>And single_set will normally return NULL for your insn xxx above.

- As leading edge processor architectures seem to be slowly increasing
  intra-instruction level parallelism (i.e. backing slowly away from a
  pure simple one-operation/one-instruction RISC ISA, toward enabling a
  single instruction to potentially operate on a small set of operands
  which my yield multiple, non-interdependent results simultaneously
  (i.e. results dependant on only on the operands).  Are there any
  plans to eliminate this "single-set" restricting presumption, thereby
  enabling the potential optimization of multi-operand/operation/result
  instruction sequences?

- However regardless, although the above represents a primitive example
  of intra-instruction level multi-operation/result parallelism; I wonder
  if it's restricted enough case that optimization may be simply enabled
  for all instructions which have a "single-live-set"?

  In other words, although I understand parallel instruction optimization
  may be beyond the capabilities of many of the present optimizers, it
  seems "safe" to enable optimization of the "live" path, which would be
  the case when only a "single-set" has live dependencies, and the remaining
  "set(s)" are are "dead" (i.e. have no dependants), therefore irrelevant?

  Which would seem to be the most likely the case when a second parallel
  "set" is used to specify an update to a global condition-state, as most
  updates won't likely have conditional branch dependants? (Therefore safe
  to optimize the data-path, or in cases when the data path is dead,
  implying the condition-state path may be optimized irrespective of the
  data path.  Which would be analogous to turning a subtract into a compare
  instruction when the result of the subtract isn't used other than as
  required to set the global condition-state.)

> 2) Reload requires the ability to insert arbitrary instructions
>between any pair of instructions.  The instructions inserted by
>reload will load registers from memory, store registers to memory,
>and possibly, depending on the port, move values between different
>classes of registers and add constants to base registers.  If
>reload can't do that without affecting the dependencies between
>instructions, then it will break.  And I don't think reload will be
>able to do that between your two instructions above, on a typical
>cc0 machine in which every move affects the condition codes.

- Understood, however along a similar line of though; it would seem "safe"
  to simply "save/restore" the global condition-state around all potentially
  destructive memory operations.

  Which at first glance may seem excessive, however observing that most
  frequently required simple load/store operations on most targets won't
  modify global condition-state; and in the circumstances when more complex
  operations which may modify it are required, it need only be save/restored
  if it has dependants. Which as observed above, won't likely typically be
  the case.

  So overall it appears that there's likely very little effective overhead
  making all memory transactions effectively transparent to the global
  condition-state when it matters, as global condition-state will only be
  required to saved/restored in the likely few circumstances when a complex
  memory transaction may modify it, and it has dependants?

  (does that seem rational? or may I be missing something fundamental?)

thanks again, -paul-

Re: A plan for eliminating cc0

2005-03-26 Thread Paul Schlie

> From: Ian Lance Taylor 
> I'm also not aware of processors changing as you describe, except for
> the particular special case of SIMD vector instructions.  gcc can and
> does represent vector instructions as a single set.

- Understood, unfortunately hiding the multiple-set nature of instructions
  which simultaneously set data and condition-state register values in an
  abstract new data mode type like simd instructions do won't likely be
  helpful, as unlike multiple discrete simd values embedded in a vector
  data type, data and condition values tend to have different subsequent
  dependant evaluation paths.

> Yes, but the point of representing changes to the condition flags is
> precisely to permit optimizations when the condition flags are used,
> and that is precisely when the single_set assumption will fail.  You
> are correct that in general adding descriptions of the condition code
> changes to the RTL won't inhibit optimizations that don't meaningfully
> set the condition code flags.  But it will inhibit optimizations which
> do set the condition code flags, and that more or less obviates the
> whole point of representing the condition code setting in the first
> place.

- OK, I think I understand the difference in our perspective on the issue.

  In general it seems that much of the concern related to the optimization
  complexity relating to multi-set instructions is related to attempting
  to iteratively optimize instruction mappings?

  I presume that "code" can/should be optimally generated once by initially
  optimally covering the rtl representing a basic block (with minimal cost
  in either storage, cycles or some hybrid of both); where there's then no
  need to ever subsequently screw with it again (although various basic
  block partitioning resulting from various loop transformations strategies,
  etc. may require multiple mappings to determine their relative costs).
  
  Where this presumption basically ideally requires that the target be
  described as accurately as possible entirely in rtl, with no reliance
  on procedural or peephole optimization, relying entirely on GCC to
  optimally cover the program's basic-block rtl optimally with rtl
  instruction description equivalents; thereby by fully exposing all
  dependencies, an optimal instruction schedule will naturally result
  from an optimal rtl graph covering without needing to perform an
  explicit further optimization for example.
 
  (is this not feasible if the target is accurately described in rtl?)

Re: A plan for eliminating cc0

2005-03-27 Thread Paul Schlie

> From: Ian Lance Taylor 
>> Paul Schlie <[EMAIL PROTECTED]>
>>   (is this not feasible if the target is accurately described in rtl?)
> 
> I don't know how to respond to this.  I'm discussing a way to achieve
> an incremental improvement in gcc.  You seem to be discussing a
> different compiler.  I don't think my suggestions for incremental
> improvement are relevant to creating your compiler: they don't help,
> and they don't hurt.
> 
> Perhaps somebody else has something to say about this, but I don't.
> I'm a practical guy: I compile code with the compiler I have, not the
> compiler I might want or wish to have.

- sorry, there's no need to respond; I need to concentrate on reality too.

Re: A plan for eliminating cc0

2005-03-27 Thread Paul Schlie

Hi Ian, (getting back to reality) upon reviewing things further, it appears
that if GCC could relax it's single-set restriction to enable a restricted
form of multi-set instructions to be included in optimizations; then ISA's
who's instructions either implicitly set or depend on global machine state
which directly correspond to that instruction's single result value, would
be enabled to be both accurately modeled and fully safely optimized.

More specifically, if GCC enabled set to optionally specify multiple targets
for a single rtl source expression, i.e.:

  (set ((reg:xx %0) (reg CC) ...) (some-expression:xx ...))

Then it would seem that this single common rtl source expression may be
optimized as if it were a single-set expression with the only restriction
that instructions with multiple active sets may only be merged with it's
source operand instructions when all of it's live sets may correspondingly
be represented in the newly merged instruction, i.e.:

(add %0 %1); (set ((reg %0) (reg CC)) (add (reg %0) (reg %1))
   ; %0 live, CC dead
(sub %0 1) ; (set ((reg %0) (reg CC)) (sub (reg %0) (int  1))
   ; %0 live, CC live
=> 
(add-1 a b); (set ((reg %0) (reg CC)) (sub (add (reg %0) (reg %1)) (int 1)))
   ; %0 live, CC live

Thereby enabling clean fully exposed global-cc-status target descriptions:

(insn  (set ((reg %0) (reg CC)) ( (reg: %0) (reg: %1)))
   ...)

(insn brne (set (PC) (if_then_else
   (ne: 0 (reg CC))
   (label_ref %0)
   (PC
   ...)

Might this likely work?  And if so, possibly be worthy of consideration to
enable the more efficient description and optimization of traditional cc0
target machines (and possibly be beneficial for other ISA's as well)?

Re: A plan for eliminating cc0

2005-03-28 Thread Paul Schlie

> From: Ian Lance Taylor 
>> Paul Schlie <[EMAIL PROTECTED]> writes:
>> Might this likely work?  And if so, possibly be worthy of consideration to
>> enable the more efficient description and optimization of traditional cc0
>> target machines (and possibly be beneficial for other ISA's as well)?
> 
> It seems like it might work to me.  It might be a reasonable direction
> to move in down the road, if somebody does the work.

I'll try.

Do you suspect it's best to define an extended syntax for set, or possibly
just verify that multiple sets have equivalent source expressions, or
something in between or otherwise?

Re: A plan for eliminating cc0

2005-03-29 Thread Paul Schlie

> From: Paul Koning <[EMAIL PROTECTED]>
>> "tm" == tm gccmail <[EMAIL PROTECTED]> writes:
>>> On 25 Mar 2005, Ian Lance Taylor wrote:
>>> Safe but very costly.  It assumes that every processor has a cheap
>>> way to save and restore the condition codes in user mode, which is
>>> not necessarily the case.  And it assumes that the save and restore
>>> can be removed when not required, which is not obvious to me.
>>
>>  tm> Not necessarily.
>>
>>  tm> You need the ability to regenerate the condition code. There's at
>>  tm> least two ways of doing this:
>> 
>>  tm> 1. Saving/restoring the condition code
>> 
>>  tm> 2. Rematerializing the condition code. This would usually be a
>>  tm> simple load which is faster than the save/load combo of #1.
> 
> "is faster" -- on some processors, yes.  Not on all.  For example, an
> MFPS/MTPS pair to a register on a PDP-11 (when available) is likely to
> be faster than a load that references memory.

Agreed, from the best I can tell in general, if a target's status code
register has live dependants upon a destructive spill/restore, it seems
almost always preferable to save/restore it temporarily either to a local
register, or secondarily likely to the stack; as although it may be simpler
in some instances to regenerate it, this is only reasonably possible if the
source datum which the status code itself represents is intact, and is
further complicated by the fact that some codes such as carry or overflow
are much harder to reliably recreate it without literally having to track
and re-compute the sequence which produced them, which in general is both
difficult and likely expensive. (Therefore save/restoring the status around
destructive spill/restores seems like the simplest general solution, and
hopefully being required infrequently enough to have insignificant impact.)

(but acknowledge that it's dependant on relatively infrequent need to do
 so by exposing and tracking the status register's dependants thereby
 enabling conditional save/restore, and correspondingly ideally attempting
 to position spill/reloads at points where the status register naturally has
 no dependants, thereby further helping eliminate it's otherwise necessity)

Re: A plan for eliminating cc0

2005-03-30 Thread Paul Schlie

> Richard Henderson 
>> Ian Lance Taylor wrote:
>> OK, here is a different approach toward eliminating cc0, based on a
>> combination of my earlier proposal and what Alex described.  I'm
>> looking for comments from anybody.
>
> One potential problem: once the NOTICE_UPDATE_CC pass is done, we
> can no longer run *any* pass that reorders code, because we're
> left with
> 
>[(set (reg 1) (plus (reg 1) (reg 2)))
> (clobber (reg cc_reg))]
>
>[(set (pc) (if_then_else (lt (reg cc_reg) 0)
>(label 1)
>(label 2)))]
>
> Note that there's no longer any insn that *sets* cc_reg.  I think
> that's a bit dangerous.
> 
> One solution to this is to convert the new setter to
>
> [(set (reg 1) (plus (reg 1) (reg 2)))
>  (set (reg cc_reg) (unspec [(const_int 0)] cc_set))]
> 
> I don't think it matters what we set cc_reg to, just that *some*
> set is visible in the instruction stream.

This feels like a reasonable philosophical approach; but would seem more
ideal if dependant cc-states* could be defined as being more explicitly
dependant on either this instruction's set register value, or it's source
expression; thereby allowing potential future multi-set aware optimizations
to optimize such instructions more intelligently without having to re-author
their descriptions again. Possibly via a macro enabling target descriptions
of the form:

  [(set ((reg %0) (cc-cmp) (cc-carry) ...) ( ...))]

Which could expand into whatever deemed most ideal at the time, with the
goal to enable intelligent multi-set instruction optimization, at least in
the case were all multi-set targets are all logically dependant on the same
value state.

(cc-states*: given that it would be ideal to be able to specify that one
 type of instruction may alter or depend on a particular cc-state, but not
 others; thereby enabling the specification of ISA's which may have multiple
 logical state regs which may be set in parallel, which subsequent
 instructions may be independently dependant on; thereby nicely generalizing
 the specification of global state instruction result dependencies.)

Re: RFC: #pragma optimization level

2005-04-01 Thread Paul Schlie

> Joe Buck wrote:
>>Georg Bauhaus <[EMAIL PROTECTED]> writes:
>> | A busy-loop function is used to effect a delay, not too precise,
>> | but portably. Like
>> | 
>> | #define COUNT 1000
>> | 
>> | void f() {
>> |/*volatile*/ /*register*/ int i;
>> | 
>> |for (i = 0; i < COUNT; ++i)
>> |   ;
>
> >On Sat, Apr 02, 2005 at 01:48:56AM +0200, Gabriel Dos Reis wrote:
>> This must be an FAQ.  The above is no way of (no matter how popular
>> the urban legend makes it) implementing delay.  Adding a #pragma just
>> makes teh situation worse.
>
> Unfortunately, where there is a good argument for not using empty loops
> as busy-waits, at one time it was documented GCC behavior that it would
> work, so we can't really blame the users for trusting the doc.
> 
> That's not to say that it was ever a good idea, because of the lack of
> control.  If you need a precisely timed busy-wait, an inline assembly
> construct is the best bet.

Fully agree, a C based delay loop which easily has a 2x margin of error
is basically useless (even if the clock frequency were precisely known).

Re: Use Bohem's GC for compiler proper in 4.1?

2005-04-03 Thread Paul Schlie

> | But I doubt that projects to buy small linear gains in memory usage
> | are mainstream very worthwhile in the long run (non-linear gains are
> | *always* worth going after by contrast).
> 
> I wasn't aware that people were exclusively concentrating on small
> linear gains. 

although don't know if the use of broader use of GC is appropriate, note
the obvious: small gains cumulatively produce larger gains; just as small
inefficiencies tend to cumulatively yield larger ones. so would guess that
if memory utilization efficiency, which effect cache efficiency, and thereby
performance is considered important, every little bit counts either for or
against you ultimately. (so would seem that if even only half a dozen %5-10%
gains were all that remained to be had, they could enable a memory reduction
from 1GB -> 600MB, but acknowledge as remaining marginal gains drop below a
few percent, their cumulative benefit is likely questionable, although may
still be appropriate to enable a more localized performance benefit).

Re: Obsoleting c4x last minute for 4.0

2005-04-05 Thread Paul Schlie

> Kazu Hirata  wrote:
> I would like to propose that the c4x port be obsoleted for 4.0.
> 
> c4x-*
> tic4x-*
>
>  The primary reason is that the port doesn't build.
>
>  Richard Sandiford's recent patch allows us to go further during the
>  build process, but the port still does not build.

Although personally believe the port's use of a 32-bit QI mode is odd
(and should be enabled by GCC to be defined as SI mode without QI/HI modes
 being required, where correspondingly BITS_PER_UNIT should being presumed
 to represent just the minimum alignment required by the port not
 necessarily the width of QI mode which should likely be defined as the
 width of the target's byte separately), it would still be nice to see GCC
cleaned to enable the port to build as it stresses the few remaining
assumptions scattered though out the source which should be eliminated.

Such as in the unwind code, which need only declare data modes required by
the target supported data type sizes, and not presume that all the sizes
that dwarf can support are required to be represented on the target even if
they'll never be utilized; nor does it appear that unwind word size need to
be larger than the target's pointer size, as opposed to being unnecessarily
fixed as presumed by ia64 as being a 64-bit wide DI mode datum. (Which may
be the few last tweaks remaining required to enable it to build)?

Re: Q: C++ FE emitting assignments to global read-only symbols?

2005-04-08 Thread Paul Schlie

>> On Fri, 2005-04-08 at 16:51 -0700, Dale Johannesen wrote:
>>> On Apr 8, 2005, at 4:40 PM, Mark Mitchell wrote:
 Daniel Berlin wrote:
 Your transform is correct.
 The FE is not. The variable is not read only.
 It is write once, then read-only.
>>>
>>> Diego, your analysis is exactly correct about what is happenning.
>>>
>>> I agree, in principle.  The C++ FE should not set TREE_READONLY on
>>> variables that require dynanmic initialization.  Until now, that's not
>>> been a problem, and it does result in better code.  But, it's now
>>> becoming a problem, and we have others way to get good code coming
>>> down the pipe.
>>>
>>> I do think the C++ FE needs fixing before Diego's change gets merged,
>>> though.  I can make the change, but not instantly.  If someone files a
>>> PR, and assigns to me, I'll get to it at some not-too-distant point.
>> 
>> It would be good to have a way to mark things as "write once, then
>> readonly" IMO. It's very common, and you can do some of the same
>> optimizations on such things that you can do on true Readonly objects.
>
> Some of these global properties probably belong in cgraph_var node's
> instead of shoving them into the tree structure.
> Especially if they are only going to be checked a few times (I can't
> imagine an optimization would ask the same variable if it is write-once
> more than once or twice)

Would it then still be possible for a target to identify all accesses to
"read-only" constant or initializing string/array/structure values which
may be designated to be literally stored in a read-only memory, vs being
in-lined as immediate values in the code, so that specialized instruction
sequences which may be generated to access these values if required by the
target?

(i.e. were "read-only" constant values are truly literally stored in
 a dedicated ROM memory space, and accessed directly as required; vs.
 initializing RAM based constants upon program startup.)

Re: Q: C++ FE emitting assignments to global read-only symbols?

2005-04-09 Thread Paul Schlie

> Richard Kenner wrote:
>>It would be good to have a way to mark things as "write once, then
>>readonly" IMO.  It's very common, and you can do some of the same
>>optimizations on such things that you can do on true Readonly objects.
>
> We used to do this in RTL and it caused all sorts of problems.
> 
> One is that supposed you have such a variable in a function and then
> you inline the function into a loop. Now all of a sudden, it's written
> more than once.

Wouldn't the ability to differentiate between the following be a good thing:

- constant variables (which need to be dynamically allocated & initialized)
vs.
- static constants (which don't need to be dynamically allocated/initialized
themselves, but may be accessed to initialize non-static
constants or variables, representing compile time
defined static data.)

(As it would certainly be necessary if a target needed to treat their
accesses differently based on how/where their values were stored.  And
would correspondingly be necessary to preserve the ability to differentiate
any indirect references to them as may be specified, passed, or returned as
arguments from function calls.)

Re: Q: C++ FE emitting assignments to global read-only symbols?

2005-04-10 Thread Paul Schlie

> Giovanni Bajo writes:
>> Dale Johannesen <[EMAIL PROTECTED]> wrote:
>>> I do think the C++ FE needs fixing before Diego's change gets merged,
>>> though.  I can make the change, but not instantly.  If someone files
>>> a PR, and assigns to me, I'll get to it at some not-too-distant
>>> point.
>>
>> It would be good to have a way to mark things as "write once, then
>> readonly" IMO. It's very common, and you can do some of the same
>> optimizations on such things that you can do on true Readonly objects.
>
> We had this once, it was called RTX_UNCHANGING_P, and it was a big mess.
> Progressively, we have been removing TREE_READONLY from C++ const variables
> (which are "write once" from the GCC IL point of view), and this is another
> instance of the same problem.
>
> We probably need a better way to describe C++ constructors, maybe something
> like WRITEONCE_EXPR which is a MODIFY_EXPR with a READONLY on its lhs, and
> which is supposed by the frontends when initialing such variables.

GCC's present consistent use of the readonly attribute to tag references to
declared or compiler generated read-only "static const data" (as may be used
to initialize runtime allocated constants or variables if not in-lined as
immediate data by the compiler) should be preserved; as such references seem
to be uniquely the only true read-only memory references static data which
may be potentially more optimally allocated and accessed as desired.
(therefore non static data references should not be marked read-only)

Correspondingly, the use of MEM_READONLY_P seems to be the only convenient
method available to enable identification of such references to enable the
generation of target specific instructions as may be required if static
data is directly mapped-to and accessed-from the target's ROM, in lieu of
requiring all "static data" be potentially redundantly copied into the
target's RAM prior to use, which may be impractical or even simply
impossible for targets which have very limited available RAM.

This approach basically works today, less a few bugs such as a static data
pointer's attributes not being consistently copied to the BLK mode pointer
for block memory moves; and the present inability to define a function
parameter which may point to static storage qualified type, although having
a normal or register storage type itself (which seems to be allowed by the
standard), as opposed to specifying the parameter as having a storage type
other than register itself (which is prohibited by the standard), thereby
enabling the consistent specification of differentiable "(static const) *"
and "(const) *" qualified pointer parameters.

 both per: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=20937

Re: Heads-up: volatile and C++

2005-04-15 Thread Paul Schlie

> Michael N. Moran wrote:
> I'm very much in favor of fine grained synchronization primitives
> in the compiler, but not of changes to volatile semantics.

I wonder if it would be sufficient (if not preferable) to only extend
(modify) the semantics for heap/stack and const volatile variables, as
enforcing existing volatile semantics for such variables are at best
of questionable existing value. i.e.:

 volatile vv;// volatile sync primitive, w/normal access semantics.
 volatile const vc;  // universal sync primitive, w/normal access semantics.
 volatile *vp = ;  // a true volatile reference.

 vv; // volatile sync, all pending volatile transactions logically emitted
 // into the code, but not hard physically synchronized via a HW sync.

 vc; // universal sync, all pending transactions hard physically sync'ed.

 *vp = ; // volatile access semantics enforced, with ordering
 // warranted between equivalent references, but not
 // otherwise (unless synchronized by referencing a
 // declared or (cast) volatile sync variable).

(with the exception of new sync semantics, non-reference volatile variables
 do not need to have volatile access semantics, as it would seem to serve
 no useful purpose for stack/heap allocated variables, and should be allowed
 to be optimized always just as for any other allocated variable; although
 their sync, semantic actions must be preserved, thereby may still be used
 as synchronized value semaphores, etc, or as simply sync's when no access
 would otherwise be required; and/or allow regular variables to be cast as
 (volatile), thereby enabling a arbitrary expression to insert a sync, i.e.:
 (volatile)x = y, or x = (const volatile)y; forcing a specified sync prior
 to, or after the assignment?)

Where the above is just expressed as a loose possible alternative to
strictly enforcing ordering between all volatile or otherwise transactions
without having to necessarily introduce another keyword, for good or bad.

Re: Heads-up: volatile and C++

2005-04-15 Thread Paul Schlie

> From: Paul Koning <[EMAIL PROTECTED]>
>>>>>> "Paul" == Paul Schlie <[EMAIL PROTECTED]> writes:
> 
>>> Michael N. Moran wrote: I'm very much in favor of fine grained
>>> synchronization primitives in the compiler, but not of changes to
>>> volatile semantics.
> 
>  Paul> I wonder if it would be sufficient (if not preferable) to only
>  Paul> extend (modify) the semantics for heap/stack and const volatile
>  Paul> variables, as enforcing existing volatile semantics for such
>  Paul> variables are at best of questionable existing value
> 
> I'm not sure I completely understand, but volatile heap variables are
> perfectly meaningful today.  For example, if I need to define a
> communication data area for the program to talk to some DMA I/O
> device, a volatile struct, or a struct some of whose members are
> volatile, allocated on the heap, makes perfect sense.

- yes, but only if through a reference it would seem, as otherwise a
  volatile object couldn't be modifiable by anything outside of the
  program itself. But acknowledge that this would require an allocated
  volatile's semantics be dependant on potential references to it being
  declared, which is likely too messy to be worthy of any consideration.

[RFC] warning: initialization discards qualifiers from pointer target type

2005-04-17 Thread Paul Schlie

> warning: initialization discards qualifiers from pointer target type
>
> This warning can not be disabled using -Wno-cast-qual
> (or any other warning flags). Is it intentional ?
> Otherwise I'll prepare patch.
>
> const char *a( void )
> {
>return "abc";
> }
>
> int main( void )
> {
>   char *s = a();
>return 0;
> }

Actually I'd like to think this is enforced as an "illegal" assignment.

(as it seem wrong to "discard pointer qualifiers" unless the assignment
 actually "copies" the literal string, which I don't believe it does;
 as any attempt to write to a const string literal should not be valid?)

Re: Reload Issue -- I can't believe we haven't hit this before

2005-04-19 Thread Paul Schlie

> Eric Botcazou writes:
>> Jeffrey A Law writes:
>> ...
>> Which faults because the memory location is actually  read-only memory.
>
> PR rtl-optimization/15248.
>
>> What's not clear to me is how best to fix this.
>>
>> We could try to delete all assignments to pseudos which are equivalent
>> to MEMs.
>>
>> We could avoid recording equivalences when the pseudo is set more than
>> once.
>>
>> Other possibilities?
>
> For 3.3 and 3.4, this was "fixed" by not recording memory equivalences that
> have the infamous RTX_UNCHANGING_P flag set.

As my understanding is that UNCHANGING is/should-be uniquely associated
with "literal static const data", which may have been declared either via
an explicit "static const" variable declaration, or indirectly as a literal
const value which may be used to initialize non-"static const" variable
values; and who's reference may not survive if all uses of it's declared
value are in-lined as immediate data.

As such all other uses of "UNCHANGING" potentially denoting const variables
are incorrect, and should be fixed; as "const" need not exist in tree data
to denote constant variables, as the trees should already be "correct by
construction", as the language's "front-end" should have prevented any
assignments to declared "const" variables other than initialization from
being constructed. Thereby as none of the optimizations should modify the
semantics of the tree, a const variable will never be assigned a logical
value other than it's designated initializer value, which may either be
spilled and reloaded as any other variable's value may be, or regenerated
by reallocating and reinitializing it if preferred. (Correspondingly all
tree/rtx optimizations which convert a (mem (symb)) => (mem (ptr))
reference must copy/preserve the original mem reference attributes to the
new one, as otherwise they will be improperly lost.)

Thereby MEM_READONLY_P uniquely applies to "literal static const data"
references to enable it's allocation and accesses to be reliably
identifiable as may be required to enable target specific "literal static
const data" allocation and code generation, as may be required if stored
and accessed from a target specific ROM memory region.

i.e.:

static const char s[] = "abc"; // s[4] = {'a','b','c',0} array
   // of "literal static const data"
   // MEM_READONLY_P (mem(symb(s))) == true;

static   char s[] = "abc"; // "C.x[4] = {'a','b','c',0} array
   // of "literal static const data"
   // MEM_READONLY_P (mem(symb(C.x))) == true;
   // s[4] array of char, init with C.x[]
   // MEM_READONLY_P (mem(symb(s))) == false;

   const char s[] = "abc"; // "C.x[4] = {'a','b','c',0} array
   // of "literal static const data"
   // MEM_READONLY_P (mem(symb(C.x))) == true;
   // s[4] array of char, init with C.x[]
   // MEM_READONLY_P (mem(symb(s))) == false;

some-const-char*-funct("abc"); // "C.x[4] = {'a','b','c',0} array
   // of "literal static const data"
   // some-const-char*-funct(C.x);

(Does that seem correct?)

Re: Reload Issue -- I can't believe we haven't hit this before

2005-04-19 Thread Paul Schlie

> From: Paul Schlie <[EMAIL PROTECTED]>
> 
> some-const-char*-funct("abc"); // "C.x[4] = {'a','b','c',0} array
>// of "literal static const data"
>// some-const-char*-funct(C.x);

Or rather I suspect it implies the allocation of a temporary to store
C.x[] into then passing the reference to the temporary (as there seems to
be no present way to define a function parameter which points to a "literal
static const data" object, vs. a generic allocated const object, because
although "static const char s[]" may denote an array of "literal static
const data", funct(static const char *) is interpreted as attempting to
declare the storage class of the pointer parameter, as opposed to qualifying
the storage class of the object it's pointing to)?

Re: Reload Issue -- I can't believe we haven't hit this before

2005-04-20 Thread Paul Schlie

> Jeffrey A Law wrote:
> ...
> But what worries me even more is spilling.  Say a pseudo has a hard reg
> assigned and is also equivalent to a readonly memory location.  Reload
> then decides to spill the pseudo out of the hard reg because the hard
> reg was needed for something else.  When that occurs we'd have to scan
> affected live range of the pseudo for assignments into the pseudo and
> remove them like I've suggested above.  But that brings another set of
> bookkeeping problems -- we're twiddling reg_equiv_XXX significantly
> later in the reload process, and I'm not sure how safe that's going to
> be.
> 
> Given how rare this situation occurs, I'm leaning towards simply
> ignoring the problematical equivalences.

Wouldn't the following be true for all pseduo's equated to readonly values?:

- if given a hard reg assignment, it may be spilled and reloaded just as any
  value may (thereby the pseudo is now equated with the spilled value), or
  alternatively, it's not literally spilled, it just loses it's hard reg
  assignment, and needs to be regenerated as required, as below):

- if not given a hard reg assignment, but referenced, it's reference is
  equivalent to (mem (symbol ))) with MEM_READONLY_P == true; were if its
  value ends up being stored directly into an allocated hard register, it
  now also has a hard-reg assignment, which is treated as above if the hard
  is reallocated.

Re: Reload Issue -- I can't believe we haven't hit this before

2005-04-20 Thread Paul Schlie

Sorry, to be clearer, what I meant by:

> From: Paul Schlie <[EMAIL PROTECTED]>
> ..  (thereby the pseudo is now equated with the spilled value), or ...

was:  (thereby the pseudo is now equated with an allocated temporary
   memory location, now storing the spilled value), or ...

Re: different address spaces (was Re: internal compiler error atdwarf2out.c:8362)

2005-04-21 Thread Paul Schlie

> James E Wilson wrote:
> ...
> It relies on MEM_EXPR always being set, which may not be true. But if
> there are places creating MEMs from decls without setting MEM_EXPR, then
> they probably should be fixed. MEMs created for things like spills to stack
> slots may not have MEM_EXPR set, but then they can't possibly have the
> eeprom attribute either, so that is OK.
>
> This could be a maintenance problem if other developers make changes and
> forget to keep the MEM_EXPR fields accurate. The more we use the MEM_EXPR
> fields, the less of a problem this will be.

- Might it be possible to introduce and use by convention a new macro which
  will always wrap a new pointer in a mem expression with attributes copied
  from the previous mem/symbol's reference enforced?
  
  (Thereby hopefully making it more convenient and less error prone? Ideally
  possibly supporting at least a few generic target definable attributes
  which are correspondingly preserved for all variable, literal-data, and
  function result and parameter object declaration references by default.)
 
> ...
> Presumably this is only a problem because some MEMs don't have the
> MEM_EXPR field set, in which case a better solution is to find the places
> that forget to set it and fix them.

- One of the things that's been eluding me, is that I can't seem to find
  where literal string constant mem references aren't being properly
  declared and/or preserved as READONLY/unchanging references, resulting
  in MEM_READONLY_P failing to identify them; although literal char array
  references, which seem logically equivalent, do seem to be properly
  declared/preserved?

  Any insight would be appreciated, -paul-

Re: different address spaces (was Re: internal compiler error atdwarf2out.c:8362)

2005-04-22 Thread Paul Schlie

> From: James E Wilson <[EMAIL PROTECTED]>
>> - One of the things that's been eluding me, is that I can't seem to find
>>   where literal string constant mem references aren't being properly
>>   declared and/or preserved as READONLY/unchanging references, resulting
>>   in MEM_READONLY_P failing to identify them; although literal char array
>>   references, which seem logically equivalent, do seem to be properly
>>   declared/preserved?
> 
> The tree to RTL conversion happens in expand_expr.  Just search for
> STRING_CST in that function and then follow the call chain in the
> debugger til you find the place that is trying to set RTX_UNCHANGING_P.
>Old code set it unconditionally, but current code is a bit more
> complciated.  Maybe there is something wrong with the new code.

Thanks. After going through the code, it's even further not clear why
STRING_CST string literal data references treated differently than
static const char array literal data references to begin with?

Why is this necessary?

(As if they weren't, it would seem that both a lot of code could simply be
deleted, and any odd issues resulting from the distinction would disappear?)

Re: different address spaces (was Re: internal compiler error atdwarf2out.c:8362)

2005-04-23 Thread Paul Schlie

> From: James E Wilson <[EMAIL PROTECTED]>
>> On Fri, 2005-04-22 at 04:58, Paul Schlie wrote:
>> Thanks. After going through the code, it's even further not clear why
>> STRING_CST string literal data references treated differently than
>> static const char array literal data references to begin with?
>> Why is this necessary?
> 
> Why is what necessary?  You haven't actually said anything concrete that
> I can answer.

Sorry. More specifically:

- Why are string literal character arrays not constructed and expanded as
  character array literals are? (as although similar, there are distinct
  sets of code expanding references to each of them; which seems both
  unnecessary, and error prone (as evidenced by string literal memory
  references not being properly identified as READONLY, although their
  equivalent array representations are treated properly for example?)

- If the only difference which exists between them is how their values
  are "pretty-printed" as strings, vs. array values; then it would seem
  that although they may be labeled differently, but utilize be constructed
  and expanded equivalently? If this is not true, why must they be distinct?

- I.e.

char x[3] = "abc";
  
  seems as if it should be literally equivalent in all respects to:

char y[3] = {'a','b','c'};

  but are not constructed/expanded as being equivalent?

Re: different address spaces

2005-04-23 Thread Paul Schlie

> From: Zack Weinberg <[EMAIL PROTECTED]>
>> James E Wilson <[EMAIL PROTECTED]> writes:
>>
>>>   unnecessary, and error prone (as evidenced by string literal memory
>>>   references not being properly identified as READONLY, although their
>>>   equivalent array representations are treated properly for example?)
>> 
>> If true, that sounds like a bug.  This is the only interesting issue
>> here from my point of view.  You might consider filing a bug report into
>> bugzilla for this.  Or contributing a patch.
> 
> This might just be the special case for string constants in C, that
> their type is "char*" despite their being allocated in read-only
> memory.  Paul, before filing a bug, find out whether -Wwrite-strings
> makes this alleged misbehavior go away; if it does, it's not a bug.

I just double checked, neither -Wwrite-strings nor -fconst-strings seems
to affect the READONLY attribute which should be associated with memory
references to static const strings; an earlier PR was already filed:
 
 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21018

(with respect to: -Wwrite-strings, I would have thought that the option,
 although presently depreciated and disabled by default, would only have
 enabled writes to string literal references be specified at the language
 front-end level, but not affect the READONLY attribute associated with
 them at the tree level, as regardless of the option enabling such writes
 to be accepted, the objects are still a static literal constants, and may
 simply not be physically writeable regardless of -Wwrite-strings?)

Re: different address spaces

2005-04-23 Thread Paul Schlie

> From: Paul Schlie <[EMAIL PROTECTED]>
> ...
> (with respect to: -Wwrite-strings, I would have thought that the option,
>  although presently depreciated and disabled by default, would only have
>  enabled writes to string literal references be specified at the language
>  front-end level, but not affect the READONLY attribute associated with
>  them at the tree level, as regardless of the option enabling such writes
>  to be accepted, the objects are still a static literal constants, and may
>  simply not be physically writeable regardless of -Wwrite-strings?)

Sorry, apparently -fwritable-strings was depreciated, not -Wwrite-strings;
however regardless, since there's no such thing as a string type in C/C++,
strings are arrays of char, so it would seem most consistent to treat them
as such by default.  Which means by default, you can't write to a string
literal/static-const, although may specify their static const literal value:
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=21018

Proposal: GCC core changes for different address spaces

2005-04-23 Thread Paul Schlie

> Martin Koegler wrote:
> ...
> Before I start experimenting with this, I want other people opinions,
> how acceptable this proposal will before GCC mainline or if it can be
> improved.

- sound's good, and a natural generalization of current mem ref attributes.

(However ideally, function parameter and result value references would need
 to be similarly qualify-able in order to enable the proper attributes to
 be associated and enforced when references to such attributed objects are
 passed-to/returned-from function calls; as otherwise the object's storage
 reference attribute will be lost; which could in theory could be enabled
 by allowing the qualification an arbitrary variable, parameter, result
 storage type reference as a natural extension; thereby allowing the
 specification of a pointer parameter to a static const value be specified
 as "(static const)*", as opposed to being parsed as "static (const *) by
 default which specifies a static pointer parameter which is prohibited,
 therefore wouldn't introduce an ambiguity if the optionally enabled.)

Re: Proposal: GCC core changes for different address spaces

2005-04-23 Thread Paul Schlie

> From: Paul Schlie <[EMAIL PROTECTED]>
>> Martin Koegler wrote:
>> ...
>> Before I start experimenting with this, I want other people opinions,
>> how acceptable this proposal will before GCC mainline or if it can be
>> improved.
> 
> - sound's good, and a natural generalization of current mem ref attributes.
> 
> (However ideally, function parameter and result value references would need
>  to be similarly qualify-able in order to enable the proper attributes to
>  be associated and enforced when references to such attributed objects are
>  passed-to/returned-from function calls; as otherwise the object's storage
>  reference attribute will be lost; which could in theory could be enabled
>  by allowing the qualification an arbitrary variable, parameter, result
>  storage type reference as a natural extension; thereby allowing the
>  specification of a pointer parameter to a static const value be specified
>  as "(static const)*", as opposed to being parsed as "static (const *) by
>  default which specifies a static pointer parameter which is prohibited,
>  therefore wouldn't introduce an ambiguity if the optionally enabled.)

To be somewhat clearer, a storage class could potentially qualify the type
of an referenced object, just as the target specific type qualifiers may
(i.e. rom, eeprom, progmem, etc.) when used within the context of a function
parameter or result type specification, i.e. (rom const)* or (static const)*
as a possibility to enable the more generic, and somewhat less target
specific qualification of static-const/literal and label/function() mem
references in addition to the use of more target specific named ones?

Re: Proposal: GCC core changes for different address spaces

2005-04-24 Thread Paul Schlie

> From: Martin Koegler <[EMAIL PROTECTED]>
>> On Sat, Apr 23, 2005 at 02:26:54PM -0400, Paul Schlie wrote:
>> - sound's good, and a natural generalization of current mem ref attributes.
>> 
>> (However ideally, function parameter and result value references would need
>>  to be similarly qualify-able in order to enable the proper attributes to
>>  be associated and enforced when references to such attributed objects are
>>  passed-to/returned-from function calls; as otherwise the object's storage
>>  reference attribute will be lost; which could in theory could be enabled
>>  by allowing the qualification an arbitrary variable, parameter, result
>>  storage type reference as a natural extension; thereby allowing the
>>  specification of a pointer parameter to a static const value be specified
>>  as "(static const)*", as opposed to being parsed as "static (const *) by
>>  default which specifies a static pointer parameter which is prohibited,
>>  therefore wouldn't introduce an ambiguity if the optionally enabled.)
> 
> Function calls and their return values are also represented in the tree
> representation.
> There the attributes are available as part of the type representation. Type
> compatibility between acutual and formal parameters is also handled by GCC
> front/middle end using the tree representation.

- Yes, I simply observe (below) that only literal/static-const-data
  pointer qualification need be expressible as an extension to function
  parameter and result type specifications, without needing to introduce
  new keyword qualifiers, or alter the semantics of any existing valid
  program.

> The only point, where some type information about function parameters may
> be missing, is the call of library functions, which are created by an RTL
> optimizer. The formal parameters of a libary call are known. If a RTL
> expression can be used as an acutal parameter, is checked by GCC already
> in some way.

- If I understand you correctly, I agree that in-lined function calls may
  potentially be able resolve an argument's true nature, and treat
  dereferences to it appropriately but this is likely too restrictive;
  alternatively there may be some mechanism to attempt to identify the type
  of reference passed at run-time but likely at a severe loss of efficiency,
  therefore not likely acceptable.

  Therefore it seems that the only generally correct and efficient solution
  is to enable the (static const, i.e. literal) qualification of a pointer,
  as it represents the only data class which has no standardized way to
  qualify a pointer parameter as pointing to, because by default a storage
  class is presumed to be associated with the parameter itself although
  illegal, not qualifying it (which could be considered a useful and
  backward compatible extension, thereby enabling function parameter and
  result type pointers to be qualified as pointing to literal/static-const
  data which may then be accesses as desired in a target specific way.
  (which I believe is likely sufficient, see below)

> Can tell me, who you would change the RTL and give an concrete example,
> how to use your changes and what befit they have.

I believe the following need to be supported:

- all existing references to static const string literals need to be
  correctly wrapped in (mem string-cst-ptr) such that MEM_READONLY_P is
  true as is already the case with all other static-const-data references.
  (although admittedly I haven't been able to figure out where the root
   problem is yet.)

- optionally enable the C/C++ parser to accept and enforce (static const)*
  as a function parameter and result type pointer type qualification,
  thereby all code within the body of the function will wrap all
  dereferences to it in a (mem ptr) such that MEM_READONLY_P is true.

- thereby a target may simply if desired test rtl memory operands i.e.

 (define_insn "loadqi"
  [(set (match_operand:QI 0 "register_operand" "")
(match_operand:QI 1 "memory_operand"   ""))]
  ""
  "* return output_loadqi (insn, operands, NULL);"
  )

  const char *
  output_loadqi (rtx insn, rtx operands[], int *l)
  {
...
if (MEM_READONLY_P (operands[1]))
  return output_loadqi_rom (rtx insn, operands[0], operands[2]);
else
  return output_loadqi_ram (rtx insn, operands[0], operands[2]);
...
  }

- all code references should already be identifiable as label or function
  symbol static constant values.

- all remaining references represent non-code/static-data-literal data
  references (which should likely correspondingly always be wrapped in a
  correctly attributed (mem ...) expression for consistency even when an
  effective address is computed and stored in a logical register i.e.:

Re: Proposal: GCC core changes for different address spaces

2005-04-25 Thread Paul Schlie

> Etienne Lorrain <[EMAIL PROTECTED]> wrote:
> I see a lot more address space than that in generic processors, not only
> embedded systems with EEPROM.

- Almost, but they aren't "memory spaces" in the sense that programming
  languages expect them to be, as they impose implied restrictive semantics
  that are not understood by the languages/compilers. Therefore are truly
  distinct. For example, what do these declarations mean? At what addresses
  would you expect them to be allocated to? The problem is that it matters,
  as they aren't truly memory spaces which correspond to distinct classes
  of objects which the language already understands.

EEPROM static int * x;
IOPORT complex y;

  And may already be satisfied by defining target specific functions which
  map logical program values to specific semantics, satisfy all examples
  below even if it requires the specification of target specific assembly
  I/O routines to act as an interface; so it's not clear if anything further
  is required. i.e. : (write_i_o dest value), or may be specified somewhat
  more naturally in C++ by overloading operators, to understand what to do
  when an IO type object is written for example.

  Which is very different from saying for example that a back end should
  be able to map (independently of the language) code into one address
  space, literal data into another, and runtime allocated variables into
  a third for example, which only requires that such declarations and
  corresponding references be visible during code generation (which can't
  be satisfied without internal cooperation from the complier with the
  target specific code generator).

> For ia32, there is the IO port memory space, and it seem logical for me
> to do:
> struct UART_str {
>  unsigned char buffer;
>  unsigned char flags;
>  unsigned short speed;
>  } __attribute__ ((section(IOPORT)));
> You have also the "Machine State Registers" (MSR) and the performance
> counters (PeMo) stuff. I would add segment relative address (cs, ds, fs,
> gs based address space) are separate - but just for my own set of problems.
>
> For PowerPC, you have the "Special Function Registers", accessed
> with mtspr/mfspr assembly instruction, and the "Device Control Register",
> accessed with mtdcr/mfdcr assembly instruction. Those regiters form
> complex data structure which are worth describing by C structures and
> bitfields if you do not want hundred of pages of #defines for masks
> and addresses. Note that the real reason they are in an orthogonal data
> space is that they shall not be cached like standard memory.
>
> I see also PCI space as a different memory space, you can have 32 or
> 64 bits address space behind a pair of address/data "registers", and
> also PCI malloc-able area when you are looking for a free address range
> to map a device.
>
>  That is not exactly what is being discussed here, I do not know how
> it fits the problem - but these are different address space, and having
> an address in one of those "memory space" is a valid concept, having
> some debug information for the debugger to display bits and enums
> in a human understandable form would be nice.

Re: A plan for eliminating cc0

2005-04-26 Thread Paul Schlie

> From: Alexandre Oliva <[EMAIL PROTECTED]>
>> On Mar 28, 2005, Paul Schlie <[EMAIL PROTECTED]> wrote:
> 
>> More specifically, if GCC enabled set to optionally specify multiple targets
>> for a single rtl source expression, i.e.:
> 
>>   (set ((reg:xx %0) (reg CC) ...) (some-expression:xx ...))
> 
> There's always (set (parallel (...)) (some-expression)).  We use
> parallels with similar semantics in cumulative function arguments, so
> this wouldn't be entirely new, but I suppose most rtl handlers would
> need a bit of work to fully understand the implications of this.
> 
> Also, the fact that reg CC has a different mode might require some
> further tweaking.
> 
> Given this, we could figure out some way to create lisp-like macros to
> translate input shorthands such as (set_cc (match_operand) (value))
> into the uglier (set (parallel (match_operand) (cc0)) (value)),
> including the possibility of a port introducing multiple
> variants/modes for set_cc, corresponding to different sets of flags
> that various instructions may set.

Understood; any thoughts as how it may be similarly specified which
cc-mode regs may be clobbered, in addition to updated, or left alone?

Which leads me to wonder if it may be worth while potentially accepting
and classifying something like the following as a single-set:

  [(set (match_operand %0) (some-expression ...))
   (update (ccx) (ccy)) (clobber (ccz))]

Specifying that:
- %0 is not bound to (some-expression)
- ccx, ccy are now equivalent to %0
- ccz now has no equivalency (i.e. undefined)
- any other potentially defined cc-mode regs equivalency's are unchanged.

(as possibly a simper and somewhat more familiar and flexible approach?)

Re: A plan for eliminating cc0

2005-04-26 Thread Paul Schlie

> From: Alexandre Oliva <[EMAIL PROTECTED]>
>> On Mar 28, 2005, Paul Schlie <[EMAIL PROTECTED]> wrote:
> 
>> More specifically, if GCC enabled set to optionally specify multiple targets
>> for a single rtl source expression, i.e.:
> 
>>   (set ((reg:xx %0) (reg CC) ...) (some-expression:xx ...))
> 
> There's always (set (parallel (...)) (some-expression)).  We use
> parallels with similar semantics in cumulative function arguments, so
> this wouldn't be entirely new, but I suppose most rtl handlers would
> need a bit of work to fully understand the implications of this.
> 
> Also, the fact that reg CC has a different mode might require some
> further tweaking.
> 
> Given this, we could figure out some way to create lisp-like macros to
> translate input shorthands such as (set_cc (match_operand) (value))
> into the uglier (set (parallel (match_operand) (cc0)) (value)),
> including the possibility of a port introducing multiple
> variants/modes for set_cc, corresponding to different sets of flags
> that various instructions may set.

(sorry had to fix a typo, should be somewhat more sensible now):

Understood; any thoughts as how it may be similarly specified which
cc-mode regs may be clobbered, in addition to updated, or left alone?

Which leads me to wonder if it may be worth while potentially accepting
and classifying something like the following as a single-set:

  [(set (match_operand %0) (some-expression ...))
   (update (ccx) (ccy)) (clobber (ccz))]

Specifying that:
- %0 is bound to (some-expression)
- ccx, ccy are now equivalent to %0
- ccz now has no equivalency (i.e. undefined)
- any other potentially defined cc-mode regs equivalency's are unchanged.

(as possibly a simper and somewhat more familiar and flexible approach?)

Re: different address spaces

2005-04-28 Thread Paul Schlie

> Martin Koegler wrote:
> I have redone the implementation of the eeprom attribute in my prototype.
> It is now a cleaner solution, but requires larger changes in the core,
> but the changes in the core should not affect any backend/frontend, if
> it does not uses them (except a missing case in tree_copy_mem_area, which
> will cause an assertion to fail).
> ...
> +void
> +tree_copy_mem_area (tree to, tree from)
> 

Alternatively might it make sense to utilize the analogy defined in rtl.h?

  /* Copy the attributes that apply to memory locations from RHS to LHS.  */
  #define MEM_COPY_ATTRIBUTES(LHS, RHS)\
(MEM_VOLATILE_P (LHS) = MEM_VOLATILE_P (RHS),\
 MEM_IN_STRUCT_P (LHS) = MEM_IN_STRUCT_P (RHS),\
 MEM_SCALAR_P (LHS) = MEM_SCALAR_P (RHS),\
 MEM_NOTRAP_P (LHS) = MEM_NOTRAP_P (RHS),\
 MEM_READONLY_P (LHS) = MEM_READONLY_P (RHS),\
 MEM_KEEP_ALIAS_SET_P (LHS) = MEM_KEEP_ALIAS_SET_P (RHS),\
 MEM_ATTRS (LHS) = MEM_ATTRS (RHS))

As unfortunately GCC already inconsistently maintains and copies attributes
to memory references, it seems that introducing yet another function to do
so will only likely introduce more inconsistency.

Therefore wonder if it may be best to simply define MEM_ATTRS as you have
done, and then consistently utilize MEM_COPY_ATTRIBUTES to properly copy
attributes associated with memory references when new ones as may need to
be constructed (as all effective address optimizations should be doing, as
otherwise the attributes associated with the original reference will be
lost). I.e.:

Instead of: (as occasionally incorrectly done)
 rtx addr1 = copy_to_mode_reg (Pmode, XEXP (operands[1], 0));// some EA
 emit_move_insn (tmp_reg_rtx, gen_rtx_MEM (QImode, addr1)); // lose attribs
 emit_move_insn (addr1, gen_rtx_PLUS (Pmode, addr1, const1_rtx)); // new EA

Something like this is necessary:

 rtx addr1 = copy_to_mode_reg (Pmode, XEXP (operands[1], 0));// some EA
 rtx mem_1 = gen_rtx_MEM (QImode, addr1); // gen mem
 MEM_COPY_ATTRIBUTES (mem_1, operands[1]);// copy attributes
 emit_move_insn (tmp_reg_rtx, mem_1); // read value
 emit_move_insn (addr1, gen_rtx_PLUS (Pmode, addr1, const1_rtx)); // new EA

Re: different address spaces

2005-04-28 Thread Paul Schlie

> From: Martin Koegler <[EMAIL PROTECTED]>
>> On Thu, Apr 28, 2005 at 12:37:48PM -0400, Paul Schlie wrote:
>>> Martin Koegler wrote:
>>> I have redone the implementation of the eeprom attribute in my prototype.
>>> It is now a cleaner solution, but requires larger changes in the core,
>>> but the changes in the core should not affect any backend/frontend, if
>>> it does not uses them (except a missing case in tree_copy_mem_area, which
>>> will cause an assertion to fail).
>>> ...
>>> +void
>>> +tree_copy_mem_area (tree to, tree from)
>>> 
>> 
>> Alternatively might it make sense to utilize the analogy defined in rtl.h?
>> 
>>   /* Copy the attributes that apply to memory locations from RHS to LHS.  */
>>   #define MEM_COPY_ATTRIBUTES(LHS, RHS)\
>> (MEM_VOLATILE_P (LHS) = MEM_VOLATILE_P (RHS),\
>>  MEM_IN_STRUCT_P (LHS) = MEM_IN_STRUCT_P (RHS),\
>>  MEM_SCALAR_P (LHS) = MEM_SCALAR_P (RHS),\
>>  MEM_NOTRAP_P (LHS) = MEM_NOTRAP_P (RHS),\
>>  MEM_READONLY_P (LHS) = MEM_READONLY_P (RHS),\
>>  MEM_KEEP_ALIAS_SET_P (LHS) = MEM_KEEP_ALIAS_SET_P (RHS),\
>>  MEM_ATTRS (LHS) = MEM_ATTRS (RHS))
>> 
>> As unfortunately GCC already inconsistently maintains and copies attributes
>> to memory references, it seems that introducing yet another function to do
>> so will only likely introduce more inconsistency.
>> 
>> Therefore wonder if it may be best to simply define MEM_ATTRS as you have
>> done, and then consistently utilize MEM_COPY_ATTRIBUTES to properly copy
>> attributes associated with memory references when new ones as may need to
>> be constructed (as all effective address optimizations should be doing, as
>> otherwise the attributes associated with the original reference will be
>> lost). I.e.:
>> ...
> If you want to use the memory attributes after all reload and optimization
> passes, GCC will need to be extended with the missing set of the memory
> attributes. This is not my goal (I try to provide the correct
> MEM_REF_FLAGS for the RTL expand pass with all necessary earlier steps
> to get the adress spaces information).
> 
> Correcting all this, will be a lot of work. We may not forget the machine
> description, which can also create MEMs.

- I persume that machine descriptions which already don't maintain mem
  attributes simply don't need them, as they most likely presume every
  pointer simply references the same memory space. (hence don't care).

> I have updated my patch.
> 
> For the MEM_AREA for the tree, I have eliminated many explicit set operation
> of this attribute (build3_COMPONENT_REF and build4_ARRAY_REF completly).
> 
> For certain tree codes, the build{1,2,3,4} automatically generate the correct
> value of MEM_AREA out of their parameters. Only for INDIRECT_REF, this is
> not possible.

- if the original mem ref attributes derived from their originally enclosed
  symbol were maintained, any arbitrary type of memory reference would work.

> If we want get every time the correct attribues, we need to add a source for
> the memory attributes to gen_rtx_MEM. For an automtic generation of the
> memory attributes, to few information is in the RTL available.

- that would seem like a convenient way to to it as opposed to folks having
  to remember to do it properly.

> I have added compatibilty checking for memory areas as well as correct
> handling of them for ?:.
> 
> The new version is at http://www.auto.tuwien.ac.at/~mkoegler/gcc/gcc1.patch

- thanks.

Re: different address spaces

2005-04-28 Thread Paul Schlie

> From: Martin Koegler <[EMAIL PROTECTED]>
>> On Thu, Apr 28, 2005 at 03:43:22PM -0400, Paul Schlie wrote:
>>> For the MEM_AREA for the tree, I have eliminated many explicit set operation
>>> of this attribute (build3_COMPONENT_REF and build4_ARRAY_REF completly).
>>> 
>>> For certain tree codes, the build{1,2,3,4} automatically generate the
>>> correct
>>> value of MEM_AREA out of their parameters. Only for INDIRECT_REF, this is
>>> not possible.
>> 
>> - if the original mem ref attributes derived from their originally enclosed
>>   symbol were maintained, any arbitrary type of memory reference would work.
> 
> Can an optimizer theoretically not change fundamentally the structure of a
> memory reference, so that the attributes will not be valid any more?

- I don't see how that could be? As I would expect for example:

  static const s[] = "abc"; // to declare a READONLY array of char (s.x)

  (volatile)x = (volatile char*)(&s[0] + 1); //
  (volatile)x = (volatile char*)(&s[0] + 1); // to generate something like:

   (set (mem (symb x)) (mem (plus (plus (symb s.x) (const 0)) (const 1))
   (set (mem (symb x)) (mem (plus (plus (symb s.x) (const 0)) (const 1))

  Which may be "optimized" by folding the constants representing the
  memory reference's effective address, thereby potentially being able
  to pre-calculate the references effective address:

   (set temp-ptr (plus (symbol s.x) 1))
   (set (mem (symb x)) (mem temp-ptr)) ; with either original mem ref used,
   (set (mem (symb x)) (mem temp-ptr)) ; or new one with attributes copied.

  Thereby although the effective address calculation itself may be
  "optimized", the fundamental attributes associated with the declared
  object must be maintained. Thereby in effect all effective address
  optimizations logically occur within the attributed scope of the original
  memory references context, i.e.:

  (mem (some-arbitrary-effective-address-expression))
  ->
  (mem (some-optimized-effective-address-expression))

> For adress spaces, the biggest problem can be, if access operations to
> different address spaces are joined:

- there's no such thing:

  ptr-diff = (mem ptr) +/- (mem ptr)
  (mem ptr) = (mem ptr) +/- ptr-diff

  That's all that's possible (all other interpretations are erroneous)

> eg:
> if(???)
> {
> REG_1=...;
> REG_1+=4;
> do((MEM REG_1));
> }
> else
> {
> REG_2=...;
> REG_2+=4;
> do((MEM REG_2));
> }
> where REG_1 and REG_2 are pointer to different address spaces.
> 
> to:
> if()
> REG_1=...;
> else
> REG_1=...;
> REG_1+=4;
> do((MEM REG_1));
> 
> to eg. save space.

- which is erroneous unless both reg_1 and reg_2 are pointers to objects
  with identical mem ref attributes (which is also why mem attributes should
  be maintained correctly and consistently at the tree level throughout all
  phases of optimization, as GCC presently has no ability to differentiate
  between types of objects referenced through the use of typed pointers).

> Even at tree level, such an change could be done by an optimizer,
> if he does not take care of the address spaces.
> 
> For RTL level, a problem could be, if some information about
> eg how the data is packed, would be stored in the memory attributes.
> 
> If an optimizer decides, that not the original pointer value is important,
> but pointer to an address inside the data would be more useful, then simply
> copying the attributes may give a wrong view about the new MEM.
> 
> Because of my experiments with GCC, I conclude, that if we want any kind of
> attributes (either in tree or RTL), everything, which deal with it, need
> to know about all needed side effects, which can be a problem for
> backend specific attributes.
> 
> Introducing support for named address spaces in GCC would not be a big
> problem, it should be no change for non aware backends as frontends. It
> could be written in such way, that an bug in this code cause no regression
> for not address space using targets.
> 
> The big amount of work will be, to verify that no optimizer will introduce
> wrong optimizations.
> 
> For my patch, I am still not sure, if I even handle all the MEM_AREA correct
> in all sitations or if I need to add the MEM_AREA to other expression too.

- unfortunately I don't believe there's any option, as GCC is already
  suffering from corners being cut by not properly differentiating between
  addresses and offsets, which only further confuses things when attempting
  to identify the type of object an arbitrary resulting pointer actually
  references; however it seems that restricting effective address
  optimizations to occur only within the context of their original mem ref
  representation is a great start, and will likely quickly shake out any
  miss-optimizations that may remain. (as otherwise, GCC simply won't be
  able reliably identify the memory attributes associated with an arbitrary
  effective address; as may be necessary to also be visible to the back-end)

1 2 3 >

1 - 100 of 223 matches

Mail list logo