Re: COND_EXPRs in GIMPLE code and vectorizer

2006-09-28 Thread Paolo Bonzini



Yes.  This is also true for a few other expressions.  IIRC, the
gimplifier expands MAX_EXPR into control flow, even though it is legal
gimple.


In fact it did this, but later it was removed from the gimplifier 
because it generated worse code.



The usual reason for this type of thing is that the ability to have
them as the RHS of a MODIFY_EXPR was added much later than the
gimplifier, and it was decided that in order to avoid possible
performance regressions, the existing behavior of lowering wouldn't be
changed.

If, of course, there is some good reason to always use the data
dependent form over the control dependent form, we're always willing
to explore changing the gimplifier to not do the lowering.


MIN_EXPR/MAX_EXPR in fact are a very good example, because we discovered 
that it would be best not to do the lowering.  In fact, there was no 
reason why the optimizers would mishandle MIN_EXPR and MAX_EXPR (they 
are just like any other tcc_binary node).  If we had a tree combiner, it 
would probably be better to allow COND_EXPR throughout the whole 
compilation, and possibly to generate them from simple phi's like we do 
in the tree-ssa-phiopt pass.  This would leverage some code in fold.


Paolo


Re: Documentation for loop infrastructure

2006-09-28 Thread Ira Rosen


Sebastian Pop <[EMAIL PROTECTED]> wrote on 26/09/2006 21:24:18:

> It is probably better to include the loop indexes in the example, and
> modify the syntax of the scev for making it more explicit, like:
>
> @smallexample
>   for1 i
>  for2 j
>   *((int *)p + i + j) = a[i][j];
> @end smallexample
>
> and the access function becomes: @[EMAIL PROTECTED], + [EMAIL PROTECTED]
>

Done.

I guess, I'll commit my part as soon as loop.texi (and Dependency analysis
part)
is committed.

Ira


> The data references are discovered in a particular order during the
> scanning of the loop body: the loop body is analyzed in execution
> order, and the data references of each statement are pushed at the end
> of the data reference array.  Two data references syntactically occur
> in the program in the same order as in the array of data references.
> This syntactic order is important in some classical data dependence
> tests, and mapping this order to the elements of this array avoids
> costly queries to the loop body representation.

Three types of data references are currently handled: ARRAY_REF,
INDIRECT_REF and COMPONENT_REF. The data structure for the data reference
is @code{data_reference}, where @code{data_reference_p} is a name of a
pointer to the data reference structure. The structure contains the
following elements:

@itemize
@item @code{base_object_info}: Provides information about the base object
of the data reference and its access functions. These access functions
represent the evolution of the data reference in the loop relative to
its base, in keeping with the classical meaning of the data reference
access function for the support of arrays. For example, for a reference
@code{a.b[i][j]}, the base object is @code{a.b} and the access functions,
one for each array subscript, are:
@[EMAIL PROTECTED], + [EMAIL PROTECTED], @{j_init, +, [EMAIL PROTECTED]

@item @code{first_location_in_loop}: Provides information about the first
location accessed by the data reference in the loop and about the access
function used to represent evolution relative to this location. This data
is used to support pointers, and is not used for arrays (for which we
have base objects). Pointer accesses are represented as a one-dimensional
access that starts from the first location accessed in the loop. For
example:

@smallexample
  for1 i
 for2 j
  *((int *)p + i + j) = a[i][j];
@end smallexample

The access function of the pointer access is @[EMAIL PROTECTED], + [EMAIL 
PROTECTED]
relative to @code{p + i}. The access functions of the array are
@[EMAIL PROTECTED], + [EMAIL PROTECTED] and @[EMAIL PROTECTED], +, [EMAIL 
PROTECTED]
relative to @code{a}.

Usually, the object the pointer refers to is either unknown, or we can’t
prove that the access is confined to the boundaries of a certain object.

Two data references can be compared only if at least one of these two
representations has all its fields filled for both data references.

The current strategy for data dependence tests is as follows:
If both @code{a} and @code{b} are represented as arrays, compare
@code{a.base_object} and @code{b.base_object};
if they are equal, apply dependence tests (use access functions based on
base_objects).
Else if both @code{a} and @code{b} are represented as pointers, compare
@code{a.first_location} and @code{b.first_location};
if they are equal, apply dependence tests (use access functions based on
first location).
However, if @code{a} and @code{b} are represented differently, only try
to prove that the bases are definitely different.

@item Aliasing information.
@item Alignment information.
@end itemize

> The structure describing the relation between two data references is
> @code{data_dependence_relation} and the shorter name for a pointer to
> such a structure is @code{ddr_p}.  This structure contains:

Fwd: Re: Visibility=hidden for x86_64 Xen builds -- problems?

2006-09-28 Thread Jan Beulich
In building Xen we observed a build problem when using binutils 2.15
that wasn't visible for those of us using newer binutils versions.
However, I believe that we should have seen this in all cases.

Xen gets compiled with -fPIC, and we recently added a global visibility
pragma to avoid the cost of going through the GOT for all access to
global data objects (PIC isn't really needed here, all we need is
sufficient compiler support to get the final image located outside the
+/-2Gb ranges, but large model support is neither there in older
compilers nor do we really need all of it either).

In a kallsyms-like approach, symbol information gets embedded in the
final executable, with the first linking stage not having available the
respective table symbols. For that reason, they are being attributed
weak.
After adding the global visibility hidden pragma, even these weak
symbols get accessed (or their address calculated) via RIP-relative
addressing. While accessing them this way is probably acceptable
from the compiler's perspective (given the hidden attribute it may
safely assume the symbol is in the same executable image as the
accessing code), calculating its address certainly isn't, as the symbol
may not be present at all (and after all, comparing the address of
the weak object against NULL is the only method I know to check
presence of the symbol at runtime).

So the questions are:

1) Why does gcc not use a GOT reference when calculating the 
address of a weak symbol here?

2) Why does the linker silently resolve the (32-bit PC-relative)
relocation targeting an undefined weak symbol, yielding at
run-time a non-zero address? While I can see the point of
assisting the compiler here under the assumption that it has
checked the address elsewhere and hence the actual access
is supposed to never happen at runtime, detecting the
(incorrect) use of the same relocation (access method) in
either assembly code or address calculations should be
mandatory; to distinguish the two, two distinct relocation
types would then be needed (one that keeps the linker
silent, and another one that doesn't).

Thanks, Jan

>>> Keir Fraser <[EMAIL PROTECTED]> 28.09.06 10:56 >>>
>On 28/9/06 09:23, "Keir Fraser" <[EMAIL PROTECTED]> wrote:
>> On 28/9/06 07:46, "Jan Beulich" <[EMAIL PROTECTED]> wrote:
>> Keir Fraser <[EMAIL PROTECTED]> 27.09.06 20:14 >>>
 So it seems that older versions of gcc (before 4.1.1) don't do anything 
 more
 with the pragma than -fvisibility=hidden. So currently the pragma at best
 does nothing (extern references still go through GOT) and at worst breaks
 the build. :-)
>>> 
>>> That'd be contrary to my observations; I'll check into this.
>> 
>> Thanks. I am using a personal build of vanilla gcc-4.1.1 by the way.
>
>...and that's the problem. I'm using it with a too-old version of binutils
>(version 2.15). Pretty much any newer version seems to relocate the weak
>reference to 8300 (i.e., I guess rounds down to a 2GB boundary).




Re: How do I build C++ for xscale-elf?

2006-09-28 Thread Jack Twilley

Jack Twilley wrote:

Kai Ruottu wrote:

Jack Twilley wrote:
I am trying to build gcc on a FreeBSD 6.1-STABLE system.  If there's 
more information I can give you, please ask.


What was the GCC version tried?   The new gcc-4.1.1 seems to require 
the '--disable-shared'
for instance with ARM, otherwise it tries to link against the 
"created" 'libgcc_s.so.1' despite
of using the '--with-newlib'.  A stupid bug and a stupid workaround 
('newlib' neither the target,
'xscale-elf', don't support shared libraries).  With the gcc-4.1.1 
also the '--disable-shared' is

obligatory...



I tried gcc-4.1.1 from SVN (gcc_4_1_1_release) with the following 
configure line:


Doh.  I forgot to insert the configure line.

./configure --with-newlib --disable-shared --target=xscale-elf 
--enable-language=c,c++


Am I missing something here?  Thank you in advance!

Jack.



It fails on compiling regex.c in xscale-elf/libiberty with a whole bunch 
of errors about not bein able to find sys/types.h and strings.h and the 
like.


I have installed binutils-2.17 from SVN (binutils-2_17) for xscale-elf. 
 Its version of libiberty installed into /usr/local/lib/ which makes me 
wonder how many things I accidentally overwrote while building that, but 
I'll rebuild FreeBSD later.


Should I have not built binutils?  Was there something else I missed?

Jack.




Re: COND_EXPRs in GIMPLE code and vectorizer

2006-09-28 Thread Roberto COSTA

Paolo Bonzini wrote:



Yes.  This is also true for a few other expressions.  IIRC, the
gimplifier expands MAX_EXPR into control flow, even though it is legal
gimple.



In fact it did this, but later it was removed from the gimplifier 
because it generated worse code.



The usual reason for this type of thing is that the ability to have
them as the RHS of a MODIFY_EXPR was added much later than the
gimplifier, and it was decided that in order to avoid possible
performance regressions, the existing behavior of lowering wouldn't be
changed.

If, of course, there is some good reason to always use the data
dependent form over the control dependent form, we're always willing
to explore changing the gimplifier to not do the lowering.



MIN_EXPR/MAX_EXPR in fact are a very good example, because we discovered 
that it would be best not to do the lowering.  In fact, there was no 
reason why the optimizers would mishandle MIN_EXPR and MAX_EXPR (they 
are just like any other tcc_binary node).  If we had a tree combiner, it 
would probably be better to allow COND_EXPR throughout the whole 
compilation, and possibly to generate them from simple phi's like we do 
in the tree-ssa-phiopt pass.  This would leverage some code in fold.


If time allows me, I'd like to try to see what happens if COND_EXPRs are 
kept throughout the GIMPLE passes (I confess I'm curious). Logically, I 
see them as richer constructs (they carry more information than the 
equivalent control-flow code), like MIN_EXPRs and MAX_EXPRs.
I understand there may be additional occasions (currently unexploited) 
to generated COND_EXPRs, but why precisely do you expect unlowered 
COND_EXPRs be potentially harmful?

What do you mean by "tree combiner"?

Cheers,
Roberto


Re: Notes from tinkering with the autovectorizer (4.1.1)

2006-09-28 Thread Dorit Nuzman
"Erich Plondke" <[EMAIL PROTECTED]> wrote on 27/09/2006 18:17:55:

thanks for the detailed explanation,

> Indeed, in your paper (grin) "Multi-platform Auto-vectorization" you
>
> define the functionality of realign load in terms of mis - the
> misalignment
> of the address (i.e., address&(VS)), as follows: The last
> VS-mis bytes of
> vector vec1 are concatenated to the first mis bytes of the
> vector vec2.
>
> This is what the walign instruction does, but it's not quite what we
> ended up with in GCC.
> In the case that mis is 0, the GCC hook wants to end up with vec2, not
vec1.
>

yes, what we describe in the paper is a bit more general than what we ended
up implementing...

> So for architectures that can align both ways, the current method is
> fine, but if the
> architecture is designed for one endian only we are going to have
> trouble exploiting
> the alignment feature.
>

I agree we need to add the alternatives you pointed out to make it
applicable to more targets.
This is now PR26268.

dorit

> Thanks,
>
> Erich
>
> --
> Why are ``tolerant'' people so intolerant of intolerant people?



Re: COND_EXPRs in GIMPLE code and vectorizer

2006-09-28 Thread Steven Bosscher

On 9/28/06, Roberto COSTA <[EMAIL PROTECTED]> wrote:

I understand there may be additional occasions (currently unexploited)
to generated COND_EXPRs, but why precisely do you expect unlowered
COND_EXPRs be potentially harmful?


They inhibit some control flow optimizations such as jump threading.


What do you mean by "tree combiner"?


A mythical, never completed pass that combines different GIMPLE
statements into a single new one, much like the combine pass on RTL.

Gr.
Steven


frame unwind issue with discontiguous code

2006-09-28 Thread Jan Beulich
While I'm not certain whether gcc is able to split one function's code
between different sections (if for nothing else, this might help reduce
TLB pressure by moving code unlikely to be executed not just out of
the main function body), by way of inline assembly the Linux kernel
certainly does in many places. Obviously, pure assembly make use
of such even more heavily.

However, when frame unwind information is generated, one quickly
becomes aware of a problem with this - the unwind information at a
continuation point in other than the base section would need to
replicate all unwind directives (note that DW_CFA_remember_state
and DW_CFA_restore_state are not suitable here, as there need
to be separate FDEs attached to the secondary code fragments).
While this is generally possible (albeit tedious) in pure assembly code,
doing so in inline assembly doesn't seem to be possible in any way
(the compiler may not even use .cfi_* directives to emit frame
unwind info).

To cover all cases, it would basically appear to be necessary to
add a referral op to the set of DW_CFA_* ops, which would
indicate that the frame state at the given point is to be derived
by assuming the location counter would in fact be at the origin
of the control transfer).

As I don't know how to approach requesting an addition like this
to the Dwarf standard, I'm trying my luck here.

Any pointers or suggestions are greatly appreciated.

Thanks, Jan


Re: Documentation for loop infrastructure

2006-09-28 Thread Zdenek Dvorak
Hello,

> Sebastian Pop <[EMAIL PROTECTED]> wrote on 26/09/2006 21:24:18:
> 
> > It is probably better to include the loop indexes in the example, and
> > modify the syntax of the scev for making it more explicit, like:
> >
> > @smallexample
> >   for1 i
> >  for2 j
> >   *((int *)p + i + j) = a[i][j];
> > @end smallexample
> >
> > and the access function becomes: @[EMAIL PROTECTED], + [EMAIL PROTECTED]
> >
> 
> Done.
> 
> I guess, I'll commit my part as soon as loop.texi (and Dependency analysis
> part)
> is committed.

I have commited the documentation, including the parts from Daniel and
Sebastian (but not yours) now.

Zdenek


re:Re: thesis on mix c++ and objective-c

2006-09-28 Thread Come Lonfils
>On Sep 27, 2006, at 11:58 AM, Come Lonfils wrote:
>> I'm beginning a end study thesis  on "mix" c++ end objective-c in gcc.
>> I know there is already objective-c++ but I need all information I  
>> can have on the subject. What is already done and what is not (and  
>> why)?
>
>Objective-C++ is already done.  Parts not done might include,  
>Objective-C style exceptions interoperating with C++ style  
>exceptions.  Beyond that, just random bug fixes.
>
>> I also need documentation for people who want to "enter" in gcc and  
>> to know how gcc work and how to modify it. I want to know how  
>> objective-c is compiled
>
>It is compiled just like C is compiled.  Objective C++ is compiled  
>just like C++ is compiled.
>
>

Ok but is it documentation about objective-C++ to know how it work (I mean how 
objc++ is implemented in gcc). And more generally, is it documentation about 
GCC. Not documentation to know how use gcc but for people who want to know more 
about working of gcc.


Re: COND_EXPRs in GIMPLE code and vectorizer

2006-09-28 Thread Diego Novillo
Roberto COSTA wrote on 09/28/06 05:51:

> If time allows me, I'd like to try to see what happens if COND_EXPRs are 
> kept throughout the GIMPLE passes (I confess I'm curious). Logically, I 
> see them as richer constructs (they carry more information than the 
> equivalent control-flow code), like MIN_EXPRs and MAX_EXPRs.
>
Be wary of VRP and thread jumping if you do this.  Both rely on the
current COND_EXPR format quite heavily.

> What do you mean by "tree combiner"?
> 
A pass that combines two or more GIMPLE statements into a single GIMPLE
statement.  At least one effort started down this path but was never
completed.


Re: COND_EXPRs in GIMPLE code and vectorizer

2006-09-28 Thread Roberto COSTA

Diego Novillo wrote:

Roberto COSTA wrote on 09/28/06 05:51:


If time allows me, I'd like to try to see what happens if COND_EXPRs are 
kept throughout the GIMPLE passes (I confess I'm curious). Logically, I 
see them as richer constructs (they carry more information than the 
equivalent control-flow code), like MIN_EXPRs and MAX_EXPRs.




Be wary of VRP and thread jumping if you do this.  Both rely on the
current COND_EXPR format quite heavily.


Before committing into anything, I will study the implications on these 
in order to evaluate the effort needed.


Roberto


Re: frame unwind issue with discontiguous code

2006-09-28 Thread Daniel Jacobowitz
On Thu, Sep 28, 2006 at 01:26:00PM +0200, Jan Beulich wrote:
> To cover all cases, it would basically appear to be necessary to
> add a referral op to the set of DW_CFA_* ops, which would
> indicate that the frame state at the given point is to be derived
> by assuming the location counter would in fact be at the origin
> of the control transfer).

Why?  I don't think so; just make a new FDE and accept the duplication.
That doesn't work if you use inline assembly to do it, but that's
already in the realm of seriously nasty hack.

> As I don't know how to approach requesting an addition like this
> to the Dwarf standard, I'm trying my luck here.

You could ask, um, on the Dwarf list... See dwarf.freestandards.org.

-- 
Daniel Jacobowitz
CodeSourcery


Re: Fwd: Re: Visibility=hidden for x86_64 Xen builds -- problems?

2006-09-28 Thread H. J. Lu
On Thu, Sep 28, 2006 at 10:45:38AM +0100, Jan Beulich wrote:
> 
> 2) Why does the linker silently resolve the (32-bit PC-relative)
> relocation targeting an undefined weak symbol, yielding at
> run-time a non-zero address? While I can see the point of

Do you have a testcase? I can't reproduce it. If it is true, I consider
it a linker bug.


H.J.


Re: Google group for generic System V Application Binary Interface

2006-09-28 Thread Christoph Hellwig
On Wed, Sep 27, 2006 at 03:32:45PM -0700, H. J. Lu wrote:
> I created a Google group to discuss generic ABI:
> 
> http://groups.google.com/group/generic-abi
> 
> It is by membership only. Let me know if you are interested.

What's this supposed to be?  Reinventing the doomed iBCS2?


[RFC] Changing labels of TV_* timers?

2006-09-28 Thread Diego Novillo

Some of the labels in TV_* timers are fairly long and mess up the column
display.  Would it be a problem for anyone if I changed them during the
next stage1?

It's only a cosmetic change, but I can see it affecting people's scripts.


Thanks.


Re: [RFC] Changing labels of TV_* timers?

2006-09-28 Thread Andrew Pinski
On Thu, 2006-09-28 at 10:09 -0400, Diego Novillo wrote:
> Some of the labels in TV_* timers are fairly long and mess up the column
> display.  Would it be a problem for anyone if I changed them during the
> next stage1?

Why not instead increase the column size?

-- Pinski



Re: Google group for generic System V Application Binary Interface

2006-09-28 Thread H. J. Lu
On Thu, Sep 28, 2006 at 02:53:30PM +0100, Christoph Hellwig wrote:
> On Wed, Sep 27, 2006 at 03:32:45PM -0700, H. J. Lu wrote:
> > I created a Google group to discuss generic ABI:
> > 
> > http://groups.google.com/group/generic-abi
> > 
> > It is by membership only. Let me know if you are interested.
> 
> What's this supposed to be?  Reinventing the doomed iBCS2?

Not at all. It is for generic ABI which is processor independent.
However, the current i386 psABI doesn't really reflect/cover what
have been added to i386 like MMX and SSE. Also gcc uses 16byte
stack alignment, instead of 4byte, for SSE. Should we create a
Google group for ia32 psABI?


H.J.


Re: [RFC] Changing labels of TV_* timers?

2006-09-28 Thread Diego Novillo
Andrew Pinski wrote on 09/28/06 10:11:
> On Thu, 2006-09-28 at 10:09 -0400, Diego Novillo wrote:
>> Some of the labels in TV_* timers are fairly long and mess up the column
>> display.  Would it be a problem for anyone if I changed them during the
>> next stage1?
> 
> Why not instead increase the column size?
> 
Well, I don't want to get into >80 column issues.  I know they are still
pervasive (OK, so I prefer tiling 3 80-col windows to a single 240-col
window).


Re: Visibility=hidden for x86_64 Xen builds -- problems?

2006-09-28 Thread Keir Fraser
On 28/9/06 14:24, "H. J. Lu" <[EMAIL PROTECTED]> wrote:

> On Thu, Sep 28, 2006 at 10:45:38AM +0100, Jan Beulich wrote:
>> 
>> 2) Why does the linker silently resolve the (32-bit PC-relative)
>> relocation targeting an undefined weak symbol, yielding at
>> run-time a non-zero address? While I can see the point of
> 
> Do you have a testcase? I can't reproduce it. If it is true, I consider
> it a linker bug.

Compile and link the attached C program as follows. I used gcc-4.1.1 and
binutils-2.17, but gcc >= 4.0.0 and binutils >= 2.16 probably suffice.

 # gcc -fpic -o test.o -c test.c
 # ld -Ttext 1 -o test test.o

Disassembly of the result trivially shows that the address of weak symbol
'x' is 0x1.

 -- Keir



test.c
Description: Binary data


Re: Fwd: Re: Visibility=hidden for x86_64 Xen builds -- problems?

2006-09-28 Thread Jan Beulich
>>> "H. J. Lu" <[EMAIL PROTECTED]> 28.09.06 15:24 >>>
>On Thu, Sep 28, 2006 at 10:45:38AM +0100, Jan Beulich wrote:
>> 
>> 2) Why does the linker silently resolve the (32-bit PC-relative)
>> relocation targeting an undefined weak symbol, yielding at
>> run-time a non-zero address? While I can see the point of
>
>Do you have a testcase? I can't reproduce it. If it is true, I consider
>it a linker bug.

Attached. The linker script likely is not minimal, but I think the important
point is that it sets the origin to a non-zero value.

Compiling this with gcc 4.1.1 (-c -fPIC) and linking with ld 2.17 (no other
options than those necessary to specify input and output) succeeds,
while linking with ld 2.15 fails (due to relocation overflow).

But again, if this is plainly a linker bug, then the compiler also must not
access weak objects through RIP-relative addressing (i.e. then we also
have a compiler bug here), while I continue to think that the fact that
there is a 'hidden' attribute should allow the compiler to do better than
going through GOT (at the expense of a new relocation type).

Jan


got.lds
Description: Binary data


got.c
Description: Binary data


Re: Visibility=hidden for x86_64 Xen builds -- problems?

2006-09-28 Thread Keir Fraser

> Compile and link the attached C program as follows. I used gcc-4.1.1 and
> binutils-2.17, but gcc >= 4.0.0 and binutils >= 2.16 probably suffice.
> 
>  # gcc -fpic -o test.o -c test.c
>  # ld -Ttext 1 -o test test.o
> 
> Disassembly of the result trivially shows that the address of weak symbol 'x'
> is 0x1.

By the way, experimentation with the address of the text section shows that
the weak symbol's address is resolved to the nearest 4GB-aligned address
(nearest to what I'm not sure -- RIP? Section start?). It may get rounded up
or down, whichever is nearest.

 -- Keir




Re: Splay Tree

2006-09-28 Thread Brian Makin

I looked at the splay tree code in revision 106584.

It doesn't appear to actually be doing a top down
splay.
It is performing a top down partition of the tree but
without the splay step.  This should cause some cases
to perform quite badly.

I'm pretty sure my original patch does the top down
splay correctly.


__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


Re: [RFC] Program Bounds Checking

2006-09-28 Thread Etienne Lorrain
  You write you needs 6 assembly instructions to check a pointer on x86,
 I am using the "bound" ia32 instruction (1 byte opcode 0x62, invalid in ia64)
 to check the stack pointer for few years now in Gujin (http://gujin.org) 
without
 problem.

 I am doing this kind of thing to guard against stack overflow (I do not have
 a too big stack in my bootloader):

struct {
signed low_limit;
signed high_limit;
} __attribute__ ((packed)) stack_limit;

extern inline void bound_stack (void)
  {
/*
 * limit included - but add 2 to high limit for reg16, and 4 for reg32
 * if not in bound, exception #BR generated (INT5).
 * iret from INT5 will retry the bound instruction.
 */
  asm volatile (" bound %%esp,%0 " : : "m" (stack_limit) );
  }

void fct (int arg)
 { bound_stack () {
 int cpt = 0;
 // do some stuff...
 }}

 I bet there is a huge penalty if the value is not inside the limit...
 I have a simple "dump registers" handler for this interrupt/exception.

 In my case, I would have liked to have a function attribute like:
void fct (int arg) __attribute__((bound_stack(stack_limit)))
 {
 // do some stuff...
 }
 because some assembly instructions cross the bound asm (like initialisation
 of local variable) and if the stack has already overflowed some data is 
destroyed
 before the INT 05 is taken... Also if the function is inlined the stack 
checking
 does not mean a lot and should be discarded.

 A problem of this bound test is that the limits are signed, but for some tests 
it
 does not matter. I do not know how long it takes if the value is in the limit.

 Just a comment,
 Etienne.






___ 
Découvrez un nouveau moyen de poser toutes vos questions quelque soit le sujet 
! 
Yahoo! Questions/Réponses pour partager vos connaissances, vos opinions et vos 
expériences. 
http://fr.answers.yahoo.com 



Re: [RFC] Program Bounds Checking

2006-09-28 Thread Tzi-cker Chiueh

We have considered the bound instruction in the CASH project. But
we found that bound instruction is slower than the six normal
instructions it is meant to replace for range checking. For example, the
bound instruction on a 1.1 GHz PIII machine requires 7-8 clock cycles
while the 6 equivalent instructions require 6-7 clock cycles. We have not
tested it on newer processors, though.


Tzi-cker

On Thu, 28 Sep 2006, Etienne Lorrain wrote:

>   You write you needs 6 assembly instructions to check a pointer on x86,
>  I am using the "bound" ia32 instruction (1 byte opcode 0x62, invalid in ia64)
>  to check the stack pointer for few years now in Gujin (http://gujin.org) 
> without
>  problem.
>
>  I am doing this kind of thing to guard against stack overflow (I do not have
>  a too big stack in my bootloader):
>
> struct {
> signed low_limit;
> signed high_limit;
> } __attribute__ ((packed)) stack_limit;
>
> extern inline void bound_stack (void)
>   {
> /*
>  * limit included - but add 2 to high limit for reg16, and 4 for reg32
>  * if not in bound, exception #BR generated (INT5).
>  * iret from INT5 will retry the bound instruction.
>  */
>   asm volatile (" bound %%esp,%0 " : : "m" (stack_limit) );
>   }
>
> void fct (int arg)
>  { bound_stack () {
>  int cpt = 0;
>  // do some stuff...
>  }}
>
>  I bet there is a huge penalty if the value is not inside the limit...
>  I have a simple "dump registers" handler for this interrupt/exception.
>
>  In my case, I would have liked to have a function attribute like:
> void fct (int arg) __attribute__((bound_stack(stack_limit)))
>  {
>  // do some stuff...
>  }
>  because some assembly instructions cross the bound asm (like initialisation
>  of local variable) and if the stack has already overflowed some data is 
> destroyed
>  before the INT 05 is taken... Also if the function is inlined the stack 
> checking
>  does not mean a lot and should be discarded.
>
>  A problem of this bound test is that the limits are signed, but for some 
> tests it
>  does not matter. I do not know how long it takes if the value is in the 
> limit.
>
>  Just a comment,
>  Etienne.
>
>
>
>
>
>
> ___
> Découvrez un nouveau moyen de poser toutes vos questions quelque soit le 
> sujet !
> Yahoo! Questions/Réponses pour partager vos connaissances, vos opinions et 
> vos expériences.
> http://fr.answers.yahoo.com
>
>


Re: Google group for generic System V Application Binary Interface

2006-09-28 Thread Joe Buck
On Thu, Sep 28, 2006 at 07:11:25AM -0700, H. J. Lu wrote:
> On Thu, Sep 28, 2006 at 02:53:30PM +0100, Christoph Hellwig wrote:
> > On Wed, Sep 27, 2006 at 03:32:45PM -0700, H. J. Lu wrote:
> > > I created a Google group to discuss generic ABI:
> > > 
> > > http://groups.google.com/group/generic-abi
> > > 
> > > It is by membership only. Let me know if you are interested.
> > 
> > What's this supposed to be?  Reinventing the doomed iBCS2?
> 
> Not at all. It is for generic ABI which is processor independent.
> However, the current i386 psABI doesn't really reflect/cover what
> have been added to i386 like MMX and SSE. Also gcc uses 16byte
> stack alignment, instead of 4byte, for SSE. Should we create a
> Google group for ia32 psABI?

Is this supposed to be for gcc/binutils, or is it supposed to be
processor-independent?  And why a closed list?  Please don't go
down the path of re-creating what we rebelled against when we started
egcs.  Also, if there's a need to crosspost a message betwen your
new list and a gcc or binutils list, the message will bounce.

If it is for the free toolchain, and you really feel that a new
list is needed because it crosses gcc/binutils boundaries, a new
list hosted off of gcc.gnu.org would be better than a list that
is polluted with Yahoo's ads.


Re: [RFC] Program Bounds Checking

2006-09-28 Thread Nicholas Nethercote

On Thu, 28 Sep 2006, Tzi-cker Chiueh wrote:


We have considered the bound instruction in the CASH project. But
we found that bound instruction is slower than the six normal
instructions it is meant to replace for range checking. For example, the
bound instruction on a 1.1 GHz PIII machine requires 7-8 clock cycles
while the 6 equivalent instructions require 6-7 clock cycles. We have not
tested it on newer processors, though.


I would guess it would be as slow or worse.  'bound' is an extremely rarely 
used instruction, and so will not be optimised-for at all.


Nick


Re: [RFC] Program Bounds Checking

2006-09-28 Thread Joe Buck
On Thu, Sep 28, 2006 at 12:52:18PM -0400, Tzi-cker Chiueh wrote:
> We have considered the bound instruction in the CASH project. But
> we found that bound instruction is slower than the six normal
> instructions it is meant to replace for range checking. For example, the
> bound instruction on a 1.1 GHz PIII machine requires 7-8 clock cycles
> while the 6 equivalent instructions require 6-7 clock cycles. We have not
> tested it on newer processors, though.

There's also the cache effect, in that larger code will cause more cache
misses.  If the penalty for using the shorter code sequence is only one
cycle out of 7, you might still want to support it, or at least if -Os
is specified; large codes might even run faster despite the penalty if
there are fewer cache misses.



Re: Google group for generic System V Application Binary Interface

2006-09-28 Thread H. J. Lu
On Thu, Sep 28, 2006 at 09:54:10AM -0700, Joe Buck wrote:
> On Thu, Sep 28, 2006 at 07:11:25AM -0700, H. J. Lu wrote:
> > On Thu, Sep 28, 2006 at 02:53:30PM +0100, Christoph Hellwig wrote:
> > > On Wed, Sep 27, 2006 at 03:32:45PM -0700, H. J. Lu wrote:
> > > > I created a Google group to discuss generic ABI:
> > > > 
> > > > http://groups.google.com/group/generic-abi
> > > > 
> > > > It is by membership only. Let me know if you are interested.
> > > 
> > > What's this supposed to be?  Reinventing the doomed iBCS2?
> > 
> > Not at all. It is for generic ABI which is processor independent.
> > However, the current i386 psABI doesn't really reflect/cover what
> > have been added to i386 like MMX and SSE. Also gcc uses 16byte
> > stack alignment, instead of 4byte, for SSE. Should we create a
> > Google group for ia32 psABI?
> 
> Is this supposed to be for gcc/binutils, or is it supposed to be
> processor-independent?  And why a closed list?  Please don't go
> down the path of re-creating what we rebelled against when we started
> egcs.  Also, if there's a need to crosspost a message betwen your
> new list and a gcc or binutils list, the message will bounce.
> 

The ia32 psABI list will be processor-independent, not just for
gcc/binutils. I thought people might be more willing to discuss
things among people who are interested.


H.J.


Re: Google group for generic System V Application Binary Interface

2006-09-28 Thread Joseph S. Myers
On Thu, 28 Sep 2006, Joe Buck wrote:

> Is this supposed to be for gcc/binutils, or is it supposed to be
> processor-independent?  And why a closed list?  Please don't go
> down the path of re-creating what we rebelled against when we started
> egcs.  Also, if there's a need to crosspost a message betwen your
> new list and a gcc or binutils list, the message will bounce.
> 
> If it is for the free toolchain, and you really feel that a new
> list is needed because it crosses gcc/binutils boundaries, a new
> list hosted off of gcc.gnu.org would be better than a list that
> is polluted with Yahoo's ads.

I agree that it should be an open list (or maybe separate ones for the 
psABIs alongside that for the gABI).  I would have suggest having it on 
lists.freestandards.org alongside DWARF.

-- 
Joseph S. Myers
[EMAIL PROTECTED]


Re: representation of struct field offsets

2006-09-28 Thread Dale Johannesen


On Sep 27, 2006, at 7:04 PM, Sandra Loosemore wrote:

I've been having a heck of a time figuring out how to translate the  
offsets for struct fields from the DWARF encoding back to GCC's  
internal encoding for the LTO project.  I've got a handle on the  
DWARF encoding and how to do the necessary big/little endian  
conversions, but for the GCC side, there doesn't seem to be any  
documentation about the relevant macros in the manual, and the  
comments in tree.h don't seem to reflect what is actually going on  
in the representation.


For example, DECL_FIELD_OFFSET is supposed to be "the field  
position, counting in bytes, of the byte containing the bit closest  
to the beginning of the structure", while DECL_FIELD_BIT_OFFSET is  
supposed to be "the offset, in bits, of the first bit of the field  
from DECL_FIELD_OFFSET".  So I'm quite puzzled why, for fields that  
are not bit fields and that are aligned on byte boundaries, the C  
front end is generating a DECL_FIELD_OFFSET that points to some  
byte that doesn't contain any part of the field, and a non-zero  
DECL_FIELD_BIT_OFFSET instead.  If I make the LTO front end do what  
the comments in tree.h describe, then dwarf2out.c produces  
incorrect offsets that don't match those from the original C file.


I see in stor-layout.c that there are routines to "perform  
computations that convert between the offset/bitpos forms and byte  
and bit offsets", but what exactly are these forms and which values  
are the ones that I should actually be storing inside the  
FIELD_DECL object?  Is it possible to compute the DECL_OFFSET_ALIGN  
value somehow, given that it's not encoded in the DWARF  
representation?  Trying to reverse-engineer dwarf2out.c isn't  
turning out to be very productive  :-P


I had to look at this recently and I wound up looking at it this  
way.  The total bit offset is represented as

(byte offset) + (8* bit offset);
there are multiple ways to do that that produce the same result, and  
gcc's choice of which one it uses
is, as you say, somewhat arbitrary.  However all the different ways  
seem to work equivalently for
codegen purposes.  I'm not familiar with the dwarf representation,  
but the same information must be
there, so perhaps the dwarf code could impose a canonical form if  
necessary.





RE: Google group for generic System V Application Binary Interface

2006-09-28 Thread Menezes, Evandro
HJ, 

I think that it's great that all the de facto changes adopted for i386 would be 
put in an extension or appendix to its psABI.

However, I lean towards an open discussion list.  If necessary, I'd be glad to 
investigate hosting this list at http://www.x86-64.org, even though this 
discussion is about the i386 psABI.

___
Evandro Menezes  GNU Tools Team
512-602-9940 Advanced Micro Devices
[EMAIL PROTECTED]  Austin, TX





Re: [RFC] Program Bounds Checking

2006-09-28 Thread Robert Dewar

Tzi-cker Chiueh wrote:

We have considered the bound instruction in the CASH project. But
we found that bound instruction is slower than the six normal
instructions it is meant to replace for range checking. For example, the
bound instruction on a 1.1 GHz PIII machine requires 7-8 clock cycles
while the 6 equivalent instructions require 6-7 clock cycles. We have not
tested it on newer processors, though.


Might still be appropriate to use it in -Os mode I would think ...


Re: Google group for generic System V Application Binary Interface

2006-09-28 Thread H. J. Lu
On Thu, Sep 28, 2006 at 01:34:31PM -0500, Menezes, Evandro wrote:
> HJ, 
> 
> I think that it's great that all the de facto changes adopted for i386 would 
> be put in an extension or appendix to its psABI.
> 
> However, I lean towards an open discussion list.  If necessary, I'd be glad 
> to investigate hosting this list at http://www.x86-64.org, even though this 
> discussion is about the i386 psABI.
> 

One thing I miss the most is the search capability in the mailing
list achives. It isn't easy to find the things I am looking for
in an achive.

As for open group, Google group can be made open also.


H.J.


RE: Google group for generic System V Application Binary Interface

2006-09-28 Thread Dave Korn
On 28 September 2006 20:01, H. J. Lu wrote:

> On Thu, Sep 28, 2006 at 01:34:31PM -0500, Menezes, Evandro wrote:
>> HJ,
>> 
>> I think that it's great that all the de facto changes adopted for i386
>> would be put in an extension or appendix to its psABI. 
>> 
>> However, I lean towards an open discussion list.  If necessary, I'd be
>> glad to investigate hosting this list at http://www.x86-64.org, even
>> though this discussion is about the i386 psABI.  
>> 
> 
> One thing I miss the most is the search capability in the mailing
> list achives. It isn't easy to find the things I am looking for
> in an achive.

  Really?  I use google to do it, even for non-google-groups list archives:

site:sourceware.org inurl:binutils "search terms go here ... "

  You know the kind of thing.  Haven't had to suffer through an htDig session
in a looong time now.


cheers,
  DaveK
-- 
Can't think of a witty .sigline today



Re: Google group for generic System V Application Binary Interface

2006-09-28 Thread Joe Buck
On Thu, Sep 28, 2006 at 12:01:01PM -0700, H. J. Lu wrote:
> On Thu, Sep 28, 2006 at 01:34:31PM -0500, Menezes, Evandro wrote:
> > HJ, 
> > 
> > I think that it's great that all the de facto changes adopted for i386 
> > would be put in an extension or appendix to its psABI.
> > 
> > However, I lean towards an open discussion list.  If necessary, I'd be glad 
> > to investigate hosting this list at http://www.x86-64.org, even though this 
> > discussion is about the i386 psABI.
> > 
> 
> One thing I miss the most is the search capability in the mailing
> list achives. It isn't easy to find the things I am looking for
> in an achive.

For gcc lists, use google with

search string site:gcc.gnu.org

> As for open group, Google group can be made open also.

Why do you want to use a facility that attaches an ad to every message?



Re: frame unwind issue with discontiguous code

2006-09-28 Thread Mike Stump

On Sep 28, 2006, at 4:26 AM, Jan Beulich wrote:
While I'm not certain whether gcc is able to split one function's  
code between different sections


Kinda, sorta...  Hot-cold partitioning (-freorder-blocks-and- 
partition) does this.  If one exposed a FE language construct to so  
tag code, then the hot-cold partitioner could make use of that  
information.  Naturally, when complete, it would have to manage the  
debug information and the unwind information.  As I recall, it might  
not be that complete yet.


I think exposing a language feature so that you can tell the compiler  
to so split code would be better than trying to do it behind the  
compiler's back.


If you profiled and tweaked and tuned, you might be able to get the  
compiler to partition the code for you, though, having that sticky in  
the source base so that everyone can just compile it, is well, a dream.


Re: representation of struct field offsets

2006-09-28 Thread Mark Mitchell

Sandra Loosemore wrote:

I've been having a heck of a time figuring out how to translate the 
offsets for struct fields from the DWARF encoding back to GCC's internal 
encoding for the LTO project.


Yes, that's a nasty bit.

I think the DECL_FIELD_OFFSET/DECL_FIELD_BIT_OFFSET stuff is, quite 
simply, mis-designed.  The way I think it should work is for 
DECL_FIELD_OFFSET to be the byte offset, and DECL_FIELD_BIT_OFFSET to be 
the bit offset, always less than BITS_PER_UNIT.  But, that's not how it 
actually works.  Instead, the BIT_OFFSET is kept below the alignment of 
the field, rather than BITS_PER_UNIT.


The bit of dwarf2out.c that emits the offset for the field is 
add_data_member_location_attribute.  It uses 
dwarf2out.c:field_byte_offset, which is the function that normalizes the 
weird GCC representation into the obvious one.  I don't know why it's 
using a custom function; I would think it should just use 
tree.c:byte_position.  The current DWARF code looks oddly heuristic.


But that doesn't explain why you're not getting idempotent results.  Are 
you going through the stor_layout.c:place_field routines when creating 
structure types?  If so, I wouldn't; here, you know where stuff is 
supposed to go, so I would just put it there, and set DECL_FIELD_OFFSET, 
etc., accordingly.


My bet is that you are not setting DECL_ALIGN, or that we have failed to 
set TYPE_ALIGN somewhere, and that, therefore, the heuristics in 
dwarf2out.c:field_byte_offset are getting confused.  For example, 
simple_type_align_in_bits might not be working.  I would probably step 
through field_byte_offset both when compiling C and in LTO mode, and try 
to see where it goes different.


It shouldn't be necessary as part of this work, but I can't see why we 
should just replace field_byte_offset with a use of byte_position.  Does 
anyone else know?


--
Mark Mitchell
CodeSourcery
[EMAIL PROTECTED]
(650) 331-3385 x713


Re: representation of struct field offsets

2006-09-28 Thread Chris Lattner

On Sep 28, 2006, at 1:43 PM, Mark Mitchell wrote:

Sandra Loosemore wrote:

I've been having a heck of a time figuring out how to translate  
the offsets for struct fields from the DWARF encoding back to  
GCC's internal encoding for the LTO project.


Yes, that's a nasty bit.

I think the DECL_FIELD_OFFSET/DECL_FIELD_BIT_OFFSET stuff is, quite  
simply, mis-designed.  The way I think it should work is for  
DECL_FIELD_OFFSET to be the byte offset, and DECL_FIELD_BIT_OFFSET  
to be the bit offset, always less than BITS_PER_UNIT.


An alternative design, which would save a field, is just to keep the  
offset of a field, in bits, from the start of the structure.


The only trouble you'll probably run into is with fields whose offset  
from the start of a structure is variable.


-Chris






Re: frame unwind issue with discontiguous code

2006-09-28 Thread Jim Wilson
On Thu, 2006-09-28 at 13:26 +0200, Jan Beulich wrote:
> While I'm not certain whether gcc is able to split one function's code
> between different sections 

Yes.  See the -freorder-blocks-and-partition option, which can move code
to hot/cold sections.

> However, when frame unwind information is generated, one quickly
> becomes aware of a problem with this 

Yes, this has been a known problem for a long time.  Unfortunately, I
don't know if anyone has ever tried to solve it.  Here is gcc's current
solution:
> aretha$ ./xgcc -B./ -g -freorder-blocks-and-partition -S tmp.c -funwind-tables
> cc1: note: -freorder-blocks-and-partition does not support unwind info
-- 
Jim Wilson, GNU Tools Support, http://www.specifix.com




Re: representation of struct field offsets

2006-09-28 Thread Mark Mitchell

Chris Lattner wrote:

An alternative design, which would save a field, is just to keep the 
offset of a field, in bits, from the start of the structure.


Yes, that would also work.  But, in many cases, you need the byte 
offset, so there's a time/space tradeoff.  Also, because of GCC's 
internal representation of integers, you have to be careful that you 
have enough bits; for example, you need 72 bits to represent things in a 
64-bit address space.


--
Mark Mitchell
CodeSourcery
[EMAIL PROTECTED]
(650) 331-3385 x713


Re: representation of struct field offsets

2006-09-28 Thread Chris Lattner


On Sep 28, 2006, at 1:58 PM, Mark Mitchell wrote:


Chris Lattner wrote:

An alternative design, which would save a field, is just to keep  
the offset of a field, in bits, from the start of the structure.


Yes, that would also work.  But, in many cases, you need the byte  
offset, so there's a time/space tradeoff.


Yup, that's true.

Also, because of GCC's internal representation of integers, you  
have to be careful that you have enough bits; for example, you need  
72 bits to represent things in a 64-bit address space.


Actually, just 67, right?  Does GCC support structures whose size is  
greater than 2^61 ?


-Chris


Re: representation of struct field offsets

2006-09-28 Thread Sandra Loosemore

Mark Mitchell wrote:

Are 
you going through the stor_layout.c:place_field routines when creating 
structure types?  If so, I wouldn't; here, you know where stuff is 
supposed to go, so I would just put it there, and set DECL_FIELD_OFFSET, 
etc., accordingly.


No, I'm not using the fancy stor_layout.c stuff.  As you say, I already have the 
bit offsets.  Also, the DWARF representation doesn't necessarily include all the 
bits of information that the layout algorithm uses, so it can't reliably 
reproduce the same layout just given the types and sizes of (possibly only some 
of) the fields.


Anyway, I've made a little more progress; Google pointed me at some past 
discussion of this issue and from there a mention that the Ada front end allows 
explicit placement of fields in a record, and I was able to swipe that bit of 
code.  I've got non-bitfields to work properly, but I'm still working on 
debugging the bitfield case.


-Sandra


Re: representation of struct field offsets

2006-09-28 Thread Mark Mitchell

Chris Lattner wrote:

Also, because of GCC's internal representation of integers, you have 
to be careful that you have enough bits; for example, you need 72 bits 
to represent things in a 64-bit address space.


Actually, just 67, right?  Does GCC support structures whose size is 
greater than 2^61 ?


I'm not sure -- but if it doesn't, it should.  There are folks who like 
to make structures corresponding to the entire address space, and then 
poke at particular bytes by using fields.


--
Mark Mitchell
CodeSourcery
[EMAIL PROTECTED]
(650) 331-3385 x713


gcc-4.0-20060928 is now available

2006-09-28 Thread gccadmin
Snapshot gcc-4.0-20060928 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.0-20060928/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.0 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_0-branch 
revision 117290

You'll find:

gcc-4.0-20060928.tar.bz2  Complete GCC (includes all of below)

gcc-core-4.0-20060928.tar.bz2 C front end and core compiler

gcc-ada-4.0-20060928.tar.bz2  Ada front end and runtime

gcc-fortran-4.0-20060928.tar.bz2  Fortran front end and runtime

gcc-g++-4.0-20060928.tar.bz2  C++ front end and runtime

gcc-java-4.0-20060928.tar.bz2 Java front end and runtime

gcc-objc-4.0-20060928.tar.bz2 Objective-C front end and runtime

gcc-testsuite-4.0-20060928.tar.bz2The GCC testsuite

Diffs from 4.0-20060921 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.0
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Re: IPA branch

2006-09-28 Thread Mark Mitchell

Razya Ladelsky wrote:

Except for new optimizations, IPCP (currently on mainline) should also be 
transformed to SSA.
IPCP in SSA code exists on IPA branch, and will be submitted to GCC4.3 
after IPA branch 
is committed and some testsuite regressions failing with 
IPCP+versioning+inlining are fixed.


Is there a project page for this work?

Thanks,

--
Mark Mitchell
CodeSourcery
[EMAIL PROTECTED]
(650) 331-3385 x713


GCC 4.3 project to merge representation changes

2006-09-28 Thread Mark Mitchell

Kazu, Sandra --

I don't believe there is a GCC 4.3 project page to merge the work that 
you folks did on CALL_EXPRs and TYPE_ARG_TYPEs.  Would one of you please 
create  a Wiki page for that?


Thanks,

--
Mark Mitchell
CodeSourcery
[EMAIL PROTECTED]
(650) 331-3385 x713


Re: GCC 4.3 project to merge representation changes

2006-09-28 Thread Sandra Loosemore

Mark Mitchell wrote:

I don't believe there is a GCC 4.3 project page to merge the work that 
you folks did on CALL_EXPRs and TYPE_ARG_TYPEs.  Would one of you please 
create  a Wiki page for that?


There are already a bunch of notes about this on the LTO page:

http://gcc.gnu.org/wiki/LinkTimeOptimization

-Sandra


Re: GCC 4.3 project to merge representation changes

2006-09-28 Thread Mark Mitchell

Sandra Loosemore wrote:

Mark Mitchell wrote:

I don't believe there is a GCC 4.3 project page to merge the work that 
you folks did on CALL_EXPRs and TYPE_ARG_TYPEs.  Would one of you 
please create  a Wiki page for that?


There are already a bunch of notes about this on the LTO page:

http://gcc.gnu.org/wiki/LinkTimeOptimization


Yes -- but I was hoping to get a page here:

http://gcc.gnu.org/wiki/GCC_4.3_Release_Planning

(at the bottom, under Uncategorized Projects) for the project of merging 
the changes into GCC 4.3.


Thanks,

--
Mark Mitchell
CodeSourcery
[EMAIL PROTECTED]
(650) 331-3385 x713


Re: [discuss] Re: Visibility=hidden for x86_64 Xen builds -- problems?

2006-09-28 Thread H. J. Lu
On Thu, Sep 28, 2006 at 03:33:05PM +0100, Keir Fraser wrote:
> 
> > Compile and link the attached C program as follows. I used gcc-4.1.1 and
> > binutils-2.17, but gcc >= 4.0.0 and binutils >= 2.16 probably suffice.
> > 
> >  # gcc -fpic -o test.o -c test.c
> >  # ld -Ttext 1 -o test test.o
> > 
> > Disassembly of the result trivially shows that the address of weak symbol 
> > 'x'
> > is 0x1.
> 
> By the way, experimentation with the address of the text section shows that
> the weak symbol's address is resolved to the nearest 4GB-aligned address
> (nearest to what I'm not sure -- RIP? Section start?). It may get rounded up
> or down, whichever is nearest.

You are asking for impossible:

[EMAIL PROTECTED] weak-4]$ objdump -dr foo.o

foo.o: file format elf64-x86-64

Disassembly of section .text:

 <_start>:
   0:   55  push   %rbp
   1:   48 89 e5mov%rsp,%rbp
   4:   48 8d 05 00 00 00 00lea0(%rip),%rax# b
<_start+0xb>
7: R_X86_64_PC32x+0xfffc
   b:   c9  leaveq
   c:   c3  retq

R_X86_64_PC32 only supports signed 32bit offset. 0x1 is more
than 32bit. The linker should issue an error, at least a warning. You
can take your pick and I will fix the linker. If no one objects, I
will make it an error.


H.J.


Re: Missing elements in VECTOR_CST

2006-09-28 Thread Mark Mitchell

Hans-Peter Nilsson wrote:

On Mon, 18 Sep 2006, Mark Mitchell wrote:


Andrew Pinski wrote:

The documention on VECTOR_CST is not clear if we can have missing
elements in that the remaining elements are zero.  Right we produce such
VECTOR_CST for things like:
#define vector __attribute__((vector_size(16) ))
vector int a = {1, 2};

But is that valid?  We currently produce a VECTOR_CST with just two
elements instead of 4.  Should we always have the same number of
elements in a VECTOR_CST as there are elements in the vector type?

I think it is reasonable for front-ends to elide initializers and to
follow the usual C semantics that elided initializers are (a) zero, if
the constant is appearing as an initializer for static storage, or (b)
unspecified, "random" values elsewhere.


Maybe you didn't mean what I read, but it's not just "for static
storage".  By my reading (of the May 6, 2005 ISO/IEC 9899:TC2
for reference), all items in arrays and named structure members
not mentioned in the initializer should be 0-initialized (the
"all subobjects that are not initialized explicitly shall be
initialized implicitly the same as objects that have static
storage duration" part in 6.7.8:19).


No, I meant what I said, and you read it correctly.

I think that front ends should be allowed to omits zeros for 
initializers for variables with static storage duration, but not other 
initializers, independent of what C99 says.  The reason is that this 
matches traditional linker semantics; uninitialized variables with 
static storage duration are zeroed.  Also, if we did allow front ends to 
implicitly zero local variables, we would have to provide a way to allow 
them to override that and just say "uninitialized", to avoid pessimizing 
the code.  This consideration doesn't apply to variables with static 
storage duration.


--
Mark Mitchell
CodeSourcery
[EMAIL PROTECTED]
(650) 331-3385 x713


Re: IPA branch

2006-09-28 Thread Mark Mitchell

Jan Hubicka wrote:


I intended to write the overview in a way to express that some work will
be needed.


Thank you for the detailed explanation.  I think your plans all sound 
reasonable.  I would definitely encourage you to start preparing patches 
and submitting them for review -- and hounding reviewers! -- as soon as 
possible.


Thanks,

--
Mark Mitchell
CodeSourcery
[EMAIL PROTECTED]
(650) 331-3385 x713


Re: Missing elements in VECTOR_CST

2006-09-28 Thread Hans-Peter Nilsson
On Thu, 28 Sep 2006, Mark Mitchell wrote:

> Hans-Peter Nilsson wrote:
> > On Mon, 18 Sep 2006, Mark Mitchell wrote:
> >
> >> Andrew Pinski wrote:
> >>> The documention on VECTOR_CST is not clear if we can have missing
> >>> elements in that the remaining elements are zero.  Right we produce such
> >>> VECTOR_CST for things like:
> >>> #define vector __attribute__((vector_size(16) ))
> >>> vector int a = {1, 2};
> >>>
> >>> But is that valid?  We currently produce a VECTOR_CST with just two
> >>> elements instead of 4.  Should we always have the same number of
> >>> elements in a VECTOR_CST as there are elements in the vector type?
> >> I think it is reasonable for front-ends to elide initializers and to
> >> follow the usual C semantics that elided initializers are (a) zero, if
> >> the constant is appearing as an initializer for static storage, or (b)
> >> unspecified, "random" values elsewhere.
> >
> > Maybe you didn't mean what I read, but it's not just "for static
> > storage".  By my reading (of the May 6, 2005 ISO/IEC 9899:TC2
> > for reference), all items in arrays and named structure members
> > not mentioned in the initializer should be 0-initialized (the
> > "all subobjects that are not initialized explicitly shall be
> > initialized implicitly the same as objects that have static
> > storage duration" part in 6.7.8:19).
>
> No, I meant what I said, and you read it correctly.
>
> I think that front ends should be allowed to omits zeros for
> initializers for variables with static storage duration, but not other
> initializers, independent of what C99 says.

I think we "read past each other".  I was just countering what
(I read as) your statement that for C semantics, only omitted
subobjects in initialiers of objects with *static storage* are
zero-initialized; i.e. that others are ok to be left undefined
by an implementation.

(Not to be mixed up with the interface between GCC front-ends,
back-end and linker.)

brgds, H-P


Re: representation of struct field offsets

2006-09-28 Thread Richard Kenner
> The only trouble you'll probably run into is with fields whose offset  
> from the start of a structure is variable.

Exactly.  That's the reason it's defined the way it is.  There is no way
to synthesize that field from any other in the FIELD_DECL in the most
general case: it is unique information.  I'm traveling now, but should be
able to give some examples of this early next week.


Re: Missing elements in VECTOR_CST

2006-09-28 Thread Mark Mitchell

Hans-Peter Nilsson wrote:


I think that front ends should be allowed to omits zeros for
initializers for variables with static storage duration, but not other
initializers, independent of what C99 says.


I think we "read past each other".  I was just countering what
(I read as) your statement that for C semantics, only omitted
subobjects in initialiers of objects with *static storage* are
zero-initialized; i.e. that others are ok to be left undefined
by an implementation.


Right, I think we were at cross-purposes; to be clear, I wasn't 
commenting at all about C semantics, but rather about what I think the 
internal GCC IR semantics should be.


Thanks,

--
Mark Mitchell
CodeSourcery
[EMAIL PROTECTED]
(650) 331-3385 x713


Re: GCC 4.3 project to merge representation changes

2006-09-28 Thread Kazu Hirata

Hi Mark,

I don't believe there is a GCC 4.3 project page to merge the work that 
you folks did on CALL_EXPRs and TYPE_ARG_TYPEs.  Would one of you please 
create  a Wiki page for that?


What would you suggest me to do for missing bits?  Specifically, most backends 
with the exception of x86 are broken because I haven't converted uses of 
TYPE_ARG_TYPEs in those backends.  The ARM port was broken at the time of branch 
creation.  The Java frontend uses a flag within the TREE_LIST object that makes 
up TYPE_ARG_TYPEs, so it is blocking the propsed merge.  (Java maintainers are 
planning to fix this in future.)


So, Sandra's CALL_EXPEs stuff may be ready for merge, but my TYPE_ARG_TYPE stuff 
has a somewhat long way to go.


Thanks,

Kazu Hirata


Re: GCC 4.3 project to merge representation changes

2006-09-28 Thread Mark Mitchell

Kazu Hirata wrote:

Hi Mark,

I don't believe there is a GCC 4.3 project page to merge the work that 
you folks did on CALL_EXPRs and TYPE_ARG_TYPEs.  Would one of you 
please create  a Wiki page for that?


What would you suggest me to do for missing bits?  Specifically, most 
backends with the exception of x86 are broken because I haven't 
converted uses of TYPE_ARG_TYPEs in those backends.  The ARM port was 
broken at the time of branch creation.  The Java frontend uses a flag 
within the TREE_LIST object that makes up TYPE_ARG_TYPEs, so it is 
blocking the propsed merge.  (Java maintainers are planning to fix this 
in future.)


So, Sandra's CALL_EXPEs stuff may be ready for merge, but my 
TYPE_ARG_TYPE stuff has a somewhat long way to go.


Yes, I agree that Sandra's stuff is closer.  I would hope that with the 
ECJ conversion (planned for Stage 1), the Java issue goes away.  So, 
that would leave you with fixing the various other back-ends, which 
should be mechanical, but perhaps somewhat time-consuming.  I'd also 
hope that other folks would volunteer to help with some of that work, 
since getting it done will make the compiler more efficient, which is 
better for everyone.


--
Mark Mitchell
CodeSourcery
[EMAIL PROTECTED]
(650) 331-3385 x713


paired register loads and stores

2006-09-28 Thread Erich Plondke

rs6000 and Sparc ports seem to use a peephole2 to get the ldd or lfq
instructions (respectively), but it looks like there's no reason for
the register allocater to allocate registers together.  The peephole2
just picks up loads to adjacent memory locations if the allocater
happens to choose adjacent registers (is that correct?) or the
variables are specified as living in hard registers with the help
of an asm.

Several other architectures have paired loads: some ARM targets have ldrd
which can be cheaper than a ldm, and ia64 has a pair load.

It seems like GCC does a good job of knowing how to modify register-
sized subregs of two- or four-register larger modes.  So if I could
tell GCC to turn:

   [(set (reg:SI X) (mem:SI (addr)))
(set (reg:SI Y) (mem:SI (addr+4)))]

(where addr is aligned to DI) into something like:
   [(set (reg:DI T) (mem:DI (addr)))
(set (reg:SI X) (subreg:SI (reg:DI T) 0))
(set (reg:SI Y) (subreg:SI (reg:DI T) 4))]

and I could do so early enough, GCC would know to access the subregs
directly in instruction(s) using the loaded values, and I would end up loading
the register pair and using the individual elements.  But it has to
be done early on; after register allocation even if I could get a
DI temporary I'd probably have the two SI moves and that's probably
not a win.

I've tinkered with splits but can't seem to get it to work.  And I'm
aware that trying to do it too early might be bad, because pseudo's
might not be alive in the future, or might be in memory.  But you
can't do it at peephole2, because the registers won't be paired.

Any ideas?  Am I going at it from the right angle?

--
Why are ``tolerant'' people so intolerant of intolerant people?


Re: [discuss] Re: Visibility=hidden for x86_64 Xen builds -- problems?

2006-09-28 Thread Keir Fraser
On 29/9/06 1:40 am, "H. J. Lu" <[EMAIL PROTECTED]> wrote:

> You are asking for impossible:

If the compiler emitted accesses via the GOT for weak symbols then there
wouldn't be a problem. The compiler doesn't know the final link address
though, so it'd have to be conservative. Perhaps it's not worth it for a
case that noone much cares about.

> R_X86_64_PC32 only supports signed 32bit offset. 0x1 is more
> than 32bit. The linker should issue an error, at least a warning. You
> can take your pick and I will fix the linker. If no one objects, I
> will make it an error.

That's fine with us. I fixed Xen to no longer use weak symbols.

 -- Keir