Re: COND_EXPRs in GIMPLE code and vectorizer
Yes. This is also true for a few other expressions. IIRC, the gimplifier expands MAX_EXPR into control flow, even though it is legal gimple. In fact it did this, but later it was removed from the gimplifier because it generated worse code. The usual reason for this type of thing is that the ability to have them as the RHS of a MODIFY_EXPR was added much later than the gimplifier, and it was decided that in order to avoid possible performance regressions, the existing behavior of lowering wouldn't be changed. If, of course, there is some good reason to always use the data dependent form over the control dependent form, we're always willing to explore changing the gimplifier to not do the lowering. MIN_EXPR/MAX_EXPR in fact are a very good example, because we discovered that it would be best not to do the lowering. In fact, there was no reason why the optimizers would mishandle MIN_EXPR and MAX_EXPR (they are just like any other tcc_binary node). If we had a tree combiner, it would probably be better to allow COND_EXPR throughout the whole compilation, and possibly to generate them from simple phi's like we do in the tree-ssa-phiopt pass. This would leverage some code in fold. Paolo
Re: Documentation for loop infrastructure
Sebastian Pop <[EMAIL PROTECTED]> wrote on 26/09/2006 21:24:18: > It is probably better to include the loop indexes in the example, and > modify the syntax of the scev for making it more explicit, like: > > @smallexample > for1 i > for2 j > *((int *)p + i + j) = a[i][j]; > @end smallexample > > and the access function becomes: @[EMAIL PROTECTED], + [EMAIL PROTECTED] > Done. I guess, I'll commit my part as soon as loop.texi (and Dependency analysis part) is committed. Ira > The data references are discovered in a particular order during the > scanning of the loop body: the loop body is analyzed in execution > order, and the data references of each statement are pushed at the end > of the data reference array. Two data references syntactically occur > in the program in the same order as in the array of data references. > This syntactic order is important in some classical data dependence > tests, and mapping this order to the elements of this array avoids > costly queries to the loop body representation. Three types of data references are currently handled: ARRAY_REF, INDIRECT_REF and COMPONENT_REF. The data structure for the data reference is @code{data_reference}, where @code{data_reference_p} is a name of a pointer to the data reference structure. The structure contains the following elements: @itemize @item @code{base_object_info}: Provides information about the base object of the data reference and its access functions. These access functions represent the evolution of the data reference in the loop relative to its base, in keeping with the classical meaning of the data reference access function for the support of arrays. For example, for a reference @code{a.b[i][j]}, the base object is @code{a.b} and the access functions, one for each array subscript, are: @[EMAIL PROTECTED], + [EMAIL PROTECTED], @{j_init, +, [EMAIL PROTECTED] @item @code{first_location_in_loop}: Provides information about the first location accessed by the data reference in the loop and about the access function used to represent evolution relative to this location. This data is used to support pointers, and is not used for arrays (for which we have base objects). Pointer accesses are represented as a one-dimensional access that starts from the first location accessed in the loop. For example: @smallexample for1 i for2 j *((int *)p + i + j) = a[i][j]; @end smallexample The access function of the pointer access is @[EMAIL PROTECTED], + [EMAIL PROTECTED] relative to @code{p + i}. The access functions of the array are @[EMAIL PROTECTED], + [EMAIL PROTECTED] and @[EMAIL PROTECTED], +, [EMAIL PROTECTED] relative to @code{a}. Usually, the object the pointer refers to is either unknown, or we can’t prove that the access is confined to the boundaries of a certain object. Two data references can be compared only if at least one of these two representations has all its fields filled for both data references. The current strategy for data dependence tests is as follows: If both @code{a} and @code{b} are represented as arrays, compare @code{a.base_object} and @code{b.base_object}; if they are equal, apply dependence tests (use access functions based on base_objects). Else if both @code{a} and @code{b} are represented as pointers, compare @code{a.first_location} and @code{b.first_location}; if they are equal, apply dependence tests (use access functions based on first location). However, if @code{a} and @code{b} are represented differently, only try to prove that the bases are definitely different. @item Aliasing information. @item Alignment information. @end itemize > The structure describing the relation between two data references is > @code{data_dependence_relation} and the shorter name for a pointer to > such a structure is @code{ddr_p}. This structure contains:
Fwd: Re: Visibility=hidden for x86_64 Xen builds -- problems?
In building Xen we observed a build problem when using binutils 2.15 that wasn't visible for those of us using newer binutils versions. However, I believe that we should have seen this in all cases. Xen gets compiled with -fPIC, and we recently added a global visibility pragma to avoid the cost of going through the GOT for all access to global data objects (PIC isn't really needed here, all we need is sufficient compiler support to get the final image located outside the +/-2Gb ranges, but large model support is neither there in older compilers nor do we really need all of it either). In a kallsyms-like approach, symbol information gets embedded in the final executable, with the first linking stage not having available the respective table symbols. For that reason, they are being attributed weak. After adding the global visibility hidden pragma, even these weak symbols get accessed (or their address calculated) via RIP-relative addressing. While accessing them this way is probably acceptable from the compiler's perspective (given the hidden attribute it may safely assume the symbol is in the same executable image as the accessing code), calculating its address certainly isn't, as the symbol may not be present at all (and after all, comparing the address of the weak object against NULL is the only method I know to check presence of the symbol at runtime). So the questions are: 1) Why does gcc not use a GOT reference when calculating the address of a weak symbol here? 2) Why does the linker silently resolve the (32-bit PC-relative) relocation targeting an undefined weak symbol, yielding at run-time a non-zero address? While I can see the point of assisting the compiler here under the assumption that it has checked the address elsewhere and hence the actual access is supposed to never happen at runtime, detecting the (incorrect) use of the same relocation (access method) in either assembly code or address calculations should be mandatory; to distinguish the two, two distinct relocation types would then be needed (one that keeps the linker silent, and another one that doesn't). Thanks, Jan >>> Keir Fraser <[EMAIL PROTECTED]> 28.09.06 10:56 >>> >On 28/9/06 09:23, "Keir Fraser" <[EMAIL PROTECTED]> wrote: >> On 28/9/06 07:46, "Jan Beulich" <[EMAIL PROTECTED]> wrote: >> Keir Fraser <[EMAIL PROTECTED]> 27.09.06 20:14 >>> So it seems that older versions of gcc (before 4.1.1) don't do anything more with the pragma than -fvisibility=hidden. So currently the pragma at best does nothing (extern references still go through GOT) and at worst breaks the build. :-) >>> >>> That'd be contrary to my observations; I'll check into this. >> >> Thanks. I am using a personal build of vanilla gcc-4.1.1 by the way. > >...and that's the problem. I'm using it with a too-old version of binutils >(version 2.15). Pretty much any newer version seems to relocate the weak >reference to 8300 (i.e., I guess rounds down to a 2GB boundary).
Re: How do I build C++ for xscale-elf?
Jack Twilley wrote: Kai Ruottu wrote: Jack Twilley wrote: I am trying to build gcc on a FreeBSD 6.1-STABLE system. If there's more information I can give you, please ask. What was the GCC version tried? The new gcc-4.1.1 seems to require the '--disable-shared' for instance with ARM, otherwise it tries to link against the "created" 'libgcc_s.so.1' despite of using the '--with-newlib'. A stupid bug and a stupid workaround ('newlib' neither the target, 'xscale-elf', don't support shared libraries). With the gcc-4.1.1 also the '--disable-shared' is obligatory... I tried gcc-4.1.1 from SVN (gcc_4_1_1_release) with the following configure line: Doh. I forgot to insert the configure line. ./configure --with-newlib --disable-shared --target=xscale-elf --enable-language=c,c++ Am I missing something here? Thank you in advance! Jack. It fails on compiling regex.c in xscale-elf/libiberty with a whole bunch of errors about not bein able to find sys/types.h and strings.h and the like. I have installed binutils-2.17 from SVN (binutils-2_17) for xscale-elf. Its version of libiberty installed into /usr/local/lib/ which makes me wonder how many things I accidentally overwrote while building that, but I'll rebuild FreeBSD later. Should I have not built binutils? Was there something else I missed? Jack.
Re: COND_EXPRs in GIMPLE code and vectorizer
Paolo Bonzini wrote: Yes. This is also true for a few other expressions. IIRC, the gimplifier expands MAX_EXPR into control flow, even though it is legal gimple. In fact it did this, but later it was removed from the gimplifier because it generated worse code. The usual reason for this type of thing is that the ability to have them as the RHS of a MODIFY_EXPR was added much later than the gimplifier, and it was decided that in order to avoid possible performance regressions, the existing behavior of lowering wouldn't be changed. If, of course, there is some good reason to always use the data dependent form over the control dependent form, we're always willing to explore changing the gimplifier to not do the lowering. MIN_EXPR/MAX_EXPR in fact are a very good example, because we discovered that it would be best not to do the lowering. In fact, there was no reason why the optimizers would mishandle MIN_EXPR and MAX_EXPR (they are just like any other tcc_binary node). If we had a tree combiner, it would probably be better to allow COND_EXPR throughout the whole compilation, and possibly to generate them from simple phi's like we do in the tree-ssa-phiopt pass. This would leverage some code in fold. If time allows me, I'd like to try to see what happens if COND_EXPRs are kept throughout the GIMPLE passes (I confess I'm curious). Logically, I see them as richer constructs (they carry more information than the equivalent control-flow code), like MIN_EXPRs and MAX_EXPRs. I understand there may be additional occasions (currently unexploited) to generated COND_EXPRs, but why precisely do you expect unlowered COND_EXPRs be potentially harmful? What do you mean by "tree combiner"? Cheers, Roberto
Re: Notes from tinkering with the autovectorizer (4.1.1)
"Erich Plondke" <[EMAIL PROTECTED]> wrote on 27/09/2006 18:17:55: thanks for the detailed explanation, > Indeed, in your paper (grin) "Multi-platform Auto-vectorization" you > > define the functionality of realign load in terms of mis - the > misalignment > of the address (i.e., address&(VS)), as follows: The last > VS-mis bytes of > vector vec1 are concatenated to the first mis bytes of the > vector vec2. > > This is what the walign instruction does, but it's not quite what we > ended up with in GCC. > In the case that mis is 0, the GCC hook wants to end up with vec2, not vec1. > yes, what we describe in the paper is a bit more general than what we ended up implementing... > So for architectures that can align both ways, the current method is > fine, but if the > architecture is designed for one endian only we are going to have > trouble exploiting > the alignment feature. > I agree we need to add the alternatives you pointed out to make it applicable to more targets. This is now PR26268. dorit > Thanks, > > Erich > > -- > Why are ``tolerant'' people so intolerant of intolerant people?
Re: COND_EXPRs in GIMPLE code and vectorizer
On 9/28/06, Roberto COSTA <[EMAIL PROTECTED]> wrote: I understand there may be additional occasions (currently unexploited) to generated COND_EXPRs, but why precisely do you expect unlowered COND_EXPRs be potentially harmful? They inhibit some control flow optimizations such as jump threading. What do you mean by "tree combiner"? A mythical, never completed pass that combines different GIMPLE statements into a single new one, much like the combine pass on RTL. Gr. Steven
frame unwind issue with discontiguous code
While I'm not certain whether gcc is able to split one function's code between different sections (if for nothing else, this might help reduce TLB pressure by moving code unlikely to be executed not just out of the main function body), by way of inline assembly the Linux kernel certainly does in many places. Obviously, pure assembly make use of such even more heavily. However, when frame unwind information is generated, one quickly becomes aware of a problem with this - the unwind information at a continuation point in other than the base section would need to replicate all unwind directives (note that DW_CFA_remember_state and DW_CFA_restore_state are not suitable here, as there need to be separate FDEs attached to the secondary code fragments). While this is generally possible (albeit tedious) in pure assembly code, doing so in inline assembly doesn't seem to be possible in any way (the compiler may not even use .cfi_* directives to emit frame unwind info). To cover all cases, it would basically appear to be necessary to add a referral op to the set of DW_CFA_* ops, which would indicate that the frame state at the given point is to be derived by assuming the location counter would in fact be at the origin of the control transfer). As I don't know how to approach requesting an addition like this to the Dwarf standard, I'm trying my luck here. Any pointers or suggestions are greatly appreciated. Thanks, Jan
Re: Documentation for loop infrastructure
Hello, > Sebastian Pop <[EMAIL PROTECTED]> wrote on 26/09/2006 21:24:18: > > > It is probably better to include the loop indexes in the example, and > > modify the syntax of the scev for making it more explicit, like: > > > > @smallexample > > for1 i > > for2 j > > *((int *)p + i + j) = a[i][j]; > > @end smallexample > > > > and the access function becomes: @[EMAIL PROTECTED], + [EMAIL PROTECTED] > > > > Done. > > I guess, I'll commit my part as soon as loop.texi (and Dependency analysis > part) > is committed. I have commited the documentation, including the parts from Daniel and Sebastian (but not yours) now. Zdenek
re:Re: thesis on mix c++ and objective-c
>On Sep 27, 2006, at 11:58 AM, Come Lonfils wrote: >> I'm beginning a end study thesis on "mix" c++ end objective-c in gcc. >> I know there is already objective-c++ but I need all information I >> can have on the subject. What is already done and what is not (and >> why)? > >Objective-C++ is already done. Parts not done might include, >Objective-C style exceptions interoperating with C++ style >exceptions. Beyond that, just random bug fixes. > >> I also need documentation for people who want to "enter" in gcc and >> to know how gcc work and how to modify it. I want to know how >> objective-c is compiled > >It is compiled just like C is compiled. Objective C++ is compiled >just like C++ is compiled. > > Ok but is it documentation about objective-C++ to know how it work (I mean how objc++ is implemented in gcc). And more generally, is it documentation about GCC. Not documentation to know how use gcc but for people who want to know more about working of gcc.
Re: COND_EXPRs in GIMPLE code and vectorizer
Roberto COSTA wrote on 09/28/06 05:51: > If time allows me, I'd like to try to see what happens if COND_EXPRs are > kept throughout the GIMPLE passes (I confess I'm curious). Logically, I > see them as richer constructs (they carry more information than the > equivalent control-flow code), like MIN_EXPRs and MAX_EXPRs. > Be wary of VRP and thread jumping if you do this. Both rely on the current COND_EXPR format quite heavily. > What do you mean by "tree combiner"? > A pass that combines two or more GIMPLE statements into a single GIMPLE statement. At least one effort started down this path but was never completed.
Re: COND_EXPRs in GIMPLE code and vectorizer
Diego Novillo wrote: Roberto COSTA wrote on 09/28/06 05:51: If time allows me, I'd like to try to see what happens if COND_EXPRs are kept throughout the GIMPLE passes (I confess I'm curious). Logically, I see them as richer constructs (they carry more information than the equivalent control-flow code), like MIN_EXPRs and MAX_EXPRs. Be wary of VRP and thread jumping if you do this. Both rely on the current COND_EXPR format quite heavily. Before committing into anything, I will study the implications on these in order to evaluate the effort needed. Roberto
Re: frame unwind issue with discontiguous code
On Thu, Sep 28, 2006 at 01:26:00PM +0200, Jan Beulich wrote: > To cover all cases, it would basically appear to be necessary to > add a referral op to the set of DW_CFA_* ops, which would > indicate that the frame state at the given point is to be derived > by assuming the location counter would in fact be at the origin > of the control transfer). Why? I don't think so; just make a new FDE and accept the duplication. That doesn't work if you use inline assembly to do it, but that's already in the realm of seriously nasty hack. > As I don't know how to approach requesting an addition like this > to the Dwarf standard, I'm trying my luck here. You could ask, um, on the Dwarf list... See dwarf.freestandards.org. -- Daniel Jacobowitz CodeSourcery
Re: Fwd: Re: Visibility=hidden for x86_64 Xen builds -- problems?
On Thu, Sep 28, 2006 at 10:45:38AM +0100, Jan Beulich wrote: > > 2) Why does the linker silently resolve the (32-bit PC-relative) > relocation targeting an undefined weak symbol, yielding at > run-time a non-zero address? While I can see the point of Do you have a testcase? I can't reproduce it. If it is true, I consider it a linker bug. H.J.
Re: Google group for generic System V Application Binary Interface
On Wed, Sep 27, 2006 at 03:32:45PM -0700, H. J. Lu wrote: > I created a Google group to discuss generic ABI: > > http://groups.google.com/group/generic-abi > > It is by membership only. Let me know if you are interested. What's this supposed to be? Reinventing the doomed iBCS2?
[RFC] Changing labels of TV_* timers?
Some of the labels in TV_* timers are fairly long and mess up the column display. Would it be a problem for anyone if I changed them during the next stage1? It's only a cosmetic change, but I can see it affecting people's scripts. Thanks.
Re: [RFC] Changing labels of TV_* timers?
On Thu, 2006-09-28 at 10:09 -0400, Diego Novillo wrote: > Some of the labels in TV_* timers are fairly long and mess up the column > display. Would it be a problem for anyone if I changed them during the > next stage1? Why not instead increase the column size? -- Pinski
Re: Google group for generic System V Application Binary Interface
On Thu, Sep 28, 2006 at 02:53:30PM +0100, Christoph Hellwig wrote: > On Wed, Sep 27, 2006 at 03:32:45PM -0700, H. J. Lu wrote: > > I created a Google group to discuss generic ABI: > > > > http://groups.google.com/group/generic-abi > > > > It is by membership only. Let me know if you are interested. > > What's this supposed to be? Reinventing the doomed iBCS2? Not at all. It is for generic ABI which is processor independent. However, the current i386 psABI doesn't really reflect/cover what have been added to i386 like MMX and SSE. Also gcc uses 16byte stack alignment, instead of 4byte, for SSE. Should we create a Google group for ia32 psABI? H.J.
Re: [RFC] Changing labels of TV_* timers?
Andrew Pinski wrote on 09/28/06 10:11: > On Thu, 2006-09-28 at 10:09 -0400, Diego Novillo wrote: >> Some of the labels in TV_* timers are fairly long and mess up the column >> display. Would it be a problem for anyone if I changed them during the >> next stage1? > > Why not instead increase the column size? > Well, I don't want to get into >80 column issues. I know they are still pervasive (OK, so I prefer tiling 3 80-col windows to a single 240-col window).
Re: Visibility=hidden for x86_64 Xen builds -- problems?
On 28/9/06 14:24, "H. J. Lu" <[EMAIL PROTECTED]> wrote: > On Thu, Sep 28, 2006 at 10:45:38AM +0100, Jan Beulich wrote: >> >> 2) Why does the linker silently resolve the (32-bit PC-relative) >> relocation targeting an undefined weak symbol, yielding at >> run-time a non-zero address? While I can see the point of > > Do you have a testcase? I can't reproduce it. If it is true, I consider > it a linker bug. Compile and link the attached C program as follows. I used gcc-4.1.1 and binutils-2.17, but gcc >= 4.0.0 and binutils >= 2.16 probably suffice. # gcc -fpic -o test.o -c test.c # ld -Ttext 1 -o test test.o Disassembly of the result trivially shows that the address of weak symbol 'x' is 0x1. -- Keir test.c Description: Binary data
Re: Fwd: Re: Visibility=hidden for x86_64 Xen builds -- problems?
>>> "H. J. Lu" <[EMAIL PROTECTED]> 28.09.06 15:24 >>> >On Thu, Sep 28, 2006 at 10:45:38AM +0100, Jan Beulich wrote: >> >> 2) Why does the linker silently resolve the (32-bit PC-relative) >> relocation targeting an undefined weak symbol, yielding at >> run-time a non-zero address? While I can see the point of > >Do you have a testcase? I can't reproduce it. If it is true, I consider >it a linker bug. Attached. The linker script likely is not minimal, but I think the important point is that it sets the origin to a non-zero value. Compiling this with gcc 4.1.1 (-c -fPIC) and linking with ld 2.17 (no other options than those necessary to specify input and output) succeeds, while linking with ld 2.15 fails (due to relocation overflow). But again, if this is plainly a linker bug, then the compiler also must not access weak objects through RIP-relative addressing (i.e. then we also have a compiler bug here), while I continue to think that the fact that there is a 'hidden' attribute should allow the compiler to do better than going through GOT (at the expense of a new relocation type). Jan got.lds Description: Binary data got.c Description: Binary data
Re: Visibility=hidden for x86_64 Xen builds -- problems?
> Compile and link the attached C program as follows. I used gcc-4.1.1 and > binutils-2.17, but gcc >= 4.0.0 and binutils >= 2.16 probably suffice. > > # gcc -fpic -o test.o -c test.c > # ld -Ttext 1 -o test test.o > > Disassembly of the result trivially shows that the address of weak symbol 'x' > is 0x1. By the way, experimentation with the address of the text section shows that the weak symbol's address is resolved to the nearest 4GB-aligned address (nearest to what I'm not sure -- RIP? Section start?). It may get rounded up or down, whichever is nearest. -- Keir
Re: Splay Tree
I looked at the splay tree code in revision 106584. It doesn't appear to actually be doing a top down splay. It is performing a top down partition of the tree but without the splay step. This should cause some cases to perform quite badly. I'm pretty sure my original patch does the top down splay correctly. __ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com
Re: [RFC] Program Bounds Checking
You write you needs 6 assembly instructions to check a pointer on x86, I am using the "bound" ia32 instruction (1 byte opcode 0x62, invalid in ia64) to check the stack pointer for few years now in Gujin (http://gujin.org) without problem. I am doing this kind of thing to guard against stack overflow (I do not have a too big stack in my bootloader): struct { signed low_limit; signed high_limit; } __attribute__ ((packed)) stack_limit; extern inline void bound_stack (void) { /* * limit included - but add 2 to high limit for reg16, and 4 for reg32 * if not in bound, exception #BR generated (INT5). * iret from INT5 will retry the bound instruction. */ asm volatile (" bound %%esp,%0 " : : "m" (stack_limit) ); } void fct (int arg) { bound_stack () { int cpt = 0; // do some stuff... }} I bet there is a huge penalty if the value is not inside the limit... I have a simple "dump registers" handler for this interrupt/exception. In my case, I would have liked to have a function attribute like: void fct (int arg) __attribute__((bound_stack(stack_limit))) { // do some stuff... } because some assembly instructions cross the bound asm (like initialisation of local variable) and if the stack has already overflowed some data is destroyed before the INT 05 is taken... Also if the function is inlined the stack checking does not mean a lot and should be discarded. A problem of this bound test is that the limits are signed, but for some tests it does not matter. I do not know how long it takes if the value is in the limit. Just a comment, Etienne. ___ Découvrez un nouveau moyen de poser toutes vos questions quelque soit le sujet ! Yahoo! Questions/Réponses pour partager vos connaissances, vos opinions et vos expériences. http://fr.answers.yahoo.com
Re: [RFC] Program Bounds Checking
We have considered the bound instruction in the CASH project. But we found that bound instruction is slower than the six normal instructions it is meant to replace for range checking. For example, the bound instruction on a 1.1 GHz PIII machine requires 7-8 clock cycles while the 6 equivalent instructions require 6-7 clock cycles. We have not tested it on newer processors, though. Tzi-cker On Thu, 28 Sep 2006, Etienne Lorrain wrote: > You write you needs 6 assembly instructions to check a pointer on x86, > I am using the "bound" ia32 instruction (1 byte opcode 0x62, invalid in ia64) > to check the stack pointer for few years now in Gujin (http://gujin.org) > without > problem. > > I am doing this kind of thing to guard against stack overflow (I do not have > a too big stack in my bootloader): > > struct { > signed low_limit; > signed high_limit; > } __attribute__ ((packed)) stack_limit; > > extern inline void bound_stack (void) > { > /* > * limit included - but add 2 to high limit for reg16, and 4 for reg32 > * if not in bound, exception #BR generated (INT5). > * iret from INT5 will retry the bound instruction. > */ > asm volatile (" bound %%esp,%0 " : : "m" (stack_limit) ); > } > > void fct (int arg) > { bound_stack () { > int cpt = 0; > // do some stuff... > }} > > I bet there is a huge penalty if the value is not inside the limit... > I have a simple "dump registers" handler for this interrupt/exception. > > In my case, I would have liked to have a function attribute like: > void fct (int arg) __attribute__((bound_stack(stack_limit))) > { > // do some stuff... > } > because some assembly instructions cross the bound asm (like initialisation > of local variable) and if the stack has already overflowed some data is > destroyed > before the INT 05 is taken... Also if the function is inlined the stack > checking > does not mean a lot and should be discarded. > > A problem of this bound test is that the limits are signed, but for some > tests it > does not matter. I do not know how long it takes if the value is in the > limit. > > Just a comment, > Etienne. > > > > > > > ___ > Découvrez un nouveau moyen de poser toutes vos questions quelque soit le > sujet ! > Yahoo! Questions/Réponses pour partager vos connaissances, vos opinions et > vos expériences. > http://fr.answers.yahoo.com > >
Re: Google group for generic System V Application Binary Interface
On Thu, Sep 28, 2006 at 07:11:25AM -0700, H. J. Lu wrote: > On Thu, Sep 28, 2006 at 02:53:30PM +0100, Christoph Hellwig wrote: > > On Wed, Sep 27, 2006 at 03:32:45PM -0700, H. J. Lu wrote: > > > I created a Google group to discuss generic ABI: > > > > > > http://groups.google.com/group/generic-abi > > > > > > It is by membership only. Let me know if you are interested. > > > > What's this supposed to be? Reinventing the doomed iBCS2? > > Not at all. It is for generic ABI which is processor independent. > However, the current i386 psABI doesn't really reflect/cover what > have been added to i386 like MMX and SSE. Also gcc uses 16byte > stack alignment, instead of 4byte, for SSE. Should we create a > Google group for ia32 psABI? Is this supposed to be for gcc/binutils, or is it supposed to be processor-independent? And why a closed list? Please don't go down the path of re-creating what we rebelled against when we started egcs. Also, if there's a need to crosspost a message betwen your new list and a gcc or binutils list, the message will bounce. If it is for the free toolchain, and you really feel that a new list is needed because it crosses gcc/binutils boundaries, a new list hosted off of gcc.gnu.org would be better than a list that is polluted with Yahoo's ads.
Re: [RFC] Program Bounds Checking
On Thu, 28 Sep 2006, Tzi-cker Chiueh wrote: We have considered the bound instruction in the CASH project. But we found that bound instruction is slower than the six normal instructions it is meant to replace for range checking. For example, the bound instruction on a 1.1 GHz PIII machine requires 7-8 clock cycles while the 6 equivalent instructions require 6-7 clock cycles. We have not tested it on newer processors, though. I would guess it would be as slow or worse. 'bound' is an extremely rarely used instruction, and so will not be optimised-for at all. Nick
Re: [RFC] Program Bounds Checking
On Thu, Sep 28, 2006 at 12:52:18PM -0400, Tzi-cker Chiueh wrote: > We have considered the bound instruction in the CASH project. But > we found that bound instruction is slower than the six normal > instructions it is meant to replace for range checking. For example, the > bound instruction on a 1.1 GHz PIII machine requires 7-8 clock cycles > while the 6 equivalent instructions require 6-7 clock cycles. We have not > tested it on newer processors, though. There's also the cache effect, in that larger code will cause more cache misses. If the penalty for using the shorter code sequence is only one cycle out of 7, you might still want to support it, or at least if -Os is specified; large codes might even run faster despite the penalty if there are fewer cache misses.
Re: Google group for generic System V Application Binary Interface
On Thu, Sep 28, 2006 at 09:54:10AM -0700, Joe Buck wrote: > On Thu, Sep 28, 2006 at 07:11:25AM -0700, H. J. Lu wrote: > > On Thu, Sep 28, 2006 at 02:53:30PM +0100, Christoph Hellwig wrote: > > > On Wed, Sep 27, 2006 at 03:32:45PM -0700, H. J. Lu wrote: > > > > I created a Google group to discuss generic ABI: > > > > > > > > http://groups.google.com/group/generic-abi > > > > > > > > It is by membership only. Let me know if you are interested. > > > > > > What's this supposed to be? Reinventing the doomed iBCS2? > > > > Not at all. It is for generic ABI which is processor independent. > > However, the current i386 psABI doesn't really reflect/cover what > > have been added to i386 like MMX and SSE. Also gcc uses 16byte > > stack alignment, instead of 4byte, for SSE. Should we create a > > Google group for ia32 psABI? > > Is this supposed to be for gcc/binutils, or is it supposed to be > processor-independent? And why a closed list? Please don't go > down the path of re-creating what we rebelled against when we started > egcs. Also, if there's a need to crosspost a message betwen your > new list and a gcc or binutils list, the message will bounce. > The ia32 psABI list will be processor-independent, not just for gcc/binutils. I thought people might be more willing to discuss things among people who are interested. H.J.
Re: Google group for generic System V Application Binary Interface
On Thu, 28 Sep 2006, Joe Buck wrote: > Is this supposed to be for gcc/binutils, or is it supposed to be > processor-independent? And why a closed list? Please don't go > down the path of re-creating what we rebelled against when we started > egcs. Also, if there's a need to crosspost a message betwen your > new list and a gcc or binutils list, the message will bounce. > > If it is for the free toolchain, and you really feel that a new > list is needed because it crosses gcc/binutils boundaries, a new > list hosted off of gcc.gnu.org would be better than a list that > is polluted with Yahoo's ads. I agree that it should be an open list (or maybe separate ones for the psABIs alongside that for the gABI). I would have suggest having it on lists.freestandards.org alongside DWARF. -- Joseph S. Myers [EMAIL PROTECTED]
Re: representation of struct field offsets
On Sep 27, 2006, at 7:04 PM, Sandra Loosemore wrote: I've been having a heck of a time figuring out how to translate the offsets for struct fields from the DWARF encoding back to GCC's internal encoding for the LTO project. I've got a handle on the DWARF encoding and how to do the necessary big/little endian conversions, but for the GCC side, there doesn't seem to be any documentation about the relevant macros in the manual, and the comments in tree.h don't seem to reflect what is actually going on in the representation. For example, DECL_FIELD_OFFSET is supposed to be "the field position, counting in bytes, of the byte containing the bit closest to the beginning of the structure", while DECL_FIELD_BIT_OFFSET is supposed to be "the offset, in bits, of the first bit of the field from DECL_FIELD_OFFSET". So I'm quite puzzled why, for fields that are not bit fields and that are aligned on byte boundaries, the C front end is generating a DECL_FIELD_OFFSET that points to some byte that doesn't contain any part of the field, and a non-zero DECL_FIELD_BIT_OFFSET instead. If I make the LTO front end do what the comments in tree.h describe, then dwarf2out.c produces incorrect offsets that don't match those from the original C file. I see in stor-layout.c that there are routines to "perform computations that convert between the offset/bitpos forms and byte and bit offsets", but what exactly are these forms and which values are the ones that I should actually be storing inside the FIELD_DECL object? Is it possible to compute the DECL_OFFSET_ALIGN value somehow, given that it's not encoded in the DWARF representation? Trying to reverse-engineer dwarf2out.c isn't turning out to be very productive :-P I had to look at this recently and I wound up looking at it this way. The total bit offset is represented as (byte offset) + (8* bit offset); there are multiple ways to do that that produce the same result, and gcc's choice of which one it uses is, as you say, somewhat arbitrary. However all the different ways seem to work equivalently for codegen purposes. I'm not familiar with the dwarf representation, but the same information must be there, so perhaps the dwarf code could impose a canonical form if necessary.
RE: Google group for generic System V Application Binary Interface
HJ, I think that it's great that all the de facto changes adopted for i386 would be put in an extension or appendix to its psABI. However, I lean towards an open discussion list. If necessary, I'd be glad to investigate hosting this list at http://www.x86-64.org, even though this discussion is about the i386 psABI. ___ Evandro Menezes GNU Tools Team 512-602-9940 Advanced Micro Devices [EMAIL PROTECTED] Austin, TX
Re: [RFC] Program Bounds Checking
Tzi-cker Chiueh wrote: We have considered the bound instruction in the CASH project. But we found that bound instruction is slower than the six normal instructions it is meant to replace for range checking. For example, the bound instruction on a 1.1 GHz PIII machine requires 7-8 clock cycles while the 6 equivalent instructions require 6-7 clock cycles. We have not tested it on newer processors, though. Might still be appropriate to use it in -Os mode I would think ...
Re: Google group for generic System V Application Binary Interface
On Thu, Sep 28, 2006 at 01:34:31PM -0500, Menezes, Evandro wrote: > HJ, > > I think that it's great that all the de facto changes adopted for i386 would > be put in an extension or appendix to its psABI. > > However, I lean towards an open discussion list. If necessary, I'd be glad > to investigate hosting this list at http://www.x86-64.org, even though this > discussion is about the i386 psABI. > One thing I miss the most is the search capability in the mailing list achives. It isn't easy to find the things I am looking for in an achive. As for open group, Google group can be made open also. H.J.
RE: Google group for generic System V Application Binary Interface
On 28 September 2006 20:01, H. J. Lu wrote: > On Thu, Sep 28, 2006 at 01:34:31PM -0500, Menezes, Evandro wrote: >> HJ, >> >> I think that it's great that all the de facto changes adopted for i386 >> would be put in an extension or appendix to its psABI. >> >> However, I lean towards an open discussion list. If necessary, I'd be >> glad to investigate hosting this list at http://www.x86-64.org, even >> though this discussion is about the i386 psABI. >> > > One thing I miss the most is the search capability in the mailing > list achives. It isn't easy to find the things I am looking for > in an achive. Really? I use google to do it, even for non-google-groups list archives: site:sourceware.org inurl:binutils "search terms go here ... " You know the kind of thing. Haven't had to suffer through an htDig session in a looong time now. cheers, DaveK -- Can't think of a witty .sigline today
Re: Google group for generic System V Application Binary Interface
On Thu, Sep 28, 2006 at 12:01:01PM -0700, H. J. Lu wrote: > On Thu, Sep 28, 2006 at 01:34:31PM -0500, Menezes, Evandro wrote: > > HJ, > > > > I think that it's great that all the de facto changes adopted for i386 > > would be put in an extension or appendix to its psABI. > > > > However, I lean towards an open discussion list. If necessary, I'd be glad > > to investigate hosting this list at http://www.x86-64.org, even though this > > discussion is about the i386 psABI. > > > > One thing I miss the most is the search capability in the mailing > list achives. It isn't easy to find the things I am looking for > in an achive. For gcc lists, use google with search string site:gcc.gnu.org > As for open group, Google group can be made open also. Why do you want to use a facility that attaches an ad to every message?
Re: frame unwind issue with discontiguous code
On Sep 28, 2006, at 4:26 AM, Jan Beulich wrote: While I'm not certain whether gcc is able to split one function's code between different sections Kinda, sorta... Hot-cold partitioning (-freorder-blocks-and- partition) does this. If one exposed a FE language construct to so tag code, then the hot-cold partitioner could make use of that information. Naturally, when complete, it would have to manage the debug information and the unwind information. As I recall, it might not be that complete yet. I think exposing a language feature so that you can tell the compiler to so split code would be better than trying to do it behind the compiler's back. If you profiled and tweaked and tuned, you might be able to get the compiler to partition the code for you, though, having that sticky in the source base so that everyone can just compile it, is well, a dream.
Re: representation of struct field offsets
Sandra Loosemore wrote: I've been having a heck of a time figuring out how to translate the offsets for struct fields from the DWARF encoding back to GCC's internal encoding for the LTO project. Yes, that's a nasty bit. I think the DECL_FIELD_OFFSET/DECL_FIELD_BIT_OFFSET stuff is, quite simply, mis-designed. The way I think it should work is for DECL_FIELD_OFFSET to be the byte offset, and DECL_FIELD_BIT_OFFSET to be the bit offset, always less than BITS_PER_UNIT. But, that's not how it actually works. Instead, the BIT_OFFSET is kept below the alignment of the field, rather than BITS_PER_UNIT. The bit of dwarf2out.c that emits the offset for the field is add_data_member_location_attribute. It uses dwarf2out.c:field_byte_offset, which is the function that normalizes the weird GCC representation into the obvious one. I don't know why it's using a custom function; I would think it should just use tree.c:byte_position. The current DWARF code looks oddly heuristic. But that doesn't explain why you're not getting idempotent results. Are you going through the stor_layout.c:place_field routines when creating structure types? If so, I wouldn't; here, you know where stuff is supposed to go, so I would just put it there, and set DECL_FIELD_OFFSET, etc., accordingly. My bet is that you are not setting DECL_ALIGN, or that we have failed to set TYPE_ALIGN somewhere, and that, therefore, the heuristics in dwarf2out.c:field_byte_offset are getting confused. For example, simple_type_align_in_bits might not be working. I would probably step through field_byte_offset both when compiling C and in LTO mode, and try to see where it goes different. It shouldn't be necessary as part of this work, but I can't see why we should just replace field_byte_offset with a use of byte_position. Does anyone else know? -- Mark Mitchell CodeSourcery [EMAIL PROTECTED] (650) 331-3385 x713
Re: representation of struct field offsets
On Sep 28, 2006, at 1:43 PM, Mark Mitchell wrote: Sandra Loosemore wrote: I've been having a heck of a time figuring out how to translate the offsets for struct fields from the DWARF encoding back to GCC's internal encoding for the LTO project. Yes, that's a nasty bit. I think the DECL_FIELD_OFFSET/DECL_FIELD_BIT_OFFSET stuff is, quite simply, mis-designed. The way I think it should work is for DECL_FIELD_OFFSET to be the byte offset, and DECL_FIELD_BIT_OFFSET to be the bit offset, always less than BITS_PER_UNIT. An alternative design, which would save a field, is just to keep the offset of a field, in bits, from the start of the structure. The only trouble you'll probably run into is with fields whose offset from the start of a structure is variable. -Chris
Re: frame unwind issue with discontiguous code
On Thu, 2006-09-28 at 13:26 +0200, Jan Beulich wrote: > While I'm not certain whether gcc is able to split one function's code > between different sections Yes. See the -freorder-blocks-and-partition option, which can move code to hot/cold sections. > However, when frame unwind information is generated, one quickly > becomes aware of a problem with this Yes, this has been a known problem for a long time. Unfortunately, I don't know if anyone has ever tried to solve it. Here is gcc's current solution: > aretha$ ./xgcc -B./ -g -freorder-blocks-and-partition -S tmp.c -funwind-tables > cc1: note: -freorder-blocks-and-partition does not support unwind info -- Jim Wilson, GNU Tools Support, http://www.specifix.com
Re: representation of struct field offsets
Chris Lattner wrote: An alternative design, which would save a field, is just to keep the offset of a field, in bits, from the start of the structure. Yes, that would also work. But, in many cases, you need the byte offset, so there's a time/space tradeoff. Also, because of GCC's internal representation of integers, you have to be careful that you have enough bits; for example, you need 72 bits to represent things in a 64-bit address space. -- Mark Mitchell CodeSourcery [EMAIL PROTECTED] (650) 331-3385 x713
Re: representation of struct field offsets
On Sep 28, 2006, at 1:58 PM, Mark Mitchell wrote: Chris Lattner wrote: An alternative design, which would save a field, is just to keep the offset of a field, in bits, from the start of the structure. Yes, that would also work. But, in many cases, you need the byte offset, so there's a time/space tradeoff. Yup, that's true. Also, because of GCC's internal representation of integers, you have to be careful that you have enough bits; for example, you need 72 bits to represent things in a 64-bit address space. Actually, just 67, right? Does GCC support structures whose size is greater than 2^61 ? -Chris
Re: representation of struct field offsets
Mark Mitchell wrote: Are you going through the stor_layout.c:place_field routines when creating structure types? If so, I wouldn't; here, you know where stuff is supposed to go, so I would just put it there, and set DECL_FIELD_OFFSET, etc., accordingly. No, I'm not using the fancy stor_layout.c stuff. As you say, I already have the bit offsets. Also, the DWARF representation doesn't necessarily include all the bits of information that the layout algorithm uses, so it can't reliably reproduce the same layout just given the types and sizes of (possibly only some of) the fields. Anyway, I've made a little more progress; Google pointed me at some past discussion of this issue and from there a mention that the Ada front end allows explicit placement of fields in a record, and I was able to swipe that bit of code. I've got non-bitfields to work properly, but I'm still working on debugging the bitfield case. -Sandra
Re: representation of struct field offsets
Chris Lattner wrote: Also, because of GCC's internal representation of integers, you have to be careful that you have enough bits; for example, you need 72 bits to represent things in a 64-bit address space. Actually, just 67, right? Does GCC support structures whose size is greater than 2^61 ? I'm not sure -- but if it doesn't, it should. There are folks who like to make structures corresponding to the entire address space, and then poke at particular bytes by using fields. -- Mark Mitchell CodeSourcery [EMAIL PROTECTED] (650) 331-3385 x713
gcc-4.0-20060928 is now available
Snapshot gcc-4.0-20060928 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/4.0-20060928/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 4.0 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_0-branch revision 117290 You'll find: gcc-4.0-20060928.tar.bz2 Complete GCC (includes all of below) gcc-core-4.0-20060928.tar.bz2 C front end and core compiler gcc-ada-4.0-20060928.tar.bz2 Ada front end and runtime gcc-fortran-4.0-20060928.tar.bz2 Fortran front end and runtime gcc-g++-4.0-20060928.tar.bz2 C++ front end and runtime gcc-java-4.0-20060928.tar.bz2 Java front end and runtime gcc-objc-4.0-20060928.tar.bz2 Objective-C front end and runtime gcc-testsuite-4.0-20060928.tar.bz2The GCC testsuite Diffs from 4.0-20060921 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-4.0 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
Re: IPA branch
Razya Ladelsky wrote: Except for new optimizations, IPCP (currently on mainline) should also be transformed to SSA. IPCP in SSA code exists on IPA branch, and will be submitted to GCC4.3 after IPA branch is committed and some testsuite regressions failing with IPCP+versioning+inlining are fixed. Is there a project page for this work? Thanks, -- Mark Mitchell CodeSourcery [EMAIL PROTECTED] (650) 331-3385 x713
GCC 4.3 project to merge representation changes
Kazu, Sandra -- I don't believe there is a GCC 4.3 project page to merge the work that you folks did on CALL_EXPRs and TYPE_ARG_TYPEs. Would one of you please create a Wiki page for that? Thanks, -- Mark Mitchell CodeSourcery [EMAIL PROTECTED] (650) 331-3385 x713
Re: GCC 4.3 project to merge representation changes
Mark Mitchell wrote: I don't believe there is a GCC 4.3 project page to merge the work that you folks did on CALL_EXPRs and TYPE_ARG_TYPEs. Would one of you please create a Wiki page for that? There are already a bunch of notes about this on the LTO page: http://gcc.gnu.org/wiki/LinkTimeOptimization -Sandra
Re: GCC 4.3 project to merge representation changes
Sandra Loosemore wrote: Mark Mitchell wrote: I don't believe there is a GCC 4.3 project page to merge the work that you folks did on CALL_EXPRs and TYPE_ARG_TYPEs. Would one of you please create a Wiki page for that? There are already a bunch of notes about this on the LTO page: http://gcc.gnu.org/wiki/LinkTimeOptimization Yes -- but I was hoping to get a page here: http://gcc.gnu.org/wiki/GCC_4.3_Release_Planning (at the bottom, under Uncategorized Projects) for the project of merging the changes into GCC 4.3. Thanks, -- Mark Mitchell CodeSourcery [EMAIL PROTECTED] (650) 331-3385 x713
Re: [discuss] Re: Visibility=hidden for x86_64 Xen builds -- problems?
On Thu, Sep 28, 2006 at 03:33:05PM +0100, Keir Fraser wrote: > > > Compile and link the attached C program as follows. I used gcc-4.1.1 and > > binutils-2.17, but gcc >= 4.0.0 and binutils >= 2.16 probably suffice. > > > > # gcc -fpic -o test.o -c test.c > > # ld -Ttext 1 -o test test.o > > > > Disassembly of the result trivially shows that the address of weak symbol > > 'x' > > is 0x1. > > By the way, experimentation with the address of the text section shows that > the weak symbol's address is resolved to the nearest 4GB-aligned address > (nearest to what I'm not sure -- RIP? Section start?). It may get rounded up > or down, whichever is nearest. You are asking for impossible: [EMAIL PROTECTED] weak-4]$ objdump -dr foo.o foo.o: file format elf64-x86-64 Disassembly of section .text: <_start>: 0: 55 push %rbp 1: 48 89 e5mov%rsp,%rbp 4: 48 8d 05 00 00 00 00lea0(%rip),%rax# b <_start+0xb> 7: R_X86_64_PC32x+0xfffc b: c9 leaveq c: c3 retq R_X86_64_PC32 only supports signed 32bit offset. 0x1 is more than 32bit. The linker should issue an error, at least a warning. You can take your pick and I will fix the linker. If no one objects, I will make it an error. H.J.
Re: Missing elements in VECTOR_CST
Hans-Peter Nilsson wrote: On Mon, 18 Sep 2006, Mark Mitchell wrote: Andrew Pinski wrote: The documention on VECTOR_CST is not clear if we can have missing elements in that the remaining elements are zero. Right we produce such VECTOR_CST for things like: #define vector __attribute__((vector_size(16) )) vector int a = {1, 2}; But is that valid? We currently produce a VECTOR_CST with just two elements instead of 4. Should we always have the same number of elements in a VECTOR_CST as there are elements in the vector type? I think it is reasonable for front-ends to elide initializers and to follow the usual C semantics that elided initializers are (a) zero, if the constant is appearing as an initializer for static storage, or (b) unspecified, "random" values elsewhere. Maybe you didn't mean what I read, but it's not just "for static storage". By my reading (of the May 6, 2005 ISO/IEC 9899:TC2 for reference), all items in arrays and named structure members not mentioned in the initializer should be 0-initialized (the "all subobjects that are not initialized explicitly shall be initialized implicitly the same as objects that have static storage duration" part in 6.7.8:19). No, I meant what I said, and you read it correctly. I think that front ends should be allowed to omits zeros for initializers for variables with static storage duration, but not other initializers, independent of what C99 says. The reason is that this matches traditional linker semantics; uninitialized variables with static storage duration are zeroed. Also, if we did allow front ends to implicitly zero local variables, we would have to provide a way to allow them to override that and just say "uninitialized", to avoid pessimizing the code. This consideration doesn't apply to variables with static storage duration. -- Mark Mitchell CodeSourcery [EMAIL PROTECTED] (650) 331-3385 x713
Re: IPA branch
Jan Hubicka wrote: I intended to write the overview in a way to express that some work will be needed. Thank you for the detailed explanation. I think your plans all sound reasonable. I would definitely encourage you to start preparing patches and submitting them for review -- and hounding reviewers! -- as soon as possible. Thanks, -- Mark Mitchell CodeSourcery [EMAIL PROTECTED] (650) 331-3385 x713
Re: Missing elements in VECTOR_CST
On Thu, 28 Sep 2006, Mark Mitchell wrote: > Hans-Peter Nilsson wrote: > > On Mon, 18 Sep 2006, Mark Mitchell wrote: > > > >> Andrew Pinski wrote: > >>> The documention on VECTOR_CST is not clear if we can have missing > >>> elements in that the remaining elements are zero. Right we produce such > >>> VECTOR_CST for things like: > >>> #define vector __attribute__((vector_size(16) )) > >>> vector int a = {1, 2}; > >>> > >>> But is that valid? We currently produce a VECTOR_CST with just two > >>> elements instead of 4. Should we always have the same number of > >>> elements in a VECTOR_CST as there are elements in the vector type? > >> I think it is reasonable for front-ends to elide initializers and to > >> follow the usual C semantics that elided initializers are (a) zero, if > >> the constant is appearing as an initializer for static storage, or (b) > >> unspecified, "random" values elsewhere. > > > > Maybe you didn't mean what I read, but it's not just "for static > > storage". By my reading (of the May 6, 2005 ISO/IEC 9899:TC2 > > for reference), all items in arrays and named structure members > > not mentioned in the initializer should be 0-initialized (the > > "all subobjects that are not initialized explicitly shall be > > initialized implicitly the same as objects that have static > > storage duration" part in 6.7.8:19). > > No, I meant what I said, and you read it correctly. > > I think that front ends should be allowed to omits zeros for > initializers for variables with static storage duration, but not other > initializers, independent of what C99 says. I think we "read past each other". I was just countering what (I read as) your statement that for C semantics, only omitted subobjects in initialiers of objects with *static storage* are zero-initialized; i.e. that others are ok to be left undefined by an implementation. (Not to be mixed up with the interface between GCC front-ends, back-end and linker.) brgds, H-P
Re: representation of struct field offsets
> The only trouble you'll probably run into is with fields whose offset > from the start of a structure is variable. Exactly. That's the reason it's defined the way it is. There is no way to synthesize that field from any other in the FIELD_DECL in the most general case: it is unique information. I'm traveling now, but should be able to give some examples of this early next week.
Re: Missing elements in VECTOR_CST
Hans-Peter Nilsson wrote: I think that front ends should be allowed to omits zeros for initializers for variables with static storage duration, but not other initializers, independent of what C99 says. I think we "read past each other". I was just countering what (I read as) your statement that for C semantics, only omitted subobjects in initialiers of objects with *static storage* are zero-initialized; i.e. that others are ok to be left undefined by an implementation. Right, I think we were at cross-purposes; to be clear, I wasn't commenting at all about C semantics, but rather about what I think the internal GCC IR semantics should be. Thanks, -- Mark Mitchell CodeSourcery [EMAIL PROTECTED] (650) 331-3385 x713
Re: GCC 4.3 project to merge representation changes
Hi Mark, I don't believe there is a GCC 4.3 project page to merge the work that you folks did on CALL_EXPRs and TYPE_ARG_TYPEs. Would one of you please create a Wiki page for that? What would you suggest me to do for missing bits? Specifically, most backends with the exception of x86 are broken because I haven't converted uses of TYPE_ARG_TYPEs in those backends. The ARM port was broken at the time of branch creation. The Java frontend uses a flag within the TREE_LIST object that makes up TYPE_ARG_TYPEs, so it is blocking the propsed merge. (Java maintainers are planning to fix this in future.) So, Sandra's CALL_EXPEs stuff may be ready for merge, but my TYPE_ARG_TYPE stuff has a somewhat long way to go. Thanks, Kazu Hirata
Re: GCC 4.3 project to merge representation changes
Kazu Hirata wrote: Hi Mark, I don't believe there is a GCC 4.3 project page to merge the work that you folks did on CALL_EXPRs and TYPE_ARG_TYPEs. Would one of you please create a Wiki page for that? What would you suggest me to do for missing bits? Specifically, most backends with the exception of x86 are broken because I haven't converted uses of TYPE_ARG_TYPEs in those backends. The ARM port was broken at the time of branch creation. The Java frontend uses a flag within the TREE_LIST object that makes up TYPE_ARG_TYPEs, so it is blocking the propsed merge. (Java maintainers are planning to fix this in future.) So, Sandra's CALL_EXPEs stuff may be ready for merge, but my TYPE_ARG_TYPE stuff has a somewhat long way to go. Yes, I agree that Sandra's stuff is closer. I would hope that with the ECJ conversion (planned for Stage 1), the Java issue goes away. So, that would leave you with fixing the various other back-ends, which should be mechanical, but perhaps somewhat time-consuming. I'd also hope that other folks would volunteer to help with some of that work, since getting it done will make the compiler more efficient, which is better for everyone. -- Mark Mitchell CodeSourcery [EMAIL PROTECTED] (650) 331-3385 x713
paired register loads and stores
rs6000 and Sparc ports seem to use a peephole2 to get the ldd or lfq instructions (respectively), but it looks like there's no reason for the register allocater to allocate registers together. The peephole2 just picks up loads to adjacent memory locations if the allocater happens to choose adjacent registers (is that correct?) or the variables are specified as living in hard registers with the help of an asm. Several other architectures have paired loads: some ARM targets have ldrd which can be cheaper than a ldm, and ia64 has a pair load. It seems like GCC does a good job of knowing how to modify register- sized subregs of two- or four-register larger modes. So if I could tell GCC to turn: [(set (reg:SI X) (mem:SI (addr))) (set (reg:SI Y) (mem:SI (addr+4)))] (where addr is aligned to DI) into something like: [(set (reg:DI T) (mem:DI (addr))) (set (reg:SI X) (subreg:SI (reg:DI T) 0)) (set (reg:SI Y) (subreg:SI (reg:DI T) 4))] and I could do so early enough, GCC would know to access the subregs directly in instruction(s) using the loaded values, and I would end up loading the register pair and using the individual elements. But it has to be done early on; after register allocation even if I could get a DI temporary I'd probably have the two SI moves and that's probably not a win. I've tinkered with splits but can't seem to get it to work. And I'm aware that trying to do it too early might be bad, because pseudo's might not be alive in the future, or might be in memory. But you can't do it at peephole2, because the registers won't be paired. Any ideas? Am I going at it from the right angle? -- Why are ``tolerant'' people so intolerant of intolerant people?
Re: [discuss] Re: Visibility=hidden for x86_64 Xen builds -- problems?
On 29/9/06 1:40 am, "H. J. Lu" <[EMAIL PROTECTED]> wrote: > You are asking for impossible: If the compiler emitted accesses via the GOT for weak symbols then there wouldn't be a problem. The compiler doesn't know the final link address though, so it'd have to be conservative. Perhaps it's not worth it for a case that noone much cares about. > R_X86_64_PC32 only supports signed 32bit offset. 0x1 is more > than 32bit. The linker should issue an error, at least a warning. You > can take your pick and I will fix the linker. If no one objects, I > will make it an error. That's fine with us. I fixed Xen to no longer use weak symbols. -- Keir