Re: [gimple-classes, committed 4/6] tree-ssa-tail-merge.c: Use gassign
On Thu, Nov 13, 2014 at 2:41 AM, David Malcolm wrote: > On Tue, 2014-11-11 at 11:43 +0100, Richard Biener wrote: >> On Tue, Nov 11, 2014 at 8:26 AM, Jakub Jelinek wrote: >> > On Mon, Nov 10, 2014 at 05:27:50PM -0500, David Malcolm wrote: >> >> On Sat, 2014-11-08 at 14:56 +0100, Jakub Jelinek wrote: >> >> > On Sat, Nov 08, 2014 at 01:07:28PM +0100, Richard Biener wrote: >> >> > > To be constructive here - the above case is from within a >> >> > > GIMPLE_ASSIGN case label >> >> > > and thus I'd have expected >> >> > > >> >> > > case GIMPLE_ASSIGN: >> >> > > { >> >> > > gassign *a1 = as_a (s1); >> >> > > gassign *a2 = as_a (s2); >> >> > > lhs1 = gimple_assign_lhs (a1); >> >> > > lhs2 = gimple_assign_lhs (a2); >> >> > > if (TREE_CODE (lhs1) != SSA_NAME >> >> > > && TREE_CODE (lhs2) != SSA_NAME) >> >> > > return (operand_equal_p (lhs1, lhs2, 0) >> >> > > && gimple_operand_equal_value_p (gimple_assign_rhs1 >> >> > > (a1), >> >> > > gimple_assign_rhs1 >> >> > > (a2))); >> >> > > else if (TREE_CODE (lhs1) == SSA_NAME >> >> > >&& TREE_CODE (lhs2) == SSA_NAME) >> >> > > return vn_valueize (lhs1) == vn_valueize (lhs2); >> >> > > return false; >> >> > > } >> >> > > >> >> > > instead. That's the kind of changes I have expected and have >> >> > > approved of. >> >> > >> >> > But even that looks like just adding extra work for all developers, >> >> > with no >> >> > gain. You only have to add extra code and extra temporaries, in >> >> > switches >> >> > typically also have to add {} because of the temporaries and thus extra >> >> > indentation level, and it doesn't simplify anything in the code. >> >> >> >> The branch attempts to use the C++ typesystem to capture information >> >> about the kinds of gimple statement we expect, both: >> >> (A) so that the compiler can detect type errors, and >> >> (B) as a comprehension aid to the human reader of the code >> >> >> >> The ideal here is when function params and struct field can be >> >> strengthened from "gimple" to a subclass ptr. This captures the >> >> knowledge that every use of a function or within a struct has a given >> >> gimple code. >> > >> > I just don't like all the as_a/is_a stuff enforced everywhere, >> > it means more typing, more temporaries, more indentation. >> > So, as I view it, instead of the checks being done cheaply (yes, I think >> > the gimple checking as we have right now is very cheap) under the >> > hood by the accessors (gimple_assign_{lhs,rhs1} etc.), those changes >> > put the burden on the developers, who has to check that manually through >> > the as_a/is_a stuff everywhere, more typing and uglier syntax. >> > I just don't see that as a step forward, instead a huge step backwards. >> > But perhaps I'm alone with this. >> > Can you e.g. compare the size of - lines in your patchset combined, and >> > size of + lines in your patchset? As in, if your changes lead to less >> > typing or more. >> >> I see two ways out here. One is to add overloads to all the functions >> taking the special types like >> >> tree >> gimple_assign_rhs1 (gimple *); >> >> or simply add >> >> gassign *operator ()(gimple *g) { return as_a (g); } >> >> into a gimple-compat.h header which you include in places that >> are not converted "nicely". > > Thanks for the suggestions. > > Am I missing something, or is the gimple-compat.h idea above not valid C > ++? > > Note that "gimple" is still a typedef to > gimple_statement_base * > (as noted before, the gimple -> gimple * change would break everyone > else's patches, so we talked about that as a followup patch for early > stage3). > > Given that, if I try to create an "operator ()" outside of a class, I > get this error: > > ‘gassign* operator()(gimple)’ must be a nonstatic member function > > which is emitted from cp/decl.c's grok_op_properties: > /* An operator function must either be a non-static member function > or have at least one parameter of a class, a reference to a class, > an enumeration, or a reference to an enumeration. 13.4.0.6 */ > > I tried making it a member function of gimple_statement_base, but that > doesn't work either: we want a conversion > from a gimple_statement_base * to a gassign *, not > from a gimple_statement_base to a gassign *. > > Is there some syntactic trick here that I'm missing? Sorry if I'm being > dumb (I can imagine there's a way of doing it by making "gimple" become > some kind of wrapped ptr class, but that way lies madness, surely). Hmm. struct assign; struct base { operator assign *() const { return (assign *)this; } }; struct assign : base { }; void foo (assign *); void bar (base *b) { foo (b); } doesn't work, but void bar (base &b) { foo (b); } does. Indeed C++ doesn't seem to provide what is necessary for the compat trick :( So the gimple-compat.
Re: [gimple-classes, committed 4/6] tree-ssa-tail-merge.c: Use gassign
On 13 November 2014 10:45, Richard Biener wrote: > > Hmm. > > struct assign; > struct base { > operator assign *() const { return (assign *)this; } > }; > struct assign : base { > }; > > void foo (assign *); > void bar (base *b) > { > foo (b); > } > > doesn't work, but > > void bar (base &b) > { > foo (b); > } > > does. Indeed C++ doesn't seem to provide what is necessary > for the compat trick :( Right, base* is a built-in type, you can't call a member function on it. There is no implicit conversion between unrelated pointer types, and no implicit conversion from base* to base& that would be necessary to call the conversion operator of base.
Re: [RFD] Using the 'memory constraint' trick to avoid memory clobber doesn't work
Sorry for the (very) delayed response. I'm still looking for feedback here so I can fix the docs. To refresh: The topic of conversation was the (extremely) wrong explanation that has been in the docs since forever about how to use memory constraints with inline asm to avoid the performance hit of a full memory clobber. Trying to understand how this really works has led to some surprising results. Me: >> While I really like the idea of using memory constraints to avoid all out >> memory clobbers, 16 bytes is a pretty small maximum memory block, and x86 >> only supports a max of 8. Unless there's some way to use larger sizes (say >> SSIZE_MAX), this feature hardly seems worth documenting. Richard: > I wonder how you figured out that a 12 byte clobber performs a full > memory clobber? Here's the code (compiled with gcc version 4.9.0 x86_64-win32-seh-rev2, using -m64 -O2 -fdump-final-insns): #include #define MYSIZE 3 inline void __stosb(unsigned char *Dest, unsigned char Data, size_t Count) { struct _reallybigstruct { char x[MYSIZE]; } *p = (struct _reallybigstruct *)Dest; __asm__ __volatile__ ("rep stos{b|b}" : "+D" (Dest), "+c" (Count), "=m" (*p) : [Data] "a" (Data) //: "memory" ); } int main() { unsigned char buff[100]; buff[5] = 'A'; __stosb(buff, 'B', sizeof(buff)); printf("%c\n", buff[5]); } In summary: 1) Create a 100 byte buffer, and set buff[5] to 'A'. 2) Call __stosb, which uses inline asm to overwrite all of buff with 'B'. 3) Use a memory constraint in __stosb to flush buff. The size of the memory constraint is controlled by a #define. With this, I have a simple way to test various sizes of memory constraints to see if the buffer gets flushed. If it *is* flushing the buffer, printing buff[5] after __stosb will print 'B'. If it is *not* flushing, it will print 'A'. Results: - Since buff[5] is the 6th byte in the buffer, using memory constraint sizes of 1, 2 & 4 (not surprisingly) all print 'A', showing that no flush was done. - Sizes of 8 and 16 print 'B', showing that the flush was done. This is also the expected result, since I am now flushing enough of buff to include buff[5]. - The surprise comes from using a size of 3 or 5. These also print 'B'. WTF? If 4 doesn't flush, why does 3? I believe the answer comes from reading the RTL. The difference between sizes of 3 and 16 comes here: (set (mem/c:TI (plus:DI (reg/f:DI 7 sp) (const_int 32 [0x20])) [ MEM[(struct _reallybigstruct *)&buff]+0 S16 A128]) (asm_operands/v:TI ("rep stos{b|b}") ("=m") 2 [ (set (mem/c:BLK (plus:DI (reg/f:DI 7 sp) (const_int 32 [0x20])) [ MEM[(struct _reallybigstruct *)&buff]+0 S3 A128]) (asm_operands/v:BLK ("rep stos{b|b}") ("=m") 2 [ While I don't actually speak RTL, TI clearly refers to TIMode. Apparently when using a size that exactly matches a machine mode, asm memory references (on i386) can flush the exact number of bytes. But for other sizes, gcc seems to falls back to BLK mode, which doesn't. I don't know the exact meaning of BLK on a "set" or "asm_operands." Does it cause a full clobber? Or just a complete clobber of buff? Attempting to answer that question leads us to the second bit of code: #include #define MYSIZE 8 inline void __stosb(unsigned char *Dest, unsigned char Data, size_t Count) { struct _reallybigstruct { char x[MYSIZE]; } *p = (struct _reallybigstruct *)Dest; __asm__ __volatile__ ("rep stos{b|b}" : "+D" (Dest), "+c" (Count), "=m" (*p) : [Data] "a" (Data) //: "memory" ); } int main() { unsigned char buff[100], buff2[100]; buff[5] = 'A'; buff2[5] = 'M'; asm("#" : : "r" (buff2)); __stosb(buff, 'B', sizeof(buff)); printf("%c %c\n", buff[5], buff2[5]); } Here I've added a buff2, and I set buff2[5] to 'M' (aka ascii 77), which I also print. I still perform the memory constraint against buff, then I check to see if it is affecting buff2. I start by compiling this with a size of 8 and look at the -S output. If this is NOT doing a full clobber, gcc should be able to just print buff2[5] by moving 77 into the appropriate register before calling printf. And indeed, that's what we see. /APP # 17 "mem2.cpp" 1 rep stosb # 0 "" 2 /NO_APP movzbl 37(%rsp), %edx movl$77, %r8d leaq.LC0(%rip), %rcx callprintf If using a size of 3 *is* causing a full memory clobber, we would expect to see the value getting read from memory before calling printf. And indeed, that's also what we see. /APP # 17 "mem2.cpp" 1 rep stosb # 0 "" 2 /NO_APP movzbl 37(%rsp), %edx leaq.LC0(%rip), %rcx movzbl 149(%rsp), %r8d I don't know the internals of gcc well enough to understand exactly why this is happening. But from a user's poin
Re: [RFD] Using the 'memory constraint' trick to avoid memory clobber doesn't work
On Thu, Nov 13, 2014 at 1:03 PM, David Wohlferd wrote: > Sorry for the (very) delayed response. I'm still looking for feedback here > so I can fix the docs. > > To refresh: The topic of conversation was the (extremely) wrong explanation > that has been in the docs since forever about how to use memory constraints > with inline asm to avoid the performance hit of a full memory clobber. > Trying to understand how this really works has led to some surprising > results. > > Me: >>> While I really like the idea of using memory constraints to avoid all out >>> memory clobbers, 16 bytes is a pretty small maximum memory block, and x86 >>> only supports a max of 8. Unless there's some way to use larger sizes >>> (say >>> SSIZE_MAX), this feature hardly seems worth documenting. > > Richard: >> I wonder how you figured out that a 12 byte clobber performs a full >> memory clobber? > > Here's the code (compiled with gcc version 4.9.0 x86_64-win32-seh-rev2, > using -m64 -O2 -fdump-final-insns): > > > #include > > #define MYSIZE 3 > > inline void > __stosb(unsigned char *Dest, unsigned char Data, size_t Count) > { >struct _reallybigstruct { char x[MYSIZE]; } > *p = (struct _reallybigstruct *)Dest; > >__asm__ __volatile__ ("rep stos{b|b}" > : "+D" (Dest), "+c" (Count), "=m" (*p) > : [Data] "a" (Data) > //: "memory" >); > } > > int main() > { >unsigned char buff[100]; >buff[5] = 'A'; > >__stosb(buff, 'B', sizeof(buff)); >printf("%c\n", buff[5]); > } > > > In summary: > >1) Create a 100 byte buffer, and set buff[5] to 'A'. >2) Call __stosb, which uses inline asm to overwrite all of buff with 'B'. >3) Use a memory constraint in __stosb to flush buff. The size of the > memory constraint is controlled by a #define. > > With this, I have a simple way to test various sizes of memory constraints > to see if the buffer gets flushed. If it *is* flushing the buffer, printing > buff[5] after __stosb will print 'B'. If it is *not* flushing, it will > print 'A'. > > Results: >- Since buff[5] is the 6th byte in the buffer, using memory constraint > sizes of 1, 2 & 4 (not surprisingly) all print 'A', showing that no flush > was done. >- Sizes of 8 and 16 print 'B', showing that the flush was done. This is > also the expected result, since I am now flushing enough of buff to include > buff[5]. >- The surprise comes from using a size of 3 or 5. These also print 'B'. > WTF? If 4 doesn't flush, why does 3? > > I believe the answer comes from reading the RTL. The difference between > sizes of 3 and 16 comes here: > > (set (mem/c:TI (plus:DI (reg/f:DI 7 sp) > (const_int 32 [0x20])) [ MEM[(struct _reallybigstruct *)&buff]+0 S16 > A128]) > (asm_operands/v:TI ("rep stos{b|b}") ("=m") 2 [ > >(set (mem/c:BLK (plus:DI (reg/f:DI 7 sp) > (const_int 32 [0x20])) [ MEM[(struct _reallybigstruct *)&buff]+0 S3 > A128]) > (asm_operands/v:BLK ("rep stos{b|b}") ("=m") 2 [ > > While I don't actually speak RTL, TI clearly refers to TIMode. Apparently > when using a size that exactly matches a machine mode, asm memory references > (on i386) can flush the exact number of bytes. But for other sizes, gcc > seems to falls back to BLK mode, which doesn't. > > I don't know the exact meaning of BLK on a "set" or "asm_operands." Does it > cause a full clobber? Or just a complete clobber of buff? Attempting to > answer that question leads us to the second bit of code: > > > #include > > #define MYSIZE 8 > > inline void > __stosb(unsigned char *Dest, unsigned char Data, size_t Count) > { >struct _reallybigstruct { char x[MYSIZE]; } > *p = (struct _reallybigstruct *)Dest; > >__asm__ __volatile__ ("rep stos{b|b}" > : "+D" (Dest), "+c" (Count), "=m" (*p) > : [Data] "a" (Data) > //: "memory" >); > } > int main() > { >unsigned char buff[100], buff2[100]; >buff[5] = 'A'; >buff2[5] = 'M'; >asm("#" : : "r" (buff2)); > >__stosb(buff, 'B', sizeof(buff)); >printf("%c %c\n", buff[5], buff2[5]); > } > > > Here I've added a buff2, and I set buff2[5] to 'M' (aka ascii 77), which I > also print. I still perform the memory constraint against buff, then I > check to see if it is affecting buff2. > > I start by compiling this with a size of 8 and look at the -S output. If > this is NOT doing a full clobber, gcc should be able to just print buff2[5] > by moving 77 into the appropriate register before calling printf. And > indeed, that's what we see. > > /APP > # 17 "mem2.cpp" 1 > rep stosb > # 0 "" 2 > /NO_APP > movzbl 37(%rsp), %edx > movl$77, %r8d > leaq.LC0(%rip), %rcx > callprintf > > If using a size of 3 *is* causing a full memory clobber, we would expect to > see the value getting read from memory before calling printf. And indeed, > that's also what we see. > > /APP
Re: testing policy for C/C++ front end changes
2014-11-11 10:05 GMT+01:00 Richard Biener : [...] > I think you need to retain the fact that one needs to bootstrap, not just > build GCC. Thus "If your change is to code that is not in a front > end, or is to the C or C++ front ends or libgcc or > libstdc++ > libraries, you must perform a bootstrap of GCC with all languages enabled > by default, on at least one primary target, and run all testsuites." > > Ok with that change. Perhaps that would make sense to mention the existence of the compile farm, and add link to it. Otherwise, such requirements (which are obvious) could clearly discourage contributors that do not have access to a powerful machine. -- Fabien
Re: [RFC] UBSan unsafely uses VRP
On 11/12/2014 04:26 PM, Jakub Jelinek wrote: On Wed, Nov 12, 2014 at 12:58:37PM +0300, Yury Gribov wrote: On 11/12/2014 11:45 AM, Marek Polacek wrote: On Wed, Nov 12, 2014 at 11:42:39AM +0300, Yury Gribov wrote: On 11/11/2014 05:15 PM, Jakub Jelinek wrote: There are also some unsafe code in functions ubsan_expand_si_overflow_addsub_check, ubsan_expand_si_overflow_mul_check which uses get_range_info to reduce checks number. As seen before vrp usage for sanitizers may decrease quality of error detection. Using VRP is completely intentional there, we don't want to generate too slow code if you decide you want to optimize your code (for -O0 VRP isn't performed of course). On the other hand detection quality is probably more important than important regardless of optimization level. When I use a checker, I don't want it to miss bugs due to overly aggressive optimization. Yes, but as said above, VRP is only run with >-O2 and -Os. Hm, I must be missing something. 99% of users will only run their code under -O2 because it'll be too slow otherwise. Why should we penalize them for this by lowering analysis quality? Isn't error detection the main goal of sanitizers (performance being the secondary at best)? But, if -O0 isn't too slow for them, having unnecessary bloat even at -O2 is bad the same. But not using VRP at all, you are giving up all the cases where you know something won't overflow because you e.g. sign extend or zero extend from some smaller type, sum op such values, and something with constant, or you can use a cheaper code to multiply etc. Turning off -faggressive-loop-optimizations is certainly the right thing for -fsanitize=undefined (any undefined I'd say), so are perhaps selected other optimizations. Jakub
Re: testing policy for C/C++ front end changes
Am 13.11.2014 um 14:08 schrieb Fabien Chêne: > Perhaps that would make sense to mention the existence of the compile > farm, and add link to it. Good idea. Bonus points for adding a script which executes all the required steps. Markus -- - - - - - - - - - - - - - - - - - - - Dipl. Ing. (FH) Markus Hitter http://www.jump-ing.de/
Re: [RFD] Using the 'memory constraint' trick to avoid memory clobber doesn't work
On Thu, 13 Nov 2014, David Wohlferd wrote: > Sorry for the (very) delayed response. I'm still looking for feedback here so > I can fix the docs. Thank you for your diligence. > As I said before, triggering a full memory clobber for anything over 16 bytes > (and most sizes under 16 bytes) makes this feature all but useless. So if > that's really what's happening, we need to decide what to do next: > > 1) Can this be "fixed?" > 2) Do we want to doc the current behavior? > 3) Or do we just remove this section? > > I think it could be a nice performance win for inline asm if it could be made > to work right, but I have no idea what might be involved in that. Failing > that, I guess if it doesn't work and isn't going to work, I'd recommend > removing the text for this feature. > > Since all 3 suggestions require a doc change, I'll just say that I'm prepared > to start work on the doc patch as soon as someone lets me know what the plan > is. > > Richard? Hans-Peter? Your thoughts? I've forgot if someone mentioned whether we have a test-case in our test-suite for this feature. If we don't, then 3; removal. If we do, I guess it's flawed or at least not agreeing with the documentation? Then it *might* be worth the effort fixing that and additional test-coverage (depending on the person stepping up...) but 3 is IMHO still an arguably sane option. brgds, H-P
Re: [RFD] Using the 'memory constraint' trick to avoid memory clobber doesn't work
On Thu, Nov 13, 2014 at 2:53 PM, Hans-Peter Nilsson wrote: > On Thu, 13 Nov 2014, David Wohlferd wrote: >> Sorry for the (very) delayed response. I'm still looking for feedback here >> so >> I can fix the docs. > > Thank you for your diligence. > >> As I said before, triggering a full memory clobber for anything over 16 bytes >> (and most sizes under 16 bytes) makes this feature all but useless. So if >> that's really what's happening, we need to decide what to do next: >> >> 1) Can this be "fixed?" >> 2) Do we want to doc the current behavior? >> 3) Or do we just remove this section? >> >> I think it could be a nice performance win for inline asm if it could be made >> to work right, but I have no idea what might be involved in that. Failing >> that, I guess if it doesn't work and isn't going to work, I'd recommend >> removing the text for this feature. >> >> Since all 3 suggestions require a doc change, I'll just say that I'm prepared >> to start work on the doc patch as soon as someone lets me know what the plan >> is. >> >> Richard? Hans-Peter? Your thoughts? > > I've forgot if someone mentioned whether we have a test-case in > our test-suite for this feature. If we don't, then 3; removal. > If we do, I guess it's flawed or at least not agreeing with the > documentation? Then it *might* be worth the effort fixing that > and additional test-coverage (depending on the person stepping > up...) but 3 is IMHO still an arguably sane option. Well, as what is missing is just an optimization I'd say we should try to fix it. And surely the docs should not promise that optimization will happen - it should just mention that doing this might allow optimization to happen. Richard. > brgds, H-P
Re: [PATCH 0/4] OpenMP 4.0 offloading to Intel MIC
Hello, Support of OpenMP 4.0 offloading to future Xeon Phi was fully checked in to main trunk. Thanks everybody who helped w/ development and review. -- Thanks, K
Re: [gimple-classes, committed 4/6] tree-ssa-tail-merge.c: Use gassign
On 11/13/2014 05:45 AM, Richard Biener wrote: On Thu, Nov 13, 2014 at 2:41 AM, David Malcolm wrote: On Tue, 2014-11-11 at 11:43 +0100, Richard Biener wrote: On Tue, Nov 11, 2014 at 8:26 AM, Jakub Jelinek wrote: On Mon, Nov 10, 2014 at 05:27:50PM -0500, David Malcolm wrote: On Sat, 2014-11-08 at 14:56 +0100, Jakub Jelinek wrote: On Sat, Nov 08, 2014 at 01:07:28PM +0100, Richard Biener wrote: To be constructive here - the above case is from within a GIMPLE_ASSIGN case label and thus I'd have expected case GIMPLE_ASSIGN: { gassign *a1 = as_a (s1); gassign *a2 = as_a (s2); lhs1 = gimple_assign_lhs (a1); lhs2 = gimple_assign_lhs (a2); if (TREE_CODE (lhs1) != SSA_NAME && TREE_CODE (lhs2) != SSA_NAME) return (operand_equal_p (lhs1, lhs2, 0) && gimple_operand_equal_value_p (gimple_assign_rhs1 (a1), gimple_assign_rhs1 (a2))); else if (TREE_CODE (lhs1) == SSA_NAME && TREE_CODE (lhs2) == SSA_NAME) return vn_valueize (lhs1) == vn_valueize (lhs2); return false; } instead. That's the kind of changes I have expected and have approved of. But even that looks like just adding extra work for all developers, with no gain. You only have to add extra code and extra temporaries, in switches typically also have to add {} because of the temporaries and thus extra indentation level, and it doesn't simplify anything in the code. The branch attempts to use the C++ typesystem to capture information about the kinds of gimple statement we expect, both: (A) so that the compiler can detect type errors, and (B) as a comprehension aid to the human reader of the code The ideal here is when function params and struct field can be strengthened from "gimple" to a subclass ptr. This captures the knowledge that every use of a function or within a struct has a given gimple code. I just don't like all the as_a/is_a stuff enforced everywhere, it means more typing, more temporaries, more indentation. So, as I view it, instead of the checks being done cheaply (yes, I think the gimple checking as we have right now is very cheap) under the hood by the accessors (gimple_assign_{lhs,rhs1} etc.), those changes put the burden on the developers, who has to check that manually through the as_a/is_a stuff everywhere, more typing and uglier syntax. I just don't see that as a step forward, instead a huge step backwards. But perhaps I'm alone with this. Can you e.g. compare the size of - lines in your patchset combined, and size of + lines in your patchset? As in, if your changes lead to less typing or more. I see two ways out here. One is to add overloads to all the functions taking the special types like tree gimple_assign_rhs1 (gimple *); or simply add gassign *operator ()(gimple *g) { return as_a (g); } into a gimple-compat.h header which you include in places that are not converted "nicely". Thanks for the suggestions. Am I missing something, or is the gimple-compat.h idea above not valid C ++? Note that "gimple" is still a typedef to gimple_statement_base * (as noted before, the gimple -> gimple * change would break everyone else's patches, so we talked about that as a followup patch for early stage3). Given that, if I try to create an "operator ()" outside of a class, I get this error: ‘gassign* operator()(gimple)’ must be a nonstatic member function which is emitted from cp/decl.c's grok_op_properties: /* An operator function must either be a non-static member function or have at least one parameter of a class, a reference to a class, an enumeration, or a reference to an enumeration. 13.4.0.6 */ I tried making it a member function of gimple_statement_base, but that doesn't work either: we want a conversion from a gimple_statement_base * to a gassign *, not from a gimple_statement_base to a gassign *. Is there some syntactic trick here that I'm missing? Sorry if I'm being dumb (I can imagine there's a way of doing it by making "gimple" become some kind of wrapped ptr class, but that way lies madness, surely). Hmm. struct assign; struct base { operator assign *() const { return (assign *)this; } }; struct assign : base { }; void foo (assign *); void bar (base *b) { foo (b); } doesn't work, but void bar (base &b) { foo (b); } does. Indeed C++ doesn't seem to provide what is necessary for the compat trick :( So the gimple-compat.h header would need to provide additional overloads for the affected functions like inline tree gimple_assign_rhs1 (gimple *g) { return gimple_assign_rhs1 (as_a (g)); } that would work for me as well. Won't that defeat the desire for checking tho? If you dont do a dyn_cast<> in gimple_assign_rhs1 (gimple *g) anyone can call it and upcast any kind of gimple into a gassign without checking that it is really a gassign... Actually, a gcc_asser
Re: [gimple-classes, committed 4/6] tree-ssa-tail-merge.c: Use gassign
On Thu, Nov 13, 2014 at 3:24 PM, Andrew MacLeod wrote: > On 11/13/2014 05:45 AM, Richard Biener wrote: >> >> On Thu, Nov 13, 2014 at 2:41 AM, David Malcolm >> wrote: >>> >>> On Tue, 2014-11-11 at 11:43 +0100, Richard Biener wrote: On Tue, Nov 11, 2014 at 8:26 AM, Jakub Jelinek wrote: > > On Mon, Nov 10, 2014 at 05:27:50PM -0500, David Malcolm wrote: >> >> On Sat, 2014-11-08 at 14:56 +0100, Jakub Jelinek wrote: >>> >>> On Sat, Nov 08, 2014 at 01:07:28PM +0100, Richard Biener wrote: To be constructive here - the above case is from within a GIMPLE_ASSIGN case label and thus I'd have expected case GIMPLE_ASSIGN: { gassign *a1 = as_a (s1); gassign *a2 = as_a (s2); lhs1 = gimple_assign_lhs (a1); lhs2 = gimple_assign_lhs (a2); if (TREE_CODE (lhs1) != SSA_NAME && TREE_CODE (lhs2) != SSA_NAME) return (operand_equal_p (lhs1, lhs2, 0) && gimple_operand_equal_value_p (gimple_assign_rhs1 (a1), gimple_assign_rhs1 (a2))); else if (TREE_CODE (lhs1) == SSA_NAME && TREE_CODE (lhs2) == SSA_NAME) return vn_valueize (lhs1) == vn_valueize (lhs2); return false; } instead. That's the kind of changes I have expected and have approved of. >>> >>> But even that looks like just adding extra work for all developers, >>> with no >>> gain. You only have to add extra code and extra temporaries, in >>> switches >>> typically also have to add {} because of the temporaries and thus >>> extra >>> indentation level, and it doesn't simplify anything in the code. >> >> The branch attempts to use the C++ typesystem to capture information >> about the kinds of gimple statement we expect, both: >>(A) so that the compiler can detect type errors, and >>(B) as a comprehension aid to the human reader of the code >> >> The ideal here is when function params and struct field can be >> strengthened from "gimple" to a subclass ptr. This captures the >> knowledge that every use of a function or within a struct has a given >> gimple code. > > I just don't like all the as_a/is_a stuff enforced everywhere, > it means more typing, more temporaries, more indentation. > So, as I view it, instead of the checks being done cheaply (yes, I > think > the gimple checking as we have right now is very cheap) under the > hood by the accessors (gimple_assign_{lhs,rhs1} etc.), those changes > put the burden on the developers, who has to check that manually > through > the as_a/is_a stuff everywhere, more typing and uglier syntax. > I just don't see that as a step forward, instead a huge step backwards. > But perhaps I'm alone with this. > Can you e.g. compare the size of - lines in your patchset combined, and > size of + lines in your patchset? As in, if your changes lead to less > typing or more. I see two ways out here. One is to add overloads to all the functions taking the special types like tree gimple_assign_rhs1 (gimple *); or simply add gassign *operator ()(gimple *g) { return as_a (g); } into a gimple-compat.h header which you include in places that are not converted "nicely". >>> >>> Thanks for the suggestions. >>> >>> Am I missing something, or is the gimple-compat.h idea above not valid C >>> ++? >>> >>> Note that "gimple" is still a typedef to >>>gimple_statement_base * >>> (as noted before, the gimple -> gimple * change would break everyone >>> else's patches, so we talked about that as a followup patch for early >>> stage3). >>> >>> Given that, if I try to create an "operator ()" outside of a class, I >>> get this error: >>> >>> ‘gassign* operator()(gimple)’ must be a nonstatic member function >>> >>> which is emitted from cp/decl.c's grok_op_properties: >>>/* An operator function must either be a non-static member >>> function >>> or have at least one parameter of a class, a reference to a >>> class, >>> an enumeration, or a reference to an enumeration. 13.4.0.6 */ >>> >>> I tried making it a member function of gimple_statement_base, but that >>> doesn't work either: we want a conversion >>>from a gimple_statement_base * to a gassign *, not >>>from a gimple_statement_base to a gassign *. >>> >>> Is there some syntactic trick here that I'm missing? Sorry if I'm being >>> dumb (I can imagine there's a way of doing it by making "gimple" become >>> some kind of wrapped ptr class, but that way lies madness, surely). >> >> Hmm. >> >> struct assign; >> struct base {
What is R_X86_64_GOTPLT64 used for?
x86-64 psABI has name@GOT: specifies the offset to the GOT entry for the symbol name from the base of the GOT. name@GOTPLT: specifies the offset to the GOT entry for the symbol name from the base of the GOT, implying that there is a corresponding PLT entry. But GCC never generates name@GOTPLT and assembler fails to assemble it: [hjl@gnu-6 pr17598]$ cat x.S movabs $foo@GOTPLT,%rax [hjl@gnu-6 pr17598]$ gcc -c x.S x.S: Assembler messages: x.S:1: Error: relocated field and relocation type differ in signedness [hjl@gnu-6 pr17598]$ It certainly isn't needed on data symbols. I couldn't find any possible usage for this relocation on function symbols. Does anyone remember what it was supposed to be used for? -- H.J.
Re: [PATCH 0/4] OpenMP 4.0 offloading to Intel MIC
Kirill Yukhin wrote: > Support of OpenMP 4.0 offloading to future Xeon Phi was > fully checked in to main trunk. Thanks. If I understood it correctly: * GCC 5 supports code generation for Xeon Phi (Knights Landing, KNL) * KNL (the hardware) is not yet available [mid 2015?] * liboffloadmic supports offloading in an emulation mode (executed on the host) but does not (yet) support offloading to KNL; i.e. one would need an updated version of it, once one gets hold of the actual hardware. * The current hardware (Xeon Phi Knights Corner, KNC) is and will not be supported by GCC. * Details for building GCC for offloading and running code on an accelerator is at https://gcc.gnu.org/wiki/Offloading Question: Is the latter up to date - and the item above correct? BTW: you could update gcc.gnu.org ->news and gcc.gnu.org/gcc-5/changes.html Otherwise: * OpenACC support is about to be merged (as alternative to OpenMP 4) * Support for offloading to NVidia GPUs via PTX is also about to be merged. Tobias
Re: [gimple-classes, committed 4/6] tree-ssa-tail-merge.c: Use gassign
On 11/13/2014 09:34 AM, Richard Biener wrote: On Thu, Nov 13, 2014 at 3:24 PM, Andrew MacLeod wrote: On 11/13/2014 05:45 AM, Richard Biener wrote: On Thu, Nov 13, 2014 at 2:41 AM, David Malcolm wrote: On Tue, 2014-11-11 at 11:43 +0100, Richard Biener wrote: On Tue, Nov 11, 2014 at 8:26 AM, Jakub Jelinek wrote: On Mon, Nov 10, 2014 at 05:27:50PM -0500, David Malcolm wrote: On Sat, 2014-11-08 at 14:56 +0100, Jakub Jelinek wrote: On Sat, Nov 08, 2014 at 01:07:28PM +0100, Richard Biener wrote: To be constructive here - the above case is from within a GIMPLE_ASSIGN case label and thus I'd have expected case GIMPLE_ASSIGN: { gassign *a1 = as_a (s1); gassign *a2 = as_a (s2); lhs1 = gimple_assign_lhs (a1); lhs2 = gimple_assign_lhs (a2); if (TREE_CODE (lhs1) != SSA_NAME && TREE_CODE (lhs2) != SSA_NAME) return (operand_equal_p (lhs1, lhs2, 0) && gimple_operand_equal_value_p (gimple_assign_rhs1 (a1), gimple_assign_rhs1 (a2))); else if (TREE_CODE (lhs1) == SSA_NAME && TREE_CODE (lhs2) == SSA_NAME) return vn_valueize (lhs1) == vn_valueize (lhs2); return false; } instead. That's the kind of changes I have expected and have approved of. But even that looks like just adding extra work for all developers, with no gain. You only have to add extra code and extra temporaries, in switches typically also have to add {} because of the temporaries and thus extra indentation level, and it doesn't simplify anything in the code. The branch attempts to use the C++ typesystem to capture information about the kinds of gimple statement we expect, both: (A) so that the compiler can detect type errors, and (B) as a comprehension aid to the human reader of the code The ideal here is when function params and struct field can be strengthened from "gimple" to a subclass ptr. This captures the knowledge that every use of a function or within a struct has a given gimple code. I just don't like all the as_a/is_a stuff enforced everywhere, it means more typing, more temporaries, more indentation. So, as I view it, instead of the checks being done cheaply (yes, I think the gimple checking as we have right now is very cheap) under the hood by the accessors (gimple_assign_{lhs,rhs1} etc.), those changes put the burden on the developers, who has to check that manually through the as_a/is_a stuff everywhere, more typing and uglier syntax. I just don't see that as a step forward, instead a huge step backwards. But perhaps I'm alone with this. Can you e.g. compare the size of - lines in your patchset combined, and size of + lines in your patchset? As in, if your changes lead to less typing or more. I see two ways out here. One is to add overloads to all the functions taking the special types like tree gimple_assign_rhs1 (gimple *); or simply add gassign *operator ()(gimple *g) { return as_a (g); } into a gimple-compat.h header which you include in places that are not converted "nicely". Thanks for the suggestions. Am I missing something, or is the gimple-compat.h idea above not valid C ++? Note that "gimple" is still a typedef to gimple_statement_base * (as noted before, the gimple -> gimple * change would break everyone else's patches, so we talked about that as a followup patch for early stage3). Given that, if I try to create an "operator ()" outside of a class, I get this error: ‘gassign* operator()(gimple)’ must be a nonstatic member function which is emitted from cp/decl.c's grok_op_properties: /* An operator function must either be a non-static member function or have at least one parameter of a class, a reference to a class, an enumeration, or a reference to an enumeration. 13.4.0.6 */ I tried making it a member function of gimple_statement_base, but that doesn't work either: we want a conversion from a gimple_statement_base * to a gassign *, not from a gimple_statement_base to a gassign *. Is there some syntactic trick here that I'm missing? Sorry if I'm being dumb (I can imagine there's a way of doing it by making "gimple" become some kind of wrapped ptr class, but that way lies madness, surely). Hmm. struct assign; struct base { operator assign *() const { return (assign *)this; } }; struct assign : base { }; void foo (assign *); void bar (base *b) { foo (b); } doesn't work, but void bar (base &b) { foo (b); } does. Indeed C++ doesn't seem to provide what is necessary for the compat trick :( So the gimple-compat.h header would need to provide additional overloads for the affected functions like inline tree gimple_assign_rhs1 (gimple *g) { return gimple_assign_rhs1 (as_a (g)); } that would work for me as well. Won't that defeat the desire for checking tho? If you dont do a dyn_cast<> in gimple_assign_rhs1 (gimple *g) anyone
Re: [PATCH 0/4] OpenMP 4.0 offloading to Intel MIC
On Thu, Nov 13, 2014 at 04:15:48PM +0100, Tobias Burnus wrote: > Question: Is the latter up to date - and the item above correct? Will leave that to Kirill. > BTW: you could update gcc.gnu.org ->news and gcc.gnu.org/gcc-5/changes.html Indeed, that should be updated. > Otherwise: > * OpenACC support is about to be merged (as alternative to OpenMP 4) I hope so. > * Support for offloading to NVidia GPUs via PTX is also about to be merged. Ditto. Then the question is how hard will it be to get OpenACC offloading to XeonPhi (real hw or emulation) - I guess it is a matter of whether the plugin needs to implement some extra hooks for OpenACC, and also whether we can get OpenMP offloading to PTX (dunno if Thomas or his collegues have actually tried it on simple testcases, I bet the hardest part will be porting libgomp away from pthread_* to optionally be supported by the limited nvptx target and use its threading model; whether __thread is already supported by nvptx etc.). I'm willing to help with this once I have some hw, but some help from people familiar with PTX would be certainly appreciated. Because without libgomp ported to nvptx-*-* target (or some way to inline all the GOMP_*/omp_* calls in offloading regions for nvptx, but the latter might be too hard), I guess one could offload very simple target regions, but not anything using #pragma omp inside of them. Jakub
Re: [PATCH 0/4] OpenMP 4.0 offloading to Intel MIC
Hi Tobias, On 13 Nov 16:15, Tobias Burnus wrote: > Kirill Yukhin wrote: > > Support of OpenMP 4.0 offloading to future Xeon Phi was > > fully checked in to main trunk. > > Thanks. If I understood it correctly: > > * GCC 5 supports code generation for Xeon Phi (Knights Landing, KNL) Right. > * KNL (the hardware) is not yet available [mid 2015?] Yes, but I don't know the date. > * liboffloadmic supports offloading in an emulation mode (executed on > the host) but does not (yet) support offloading to KNL; i.e. one > would need an updated version of it, once one gets hold of the > actual hardware. Yes, it supports emulation mode. Also, current scheme is the same as for KNC (however we have no code generator in GCC main trunk for KNC). We're going to keep liboffloadmic up-to-date. > * The current hardware (Xeon Phi Knights Corner, KNC) is and will not > be supported by GCC. Currently GCC main trunk doesn't support KNC code gen. > * Details for building GCC for offloading and running code on an > accelerator is at https://gcc.gnu.org/wiki/Offloading > > Question: Is the latter up to date - and the item above correct? Correct. > BTW: you could update gcc.gnu.org ->news and gcc.gnu.org/gcc-5/changes.html Thanks, I'll post a patch. -- Thanks, K
Re: [PATCH 0/4] OpenMP 4.0 offloading to Intel MIC
Hi Tobias, On 13 Nov 16:15, Tobias Burnus wrote: > Kirill Yukhin wrote: > > Support of OpenMP 4.0 offloading to future Xeon Phi was > > fully checked in to main trunk. > > Thanks. If I understood it correctly: > > * GCC 5 supports code generation for Xeon Phi (Knights Landing, KNL) Right. > * KNL (the hardware) is not yet available [mid 2015?] Yes, but I don't know the date. > * liboffloadmic supports offloading in an emulation mode (executed on > the host) but does not (yet) support offloading to KNL; i.e. one > would need an updated version of it, once one gets hold of the > actual hardware. Yes, it supports emulation mode. Also, current scheme is the same as for KNC (however we have no code generator in GCC main trunk for KNC). We're going to keep liboffloadmic up-to-date. > * The current hardware (Xeon Phi Knights Corner, KNC) is and will not > be supported by GCC. Currently GCC main trunk doesn't support KNC code gen. > * Details for building GCC for offloading and running code on an > accelerator is at https://gcc.gnu.org/wiki/Offloading > > Question: Is the latter up to date - and the item above correct? Correct. > BTW: you could update gcc.gnu.org ->news and gcc.gnu.org/gcc-5/changes.html Thanks, I'll post a patch. -- Thanks, K
Re: What is R_X86_64_GOTPLT64 used for?
On 11/13/2014 03:55 PM, H.J. Lu wrote: > x86-64 psABI has > > name@GOT: specifies the offset to the GOT entry for the symbol name > from the base of the GOT. > > name@GOTPLT: specifies the offset to the GOT entry for the symbol name > from the base of the GOT, implying that there is a corresponding PLT entry. > > But GCC never generates name@GOTPLT and assembler fails to assemble > it: > > [hjl@gnu-6 pr17598]$ cat x.S > movabs $foo@GOTPLT,%rax > [hjl@gnu-6 pr17598]$ gcc -c x.S > x.S: Assembler messages: > x.S:1: Error: relocated field and relocation type differ in signedness Presumably that's a bug, since it does work with .quad. > Does anyone remember what it was supposed to be used for? Presumably some sort of non-C language where you need a non-local function pointer, but it need not be canonical, and thus could be lazily bound. But I don't know exactly when that would be. r~
Re: What is R_X86_64_GOTPLT64 used for?
Hi, On Thu, 13 Nov 2014, H.J. Lu wrote: > x86-64 psABI has > > name@GOT: specifies the offset to the GOT entry for the symbol name > from the base of the GOT. > > name@GOTPLT: specifies the offset to the GOT entry for the symbol name > from the base of the GOT, implying that there is a corresponding PLT entry. > > But GCC never generates name@GOTPLT and assembler fails to assemble > it: I've added the implementation for the large model, but only dimly remember how it got added to the ABI in the first place. The additional effect of using that reloc was supposed to be that the GOT slot was to be placed into .got.plt, and this might hint at the reasoning for this reloc: If you take the address of a function and call it, you need both a GOT slot and a PLT entry (where the existence of GOT slot is implied by the PLT of course). Now, if you use the normal @GOT64 reloc for the address-taking operation that would create a slot in .got. For the call instruction you'd use @PLT (or variants thereof, like PLTOFF), which creates the PLT slot _and_ a slot in .got.plt. So, now we've ended up with two GOT slots for the same symbol, where one should be enough (the address taking operation can just as well use the slot in .got.plt). So if the compiler would emit @GOTPLT64 instead of @GOT64 for all address references to symbols where it knows that it's a function it could save one GOT slot. So, I think it was supposed to be a small optimization hint. But it never was used in the compiler ... > [hjl@gnu-6 pr17598]$ cat x.S > movabs $foo@GOTPLT,%rax > [hjl@gnu-6 pr17598]$ gcc -c x.S > x.S: Assembler messages: > x.S:1: Error: relocated field and relocation type differ in signedness ... and now seems to have bit-rotted. > [hjl@gnu-6 pr17598]$ > > It certainly isn't needed on data symbols. I couldn't find any possible > usage for this relocation on function symbols. The longer I think about it the more I'm sure it's the above optional optimization mean. Ciao, Michael.
Re: What is R_X86_64_GOTPLT64 used for?
On Thu, Nov 13, 2014 at 8:33 AM, Michael Matz wrote: > Hi, > > On Thu, 13 Nov 2014, H.J. Lu wrote: > >> x86-64 psABI has >> >> name@GOT: specifies the offset to the GOT entry for the symbol name >> from the base of the GOT. >> >> name@GOTPLT: specifies the offset to the GOT entry for the symbol name >> from the base of the GOT, implying that there is a corresponding PLT entry. >> >> But GCC never generates name@GOTPLT and assembler fails to assemble >> it: > > I've added the implementation for the large model, but only dimly remember > how it got added to the ABI in the first place. The additional effect of > using that reloc was supposed to be that the GOT slot was to be placed > into .got.plt, and this might hint at the reasoning for this reloc: > > If you take the address of a function and call it, you need both a GOT > slot and a PLT entry (where the existence of GOT slot is implied by the That is correct. > PLT of course). Now, if you use the normal @GOT64 reloc for the > address-taking operation that would create a slot in .got. For the call > instruction you'd use @PLT (or variants thereof, like PLTOFF), which > creates the PLT slot _and_ a slot in .got.plt. So, now we've ended up > with two GOT slots for the same symbol, where one should be enough (the > address taking operation can just as well use the slot in .got.plt). So > if the compiler would emit @GOTPLT64 instead of @GOT64 for all address > references to symbols where it knows that it's a function it could save > one GOT slot. @GOTPLT will create a PLT entry, but it doesn't mean PLT entry will be used. Only @PLTOFF will use PLT entry. Linker should be smart enough to use only one GOT slot, regardless if @GOTPLT or @GOT is used to take function address and call via PLT. However, if @GOTPLT is used without @PLT, a PLT entry will be created and unused. I'd like to propose 1. Update psABI to remove R_X86_64_GOTPLT64. 2. Fix assembler to take @GOTPLT for backward compatibility, 3. Make sure that linker uses one GOT slot for @GOT and @PLTOFF. > So, I think it was supposed to be a small optimization hint. But it never > was used in the compiler ... > >> [hjl@gnu-6 pr17598]$ cat x.S >> movabs $foo@GOTPLT,%rax >> [hjl@gnu-6 pr17598]$ gcc -c x.S >> x.S: Assembler messages: >> x.S:1: Error: relocated field and relocation type differ in signedness > > ... and now seems to have bit-rotted. > >> [hjl@gnu-6 pr17598]$ >> >> It certainly isn't needed on data symbols. I couldn't find any possible >> usage for this relocation on function symbols. > > The longer I think about it the more I'm sure it's the above optional > optimization mean. > The reason I am asking about it is I'd like to finish the large model support in binutils and GCC. I have filed a couple bugs: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63833 https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63842 https://sourceware.org/bugzilla/show_bug.cgi?id=17592 https://sourceware.org/bugzilla/show_bug.cgi?id=17593 I will fix all of them and verify that large model works correctly. -- H.J.
Re: [PATCH 0/4] OpenMP 4.0 offloading to Intel MIC
On Thu, Nov 13, 2014 at 6:14 AM, Kirill Yukhin wrote: > Hello, > > Support of OpenMP 4.0 offloading to future Xeon Phi was fully checked in to > main > trunk. > > Thanks everybody who helped w/ development and review. > I noticed many libgomp test failures: https://gcc.gnu.org/ml/gcc-regression/2014-11/msg00309.html Have you seen them? -- H.J.
Re: [PATCH 0/4] OpenMP 4.0 offloading to Intel MIC
Kirill, The patches have broken bootstrap on AIX and probably on other non-GNU platforms. strchrnul() is a GNU extension. /nasfarm/edelsohn/src/src/gcc/lto-wrapper.c: In function 'unsigned int parse_env_var(const char*, char***, const char*)': /nasfarm/edelsohn/src/src/gcc/lto-wrapper.c:427:35: error: 'strchrnul' was not declared in this scope nextval = strchrnul (curval, ':'); ^ /nasfarm/edelsohn/src/src/gcc/lto-wrapper.c: In function 'void append_offload_options(obstack*, const char*, cl_decoded_option*, unsigned int)': /nasfarm/edelsohn/src/src/gcc/lto-wrapper.c:584:34: error: 'strchrnul' was not declared in this scope next = strchrnul (cur, ','); /nasfarm/edelsohn/src/src/gcc/gcc.c: In function 'void handle_foffload_option(const char*)': /nasfarm/edelsohn/src/src/gcc/gcc.c:3378:28: error: 'strchrnul' was not declared in this scope end = strchrnul (arg, '='); ^ Thanks, David On Thu, Nov 13, 2014 at 9:14 AM, Kirill Yukhin wrote: > Hello, > > Support of OpenMP 4.0 offloading to future Xeon Phi was fully checked in to > main > trunk. > > Thanks everybody who helped w/ development and review. > > -- > Thanks, K
Re: What is R_X86_64_GOTPLT64 used for?
On Thu, Nov 13, 2014 at 9:03 AM, H.J. Lu wrote: > On Thu, Nov 13, 2014 at 8:33 AM, Michael Matz wrote: >> Hi, >> >> On Thu, 13 Nov 2014, H.J. Lu wrote: >> >>> x86-64 psABI has >>> >>> name@GOT: specifies the offset to the GOT entry for the symbol name >>> from the base of the GOT. >>> >>> name@GOTPLT: specifies the offset to the GOT entry for the symbol name >>> from the base of the GOT, implying that there is a corresponding PLT entry. >>> >>> But GCC never generates name@GOTPLT and assembler fails to assemble >>> it: >> >> I've added the implementation for the large model, but only dimly remember >> how it got added to the ABI in the first place. The additional effect of >> using that reloc was supposed to be that the GOT slot was to be placed >> into .got.plt, and this might hint at the reasoning for this reloc: >> >> If you take the address of a function and call it, you need both a GOT >> slot and a PLT entry (where the existence of GOT slot is implied by the > > That is correct. > >> PLT of course). Now, if you use the normal @GOT64 reloc for the >> address-taking operation that would create a slot in .got. For the call >> instruction you'd use @PLT (or variants thereof, like PLTOFF), which >> creates the PLT slot _and_ a slot in .got.plt. So, now we've ended up >> with two GOT slots for the same symbol, where one should be enough (the >> address taking operation can just as well use the slot in .got.plt). So >> if the compiler would emit @GOTPLT64 instead of @GOT64 for all address >> references to symbols where it knows that it's a function it could save >> one GOT slot. > > @GOTPLT will create a PLT entry, but it doesn't mean PLT entry will > be used. Only @PLTOFF will use PLT entry. Linker should be smart > enough to use only one GOT slot, regardless if @GOTPLT or @GOT > is used to take function address and call via PLT. However, if > @GOTPLT is used without @PLT, a PLT entry will be created and unused. > > I'd like to propose > > 1. Update psABI to remove R_X86_64_GOTPLT64. > 2. Fix assembler to take @GOTPLT for backward compatibility, > 3. Make sure that linker uses one GOT slot for @GOT and @PLTOFF. > Linker does: case R_X86_64_GOT64: case R_X86_64_GOTPLT64: base_got = htab->elf.sgot; if (htab->elf.sgot == NULL) abort (); if (h != NULL) { bfd_boolean dyn; off = h->got.offset; if (h->needs_plt && h->plt.offset != (bfd_vma)-1 && off == (bfd_vma)-1) { /* We can't use h->got.offset here to save state, or even just remember the offset, as finish_dynamic_symbol would use that as offset into .got. */ bfd_vma plt_index = h->plt.offset / plt_entry_size - 1; off = (plt_index + 3) * GOT_ENTRY_SIZE; base_got = htab->elf.sgotplt; } So if a symbol is accessed by both @GOT and @PLTOFF, its needs_plt will be true and its got.plt entry will be used for both @GOT and @GOTPLT. @GOTPLT has no advantage over @GOT, but potentially wastes a PLT entry. Here is a patch to mark relocation 30 (R_X86_64_GOTPLT64) as reserved. I pushed updated x86-64 psABI changes to https://github.com/hjl-tools/x86-64-psABI/tree/hjl/master I will update linker to keep accepting relocation 30 and treat it the same as R_X86_64_GOT64. -- H.J. --- diff --git a/low-level-sys-info.tex b/low-level-sys-info.tex index 7f636fc..981390b 100644 --- a/low-level-sys-info.tex +++ b/low-level-sys-info.tex @@ -1242,9 +1242,6 @@ examples and discussion. They are: \begin{itemize} \item \code{name@GOT}: specifies the offset to the GOT entry for the symbol \code{name} from the base of the GOT. -\item \code{name@GOTPLT}: specifies the offset to the GOT entry for - the symbol \code{name} from the base of the GOT, implying that - there is a corresponding PLT entry. \item \code{name@GOTOFF}: specifies the offset to the location of the symbol \code{name} from the base of the GOT. \item \code{name@GOTPCREL}: specifies the offset to the GOT entry diff --git a/object-files.tex b/object-files.tex index 4705e96..c0698dc 100644 --- a/object-files.tex +++ b/object-files.tex @@ -611,7 +611,7 @@ Name& Value & Field & Calculati on\\ \hline \code{R_X86_64_GOTPC64} & 29& word64 & \code{GOT - P + A} \\ \hline gnu-6:pts/18[114]> cat /tmp/x diff --git a/low-level-sys-info.tex b/low-level-sys-info.tex index 7f636fc..981390b 100644 --- a/low-level-sys-info.tex +++ b/low-level-sys-info.tex @@ -1242,9 +1242,6 @@ examples and discussion. They are: \begin{itemize} \item \code{name@GOT}: specifies the offset to the GOT entry for the symbol \code{name} from the base of the GOT. -\item \code{name@GOTPLT}: specifies the offset to the GOT entry for - th
Re: [PATCH 0/4] OpenMP 4.0 offloading to Intel MIC
On 13 Nov 09:17, H.J. Lu wrote: > I noticed many libgomp test failures: > > https://gcc.gnu.org/ml/gcc-regression/2014-11/msg00309.html > > Have you seen them? Hi H.J., I do not see these regressions on i686-linux and x86_64-linux. Could you please provide more details? (configure options, error log) Thanks, -- Ilya
Re: [PATCH 0/4] OpenMP 4.0 offloading to Intel MIC
On Thu, Nov 13, 2014 at 10:37 AM, Ilya Verbin wrote: > On 13 Nov 09:17, H.J. Lu wrote: >> I noticed many libgomp test failures: >> >> https://gcc.gnu.org/ml/gcc-regression/2014-11/msg00309.html >> >> Have you seen them? > > Hi H.J., > > I do not see these regressions on i686-linux and x86_64-linux. > Could you please provide more details? (configure options, error log) > > Thanks, > -- Ilya GCC is configured with --prefix=/usr/5.0.0 --enable-clocale=gnu --with-system-zlib --enable-shared --with-demangler-in-ld i686-linux --with-fpmath=sse --enable-languages=c,c++,fortran,java,lto,objc I got spawn -ignore SIGHUP /export/project/git/gcc-regression/master/217501/bld/gcc/xgcc -B/export/project/git/gcc-regression/master/217501/bld/gcc/ /export/project/git/gcc-regression/gcc/libgomp/testsuite/libgomp.c/examples-4/e.50.1.c -B/export/project/git/gcc-regression/master/217501/bld/x86_64-unknown-linux-gnu/./libgomp/ -B/export/project/git/gcc-regression/master/217501/bld/x86_64-unknown-linux-gnu/./libgomp/.libs -I/export/project/git/gcc-regression/master/217501/bld/x86_64-unknown-linux-gnu/./libgomp -I/export/project/git/gcc-regression/gcc/libgomp/testsuite/.. -fmessage-length=0 -fno-diagnostics-show-caret -fdiagnostics-color=never -fopenmp -O2 -L/export/project/git/gcc-regression/master/217501/bld/x86_64-unknown-linux-gnu/./libgomp/.libs -lm -o ./e.50.1.exe^M /usr/local/bin/ld: /tmp/ccA8cExp.o: plugin needed to handle lto object^M output is: /usr/local/bin/ld: /tmp/ccA8cExp.o: plugin needed to handle lto object^M -- H.J.
Re: [PATCH 0/4] OpenMP 4.0 offloading to Intel MIC
On 13 Nov 10:48, H.J. Lu wrote: > /usr/local/bin/ld: /tmp/ccA8cExp.o: plugin needed to handle lto object^M Looks like we should set flag_fat_lto_objects while compilation with offloading. I'll investigate this issue tomorrow. Could you please also show a version and configure options for ld? Thanks, -- Ilya
Re: [PATCH 0/4] OpenMP 4.0 offloading to Intel MIC
On Thu, Nov 13, 2014 at 11:20 AM, Ilya Verbin wrote: > On 13 Nov 10:48, H.J. Lu wrote: >> /usr/local/bin/ld: /tmp/ccA8cExp.o: plugin needed to handle lto object^M > > Looks like we should set flag_fat_lto_objects while compilation with > offloading. > I'll investigate this issue tomorrow. > > Could you please also show a version and configure options for ld? > > Thanks, > -- Ilya I am using binutils 20141107 trunk, which was configured with --with-sysroot=/ \ --enable-gold --enable-plugins --enable-threads --enable-targets=x86_64-linux \ --prefix=/usr/local \ --with-local-prefix=/usr/local -- H.J.
Re: [PATCH 0/4] OpenMP 4.0 offloading to Intel MIC
On Thursday 2014-11-13 12:41, David Edelsohn wrote: > The patches have broken bootstrap on AIX and probably on other non-GNU > platforms. strchrnul() is a GNU extension. Yep, FreeBSD 8 is broken as well. The failure rate of my nightly testers over the last two weeks must be around 50%. Gerald
Re: [PATCH 0/4] OpenMP 4.0 offloading to Intel MIC
On Thu, Nov 13, 2014 at 11:20 AM, Ilya Verbin wrote: > On 13 Nov 10:48, H.J. Lu wrote: >> /usr/local/bin/ld: /tmp/ccA8cExp.o: plugin needed to handle lto object^M > > Looks like we should set flag_fat_lto_objects while compilation with > offloading. > I'll investigate this issue tomorrow. > > Could you please also show a version and configure options for ld? > Section Headers: [Nr] Name TypeAddress OffSize ES Flg Lk Inf Al [ 0] NULL 00 00 00 0 0 0 [ 1] .text PROGBITS 40 000204 00 AX 0 0 16 [ 2] .rela.textRELA 001a60 d8 18 I 29 1 8 [ 3] .data PROGBITS 000260 40 00 WA 0 0 32 [ 4] .bss NOBITS 0002a0 00 00 WA 0 0 1 [ 5] .gnu.offload_lto_.profile.50035f9931394ed4 PROGBITS 0002a0 13 00 E 0 0 1 [ 6] .gnu.offload_lto_.icf.50035f9931394ed4 PROGBITS 0002b3 1e 00 E 0 0 1 [ 7] .gnu.offload_lto_.jmpfuncs.50035f9931394ed4 PROGBITS 0002d1 19 00 E 0 0 1 [ 8] .gnu.offload_lto_.inline.50035f9931394ed4 PROGBITS 0002ea 6c 00 E 0 0 1 [ 9] .gnu.offload_lto_.pureconst.50035f9931394ed4 PROGBITS 000356 13 00 E 0 0 1 [10] .gnu.offload_lto_vec_mult._omp_fn.1.50035f9931394ed4 PROGBITS 000369 0004ab 00 E 0 0 1 [11] .gnu.offload_lto_vec_mult._omp_fn.0.50035f9931394ed4 PROGBITS 000814 00035d 00 E 0 0 1 [12] .gnu.offload_lto_.symbol_nodes.50035f9931394ed4 PROGBITS 000b71 55 00 E 0 0 1 [13] .gnu.offload_lto_.refs.50035f9931394ed4 PROGBITS 000bc6 14 00 E 0 0 1 [14] .gnu.offload_lto_.offload_table.50035f9931394ed4 PROGBITS 000bda 11 00 E 0 0 1 [15] .gnu.offload_lto_.decls.50035f9931394ed4 PROGBITS 000beb 00043d 00 E 0 0 1 [16] .gnu.offload_lto_.symtab.50035f9931394ed4 PROGBITS 001028 00 00 E 0 0 1 [17] .gnu.offload_lto_.opts PROGBITS 001028 a9 00 E 0 0 1 Don't you need another plugin to claim those offload IR sections? -- H.J.
Re: [PATCH 0/4] OpenMP 4.0 offloading to Intel MIC
On 13 Nov 2014, at 23:11, H.J. Lu wrote: > > Section Headers: > [Nr] Name TypeAddress OffSize > ES Flg Lk Inf Al > [ 0] NULL 00 > 00 00 0 0 0 > [ 1] .text PROGBITS 40 > 000204 00 AX 0 0 16 > [ 2] .rela.textRELA 001a60 > d8 18 I 29 1 8 > [ 3] .data PROGBITS 000260 > 40 00 WA 0 0 32 > [ 4] .bss NOBITS 0002a0 > 00 00 WA 0 0 1 > [ 5] .gnu.offload_lto_.profile.50035f9931394ed4 PROGBITS > 0002a0 13 00 E 0 0 1 > [ 6] .gnu.offload_lto_.icf.50035f9931394ed4 PROGBITS > 0002b3 1e 00 E 0 0 1 > [ 7] .gnu.offload_lto_.jmpfuncs.50035f9931394ed4 PROGBITS > 0002d1 19 00 E 0 0 1 > [ 8] .gnu.offload_lto_.inline.50035f9931394ed4 PROGBITS > 0002ea 6c 00 E 0 0 1 > [ 9] .gnu.offload_lto_.pureconst.50035f9931394ed4 PROGBITS > 000356 13 00 E 0 0 1 > [10] .gnu.offload_lto_vec_mult._omp_fn.1.50035f9931394ed4 PROGBITS > 000369 0004ab 00 E 0 0 1 > [11] .gnu.offload_lto_vec_mult._omp_fn.0.50035f9931394ed4 PROGBITS > 000814 00035d 00 E 0 0 1 > [12] .gnu.offload_lto_.symbol_nodes.50035f9931394ed4 PROGBITS > 000b71 55 00 E 0 0 1 > [13] .gnu.offload_lto_.refs.50035f9931394ed4 PROGBITS > 000bc6 14 00 E 0 0 1 > [14] .gnu.offload_lto_.offload_table.50035f9931394ed4 PROGBITS > 000bda 11 00 E 0 0 1 > [15] .gnu.offload_lto_.decls.50035f9931394ed4 PROGBITS > 000beb 00043d 00 E 0 0 1 > [16] .gnu.offload_lto_.symtab.50035f9931394ed4 PROGBITS > 001028 00 00 E 0 0 1 > [17] .gnu.offload_lto_.opts PROGBITS 001028 > a9 00 E 0 0 1 > > Don't you need another plugin to claim those offload IR sections? No, the plan was that a regular plugin will just ignore offload IR sections by default. In your configuration ld detects a __gnu_lto_slim symbol and decided that the object file contains only LTO IR without asm. I am going to investigate where is the difference with my configuration and fix the bug. -- Ilya
Re: [PATCH 0/4] OpenMP 4.0 offloading to Intel MIC
On Thu, Nov 13, 2014 at 12:53 PM, Ilya Verbin wrote: > On 13 Nov 2014, at 23:11, H.J. Lu wrote: >> >> Section Headers: >> [Nr] Name TypeAddress OffSize >> ES Flg Lk Inf Al >> [ 0] NULL 00 >> 00 00 0 0 0 >> [ 1] .text PROGBITS 40 >> 000204 00 AX 0 0 16 >> [ 2] .rela.textRELA 001a60 >> d8 18 I 29 1 8 >> [ 3] .data PROGBITS 000260 >> 40 00 WA 0 0 32 >> [ 4] .bss NOBITS 0002a0 >> 00 00 WA 0 0 1 >> [ 5] .gnu.offload_lto_.profile.50035f9931394ed4 PROGBITS >> 0002a0 13 00 E 0 0 1 >> [ 6] .gnu.offload_lto_.icf.50035f9931394ed4 PROGBITS >> 0002b3 1e 00 E 0 0 1 >> [ 7] .gnu.offload_lto_.jmpfuncs.50035f9931394ed4 PROGBITS >> 0002d1 19 00 E 0 0 1 >> [ 8] .gnu.offload_lto_.inline.50035f9931394ed4 PROGBITS >> 0002ea 6c 00 E 0 0 1 >> [ 9] .gnu.offload_lto_.pureconst.50035f9931394ed4 PROGBITS >> 000356 13 00 E 0 0 1 >> [10] .gnu.offload_lto_vec_mult._omp_fn.1.50035f9931394ed4 PROGBITS >> 000369 0004ab 00 E 0 0 1 >> [11] .gnu.offload_lto_vec_mult._omp_fn.0.50035f9931394ed4 PROGBITS >> 000814 00035d 00 E 0 0 1 >> [12] .gnu.offload_lto_.symbol_nodes.50035f9931394ed4 PROGBITS >> 000b71 55 00 E 0 0 1 >> [13] .gnu.offload_lto_.refs.50035f9931394ed4 PROGBITS >> 000bc6 14 00 E 0 0 1 >> [14] .gnu.offload_lto_.offload_table.50035f9931394ed4 PROGBITS >> 000bda 11 00 E 0 0 1 >> [15] .gnu.offload_lto_.decls.50035f9931394ed4 PROGBITS >> 000beb 00043d 00 E 0 0 1 >> [16] .gnu.offload_lto_.symtab.50035f9931394ed4 PROGBITS >> 001028 00 00 E 0 0 1 >> [17] .gnu.offload_lto_.opts PROGBITS 001028 >> a9 00 E 0 0 1 >> >> Don't you need another plugin to claim those offload IR sections? > > No, the plan was that a regular plugin will just ignore offload IR sections > by default. In your configuration ld detects a __gnu_lto_slim symbol and > decided that the object file contains only LTO IR without asm. I am going to > investigate where is the difference with my configuration and fix the bug. > You may need to install the current binutils to see it. -- H.J.
Re: [PATCH 0/4] OpenMP 4.0 offloading to Intel MIC
On Thu, Nov 13, 2014 at 11:53:53PM +0300, Ilya Verbin wrote: > > Don't you need another plugin to claim those offload IR sections? > > No, the plan was that a regular plugin will just ignore offload IR > sections by default. In your configuration ld detects a __gnu_lto_slim > symbol and decided that the object file contains only LTO IR without asm. > I am going to investigate where is the difference with my configuration > and fix the bug. FYI, I'm getting +WARNING: program timed out. +FAIL: libgomp.c/examples-4/e.54.2.c execution test on both x86_64-linux and i686-linux (normal --enable-checking=yes,rtl bootstrap, no offloading configure options). binutils-2.24, ld.bfd. Jakub
Ann: MELT plugin 1.1.3 for GCC 4.8 & 4.9
Dear All, It is my pleasure to announce the MELT plugin 1.1.3 for GCC 4.8 or 4.9 MELT is a high-level domain specific language and plugin to customize GCC, see http://gcc-melt.org/ for details. It is free software, GPLv3+ licensed, FSF copyrighted. You can download the source tarball from http://gcc-melt.org/melt-plugin-1.1.3-for-gcc-4.8-or-4.9.tar.bz2 this is a bzip2-ed tarball of 4124848 bytes (4.0Mbytes) extracted from the GCC MELT branch svn rev. 217521 on november 13th, 2014 It brings several new features and significant bug fixes w.r.t. previous MELT 1.1.2 (of august 31st 2014) NEWS for 1.1.3 MELT plugin for GCC 4.8 & 4.9 [[november 13th, 2014]] Bug-fix & feature-increase with significant improvements release w.r.t. to MELT plugin 1.1.2. Bug fixes = Better working macros, and improved documentations Language improvement Improved macro constructs. All common (i.e. non-language specific) tree codes are handled, at least thru an automatically generated cmatcher. All non-OMP gimple codes are handled. Better handling of variadic tree & gimple constructs. Added gimple_call_args & gimple_call_more_args quasi-cmatcher, gimple_switch handling, and functions to build them with a MELT sequence (tuple or list) of constituents. Variadic and polytypic add2list. End-user improvements = Better eval mode. More informative error messages. Documentation generated in several HTML files. ### Please report bugs and comments to gcc-m...@googlegroups.com Regards. -- Basile STARYNKEVITCH http://starynkevitch.net/Basile/ email: basilestarynkevitchnet mobile: +33 6 8501 2359 8, rue de la Faiencerie, 92340 Bourg La Reine, France *** opinions {are only mine, sont seulement les miennes} ***
gcc-4.8-20141113 is now available
Snapshot gcc-4.8-20141113 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/4.8-20141113/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 4.8 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_8-branch revision 217524 You'll find: gcc-4.8-20141113.tar.bz2 Complete GCC MD5=fe4685763a78ec1fabad73e76d1ce6dd SHA1=f6a13da7ba744503c3b6bf78ffccc0c998c4d328 Diffs from 4.8-20141106 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-4.8 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
Re: [RFD] Using the 'memory constraint' trick to avoid memory clobber doesn't work
On 11/13/2014 6:02 AM, Richard Biener wrote: On Thu, Nov 13, 2014 at 2:53 PM, Hans-Peter Nilsson wrote: On Thu, 13 Nov 2014, David Wohlferd wrote: Sorry for the (very) delayed response. I'm still looking for feedback here so I can fix the docs. Thank you for your diligence. As I said before, triggering a full memory clobber for anything over 16 bytes (and most sizes under 16 bytes) makes this feature all but useless. So if that's really what's happening, we need to decide what to do next: 1) Can this be "fixed?" 2) Do we want to doc the current behavior? 3) Or do we just remove this section? I think it could be a nice performance win for inline asm if it could be made to work right, but I have no idea what might be involved in that. Failing that, I guess if it doesn't work and isn't going to work, I'd recommend removing the text for this feature. Since all 3 suggestions require a doc change, I'll just say that I'm prepared to start work on the doc patch as soon as someone lets me know what the plan is. Richard? Hans-Peter? Your thoughts? I've forgot if someone mentioned whether we have a test-case in our test-suite for this feature. I'm looking thru gcc/testsuite/*.c to see if I can spot anything. It's not easy since there is a lot of asm and the people who write these are apparently allergic to using comments to describe what they are testing. If we don't, then 3; removal. If we do, I guess it's flawed or at least not agreeing with the documentation? Then it *might* be worth the effort fixing that and additional test-coverage (depending on the person stepping up...) but 3 is IMHO still an arguably sane option. Well, as what is missing is just an optimization I'd say we should try to fix it. While I'd love to be the one to fix this, the fact of the matter is that most of gcc is a black box to me. Even if you told me roughly where to start, I'd have no idea of the downstream impacts of anything I changed. So while I understand that it looks like I'm just finding work for other people, fixing something like this is simply beyond me. That said, I'm certainly prepared to outline what I see as the interesting test cases and to do some testing if someone else is willing to step up and do this optimization. And surely the docs should not promise that optimization will happen - it should just mention that doing this might allow optimization to happen. I can agree with this. I am quite confident there will be occasions where gcc has no option but to fall back to doing a full clobber to ensure correct function (a possibility which the current docs also fails to mention). So yes, the docs should be limited in what it promises here. Which brings us to the question: what do we do now? The 15th is fast approaching. Can something like this get done before then? Can it be checked in for 5.0 after the 15th? Or does it need to wait for 6.0? If it does need to wait for 6.0, what do we want to do with the docs in the meantime? Given how wrong they are currently, I'd hate to ship yet another release with that ugly text. But trying to describe the best way to take advantage of optimizations that haven't been written yet is... hard. Since (as I understand it) 5.0 docs *can* be checked in after the 15th, my recommendations: - If someone is prepared to step up and do this work for v5.0, then I'll wait and write the docs when they are done and can describe how it works. - If this is going to wait for 6.0, then if someone does (at least) enough investigative work to be able to describe how this will eventually work, I'll update the 5.0 docs in a general way talking about ways gcc *may* be able to optimize. It should be possible to phrase this so code people write today will work even better tomorrow. - Worst case is if no one has the time to look at this for the foreseeable future. In that case, I'm with Hans-Peter. Let's take the existing text out. Following the existing text makes things *worse*, and the current implementation is so limited that I'd be surprised if anyone's code actually uses it successfully. New text can get added when the new code is. Hmm. I just had a thought: Is it possible the problem I'm seeing here is platform-specific? Maybe this works perfectly for non-i386 code? That would certainly change my recommendations here. dw
Nov Confirmed Classes
Dear Training Manager Below are the confirmed classes for Nov 2014 Advance PC Configuration, Troubleshooting & Data RecoveryRate: RM1530Date: 13-14th November 2014Time: 9am-5pmVenue: Vistana Hotel, KLTrainer: En Muhammad Date: 18-19th December 2014Venue: Suria City Hotel, Johor BahruTrainer: En Hamizi Jamaludin Understanding GSTRate: RM750/ pax, RM600/pax for group of 3Date: 17th November 2014Time: 9am-5pmVenue: Vistana Hotel, KLTrainer: Mr Henry Basic English CommunicationRate: RM699/ pax, RM550/pax for group of 3Date: 19-20 Nov 2014Time: 9am-5pmVenue: Vistana Hotel, KLTrainer: Ms Maler Excel Dashboard & Interactive ReportsRate: RM1399/ pax , RM 999/ pax for a group of 2Date: 20-21 Nov 2014Time: 9am-5pmVenue: Vistana Hotel, KLTrainer: Mr Julian Function & Formula with ExcelRate: RM850/pax (Buy 2 Free 1)Date: 24-25 Nov 2014Time: 9am-5pmVenue: Crystal Crown Hotel Klang Inhouse MS Office sessions are conducted at RM1650/day for a total of 20 pax. Offer ends for trainings confirmed next year by Dec 2014 Thank you Jacob Training Executive Comsystem Solutions Sdn Bhd 03-51628254 012-3162007 Anti-SPAM Policy Disclaimer: Please accept our apologies if you have no intention to receive such message & it is not welcomed. Send us a reply with "Remove" heading in the subject to unsubscribe & not to receive future e-mail from us. We will remove your email from our database. Please allow a 7 working days period for this action. Please do not reply to this email. Please revert from the email listed in the brochure
Re: [RFD] Using the 'memory constraint' trick to avoid memory clobber doesn't work
I've forgot if someone mentioned whether we have a test-case in our test-suite for this feature. I'm looking thru gcc/testsuite/*.c to see if I can spot anything. It's not easy since there is a lot of asm and the people who write these are apparently allergic to using comments to describe what they are testing. So, I found a few tests that were *using* this feature. But they seem to be checking for an ICE or page fault, rather than checking to see if the generated code was avoiding the memory clobber. dw