Re: [gimple-classes, committed 4/6] tree-ssa-tail-merge.c: Use gassign

2014-11-13 Thread Richard Biener
On Thu, Nov 13, 2014 at 2:41 AM, David Malcolm  wrote:
> On Tue, 2014-11-11 at 11:43 +0100, Richard Biener wrote:
>> On Tue, Nov 11, 2014 at 8:26 AM, Jakub Jelinek  wrote:
>> > On Mon, Nov 10, 2014 at 05:27:50PM -0500, David Malcolm wrote:
>> >> On Sat, 2014-11-08 at 14:56 +0100, Jakub Jelinek wrote:
>> >> > On Sat, Nov 08, 2014 at 01:07:28PM +0100, Richard Biener wrote:
>> >> > > To be constructive here - the above case is from within a
>> >> > > GIMPLE_ASSIGN case label
>> >> > > and thus I'd have expected
>> >> > >
>> >> > > case GIMPLE_ASSIGN:
>> >> > >   {
>> >> > > gassign *a1 = as_a  (s1);
>> >> > > gassign *a2 = as_a  (s2);
>> >> > >   lhs1 = gimple_assign_lhs (a1);
>> >> > >   lhs2 = gimple_assign_lhs (a2);
>> >> > >   if (TREE_CODE (lhs1) != SSA_NAME
>> >> > >   && TREE_CODE (lhs2) != SSA_NAME)
>> >> > > return (operand_equal_p (lhs1, lhs2, 0)
>> >> > > && gimple_operand_equal_value_p (gimple_assign_rhs1 
>> >> > > (a1),
>> >> > >  gimple_assign_rhs1 
>> >> > > (a2)));
>> >> > >   else if (TREE_CODE (lhs1) == SSA_NAME
>> >> > >&& TREE_CODE (lhs2) == SSA_NAME)
>> >> > > return vn_valueize (lhs1) == vn_valueize (lhs2);
>> >> > >   return false;
>> >> > >   }
>> >> > >
>> >> > > instead.  That's the kind of changes I have expected and have 
>> >> > > approved of.
>> >> >
>> >> > But even that looks like just adding extra work for all developers, 
>> >> > with no
>> >> > gain.  You only have to add extra code and extra temporaries, in 
>> >> > switches
>> >> > typically also have to add {} because of the temporaries and thus extra
>> >> > indentation level, and it doesn't simplify anything in the code.
>> >>
>> >> The branch attempts to use the C++ typesystem to capture information
>> >> about the kinds of gimple statement we expect, both:
>> >>   (A) so that the compiler can detect type errors, and
>> >>   (B) as a comprehension aid to the human reader of the code
>> >>
>> >> The ideal here is when function params and struct field can be
>> >> strengthened from "gimple" to a subclass ptr.  This captures the
>> >> knowledge that every use of a function or within a struct has a given
>> >> gimple code.
>> >
>> > I just don't like all the as_a/is_a stuff enforced everywhere,
>> > it means more typing, more temporaries, more indentation.
>> > So, as I view it, instead of the checks being done cheaply (yes, I think
>> > the gimple checking as we have right now is very cheap) under the
>> > hood by the accessors (gimple_assign_{lhs,rhs1} etc.), those changes
>> > put the burden on the developers, who has to check that manually through
>> > the as_a/is_a stuff everywhere, more typing and uglier syntax.
>> > I just don't see that as a step forward, instead a huge step backwards.
>> > But perhaps I'm alone with this.
>> > Can you e.g. compare the size of - lines in your patchset combined, and
>> > size of + lines in your patchset?  As in, if your changes lead to less
>> > typing or more.
>>
>> I see two ways out here.  One is to add overloads to all the functions
>> taking the special types like
>>
>> tree
>> gimple_assign_rhs1 (gimple *);
>>
>> or simply add
>>
>> gassign *operator ()(gimple *g) { return as_a  (g); }
>>
>> into a gimple-compat.h header which you include in places that
>> are not converted "nicely".
>
> Thanks for the suggestions.
>
> Am I missing something, or is the gimple-compat.h idea above not valid C
> ++?
>
> Note that "gimple" is still a typedef to
>   gimple_statement_base *
> (as noted before, the gimple -> gimple * change would break everyone
> else's patches, so we talked about that as a followup patch for early
> stage3).
>
> Given that, if I try to create an "operator ()" outside of a class, I
> get this error:
>
> ‘gassign* operator()(gimple)’ must be a nonstatic member function
>
> which is emitted from cp/decl.c's grok_op_properties:
>   /* An operator function must either be a non-static member function
>  or have at least one parameter of a class, a reference to a class,
>  an enumeration, or a reference to an enumeration.  13.4.0.6 */
>
> I tried making it a member function of gimple_statement_base, but that
> doesn't work either: we want a conversion
>   from a gimple_statement_base * to a gassign *, not
>   from a gimple_statement_base   to a gassign *.
>
> Is there some syntactic trick here that I'm missing?  Sorry if I'm being
> dumb (I can imagine there's a way of doing it by making "gimple" become
> some kind of wrapped ptr class, but that way lies madness, surely).

Hmm.

struct assign;
struct base {
  operator assign *() const { return (assign *)this; }
};
struct assign : base {
};

void foo (assign *);
void bar (base *b)
{
  foo (b);
}

doesn't work, but

void bar (base &b)
{
  foo (b);
}

does.  Indeed C++ doesn't seem to provide what is necessary
for the compat trick :(

So the gimple-compat.

Re: [gimple-classes, committed 4/6] tree-ssa-tail-merge.c: Use gassign

2014-11-13 Thread Jonathan Wakely
On 13 November 2014 10:45, Richard Biener wrote:
>
> Hmm.
>
> struct assign;
> struct base {
>   operator assign *() const { return (assign *)this; }
> };
> struct assign : base {
> };
>
> void foo (assign *);
> void bar (base *b)
> {
>   foo (b);
> }
>
> doesn't work, but
>
> void bar (base &b)
> {
>   foo (b);
> }
>
> does.  Indeed C++ doesn't seem to provide what is necessary
> for the compat trick :(

Right, base* is a built-in type, you can't call a member function on it.

There is no implicit conversion between unrelated pointer types, and
no implicit conversion from base* to base& that would be necessary to
call the conversion operator of base.


Re: [RFD] Using the 'memory constraint' trick to avoid memory clobber doesn't work

2014-11-13 Thread David Wohlferd
Sorry for the (very) delayed response.  I'm still looking for feedback 
here so I can fix the docs.


To refresh: The topic of conversation was the (extremely) wrong 
explanation that has been in the docs since forever about how to use 
memory constraints with inline asm to avoid the performance hit of a 
full memory clobber.  Trying to understand how this really works has led 
to some surprising results.


Me:
>> While I really like the idea of using memory constraints to avoid 
all out
>> memory clobbers, 16 bytes is a pretty small maximum memory block, 
and x86
>> only supports a max of 8.  Unless there's some way to use larger 
sizes (say

>> SSIZE_MAX), this feature hardly seems worth documenting.

Richard:
> I wonder how you figured out that a 12 byte clobber performs a full
> memory clobber?

Here's the code (compiled with gcc version 4.9.0 x86_64-win32-seh-rev2, 
using -m64 -O2 -fdump-final-insns):



#include 

#define MYSIZE 3

inline void
__stosb(unsigned char *Dest, unsigned char Data, size_t Count)
{
   struct _reallybigstruct { char x[MYSIZE]; }
  *p = (struct _reallybigstruct *)Dest;

   __asm__ __volatile__ ("rep stos{b|b}"
  : "+D" (Dest), "+c" (Count), "=m" (*p)
  : [Data] "a" (Data)
  //: "memory"
   );
}

int main()
{
   unsigned char buff[100];
   buff[5] = 'A';

   __stosb(buff, 'B', sizeof(buff));
   printf("%c\n", buff[5]);
}


In summary:

   1) Create a 100 byte buffer, and set buff[5] to 'A'.
   2) Call __stosb, which uses inline asm to overwrite all of buff with 
'B'.
   3) Use a memory constraint in __stosb to flush buff.  The size of 
the memory constraint is controlled by a #define.


With this, I have a simple way to test various sizes of memory 
constraints to see if the buffer gets flushed.  If it *is* flushing the 
buffer, printing buff[5] after __stosb will print 'B'.  If it is *not* 
flushing, it will print 'A'.


Results:
   - Since buff[5] is the 6th byte in the buffer, using memory 
constraint sizes of 1, 2 & 4 (not surprisingly) all print 'A', showing 
that no flush was done.
   - Sizes of 8 and 16 print 'B', showing that the flush was done. This 
is also the expected result, since I am now flushing enough of buff to 
include buff[5].
   - The surprise comes from using a size of 3 or 5.  These also print 
'B'.  WTF?  If 4 doesn't flush, why does 3?


I believe the answer comes from reading the RTL.  The difference between 
sizes of 3 and 16 comes here:


  (set (mem/c:TI (plus:DI (reg/f:DI 7 sp)
 (const_int 32 [0x20])) [ MEM[(struct _reallybigstruct *)&buff]+0 
S16 A128])

 (asm_operands/v:TI ("rep stos{b|b}") ("=m") 2 [

   (set (mem/c:BLK (plus:DI (reg/f:DI 7 sp)
 (const_int 32 [0x20])) [ MEM[(struct _reallybigstruct 
*)&buff]+0 S3 A128])

 (asm_operands/v:BLK ("rep stos{b|b}") ("=m") 2 [

While I don't actually speak RTL, TI clearly refers to TIMode. 
Apparently when using a size that exactly matches a machine mode, asm 
memory references (on i386) can flush the exact number of bytes.  But 
for other sizes, gcc seems to falls back to BLK mode, which doesn't.


I don't know the exact meaning of BLK on a "set" or "asm_operands." Does 
it cause a full clobber?  Or just a complete clobber of buff? Attempting 
to answer that question leads us to the second bit of code:



#include 

#define MYSIZE 8

inline void
__stosb(unsigned char *Dest, unsigned char Data, size_t Count)
{
   struct _reallybigstruct { char x[MYSIZE]; }
  *p = (struct _reallybigstruct *)Dest;

   __asm__ __volatile__ ("rep stos{b|b}"
  : "+D" (Dest), "+c" (Count), "=m" (*p)
  : [Data] "a" (Data)
  //: "memory"
   );
}
int main()
{
   unsigned char buff[100], buff2[100];
   buff[5] = 'A';
   buff2[5] = 'M';
   asm("#" : : "r" (buff2));

   __stosb(buff, 'B', sizeof(buff));
   printf("%c %c\n", buff[5], buff2[5]);
}


Here I've added a buff2, and I set buff2[5] to 'M' (aka ascii 77), which 
I also print.  I still perform the memory constraint against buff, then 
I check to see if it is affecting buff2.


I start by compiling this with a size of 8 and look at the -S output.  
If this is NOT doing a full clobber, gcc should be able to just print 
buff2[5] by moving 77 into the appropriate register before calling 
printf.  And indeed, that's what we see.


/APP
 # 17 "mem2.cpp" 1
rep stosb
 # 0 "" 2
/NO_APP
movzbl  37(%rsp), %edx
movl$77, %r8d
leaq.LC0(%rip), %rcx
callprintf

If using a size of 3 *is* causing a full memory clobber, we would expect 
to see the value getting read from memory before calling printf.  And 
indeed, that's also what we see.


/APP
 # 17 "mem2.cpp" 1
rep stosb
 # 0 "" 2
/NO_APP
movzbl  37(%rsp), %edx
leaq.LC0(%rip), %rcx
movzbl  149(%rsp), %r8d

I don't know the internals of gcc well enough to understand exactly why 
this is happening.  But from a user's poin

Re: [RFD] Using the 'memory constraint' trick to avoid memory clobber doesn't work

2014-11-13 Thread Richard Biener
On Thu, Nov 13, 2014 at 1:03 PM, David Wohlferd  wrote:
> Sorry for the (very) delayed response.  I'm still looking for feedback here
> so I can fix the docs.
>
> To refresh: The topic of conversation was the (extremely) wrong explanation
> that has been in the docs since forever about how to use memory constraints
> with inline asm to avoid the performance hit of a full memory clobber.
> Trying to understand how this really works has led to some surprising
> results.
>
> Me:
>>> While I really like the idea of using memory constraints to avoid all out
>>> memory clobbers, 16 bytes is a pretty small maximum memory block, and x86
>>> only supports a max of 8.  Unless there's some way to use larger sizes
>>> (say
>>> SSIZE_MAX), this feature hardly seems worth documenting.
>
> Richard:
>> I wonder how you figured out that a 12 byte clobber performs a full
>> memory clobber?
>
> Here's the code (compiled with gcc version 4.9.0 x86_64-win32-seh-rev2,
> using -m64 -O2 -fdump-final-insns):
>
> 
> #include 
>
> #define MYSIZE 3
>
> inline void
> __stosb(unsigned char *Dest, unsigned char Data, size_t Count)
> {
>struct _reallybigstruct { char x[MYSIZE]; }
>   *p = (struct _reallybigstruct *)Dest;
>
>__asm__ __volatile__ ("rep stos{b|b}"
>   : "+D" (Dest), "+c" (Count), "=m" (*p)
>   : [Data] "a" (Data)
>   //: "memory"
>);
> }
>
> int main()
> {
>unsigned char buff[100];
>buff[5] = 'A';
>
>__stosb(buff, 'B', sizeof(buff));
>printf("%c\n", buff[5]);
> }
> 
>
> In summary:
>
>1) Create a 100 byte buffer, and set buff[5] to 'A'.
>2) Call __stosb, which uses inline asm to overwrite all of buff with 'B'.
>3) Use a memory constraint in __stosb to flush buff.  The size of the
> memory constraint is controlled by a #define.
>
> With this, I have a simple way to test various sizes of memory constraints
> to see if the buffer gets flushed.  If it *is* flushing the buffer, printing
> buff[5] after __stosb will print 'B'.  If it is *not* flushing, it will
> print 'A'.
>
> Results:
>- Since buff[5] is the 6th byte in the buffer, using memory constraint
> sizes of 1, 2 & 4 (not surprisingly) all print 'A', showing that no flush
> was done.
>- Sizes of 8 and 16 print 'B', showing that the flush was done. This is
> also the expected result, since I am now flushing enough of buff to include
> buff[5].
>- The surprise comes from using a size of 3 or 5.  These also print 'B'.
> WTF?  If 4 doesn't flush, why does 3?
>
> I believe the answer comes from reading the RTL.  The difference between
> sizes of 3 and 16 comes here:
>
>   (set (mem/c:TI (plus:DI (reg/f:DI 7 sp)
>  (const_int 32 [0x20])) [ MEM[(struct _reallybigstruct *)&buff]+0 S16
> A128])
>  (asm_operands/v:TI ("rep stos{b|b}") ("=m") 2 [
>
>(set (mem/c:BLK (plus:DI (reg/f:DI 7 sp)
>  (const_int 32 [0x20])) [ MEM[(struct _reallybigstruct *)&buff]+0 S3
> A128])
>  (asm_operands/v:BLK ("rep stos{b|b}") ("=m") 2 [
>
> While I don't actually speak RTL, TI clearly refers to TIMode. Apparently
> when using a size that exactly matches a machine mode, asm memory references
> (on i386) can flush the exact number of bytes.  But for other sizes, gcc
> seems to falls back to BLK mode, which doesn't.
>
> I don't know the exact meaning of BLK on a "set" or "asm_operands." Does it
> cause a full clobber?  Or just a complete clobber of buff? Attempting to
> answer that question leads us to the second bit of code:
>
> 
> #include 
>
> #define MYSIZE 8
>
> inline void
> __stosb(unsigned char *Dest, unsigned char Data, size_t Count)
> {
>struct _reallybigstruct { char x[MYSIZE]; }
>   *p = (struct _reallybigstruct *)Dest;
>
>__asm__ __volatile__ ("rep stos{b|b}"
>   : "+D" (Dest), "+c" (Count), "=m" (*p)
>   : [Data] "a" (Data)
>   //: "memory"
>);
> }
> int main()
> {
>unsigned char buff[100], buff2[100];
>buff[5] = 'A';
>buff2[5] = 'M';
>asm("#" : : "r" (buff2));
>
>__stosb(buff, 'B', sizeof(buff));
>printf("%c %c\n", buff[5], buff2[5]);
> }
> 
>
> Here I've added a buff2, and I set buff2[5] to 'M' (aka ascii 77), which I
> also print.  I still perform the memory constraint against buff, then I
> check to see if it is affecting buff2.
>
> I start by compiling this with a size of 8 and look at the -S output.  If
> this is NOT doing a full clobber, gcc should be able to just print buff2[5]
> by moving 77 into the appropriate register before calling printf.  And
> indeed, that's what we see.
>
> /APP
>  # 17 "mem2.cpp" 1
> rep stosb
>  # 0 "" 2
> /NO_APP
> movzbl  37(%rsp), %edx
> movl$77, %r8d
> leaq.LC0(%rip), %rcx
> callprintf
>
> If using a size of 3 *is* causing a full memory clobber, we would expect to
> see the value getting read from memory before calling printf.  And indeed,
> that's also what we see.
>
> /APP

Re: testing policy for C/C++ front end changes

2014-11-13 Thread Fabien Chêne
2014-11-11 10:05 GMT+01:00 Richard Biener :
[...]
> I think you need to retain the fact that one needs to bootstrap, not just
> build GCC.  Thus "If your change is to code that is not in a front
> end, or is to the C or C++ front ends or libgcc or
> libstdc++
> libraries, you must perform a bootstrap of GCC with all languages enabled
> by default, on at least one primary target,  and run all testsuites."
>
> Ok with that change.

Perhaps that would make sense to mention the existence of the compile
farm, and add link to it.
Otherwise, such requirements (which are obvious) could clearly
discourage contributors that do not have access to a powerful machine.

-- 
Fabien


Re: [RFC] UBSan unsafely uses VRP

2014-11-13 Thread Yury Gribov

On 11/12/2014 04:26 PM, Jakub Jelinek wrote:

On Wed, Nov 12, 2014 at 12:58:37PM +0300, Yury Gribov wrote:

On 11/12/2014 11:45 AM, Marek Polacek wrote:

On Wed, Nov 12, 2014 at 11:42:39AM +0300, Yury Gribov wrote:

On 11/11/2014 05:15 PM, Jakub Jelinek wrote:

There are also some unsafe code in functions
ubsan_expand_si_overflow_addsub_check, ubsan_expand_si_overflow_mul_check
which uses get_range_info to reduce checks number. As seen before vrp usage
for sanitizers may decrease quality of error detection.


Using VRP is completely intentional there, we don't want to generate too
slow code if you decide you want to optimize your code (for -O0 VRP isn't
performed of course).


On the other hand detection quality is probably more important than
important regardless of optimization level. When I use a checker, I don't
want it to miss bugs due to overly aggressive optimization.


Yes, but as said above, VRP is only run with >-O2 and -Os.


Hm, I must be missing something.  99% of users will only run their code
under -O2 because it'll be too slow otherwise.  Why should we penalize them
for this by lowering analysis quality?  Isn't error detection the main goal
of sanitizers (performance being the secondary at best)?


But, if -O0 isn't too slow for them, having unnecessary bloat even at -O2
is bad the same.  But not using VRP at all, you are giving up all the cases
where you know something won't overflow because you e.g. sign extend
or zero extend from some smaller type, sum op such values, and something
with constant, or you can use a cheaper code to multiply etc.
Turning off -faggressive-loop-optimizations is certainly the right thing for
-fsanitize=undefined (any undefined I'd say), so are perhaps selected other
optimizations.

Jakub





Re: testing policy for C/C++ front end changes

2014-11-13 Thread Markus Hitter
Am 13.11.2014 um 14:08 schrieb Fabien Chêne:
> Perhaps that would make sense to mention the existence of the compile
> farm, and add link to it.

Good idea. Bonus points for adding a script which executes all the required 
steps.


Markus

-- 
- - - - - - - - - - - - - - - - - - -
Dipl. Ing. (FH) Markus Hitter
http://www.jump-ing.de/


Re: [RFD] Using the 'memory constraint' trick to avoid memory clobber doesn't work

2014-11-13 Thread Hans-Peter Nilsson
On Thu, 13 Nov 2014, David Wohlferd wrote:
> Sorry for the (very) delayed response.  I'm still looking for feedback here so
> I can fix the docs.

Thank you for your diligence.

> As I said before, triggering a full memory clobber for anything over 16 bytes
> (and most sizes under 16 bytes) makes this feature all but useless.  So if
> that's really what's happening, we need to decide what to do next:
>
> 1) Can this be "fixed?"
> 2) Do we want to doc the current behavior?
> 3) Or do we just remove this section?
>
> I think it could be a nice performance win for inline asm if it could be made
> to work right, but I have no idea what might be involved in that.  Failing
> that, I guess if it doesn't work and isn't going to work, I'd recommend
> removing the text for this feature.
>
> Since all 3 suggestions require a doc change, I'll just say that I'm prepared
> to start work on the doc patch as soon as someone lets me know what the plan
> is.
>
> Richard?  Hans-Peter?  Your thoughts?

I've forgot if someone mentioned whether we have a test-case in
our test-suite for this feature.  If we don't, then 3; removal.
If we do, I guess it's flawed or at least not agreeing with the
documentation?  Then it *might* be worth the effort fixing that
and additional test-coverage (depending on the person stepping
up...) but 3 is IMHO still an arguably sane option.

brgds, H-P


Re: [RFD] Using the 'memory constraint' trick to avoid memory clobber doesn't work

2014-11-13 Thread Richard Biener
On Thu, Nov 13, 2014 at 2:53 PM, Hans-Peter Nilsson  wrote:
> On Thu, 13 Nov 2014, David Wohlferd wrote:
>> Sorry for the (very) delayed response.  I'm still looking for feedback here 
>> so
>> I can fix the docs.
>
> Thank you for your diligence.
>
>> As I said before, triggering a full memory clobber for anything over 16 bytes
>> (and most sizes under 16 bytes) makes this feature all but useless.  So if
>> that's really what's happening, we need to decide what to do next:
>>
>> 1) Can this be "fixed?"
>> 2) Do we want to doc the current behavior?
>> 3) Or do we just remove this section?
>>
>> I think it could be a nice performance win for inline asm if it could be made
>> to work right, but I have no idea what might be involved in that.  Failing
>> that, I guess if it doesn't work and isn't going to work, I'd recommend
>> removing the text for this feature.
>>
>> Since all 3 suggestions require a doc change, I'll just say that I'm prepared
>> to start work on the doc patch as soon as someone lets me know what the plan
>> is.
>>
>> Richard?  Hans-Peter?  Your thoughts?
>
> I've forgot if someone mentioned whether we have a test-case in
> our test-suite for this feature.  If we don't, then 3; removal.
> If we do, I guess it's flawed or at least not agreeing with the
> documentation?  Then it *might* be worth the effort fixing that
> and additional test-coverage (depending on the person stepping
> up...) but 3 is IMHO still an arguably sane option.

Well, as what is missing is just an optimization I'd say we should
try to fix it.  And surely the docs should not promise that optimization
will happen - it should just mention that doing this might allow
optimization to happen.

Richard.

> brgds, H-P


Re: [PATCH 0/4] OpenMP 4.0 offloading to Intel MIC

2014-11-13 Thread Kirill Yukhin
Hello,

Support of OpenMP 4.0 offloading to future Xeon Phi was fully checked in to main
trunk.

Thanks everybody who helped w/ development and review.

--
Thanks, K


Re: [gimple-classes, committed 4/6] tree-ssa-tail-merge.c: Use gassign

2014-11-13 Thread Andrew MacLeod

On 11/13/2014 05:45 AM, Richard Biener wrote:

On Thu, Nov 13, 2014 at 2:41 AM, David Malcolm  wrote:

On Tue, 2014-11-11 at 11:43 +0100, Richard Biener wrote:

On Tue, Nov 11, 2014 at 8:26 AM, Jakub Jelinek  wrote:

On Mon, Nov 10, 2014 at 05:27:50PM -0500, David Malcolm wrote:

On Sat, 2014-11-08 at 14:56 +0100, Jakub Jelinek wrote:

On Sat, Nov 08, 2014 at 01:07:28PM +0100, Richard Biener wrote:

To be constructive here - the above case is from within a
GIMPLE_ASSIGN case label
and thus I'd have expected

 case GIMPLE_ASSIGN:
   {
 gassign *a1 = as_a  (s1);
 gassign *a2 = as_a  (s2);
   lhs1 = gimple_assign_lhs (a1);
   lhs2 = gimple_assign_lhs (a2);
   if (TREE_CODE (lhs1) != SSA_NAME
   && TREE_CODE (lhs2) != SSA_NAME)
 return (operand_equal_p (lhs1, lhs2, 0)
 && gimple_operand_equal_value_p (gimple_assign_rhs1 (a1),
  gimple_assign_rhs1 (a2)));
   else if (TREE_CODE (lhs1) == SSA_NAME
&& TREE_CODE (lhs2) == SSA_NAME)
 return vn_valueize (lhs1) == vn_valueize (lhs2);
   return false;
   }

instead.  That's the kind of changes I have expected and have approved of.

But even that looks like just adding extra work for all developers, with no
gain.  You only have to add extra code and extra temporaries, in switches
typically also have to add {} because of the temporaries and thus extra
indentation level, and it doesn't simplify anything in the code.

The branch attempts to use the C++ typesystem to capture information
about the kinds of gimple statement we expect, both:
   (A) so that the compiler can detect type errors, and
   (B) as a comprehension aid to the human reader of the code

The ideal here is when function params and struct field can be
strengthened from "gimple" to a subclass ptr.  This captures the
knowledge that every use of a function or within a struct has a given
gimple code.

I just don't like all the as_a/is_a stuff enforced everywhere,
it means more typing, more temporaries, more indentation.
So, as I view it, instead of the checks being done cheaply (yes, I think
the gimple checking as we have right now is very cheap) under the
hood by the accessors (gimple_assign_{lhs,rhs1} etc.), those changes
put the burden on the developers, who has to check that manually through
the as_a/is_a stuff everywhere, more typing and uglier syntax.
I just don't see that as a step forward, instead a huge step backwards.
But perhaps I'm alone with this.
Can you e.g. compare the size of - lines in your patchset combined, and
size of + lines in your patchset?  As in, if your changes lead to less
typing or more.

I see two ways out here.  One is to add overloads to all the functions
taking the special types like

tree
gimple_assign_rhs1 (gimple *);

or simply add

gassign *operator ()(gimple *g) { return as_a  (g); }

into a gimple-compat.h header which you include in places that
are not converted "nicely".

Thanks for the suggestions.

Am I missing something, or is the gimple-compat.h idea above not valid C
++?

Note that "gimple" is still a typedef to
   gimple_statement_base *
(as noted before, the gimple -> gimple * change would break everyone
else's patches, so we talked about that as a followup patch for early
stage3).

Given that, if I try to create an "operator ()" outside of a class, I
get this error:

‘gassign* operator()(gimple)’ must be a nonstatic member function

which is emitted from cp/decl.c's grok_op_properties:
   /* An operator function must either be a non-static member function
  or have at least one parameter of a class, a reference to a class,
  an enumeration, or a reference to an enumeration.  13.4.0.6 */

I tried making it a member function of gimple_statement_base, but that
doesn't work either: we want a conversion
   from a gimple_statement_base * to a gassign *, not
   from a gimple_statement_base   to a gassign *.

Is there some syntactic trick here that I'm missing?  Sorry if I'm being
dumb (I can imagine there's a way of doing it by making "gimple" become
some kind of wrapped ptr class, but that way lies madness, surely).

Hmm.

struct assign;
struct base {
   operator assign *() const { return (assign *)this; }
};
struct assign : base {
};

void foo (assign *);
void bar (base *b)
{
   foo (b);
}

doesn't work, but

void bar (base &b)
{
   foo (b);
}

does.  Indeed C++ doesn't seem to provide what is necessary
for the compat trick :(

So the gimple-compat.h header would need to provide
additional overloads for the affected functions like

inline tree
gimple_assign_rhs1 (gimple *g)
{
   return gimple_assign_rhs1 (as_a  (g));
}

that would work for me as well.


Won't that defeat the desire for checking tho?  If you dont do a 
dyn_cast<>  in gimple_assign_rhs1 (gimple *g)  anyone can call it and 
upcast any kind of gimple into a gassign without checking that it is 
really a gassign...   Actually, a gcc_asser

Re: [gimple-classes, committed 4/6] tree-ssa-tail-merge.c: Use gassign

2014-11-13 Thread Richard Biener
On Thu, Nov 13, 2014 at 3:24 PM, Andrew MacLeod  wrote:
> On 11/13/2014 05:45 AM, Richard Biener wrote:
>>
>> On Thu, Nov 13, 2014 at 2:41 AM, David Malcolm 
>> wrote:
>>>
>>> On Tue, 2014-11-11 at 11:43 +0100, Richard Biener wrote:

 On Tue, Nov 11, 2014 at 8:26 AM, Jakub Jelinek  wrote:
>
> On Mon, Nov 10, 2014 at 05:27:50PM -0500, David Malcolm wrote:
>>
>> On Sat, 2014-11-08 at 14:56 +0100, Jakub Jelinek wrote:
>>>
>>> On Sat, Nov 08, 2014 at 01:07:28PM +0100, Richard Biener wrote:

 To be constructive here - the above case is from within a
 GIMPLE_ASSIGN case label
 and thus I'd have expected

  case GIMPLE_ASSIGN:
{
  gassign *a1 = as_a  (s1);
  gassign *a2 = as_a  (s2);
lhs1 = gimple_assign_lhs (a1);
lhs2 = gimple_assign_lhs (a2);
if (TREE_CODE (lhs1) != SSA_NAME
&& TREE_CODE (lhs2) != SSA_NAME)
  return (operand_equal_p (lhs1, lhs2, 0)
  && gimple_operand_equal_value_p (gimple_assign_rhs1
 (a1),
   gimple_assign_rhs1
 (a2)));
else if (TREE_CODE (lhs1) == SSA_NAME
 && TREE_CODE (lhs2) == SSA_NAME)
  return vn_valueize (lhs1) == vn_valueize (lhs2);
return false;
}

 instead.  That's the kind of changes I have expected and have
 approved of.
>>>
>>> But even that looks like just adding extra work for all developers,
>>> with no
>>> gain.  You only have to add extra code and extra temporaries, in
>>> switches
>>> typically also have to add {} because of the temporaries and thus
>>> extra
>>> indentation level, and it doesn't simplify anything in the code.
>>
>> The branch attempts to use the C++ typesystem to capture information
>> about the kinds of gimple statement we expect, both:
>>(A) so that the compiler can detect type errors, and
>>(B) as a comprehension aid to the human reader of the code
>>
>> The ideal here is when function params and struct field can be
>> strengthened from "gimple" to a subclass ptr.  This captures the
>> knowledge that every use of a function or within a struct has a given
>> gimple code.
>
> I just don't like all the as_a/is_a stuff enforced everywhere,
> it means more typing, more temporaries, more indentation.
> So, as I view it, instead of the checks being done cheaply (yes, I
> think
> the gimple checking as we have right now is very cheap) under the
> hood by the accessors (gimple_assign_{lhs,rhs1} etc.), those changes
> put the burden on the developers, who has to check that manually
> through
> the as_a/is_a stuff everywhere, more typing and uglier syntax.
> I just don't see that as a step forward, instead a huge step backwards.
> But perhaps I'm alone with this.
> Can you e.g. compare the size of - lines in your patchset combined, and
> size of + lines in your patchset?  As in, if your changes lead to less
> typing or more.

 I see two ways out here.  One is to add overloads to all the functions
 taking the special types like

 tree
 gimple_assign_rhs1 (gimple *);

 or simply add

 gassign *operator ()(gimple *g) { return as_a  (g); }

 into a gimple-compat.h header which you include in places that
 are not converted "nicely".
>>>
>>> Thanks for the suggestions.
>>>
>>> Am I missing something, or is the gimple-compat.h idea above not valid C
>>> ++?
>>>
>>> Note that "gimple" is still a typedef to
>>>gimple_statement_base *
>>> (as noted before, the gimple -> gimple * change would break everyone
>>> else's patches, so we talked about that as a followup patch for early
>>> stage3).
>>>
>>> Given that, if I try to create an "operator ()" outside of a class, I
>>> get this error:
>>>
>>> ‘gassign* operator()(gimple)’ must be a nonstatic member function
>>>
>>> which is emitted from cp/decl.c's grok_op_properties:
>>>/* An operator function must either be a non-static member
>>> function
>>>   or have at least one parameter of a class, a reference to a
>>> class,
>>>   an enumeration, or a reference to an enumeration.  13.4.0.6 */
>>>
>>> I tried making it a member function of gimple_statement_base, but that
>>> doesn't work either: we want a conversion
>>>from a gimple_statement_base * to a gassign *, not
>>>from a gimple_statement_base   to a gassign *.
>>>
>>> Is there some syntactic trick here that I'm missing?  Sorry if I'm being
>>> dumb (I can imagine there's a way of doing it by making "gimple" become
>>> some kind of wrapped ptr class, but that way lies madness, surely).
>>
>> Hmm.
>>
>> struct assign;
>> struct base {

What is R_X86_64_GOTPLT64 used for?

2014-11-13 Thread H.J. Lu
x86-64 psABI has

name@GOT: specifies the offset to the GOT entry for the symbol name
from the base of the GOT.

name@GOTPLT: specifies the offset to the GOT entry for the symbol name
from the base of the GOT, implying that there is a corresponding PLT entry.

But GCC never generates name@GOTPLT and assembler fails to assemble
it:

[hjl@gnu-6 pr17598]$ cat x.S
movabs $foo@GOTPLT,%rax
[hjl@gnu-6 pr17598]$ gcc -c x.S
x.S: Assembler messages:
x.S:1: Error: relocated field and relocation type differ in signedness
[hjl@gnu-6 pr17598]$

It certainly isn't needed on data symbols.  I couldn't find any possible
usage for this relocation on function symbols.

Does anyone remember what it was supposed to be used for?

-- 
H.J.


Re: [PATCH 0/4] OpenMP 4.0 offloading to Intel MIC

2014-11-13 Thread Tobias Burnus
Kirill Yukhin wrote:
> Support of OpenMP 4.0 offloading to future Xeon Phi was
> fully checked in to main trunk.

Thanks. If I understood it correctly:

* GCC 5 supports code generation for Xeon Phi (Knights Landing, KNL)
* KNL (the hardware) is not yet available [mid 2015?]
* liboffloadmic supports offloading in an emulation mode (executed on
  the host) but does not (yet) support offloading to KNL; i.e. one
  would need an updated version of it, once one gets hold of the
  actual hardware.
* The current hardware (Xeon Phi Knights Corner, KNC) is and will not
  be supported by GCC.

* Details for building GCC for offloading and running code on an
accelerator is at https://gcc.gnu.org/wiki/Offloading

Question: Is the latter up to date - and the item above correct?
BTW: you could update gcc.gnu.org ->news and gcc.gnu.org/gcc-5/changes.html

Otherwise:
* OpenACC support is about to be merged (as alternative to OpenMP 4)
* Support for offloading to NVidia GPUs via PTX is also about to be merged.

Tobias


Re: [gimple-classes, committed 4/6] tree-ssa-tail-merge.c: Use gassign

2014-11-13 Thread Andrew MacLeod

On 11/13/2014 09:34 AM, Richard Biener wrote:

On Thu, Nov 13, 2014 at 3:24 PM, Andrew MacLeod  wrote:

On 11/13/2014 05:45 AM, Richard Biener wrote:

On Thu, Nov 13, 2014 at 2:41 AM, David Malcolm 
wrote:

On Tue, 2014-11-11 at 11:43 +0100, Richard Biener wrote:

On Tue, Nov 11, 2014 at 8:26 AM, Jakub Jelinek  wrote:

On Mon, Nov 10, 2014 at 05:27:50PM -0500, David Malcolm wrote:

On Sat, 2014-11-08 at 14:56 +0100, Jakub Jelinek wrote:

On Sat, Nov 08, 2014 at 01:07:28PM +0100, Richard Biener wrote:

To be constructive here - the above case is from within a
GIMPLE_ASSIGN case label
and thus I'd have expected

  case GIMPLE_ASSIGN:
{
  gassign *a1 = as_a  (s1);
  gassign *a2 = as_a  (s2);
lhs1 = gimple_assign_lhs (a1);
lhs2 = gimple_assign_lhs (a2);
if (TREE_CODE (lhs1) != SSA_NAME
&& TREE_CODE (lhs2) != SSA_NAME)
  return (operand_equal_p (lhs1, lhs2, 0)
  && gimple_operand_equal_value_p (gimple_assign_rhs1
(a1),
   gimple_assign_rhs1
(a2)));
else if (TREE_CODE (lhs1) == SSA_NAME
 && TREE_CODE (lhs2) == SSA_NAME)
  return vn_valueize (lhs1) == vn_valueize (lhs2);
return false;
}

instead.  That's the kind of changes I have expected and have
approved of.

But even that looks like just adding extra work for all developers,
with no
gain.  You only have to add extra code and extra temporaries, in
switches
typically also have to add {} because of the temporaries and thus
extra
indentation level, and it doesn't simplify anything in the code.

The branch attempts to use the C++ typesystem to capture information
about the kinds of gimple statement we expect, both:
(A) so that the compiler can detect type errors, and
(B) as a comprehension aid to the human reader of the code

The ideal here is when function params and struct field can be
strengthened from "gimple" to a subclass ptr.  This captures the
knowledge that every use of a function or within a struct has a given
gimple code.

I just don't like all the as_a/is_a stuff enforced everywhere,
it means more typing, more temporaries, more indentation.
So, as I view it, instead of the checks being done cheaply (yes, I
think
the gimple checking as we have right now is very cheap) under the
hood by the accessors (gimple_assign_{lhs,rhs1} etc.), those changes
put the burden on the developers, who has to check that manually
through
the as_a/is_a stuff everywhere, more typing and uglier syntax.
I just don't see that as a step forward, instead a huge step backwards.
But perhaps I'm alone with this.
Can you e.g. compare the size of - lines in your patchset combined, and
size of + lines in your patchset?  As in, if your changes lead to less
typing or more.

I see two ways out here.  One is to add overloads to all the functions
taking the special types like

tree
gimple_assign_rhs1 (gimple *);

or simply add

gassign *operator ()(gimple *g) { return as_a  (g); }

into a gimple-compat.h header which you include in places that
are not converted "nicely".

Thanks for the suggestions.

Am I missing something, or is the gimple-compat.h idea above not valid C
++?

Note that "gimple" is still a typedef to
gimple_statement_base *
(as noted before, the gimple -> gimple * change would break everyone
else's patches, so we talked about that as a followup patch for early
stage3).

Given that, if I try to create an "operator ()" outside of a class, I
get this error:

‘gassign* operator()(gimple)’ must be a nonstatic member function

which is emitted from cp/decl.c's grok_op_properties:
/* An operator function must either be a non-static member
function
   or have at least one parameter of a class, a reference to a
class,
   an enumeration, or a reference to an enumeration.  13.4.0.6 */

I tried making it a member function of gimple_statement_base, but that
doesn't work either: we want a conversion
from a gimple_statement_base * to a gassign *, not
from a gimple_statement_base   to a gassign *.

Is there some syntactic trick here that I'm missing?  Sorry if I'm being
dumb (I can imagine there's a way of doing it by making "gimple" become
some kind of wrapped ptr class, but that way lies madness, surely).

Hmm.

struct assign;
struct base {
operator assign *() const { return (assign *)this; }
};
struct assign : base {
};

void foo (assign *);
void bar (base *b)
{
foo (b);
}

doesn't work, but

void bar (base &b)
{
foo (b);
}

does.  Indeed C++ doesn't seem to provide what is necessary
for the compat trick :(

So the gimple-compat.h header would need to provide
additional overloads for the affected functions like

inline tree
gimple_assign_rhs1 (gimple *g)
{
return gimple_assign_rhs1 (as_a  (g));
}

that would work for me as well.



Won't that defeat the desire for checking tho?  If you dont do a dyn_cast<>
in gimple_assign_rhs1 (gimple *g)  anyone

Re: [PATCH 0/4] OpenMP 4.0 offloading to Intel MIC

2014-11-13 Thread Jakub Jelinek
On Thu, Nov 13, 2014 at 04:15:48PM +0100, Tobias Burnus wrote:
> Question: Is the latter up to date - and the item above correct?

Will leave that to Kirill.

> BTW: you could update gcc.gnu.org ->news and gcc.gnu.org/gcc-5/changes.html

Indeed, that should be updated.

> Otherwise:
> * OpenACC support is about to be merged (as alternative to OpenMP 4)

I hope so.

> * Support for offloading to NVidia GPUs via PTX is also about to be merged.

Ditto.

Then the question is how hard will it be to get OpenACC offloading to
XeonPhi (real hw or emulation) - I guess it is a matter of whether the
plugin needs to implement some extra hooks for OpenACC, and also
whether we can get OpenMP offloading to PTX (dunno if Thomas or his
collegues have actually tried it on simple testcases, I bet the hardest part
will be porting libgomp away from pthread_* to optionally be supported
by the limited nvptx target and use its threading model; whether __thread
is already supported by nvptx etc.).  I'm willing to help with this once I
have some hw, but some help from people familiar with PTX would be certainly
appreciated.  Because without libgomp ported to nvptx-*-* target (or some
way to inline all the GOMP_*/omp_* calls in offloading regions for nvptx,
but the latter might be too hard), I guess one could offload very simple
target regions, but not anything using #pragma omp inside of them.

Jakub


Re: [PATCH 0/4] OpenMP 4.0 offloading to Intel MIC

2014-11-13 Thread Kirill Yukhin
Hi Tobias,
On 13 Nov 16:15, Tobias Burnus wrote:
> Kirill Yukhin wrote:
> > Support of OpenMP 4.0 offloading to future Xeon Phi was
> > fully checked in to main trunk.
> 
> Thanks. If I understood it correctly:
> 
> * GCC 5 supports code generation for Xeon Phi (Knights Landing, KNL)
Right.

> * KNL (the hardware) is not yet available [mid 2015?]
Yes, but I don't know the date.

> * liboffloadmic supports offloading in an emulation mode (executed on
>   the host) but does not (yet) support offloading to KNL; i.e. one
>   would need an updated version of it, once one gets hold of the
>   actual hardware.
Yes, it supports emulation mode. Also, current scheme is the same as
for KNC (however we have no code generator in GCC main trunk for KNC).
We're going to keep liboffloadmic up-to-date.

> * The current hardware (Xeon Phi Knights Corner, KNC) is and will not
>   be supported by GCC.
Currently GCC main trunk doesn't support KNC code gen.

> * Details for building GCC for offloading and running code on an
> accelerator is at https://gcc.gnu.org/wiki/Offloading
> 
> Question: Is the latter up to date - and the item above correct?
Correct.

> BTW: you could update gcc.gnu.org ->news and gcc.gnu.org/gcc-5/changes.html
Thanks, I'll post a patch.

--
Thanks, K


Re: [PATCH 0/4] OpenMP 4.0 offloading to Intel MIC

2014-11-13 Thread Kirill Yukhin


Hi Tobias,
On 13 Nov 16:15, Tobias Burnus wrote:
> Kirill Yukhin wrote:
> > Support of OpenMP 4.0 offloading to future Xeon Phi was
> > fully checked in to main trunk.
> 
> Thanks. If I understood it correctly:
> 
> * GCC 5 supports code generation for Xeon Phi (Knights Landing, KNL)
Right.

> * KNL (the hardware) is not yet available [mid 2015?]
Yes, but I don't know the date.

> * liboffloadmic supports offloading in an emulation mode (executed on
>   the host) but does not (yet) support offloading to KNL; i.e. one
>   would need an updated version of it, once one gets hold of the
>   actual hardware.
Yes, it supports emulation mode. Also, current scheme is the same as
for KNC (however we have no code generator in GCC main trunk for KNC).
We're going to keep liboffloadmic up-to-date.

> * The current hardware (Xeon Phi Knights Corner, KNC) is and will not
>   be supported by GCC.
Currently GCC main trunk doesn't support KNC code gen.

> * Details for building GCC for offloading and running code on an
> accelerator is at https://gcc.gnu.org/wiki/Offloading
> 
> Question: Is the latter up to date - and the item above correct?
Correct.

> BTW: you could update gcc.gnu.org ->news and gcc.gnu.org/gcc-5/changes.html
Thanks, I'll post a patch.

--
Thanks, K


Re: What is R_X86_64_GOTPLT64 used for?

2014-11-13 Thread Richard Henderson
On 11/13/2014 03:55 PM, H.J. Lu wrote:
> x86-64 psABI has
> 
> name@GOT: specifies the offset to the GOT entry for the symbol name
> from the base of the GOT.
> 
> name@GOTPLT: specifies the offset to the GOT entry for the symbol name
> from the base of the GOT, implying that there is a corresponding PLT entry.
> 
> But GCC never generates name@GOTPLT and assembler fails to assemble
> it:
> 
> [hjl@gnu-6 pr17598]$ cat x.S
> movabs $foo@GOTPLT,%rax
> [hjl@gnu-6 pr17598]$ gcc -c x.S
> x.S: Assembler messages:
> x.S:1: Error: relocated field and relocation type differ in signedness

Presumably that's a bug, since it does work with .quad.

> Does anyone remember what it was supposed to be used for?

Presumably some sort of non-C language where you need a non-local function
pointer, but it need not be canonical, and thus could be lazily bound.

But I don't know exactly when that would be.


r~


Re: What is R_X86_64_GOTPLT64 used for?

2014-11-13 Thread Michael Matz
Hi,

On Thu, 13 Nov 2014, H.J. Lu wrote:

> x86-64 psABI has
> 
> name@GOT: specifies the offset to the GOT entry for the symbol name
> from the base of the GOT.
> 
> name@GOTPLT: specifies the offset to the GOT entry for the symbol name
> from the base of the GOT, implying that there is a corresponding PLT entry.
> 
> But GCC never generates name@GOTPLT and assembler fails to assemble
> it:

I've added the implementation for the large model, but only dimly remember 
how it got added to the ABI in the first place.  The additional effect of 
using that reloc was supposed to be that the GOT slot was to be placed 
into .got.plt, and this might hint at the reasoning for this reloc:

If you take the address of a function and call it, you need both a GOT 
slot and a PLT entry (where the existence of GOT slot is implied by the 
PLT of course).  Now, if you use the normal @GOT64 reloc for the 
address-taking operation that would create a slot in .got.  For the call 
instruction you'd use @PLT (or variants thereof, like PLTOFF), which 
creates the PLT slot _and_ a slot in .got.plt.  So, now we've ended up 
with two GOT slots for the same symbol, where one should be enough (the 
address taking operation can just as well use the slot in .got.plt).  So 
if the compiler would emit @GOTPLT64 instead of @GOT64 for all address 
references to symbols where it knows that it's a function it could save 
one GOT slot.

So, I think it was supposed to be a small optimization hint.  But it never 
was used in the compiler ...

> [hjl@gnu-6 pr17598]$ cat x.S
> movabs $foo@GOTPLT,%rax
> [hjl@gnu-6 pr17598]$ gcc -c x.S
> x.S: Assembler messages:
> x.S:1: Error: relocated field and relocation type differ in signedness

... and now seems to have bit-rotted.

> [hjl@gnu-6 pr17598]$
> 
> It certainly isn't needed on data symbols.  I couldn't find any possible
> usage for this relocation on function symbols.

The longer I think about it the more I'm sure it's the above optional 
optimization mean.


Ciao,
Michael.


Re: What is R_X86_64_GOTPLT64 used for?

2014-11-13 Thread H.J. Lu
On Thu, Nov 13, 2014 at 8:33 AM, Michael Matz  wrote:
> Hi,
>
> On Thu, 13 Nov 2014, H.J. Lu wrote:
>
>> x86-64 psABI has
>>
>> name@GOT: specifies the offset to the GOT entry for the symbol name
>> from the base of the GOT.
>>
>> name@GOTPLT: specifies the offset to the GOT entry for the symbol name
>> from the base of the GOT, implying that there is a corresponding PLT entry.
>>
>> But GCC never generates name@GOTPLT and assembler fails to assemble
>> it:
>
> I've added the implementation for the large model, but only dimly remember
> how it got added to the ABI in the first place.  The additional effect of
> using that reloc was supposed to be that the GOT slot was to be placed
> into .got.plt, and this might hint at the reasoning for this reloc:
>
> If you take the address of a function and call it, you need both a GOT
> slot and a PLT entry (where the existence of GOT slot is implied by the

That is correct.

> PLT of course).  Now, if you use the normal @GOT64 reloc for the
> address-taking operation that would create a slot in .got.  For the call
> instruction you'd use @PLT (or variants thereof, like PLTOFF), which
> creates the PLT slot _and_ a slot in .got.plt.  So, now we've ended up
> with two GOT slots for the same symbol, where one should be enough (the
> address taking operation can just as well use the slot in .got.plt).  So
> if the compiler would emit @GOTPLT64 instead of @GOT64 for all address
> references to symbols where it knows that it's a function it could save
> one GOT slot.

@GOTPLT will create a PLT entry, but it doesn't mean PLT entry will
be used.  Only @PLTOFF will use PLT entry.  Linker should be smart
enough to use only one GOT slot, regardless if @GOTPLT or @GOT
is used to take function address and call via PLT.  However, if
@GOTPLT is used without @PLT, a PLT entry will be created and unused.

I'd like to propose

1. Update psABI to remove R_X86_64_GOTPLT64.
2. Fix assembler to take @GOTPLT for backward compatibility,
3. Make sure that linker uses one GOT slot for @GOT and @PLTOFF.

> So, I think it was supposed to be a small optimization hint.  But it never
> was used in the compiler ...
>
>> [hjl@gnu-6 pr17598]$ cat x.S
>> movabs $foo@GOTPLT,%rax
>> [hjl@gnu-6 pr17598]$ gcc -c x.S
>> x.S: Assembler messages:
>> x.S:1: Error: relocated field and relocation type differ in signedness
>
> ... and now seems to have bit-rotted.
>
>> [hjl@gnu-6 pr17598]$
>>
>> It certainly isn't needed on data symbols.  I couldn't find any possible
>> usage for this relocation on function symbols.
>
> The longer I think about it the more I'm sure it's the above optional
> optimization mean.
>

The reason I am asking about it is I'd like to finish
the large model support in binutils and GCC. I have
filed a couple bugs:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63833
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63842
https://sourceware.org/bugzilla/show_bug.cgi?id=17592
https://sourceware.org/bugzilla/show_bug.cgi?id=17593

I will fix all of them and verify that large model works correctly.


-- 
H.J.


Re: [PATCH 0/4] OpenMP 4.0 offloading to Intel MIC

2014-11-13 Thread H.J. Lu
On Thu, Nov 13, 2014 at 6:14 AM, Kirill Yukhin  wrote:
> Hello,
>
> Support of OpenMP 4.0 offloading to future Xeon Phi was fully checked in to 
> main
> trunk.
>
> Thanks everybody who helped w/ development and review.
>

I noticed many libgomp test failures:

https://gcc.gnu.org/ml/gcc-regression/2014-11/msg00309.html

Have you seen them?

-- 
H.J.


Re: [PATCH 0/4] OpenMP 4.0 offloading to Intel MIC

2014-11-13 Thread David Edelsohn
Kirill,

The patches have broken bootstrap on AIX and probably on other non-GNU
platforms.  strchrnul() is a GNU extension.

/nasfarm/edelsohn/src/src/gcc/lto-wrapper.c: In function 'unsigned int
parse_env_var(const char*, char***, const char*)':
/nasfarm/edelsohn/src/src/gcc/lto-wrapper.c:427:35: error: 'strchrnul'
was not declared in this scope
   nextval = strchrnul (curval, ':');
   ^
/nasfarm/edelsohn/src/src/gcc/lto-wrapper.c: In function 'void
append_offload_options(obstack*, const char*, cl_decoded_option*,
unsigned int)':
/nasfarm/edelsohn/src/src/gcc/lto-wrapper.c:584:34: error: 'strchrnul'
was not declared in this scope
next = strchrnul (cur, ',');

/nasfarm/edelsohn/src/src/gcc/gcc.c: In function 'void
handle_foffload_option(const char*)':
/nasfarm/edelsohn/src/src/gcc/gcc.c:3378:28: error: 'strchrnul' was
not declared in this scope
   end = strchrnul (arg, '=');
^
Thanks, David


On Thu, Nov 13, 2014 at 9:14 AM, Kirill Yukhin  wrote:
> Hello,
>
> Support of OpenMP 4.0 offloading to future Xeon Phi was fully checked in to 
> main
> trunk.
>
> Thanks everybody who helped w/ development and review.
>
> --
> Thanks, K


Re: What is R_X86_64_GOTPLT64 used for?

2014-11-13 Thread H.J. Lu
On Thu, Nov 13, 2014 at 9:03 AM, H.J. Lu  wrote:
> On Thu, Nov 13, 2014 at 8:33 AM, Michael Matz  wrote:
>> Hi,
>>
>> On Thu, 13 Nov 2014, H.J. Lu wrote:
>>
>>> x86-64 psABI has
>>>
>>> name@GOT: specifies the offset to the GOT entry for the symbol name
>>> from the base of the GOT.
>>>
>>> name@GOTPLT: specifies the offset to the GOT entry for the symbol name
>>> from the base of the GOT, implying that there is a corresponding PLT entry.
>>>
>>> But GCC never generates name@GOTPLT and assembler fails to assemble
>>> it:
>>
>> I've added the implementation for the large model, but only dimly remember
>> how it got added to the ABI in the first place.  The additional effect of
>> using that reloc was supposed to be that the GOT slot was to be placed
>> into .got.plt, and this might hint at the reasoning for this reloc:
>>
>> If you take the address of a function and call it, you need both a GOT
>> slot and a PLT entry (where the existence of GOT slot is implied by the
>
> That is correct.
>
>> PLT of course).  Now, if you use the normal @GOT64 reloc for the
>> address-taking operation that would create a slot in .got.  For the call
>> instruction you'd use @PLT (or variants thereof, like PLTOFF), which
>> creates the PLT slot _and_ a slot in .got.plt.  So, now we've ended up
>> with two GOT slots for the same symbol, where one should be enough (the
>> address taking operation can just as well use the slot in .got.plt).  So
>> if the compiler would emit @GOTPLT64 instead of @GOT64 for all address
>> references to symbols where it knows that it's a function it could save
>> one GOT slot.
>
> @GOTPLT will create a PLT entry, but it doesn't mean PLT entry will
> be used.  Only @PLTOFF will use PLT entry.  Linker should be smart
> enough to use only one GOT slot, regardless if @GOTPLT or @GOT
> is used to take function address and call via PLT.  However, if
> @GOTPLT is used without @PLT, a PLT entry will be created and unused.
>
> I'd like to propose
>
> 1. Update psABI to remove R_X86_64_GOTPLT64.
> 2. Fix assembler to take @GOTPLT for backward compatibility,
> 3. Make sure that linker uses one GOT slot for @GOT and @PLTOFF.
>

Linker does:

case R_X86_64_GOT64:
case R_X86_64_GOTPLT64:
   base_got = htab->elf.sgot;

  if (htab->elf.sgot == NULL)
abort ();

  if (h != NULL)
{
  bfd_boolean dyn;

  off = h->got.offset;
  if (h->needs_plt
  && h->plt.offset != (bfd_vma)-1
  && off == (bfd_vma)-1)
{
  /* We can't use h->got.offset here to save
 state, or even just remember the offset, as
 finish_dynamic_symbol would use that as offset into
 .got.  */
  bfd_vma plt_index = h->plt.offset / plt_entry_size - 1;
  off = (plt_index + 3) * GOT_ENTRY_SIZE;
  base_got = htab->elf.sgotplt;
}

So if  a symbol is accessed by both @GOT and @PLTOFF, its
needs_plt will be true and its got.plt entry will be used for
both @GOT and @GOTPLT.  @GOTPLT has no advantage
over @GOT, but potentially wastes a PLT entry.

Here is a patch to mark relocation 30 (R_X86_64_GOTPLT64)
as reserved.  I pushed updated x86-64 psABI changes to

https://github.com/hjl-tools/x86-64-psABI/tree/hjl/master

I will update linker to keep accepting relocation 30 and
treat it the same as R_X86_64_GOT64.

-- 
H.J.
---
diff --git a/low-level-sys-info.tex b/low-level-sys-info.tex
index 7f636fc..981390b 100644
--- a/low-level-sys-info.tex
+++ b/low-level-sys-info.tex
@@ -1242,9 +1242,6 @@ examples and discussion.  They are:
 \begin{itemize}
 \item \code{name@GOT}: specifies the offset to the GOT entry for
   the symbol \code{name} from the base of the GOT.
-\item \code{name@GOTPLT}: specifies the offset to the GOT entry for
-  the symbol \code{name} from the base of the GOT, implying that
-  there is a corresponding PLT entry.
 \item \code{name@GOTOFF}: specifies the offset to the location of
   the symbol \code{name} from the base of the GOT.
 \item \code{name@GOTPCREL}: specifies the offset to the GOT entry
diff --git a/object-files.tex b/object-files.tex
index 4705e96..c0698dc 100644
--- a/object-files.tex
+++ b/object-files.tex
@@ -611,7 +611,7 @@ Name&  Value &   Field   & Calculati
on\\
 \hline
 \code{R_X86_64_GOTPC64} &  29&   word64  & \code{GOT - P + A} \\
 \hline
gnu-6:pts/18[114]> cat /tmp/x
diff --git a/low-level-sys-info.tex b/low-level-sys-info.tex
index 7f636fc..981390b 100644
--- a/low-level-sys-info.tex
+++ b/low-level-sys-info.tex
@@ -1242,9 +1242,6 @@ examples and discussion.  They are:
 \begin{itemize}
 \item \code{name@GOT}: specifies the offset to the GOT entry for
   the symbol \code{name} from the base of the GOT.
-\item \code{name@GOTPLT}: specifies the offset to the GOT entry for
-  th

Re: [PATCH 0/4] OpenMP 4.0 offloading to Intel MIC

2014-11-13 Thread Ilya Verbin
On 13 Nov 09:17, H.J. Lu wrote:
> I noticed many libgomp test failures:
> 
> https://gcc.gnu.org/ml/gcc-regression/2014-11/msg00309.html
> 
> Have you seen them?

Hi H.J.,

I do not see these regressions on i686-linux and x86_64-linux.
Could you please provide more details? (configure options, error log)

Thanks,
  -- Ilya


Re: [PATCH 0/4] OpenMP 4.0 offloading to Intel MIC

2014-11-13 Thread H.J. Lu
On Thu, Nov 13, 2014 at 10:37 AM, Ilya Verbin  wrote:
> On 13 Nov 09:17, H.J. Lu wrote:
>> I noticed many libgomp test failures:
>>
>> https://gcc.gnu.org/ml/gcc-regression/2014-11/msg00309.html
>>
>> Have you seen them?
>
> Hi H.J.,
>
> I do not see these regressions on i686-linux and x86_64-linux.
> Could you please provide more details? (configure options, error log)
>
> Thanks,
>   -- Ilya

GCC is configured with

--prefix=/usr/5.0.0 --enable-clocale=gnu --with-system-zlib
--enable-shared --with-demangler-in-ld i686-linux --with-fpmath=sse
--enable-languages=c,c++,fortran,java,lto,objc

I got

spawn -ignore SIGHUP
/export/project/git/gcc-regression/master/217501/bld/gcc/xgcc
-B/export/project/git/gcc-regression/master/217501/bld/gcc/
/export/project/git/gcc-regression/gcc/libgomp/testsuite/libgomp.c/examples-4/e.50.1.c
-B/export/project/git/gcc-regression/master/217501/bld/x86_64-unknown-linux-gnu/./libgomp/
-B/export/project/git/gcc-regression/master/217501/bld/x86_64-unknown-linux-gnu/./libgomp/.libs
-I/export/project/git/gcc-regression/master/217501/bld/x86_64-unknown-linux-gnu/./libgomp
-I/export/project/git/gcc-regression/gcc/libgomp/testsuite/..
-fmessage-length=0 -fno-diagnostics-show-caret
-fdiagnostics-color=never -fopenmp -O2
-L/export/project/git/gcc-regression/master/217501/bld/x86_64-unknown-linux-gnu/./libgomp/.libs
-lm -o ./e.50.1.exe^M
/usr/local/bin/ld: /tmp/ccA8cExp.o: plugin needed to handle lto object^M
output is:
/usr/local/bin/ld: /tmp/ccA8cExp.o: plugin needed to handle lto object^M


-- 
H.J.


Re: [PATCH 0/4] OpenMP 4.0 offloading to Intel MIC

2014-11-13 Thread Ilya Verbin
On 13 Nov 10:48, H.J. Lu wrote:
> /usr/local/bin/ld: /tmp/ccA8cExp.o: plugin needed to handle lto object^M

Looks like we should set flag_fat_lto_objects while compilation with offloading.
I'll investigate this issue tomorrow.

Could you please also show a version and configure options for ld?

Thanks,
  -- Ilya


Re: [PATCH 0/4] OpenMP 4.0 offloading to Intel MIC

2014-11-13 Thread H.J. Lu
On Thu, Nov 13, 2014 at 11:20 AM, Ilya Verbin  wrote:
> On 13 Nov 10:48, H.J. Lu wrote:
>> /usr/local/bin/ld: /tmp/ccA8cExp.o: plugin needed to handle lto object^M
>
> Looks like we should set flag_fat_lto_objects while compilation with 
> offloading.
> I'll investigate this issue tomorrow.
>
> Could you please also show a version and configure options for ld?
>
> Thanks,
>   -- Ilya

I am using binutils 20141107 trunk, which was configured with

--with-sysroot=/ \
--enable-gold --enable-plugins --enable-threads --enable-targets=x86_64-linux \
--prefix=/usr/local \
--with-local-prefix=/usr/local


-- 
H.J.


Re: [PATCH 0/4] OpenMP 4.0 offloading to Intel MIC

2014-11-13 Thread Gerald Pfeifer
On Thursday 2014-11-13 12:41, David Edelsohn wrote:
> The patches have broken bootstrap on AIX and probably on other non-GNU
> platforms.  strchrnul() is a GNU extension.

Yep, FreeBSD 8 is broken as well.

The failure rate of my nightly testers over the last two weeks
must be around 50%.

Gerald


Re: [PATCH 0/4] OpenMP 4.0 offloading to Intel MIC

2014-11-13 Thread H.J. Lu
On Thu, Nov 13, 2014 at 11:20 AM, Ilya Verbin  wrote:
> On 13 Nov 10:48, H.J. Lu wrote:
>> /usr/local/bin/ld: /tmp/ccA8cExp.o: plugin needed to handle lto object^M
>
> Looks like we should set flag_fat_lto_objects while compilation with 
> offloading.
> I'll investigate this issue tomorrow.
>
> Could you please also show a version and configure options for ld?
>


Section Headers:
  [Nr] Name  TypeAddress  OffSize
 ES Flg Lk Inf Al
  [ 0]   NULL 00
00 00  0   0  0
  [ 1] .text PROGBITS 40
000204 00  AX  0   0 16
  [ 2] .rela.textRELA 001a60
d8 18   I 29   1  8
  [ 3] .data PROGBITS 000260
40 00  WA  0   0 32
  [ 4] .bss  NOBITS   0002a0
00 00  WA  0   0  1
  [ 5] .gnu.offload_lto_.profile.50035f9931394ed4 PROGBITS
 0002a0 13 00   E  0   0  1
  [ 6] .gnu.offload_lto_.icf.50035f9931394ed4 PROGBITS
 0002b3 1e 00   E  0   0  1
  [ 7] .gnu.offload_lto_.jmpfuncs.50035f9931394ed4 PROGBITS
 0002d1 19 00   E  0   0  1
  [ 8] .gnu.offload_lto_.inline.50035f9931394ed4 PROGBITS
 0002ea 6c 00   E  0   0  1
  [ 9] .gnu.offload_lto_.pureconst.50035f9931394ed4 PROGBITS
 000356 13 00   E  0   0  1
  [10] .gnu.offload_lto_vec_mult._omp_fn.1.50035f9931394ed4 PROGBITS
  000369 0004ab 00   E  0   0  1
  [11] .gnu.offload_lto_vec_mult._omp_fn.0.50035f9931394ed4 PROGBITS
  000814 00035d 00   E  0   0  1
  [12] .gnu.offload_lto_.symbol_nodes.50035f9931394ed4 PROGBITS
 000b71 55 00   E  0   0  1
  [13] .gnu.offload_lto_.refs.50035f9931394ed4 PROGBITS
 000bc6 14 00   E  0   0  1
  [14] .gnu.offload_lto_.offload_table.50035f9931394ed4 PROGBITS
  000bda 11 00   E  0   0  1
  [15] .gnu.offload_lto_.decls.50035f9931394ed4 PROGBITS
 000beb 00043d 00   E  0   0  1
  [16] .gnu.offload_lto_.symtab.50035f9931394ed4 PROGBITS
 001028 00 00   E  0   0  1
  [17] .gnu.offload_lto_.opts PROGBITS 001028
a9 00   E  0   0  1

Don't you need another plugin to claim those offload IR sections?

-- 
H.J.


Re: [PATCH 0/4] OpenMP 4.0 offloading to Intel MIC

2014-11-13 Thread Ilya Verbin
On 13 Nov 2014, at 23:11, H.J. Lu  wrote:
> 
> Section Headers:
>  [Nr] Name  TypeAddress  OffSize
> ES Flg Lk Inf Al
>  [ 0]   NULL 00
> 00 00  0   0  0
>  [ 1] .text PROGBITS 40
> 000204 00  AX  0   0 16
>  [ 2] .rela.textRELA 001a60
> d8 18   I 29   1  8
>  [ 3] .data PROGBITS 000260
> 40 00  WA  0   0 32
>  [ 4] .bss  NOBITS   0002a0
> 00 00  WA  0   0  1
>  [ 5] .gnu.offload_lto_.profile.50035f9931394ed4 PROGBITS
>  0002a0 13 00   E  0   0  1
>  [ 6] .gnu.offload_lto_.icf.50035f9931394ed4 PROGBITS
>  0002b3 1e 00   E  0   0  1
>  [ 7] .gnu.offload_lto_.jmpfuncs.50035f9931394ed4 PROGBITS
>  0002d1 19 00   E  0   0  1
>  [ 8] .gnu.offload_lto_.inline.50035f9931394ed4 PROGBITS
>  0002ea 6c 00   E  0   0  1
>  [ 9] .gnu.offload_lto_.pureconst.50035f9931394ed4 PROGBITS
>  000356 13 00   E  0   0  1
>  [10] .gnu.offload_lto_vec_mult._omp_fn.1.50035f9931394ed4 PROGBITS
>  000369 0004ab 00   E  0   0  1
>  [11] .gnu.offload_lto_vec_mult._omp_fn.0.50035f9931394ed4 PROGBITS
>  000814 00035d 00   E  0   0  1
>  [12] .gnu.offload_lto_.symbol_nodes.50035f9931394ed4 PROGBITS
>  000b71 55 00   E  0   0  1
>  [13] .gnu.offload_lto_.refs.50035f9931394ed4 PROGBITS
>  000bc6 14 00   E  0   0  1
>  [14] .gnu.offload_lto_.offload_table.50035f9931394ed4 PROGBITS
>  000bda 11 00   E  0   0  1
>  [15] .gnu.offload_lto_.decls.50035f9931394ed4 PROGBITS
>  000beb 00043d 00   E  0   0  1
>  [16] .gnu.offload_lto_.symtab.50035f9931394ed4 PROGBITS
>  001028 00 00   E  0   0  1
>  [17] .gnu.offload_lto_.opts PROGBITS 001028
> a9 00   E  0   0  1
> 
> Don't you need another plugin to claim those offload IR sections?

No, the plan was that a regular plugin will just ignore offload IR sections by 
default.  In your configuration ld detects a __gnu_lto_slim symbol and decided 
that the object file contains only LTO IR without asm.  I am going to 
investigate where is the difference with my configuration and fix the bug.

  -- Ilya

Re: [PATCH 0/4] OpenMP 4.0 offloading to Intel MIC

2014-11-13 Thread H.J. Lu
On Thu, Nov 13, 2014 at 12:53 PM, Ilya Verbin  wrote:
> On 13 Nov 2014, at 23:11, H.J. Lu  wrote:
>>
>> Section Headers:
>>  [Nr] Name  TypeAddress  OffSize
>> ES Flg Lk Inf Al
>>  [ 0]   NULL 00
>> 00 00  0   0  0
>>  [ 1] .text PROGBITS 40
>> 000204 00  AX  0   0 16
>>  [ 2] .rela.textRELA 001a60
>> d8 18   I 29   1  8
>>  [ 3] .data PROGBITS 000260
>> 40 00  WA  0   0 32
>>  [ 4] .bss  NOBITS   0002a0
>> 00 00  WA  0   0  1
>>  [ 5] .gnu.offload_lto_.profile.50035f9931394ed4 PROGBITS
>>  0002a0 13 00   E  0   0  1
>>  [ 6] .gnu.offload_lto_.icf.50035f9931394ed4 PROGBITS
>>  0002b3 1e 00   E  0   0  1
>>  [ 7] .gnu.offload_lto_.jmpfuncs.50035f9931394ed4 PROGBITS
>>  0002d1 19 00   E  0   0  1
>>  [ 8] .gnu.offload_lto_.inline.50035f9931394ed4 PROGBITS
>>  0002ea 6c 00   E  0   0  1
>>  [ 9] .gnu.offload_lto_.pureconst.50035f9931394ed4 PROGBITS
>>  000356 13 00   E  0   0  1
>>  [10] .gnu.offload_lto_vec_mult._omp_fn.1.50035f9931394ed4 PROGBITS
>>  000369 0004ab 00   E  0   0  1
>>  [11] .gnu.offload_lto_vec_mult._omp_fn.0.50035f9931394ed4 PROGBITS
>>  000814 00035d 00   E  0   0  1
>>  [12] .gnu.offload_lto_.symbol_nodes.50035f9931394ed4 PROGBITS
>>  000b71 55 00   E  0   0  1
>>  [13] .gnu.offload_lto_.refs.50035f9931394ed4 PROGBITS
>>  000bc6 14 00   E  0   0  1
>>  [14] .gnu.offload_lto_.offload_table.50035f9931394ed4 PROGBITS
>>  000bda 11 00   E  0   0  1
>>  [15] .gnu.offload_lto_.decls.50035f9931394ed4 PROGBITS
>>  000beb 00043d 00   E  0   0  1
>>  [16] .gnu.offload_lto_.symtab.50035f9931394ed4 PROGBITS
>>  001028 00 00   E  0   0  1
>>  [17] .gnu.offload_lto_.opts PROGBITS 001028
>> a9 00   E  0   0  1
>>
>> Don't you need another plugin to claim those offload IR sections?
>
> No, the plan was that a regular plugin will just ignore offload IR sections 
> by default.  In your configuration ld detects a __gnu_lto_slim symbol and 
> decided that the object file contains only LTO IR without asm.  I am going to 
> investigate where is the difference with my configuration and fix the bug.
>

You may need to install the current binutils to see it.

-- 
H.J.


Re: [PATCH 0/4] OpenMP 4.0 offloading to Intel MIC

2014-11-13 Thread Jakub Jelinek
On Thu, Nov 13, 2014 at 11:53:53PM +0300, Ilya Verbin wrote:
> > Don't you need another plugin to claim those offload IR sections?
> 
> No, the plan was that a regular plugin will just ignore offload IR
> sections by default.  In your configuration ld detects a __gnu_lto_slim
> symbol and decided that the object file contains only LTO IR without asm. 
> I am going to investigate where is the difference with my configuration
> and fix the bug.

FYI, I'm getting
+WARNING: program timed out.
+FAIL: libgomp.c/examples-4/e.54.2.c execution test
on both x86_64-linux and i686-linux (normal --enable-checking=yes,rtl
bootstrap, no offloading configure options).
binutils-2.24, ld.bfd.

Jakub


Ann: MELT plugin 1.1.3 for GCC 4.8 & 4.9

2014-11-13 Thread Basile Starynkevitch
Dear All,

It is my pleasure to announce the MELT plugin 1.1.3 for GCC 4.8 or 4.9

MELT is a high-level domain specific language and plugin to customize
GCC, see http://gcc-melt.org/ for details.

It is free software, GPLv3+ licensed, FSF copyrighted.

You can download the source tarball from

 http://gcc-melt.org/melt-plugin-1.1.3-for-gcc-4.8-or-4.9.tar.bz2 

this is a bzip2-ed tarball of 4124848 bytes (4.0Mbytes) extracted from
the GCC MELT branch svn rev. 217521 on november 13th, 2014

It brings several new features and significant bug fixes w.r.t. previous
MELT 1.1.2 (of august 31st 2014)


NEWS for 1.1.3 MELT plugin for GCC 4.8 & 4.9
[[november 13th, 2014]]

Bug-fix & feature-increase with significant improvements release
w.r.t. to MELT plugin 1.1.2.

   Bug fixes
   =

Better working macros, and improved documentations


   Language improvement
   

   Improved macro constructs.
  
   All common (i.e. non-language specific) tree codes are handled, at
   least thru an automatically generated cmatcher.

   All non-OMP gimple codes are handled.

   Better handling of variadic tree & gimple constructs.
   
   Added gimple_call_args & gimple_call_more_args quasi-cmatcher,
   gimple_switch handling, and functions to build them with a MELT
   sequence (tuple or list) of constituents.

   Variadic and polytypic add2list.
  
   End-user improvements
   =

   Better eval mode. More informative error messages.

   Documentation generated in several HTML files.

###


Please report bugs and comments to gcc-m...@googlegroups.com

Regards.

-- 
Basile STARYNKEVITCH http://starynkevitch.net/Basile/
email: basilestarynkevitchnet mobile: +33 6 8501 2359
8, rue de la Faiencerie, 92340 Bourg La Reine, France
*** opinions {are only mine, sont seulement les miennes} ***




gcc-4.8-20141113 is now available

2014-11-13 Thread gccadmin
Snapshot gcc-4.8-20141113 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.8-20141113/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.8 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-4_8-branch 
revision 217524

You'll find:

 gcc-4.8-20141113.tar.bz2 Complete GCC

  MD5=fe4685763a78ec1fabad73e76d1ce6dd
  SHA1=f6a13da7ba744503c3b6bf78ffccc0c998c4d328

Diffs from 4.8-20141106 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.8
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Re: [RFD] Using the 'memory constraint' trick to avoid memory clobber doesn't work

2014-11-13 Thread David Wohlferd


On 11/13/2014 6:02 AM, Richard Biener wrote:

On Thu, Nov 13, 2014 at 2:53 PM, Hans-Peter Nilsson  wrote:

On Thu, 13 Nov 2014, David Wohlferd wrote:

Sorry for the (very) delayed response.  I'm still looking for feedback here so
I can fix the docs.

Thank you for your diligence.


As I said before, triggering a full memory clobber for anything over 16 bytes
(and most sizes under 16 bytes) makes this feature all but useless.  So if
that's really what's happening, we need to decide what to do next:

1) Can this be "fixed?"
2) Do we want to doc the current behavior?
3) Or do we just remove this section?

I think it could be a nice performance win for inline asm if it could be made
to work right, but I have no idea what might be involved in that.  Failing
that, I guess if it doesn't work and isn't going to work, I'd recommend
removing the text for this feature.

Since all 3 suggestions require a doc change, I'll just say that I'm prepared
to start work on the doc patch as soon as someone lets me know what the plan
is.

Richard?  Hans-Peter?  Your thoughts?

I've forgot if someone mentioned whether we have a test-case in
our test-suite for this feature.


I'm looking thru gcc/testsuite/*.c to see if I can spot anything. It's 
not easy since there is a lot of asm and the people who write these are 
apparently allergic to using comments to describe what they are testing.



If we don't, then 3; removal.
If we do, I guess it's flawed or at least not agreeing with the
documentation?  Then it *might* be worth the effort fixing that
and additional test-coverage (depending on the person stepping
up...) but 3 is IMHO still an arguably sane option.

Well, as what is missing is just an optimization I'd say we should
try to fix it.


While I'd love to be the one to fix this, the fact of the matter is that 
most of gcc is a black box to me.  Even if you told me roughly where to 
start, I'd have no idea of the downstream impacts of anything I changed.


So while I understand that it looks like I'm just finding work for other 
people, fixing something like this is simply beyond me.  That said, I'm 
certainly prepared to outline what I see as the interesting test cases 
and to do some testing if someone else is willing to step up and do this 
optimization.



And surely the docs should not promise that optimization
will happen - it should just mention that doing this might allow
optimization to happen.


I can agree with this.  I am quite confident there will be occasions 
where gcc has no option but to fall back to doing a full clobber to 
ensure correct function (a possibility which the current docs also fails 
to mention).  So yes, the docs should be limited in what it promises here.


Which brings us to the question: what do we do now?  The 15th is fast 
approaching.  Can something like this get done before then? Can it be 
checked in for 5.0 after the 15th?  Or does it need to wait for 6.0?


If it does need to wait for 6.0, what do we want to do with the docs in 
the meantime?  Given how wrong they are currently, I'd hate to ship yet 
another release with that ugly text.  But trying to describe the best 
way to take advantage of optimizations that haven't been written yet 
is... hard.


Since (as I understand it) 5.0 docs *can* be checked in after the 15th, 
my recommendations:


 - If someone is prepared to step up and do this work for v5.0, then 
I'll wait and write the docs when they are done and can describe how it 
works.
 - If this is going to wait for 6.0, then if someone does (at least) 
enough investigative work to be able to describe how this will 
eventually work, I'll update the 5.0 docs in a general way talking about 
ways gcc *may* be able to optimize.  It should be possible to phrase 
this so code people write today will work even better tomorrow.
 - Worst case is if no one has the time to look at this for the 
foreseeable future.  In that case, I'm with Hans-Peter.  Let's take the 
existing text out.  Following the existing text makes things *worse*, 
and the current implementation is so limited that I'd be surprised if 
anyone's code actually uses it successfully.  New text can get added 
when the new code is.


Hmm.  I just had a thought: Is it possible the problem I'm seeing here 
is platform-specific?  Maybe this works perfectly for non-i386 code?  
That would certainly change my recommendations here.


dw


Nov Confirmed Classes

2014-11-13 Thread ComSystem

Dear Training Manager

Below are the confirmed classes for Nov 2014

Advance PC Configuration, Troubleshooting & Data RecoveryRate: RM1530Date: 
13-14th November 2014Time: 9am-5pmVenue: Vistana Hotel, KLTrainer: En Muhammad

Date: 18-19th December 2014Venue: Suria City Hotel, Johor BahruTrainer: En 
Hamizi Jamaludin

Understanding GSTRate: RM750/ pax, RM600/pax for group of 3Date: 17th 
November 2014Time: 9am-5pmVenue: Vistana Hotel, KLTrainer: Mr Henry

Basic English CommunicationRate: RM699/ pax, RM550/pax for group of 3Date: 
19-20 Nov 2014Time: 9am-5pmVenue: Vistana Hotel, KLTrainer: Ms Maler

Excel Dashboard & Interactive ReportsRate: RM1399/ pax , RM 999/ pax for a 
group of 2Date: 20-21 Nov 2014Time: 9am-5pmVenue: Vistana Hotel, KLTrainer: Mr 
Julian
Function & Formula with ExcelRate: RM850/pax (Buy 2 Free 1)Date: 24-25 Nov 
2014Time: 9am-5pmVenue: Crystal Crown Hotel Klang
Inhouse MS Office sessions are conducted at RM1650/day for a total of 20 pax. 
Offer ends for trainings confirmed next year by Dec 2014


Thank you
Jacob
Training Executive
Comsystem Solutions Sdn Bhd
03-51628254
012-3162007

Anti-SPAM Policy Disclaimer:
Please accept our apologies if you have no intention to receive such message & it is not 
welcomed. Send us a reply with "Remove" heading in the subject to unsubscribe & not 
to receive future e-mail from us. We will remove your email from our database. Please allow a 7 
working days period for this action.

Please do not reply to this email. Please revert from the email listed in the  
brochure



Re: [RFD] Using the 'memory constraint' trick to avoid memory clobber doesn't work

2014-11-13 Thread dw



I've forgot if someone mentioned whether we have a test-case in
our test-suite for this feature.


I'm looking thru gcc/testsuite/*.c to see if I can spot anything. It's 
not easy since there is a lot of asm and the people who write these 
are apparently allergic to using comments to describe what they are 
testing.


So, I found a few tests that were *using* this feature.  But they seem 
to be checking for an ICE or page fault, rather than checking to see if 
the generated code was avoiding the memory clobber.


dw