RE: negative latencies

2014-05-19 Thread Ajit Kumar Agarwal

Is it the case of code speculation where the negative latencies are used?

Thanks & Regards
Ajit
-Original Message-
From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of shmeel 
gutl
Sent: Monday, May 19, 2014 12:23 PM
To: Andrew Pinski
Cc: gcc@gcc.gnu.org; Vladimir Makarov
Subject: Re: negative latencies

On 19-May-14 09:39 AM, Andrew Pinski wrote:
> On Sun, May 18, 2014 at 11:13 PM, shmeel gutl 
>  wrote:
>> Are there hooks in gcc to deal with negative latencies? In other 
>> words, an architecture that permits an instruction to use a result 
>> from an instruction that will be issued later.
> Do you mean bypasses?  If so there is a bypass feature which you can use:
> https://gcc.gnu.org/onlinedocs/gccint/Processor-pipeline-description.h
> tml#index-data-bypass-3773
>
> Thanks,
> Andrew Pinski
Unfortunately, bypasses in the pipeline description is not enough.
They only allow you to calculate the latency of true dependencies. They are 
also forced to be zero or greater. The real question is how the scheduler and 
register allocator can deal with negative latencies.

Thanks
Shmeel
>> At first glance it seems that it will will break a few things.
>> 1) The definition of dependencies cannot come from the simple 
>> ordering of rtl.
>> 2) The scheduling problem starts to look like "get off the train 3 
>> stops before me".
>> 3) The definition of live ranges needs to use actual instruction 
>> timing information, not just instruction sequencing.
>>
>> The hooks in the scheduler seem to be enough to stop damage but not 
>> enough to take advantage of this "feature".
>>
>> Thanks
>>
>
> -
> No virus found in this message.
> Checked by AVG - www.avg.com
> Version: 2014.0.4577 / Virus Database: 3950/7515 - Release Date: 
> 05/18/14




Re: Using particular register class (like floating point registers) as spill register class

2014-05-19 Thread Andrew Haley
On 05/16/2014 05:20 PM, Ian Bolton wrote:
>> On 05/16/2014 12:05 PM, Kugan wrote:
>>>
>>>
>>> On 16/05/14 20:40, pins...@gmail.com wrote:


> On May 16, 2014, at 3:23 AM, Kugan
>>  wrote:
>
> I would like to know if there is anyway we can use registers from
> particular register class just as spill registers (in places where
> register allocator would normally spill to stack and nothing more),
>> when
> it can be useful.
>
> In AArch64, in some cases, compiling with -mgeneral-regs-only
>> produces
> better performance compared not using it. The difference here is
>> that
> when -mgeneral-regs-only is not used, floating point register are
>> also
> used in register allocation. Then IRA/LRA has to move them to core
> registers before performing operations as shown below.

 Can you show the code with fp register disabled?  Does it use the
>> stack to spill?  Normally this is due to register to register class
>> costs compared to register to memory move cost.  Also I think it
>> depends on the processor rather the target.  For thunder, using the fp
>> registers might actually be better than using the stack depending if
>> the stack was in L1.
>>> Not all the LDR/STR combination match to fmov. In the testcase I
>> have,
>>>
>>> aarch64-none-linux-gnu-gcc sha_dgst.c -O2  -S  -mgeneral-regs-only
>>> grep -c "ldr" sha_dgst.s
>>> 50
>>> grep -c "str" sha_dgst.s
>>> 42
>>> grep -c "fmov" sha_dgst.s
>>> 0
>>>
>>> aarch64-none-linux-gnu-gcc sha_dgst.c -O2  -S
>>> grep -c "ldr" sha_dgst.s
>>> 42
>>> grep -c "str" sha_dgst.s
>>> 31
>>> grep -c "fmov" sha_dgst.s
>>> 105
>>>
>>> I  am not saying that we shouldn't use floating point register here.
>> But
>>> from the above, it seems like register allocator is using it as more
>>> like core register (even though the cost mode has higher cost) and
>> then
>>> moving the values to core registers before operations. if that is the
>>> case, my question is, how do we just make this as spill register
>> class
>>> so that we will replace ldr/str with equal number of fmov when it is
>>> possible.
>>
>> I'm also seeing stuff like this:
>>
>> => 0x7fb72a0928 > Thread*)+2500>:
>> add  x21, x4, x21, lsl #3
>> => 0x7fb72a092c > Thread*)+2504>:
>> fmov w2, s8
>> => 0x7fb72a0930 > Thread*)+2508>:
>> str  w2, [x21,#88]
>>
>> I guess GCC doesn't know how to store an SImode value in an FP register
>> into
>> memory?  This is  4.8.1.
>>
> 
> Please can you try that on trunk and report back.

OK, this is trunk, and I'm not longer seeing that happen.

However, I am seeing:

   0x007fb76dc82c <+160>:   adrpx25, 0x7fb7c8
   0x007fb76dc830 <+164>:   add x25, x25, #0x480
   0x007fb76dc834 <+168>:   fmovd8, x0
   0x007fb76dc838 <+172>:   add x0, x29, #0x160
   0x007fb76dc83c <+176>:   fmovd9, x0
   0x007fb76dc840 <+180>:   add x0, x29, #0xd8
   0x007fb76dc844 <+184>:   fmovd10, x0
   0x007fb76dc848 <+188>:   add x0, x29, #0xf8
   0x007fb76dc84c <+192>:   fmovd11, x0

followed later by:

   0x007fb76dd224 <+2712>:  fmovx0, d9
   0x007fb76dd228 <+2716>:  add x6, x29, #0x118
   0x007fb76dd22c <+2720>:  str x20, [x0,w27,sxtw #3]
   0x007fb76dd230 <+2724>:  fmovx0, d10
   0x007fb76dd234 <+2728>:  str w28, [x0,w27,sxtw #2]
   0x007fb76dd238 <+2732>:  fmovx0, d11
   0x007fb76dd23c <+2736>:  str w19, [x0,w27,sxtw #2]

which seems a bit suboptimal, given that these double registers now have
to be saved in the prologue.



Re: Using particular register class (like floating point registers) as spill register class

2014-05-19 Thread Ramana Radhakrishnan
On Mon, May 19, 2014 at 1:02 PM, Andrew Haley  wrote:
> On 05/16/2014 05:20 PM, Ian Bolton wrote:
>>> On 05/16/2014 12:05 PM, Kugan wrote:


 On 16/05/14 20:40, pins...@gmail.com wrote:
>
>
>> On May 16, 2014, at 3:23 AM, Kugan
>>>  wrote:
>>
>> I would like to know if there is anyway we can use registers from
>> particular register class just as spill registers (in places where
>> register allocator would normally spill to stack and nothing more),
>>> when
>> it can be useful.
>>
>> In AArch64, in some cases, compiling with -mgeneral-regs-only
>>> produces
>> better performance compared not using it. The difference here is
>>> that
>> when -mgeneral-regs-only is not used, floating point register are
>>> also
>> used in register allocation. Then IRA/LRA has to move them to core
>> registers before performing operations as shown below.
>
> Can you show the code with fp register disabled?  Does it use the
>>> stack to spill?  Normally this is due to register to register class
>>> costs compared to register to memory move cost.  Also I think it
>>> depends on the processor rather the target.  For thunder, using the fp
>>> registers might actually be better than using the stack depending if
>>> the stack was in L1.
 Not all the LDR/STR combination match to fmov. In the testcase I
>>> have,

 aarch64-none-linux-gnu-gcc sha_dgst.c -O2  -S  -mgeneral-regs-only
 grep -c "ldr" sha_dgst.s
 50
 grep -c "str" sha_dgst.s
 42
 grep -c "fmov" sha_dgst.s
 0

 aarch64-none-linux-gnu-gcc sha_dgst.c -O2  -S
 grep -c "ldr" sha_dgst.s
 42
 grep -c "str" sha_dgst.s
 31
 grep -c "fmov" sha_dgst.s
 105

 I  am not saying that we shouldn't use floating point register here.
>>> But
 from the above, it seems like register allocator is using it as more
 like core register (even though the cost mode has higher cost) and
>>> then
 moving the values to core registers before operations. if that is the
 case, my question is, how do we just make this as spill register
>>> class
 so that we will replace ldr/str with equal number of fmov when it is
 possible.
>>>
>>> I'm also seeing stuff like this:
>>>
>>> => 0x7fb72a0928 >> Thread*)+2500>:
>>> add  x21, x4, x21, lsl #3
>>> => 0x7fb72a092c >> Thread*)+2504>:
>>> fmov w2, s8
>>> => 0x7fb72a0930 >> Thread*)+2508>:
>>> str  w2, [x21,#88]
>>>
>>> I guess GCC doesn't know how to store an SImode value in an FP register
>>> into
>>> memory?  This is  4.8.1.
>>>
>>
>> Please can you try that on trunk and report back.
>
> OK, this is trunk, and I'm not longer seeing that happen.
>
> However, I am seeing:
>
>0x007fb76dc82c <+160>:   adrpx25, 0x7fb7c8
>0x007fb76dc830 <+164>:   add x25, x25, #0x480
>0x007fb76dc834 <+168>:   fmovd8, x0
>0x007fb76dc838 <+172>:   add x0, x29, #0x160
>0x007fb76dc83c <+176>:   fmovd9, x0
>0x007fb76dc840 <+180>:   add x0, x29, #0xd8
>0x007fb76dc844 <+184>:   fmovd10, x0
>0x007fb76dc848 <+188>:   add x0, x29, #0xf8
>0x007fb76dc84c <+192>:   fmovd11, x0
>
> followed later by:
>
>0x007fb76dd224 <+2712>:  fmovx0, d9
>0x007fb76dd228 <+2716>:  add x6, x29, #0x118
>0x007fb76dd22c <+2720>:  str x20, [x0,w27,sxtw #3]
>0x007fb76dd230 <+2724>:  fmovx0, d10
>0x007fb76dd234 <+2728>:  str w28, [x0,w27,sxtw #2]
>0x007fb76dd238 <+2732>:  fmovx0, d11
>0x007fb76dd23c <+2736>:  str w19, [x0,w27,sxtw #2]
>
> which seems a bit suboptimal, given that these double registers now have
> to be saved in the prologue.

That looks a bit suspicious - Is there a pre-processed file you can
put on to bugzilla for someone to take a look at with command line
options et al ?

I had a testcase that I was investigating a few days back from a
benchmark that was doing SHA2 calculations. From my notes I'd been
playing with REGISTER_MOVE_COST and MEMORY_MOVE_COST and additionally
the extra moves appeared to disappear with -fno-schedule-insns.
Remember however that on AArch64 we don't have sched-pressure on by
default.


regards
Ramana

>


RE: Using particular register class (like floating point registers) as spill register class

2014-05-19 Thread Ian Bolton
> >
> > Please can you try that on trunk and report back.
> 
> OK, this is trunk, and I'm not longer seeing that happen.
> 
> However, I am seeing:
> 
>0x007fb76dc82c <+160>: adrpx25, 0x7fb7c8
>0x007fb76dc830 <+164>: add x25, x25, #0x480
>0x007fb76dc834 <+168>: fmovd8, x0
>0x007fb76dc838 <+172>: add x0, x29, #0x160
>0x007fb76dc83c <+176>: fmovd9, x0
>0x007fb76dc840 <+180>: add x0, x29, #0xd8
>0x007fb76dc844 <+184>: fmovd10, x0
>0x007fb76dc848 <+188>: add x0, x29, #0xf8
>0x007fb76dc84c <+192>: fmovd11, x0
> 
> followed later by:
> 
>0x007fb76dd224 <+2712>:fmovx0, d9
>0x007fb76dd228 <+2716>:add x6, x29, #0x118
>0x007fb76dd22c <+2720>:str x20, [x0,w27,sxtw #3]
>0x007fb76dd230 <+2724>:fmovx0, d10
>0x007fb76dd234 <+2728>:str w28, [x0,w27,sxtw #2]
>0x007fb76dd238 <+2732>:fmovx0, d11
>0x007fb76dd23c <+2736>:str w19, [x0,w27,sxtw #2]
> 
> which seems a bit suboptimal, given that these double registers now
> have
> to be saved in the prologue.
> 

Thanks for doing that.  Many AArch64 improvements have gone in since
4.8 was released.

I think we'd have to see the output for the whole function to
determine whether that code is sane. I don't suppose the source
code is shareable or you have a testcase for this you can share?

Cheers,
Ian





Re: Offload Library

2014-05-19 Thread Kirill Yukhin
Hello Ian,
On 16 May 07:07, Ian Lance Taylor wrote:
> On Fri, May 16, 2014 at 4:47 AM, Kirill Yukhin  
> wrote:
> >
> > To support the offloading features for Intel's Xeon Phi cards
> > we need to add a foreign library (liboffload) into the gcc repository.
> > README with build instructions is attached.
> 
> Can you explain why this library should be part of GCC, and how GCC
> would use it?  I'm sure it's obvious to you but it's not obvious to
> me.
The ‘target’ clause of OpenMP 4.0 aka ‘offloading’ support is expected to be a 
part of 
libgomp. Every target platform that will be supported should implement a 
dedicated 
plugin for libgomp. The plugin for Xeon PHI is based on the liboffload 
functionality.
This library also will provide compatibility for binaries built with ICC. 

--
Thanks, K

> 
> Ian


Re: Offload Library

2014-05-19 Thread Kirill Yukhin
Hello, Thomas!

On 16 May 19:30, Thomas Schwinge wrote:
> On Fri, 16 May 2014 15:47:58 +0400, Kirill Yukhin  
> wrote:
> > To support the offloading features for Intel's Xeon Phi cards
> > we need to add a foreign library (liboffload) into the gcc repository.
> 
> As written in the README, this library currently is specific to Intel
> hardware (understandably, of course), and I assume also in the future is
> to remain that way (?) -- should it thus get a more specific name in GCC,
> than the generic liboffload?
Yes, this library generates calls to Intel specific Coprocessor offload
Interface (COI).
I think, that name of library maybe changed, and when I’ll submit the patch
We'll discuss it.

> > Additionally to that sources we going to add few headers [...]
> > and couple of new sources
> 
> For interfacing with GCC, presumably.  You haven't stated it explicitly,
> but do I assume right that this work will be going onto the
> gomp-4_0-branch, integrated with the offloading work developed there, as
> a plugin for libgomp?
Not exactly. I was talking about COI emulator, which will allow to
perform testing of offload w/o any external library dependency and HW.
Libgomp <-> liboffload plug-in is also ready, but it need no such an approval,
so it’ll be submitted as separate patch.

--
Thanks, K

> Grüße,
>  Thomas




Re: Using particular register class (like floating point registers) as spill register class

2014-05-19 Thread Andrew Haley
On 05/19/2014 01:19 PM, Ramana Radhakrishnan wrote:
> On Mon, May 19, 2014 at 1:02 PM, Andrew Haley  wrote:
>> On 05/16/2014 05:20 PM, Ian Bolton wrote:
 On 05/16/2014 12:05 PM, Kugan wrote:
>
>
> On 16/05/14 20:40, pins...@gmail.com wrote:
>>
>>
>>> On May 16, 2014, at 3:23 AM, Kugan
  wrote:
>>>
>>> I would like to know if there is anyway we can use registers from
>>> particular register class just as spill registers (in places where
>>> register allocator would normally spill to stack and nothing more),
 when
>>> it can be useful.
>>>
>>> In AArch64, in some cases, compiling with -mgeneral-regs-only
 produces
>>> better performance compared not using it. The difference here is
 that
>>> when -mgeneral-regs-only is not used, floating point register are
 also
>>> used in register allocation. Then IRA/LRA has to move them to core
>>> registers before performing operations as shown below.
>>
>> Can you show the code with fp register disabled?  Does it use the
 stack to spill?  Normally this is due to register to register class
 costs compared to register to memory move cost.  Also I think it
 depends on the processor rather the target.  For thunder, using the fp
 registers might actually be better than using the stack depending if
 the stack was in L1.
> Not all the LDR/STR combination match to fmov. In the testcase I
 have,
>
> aarch64-none-linux-gnu-gcc sha_dgst.c -O2  -S  -mgeneral-regs-only
> grep -c "ldr" sha_dgst.s
> 50
> grep -c "str" sha_dgst.s
> 42
> grep -c "fmov" sha_dgst.s
> 0
>
> aarch64-none-linux-gnu-gcc sha_dgst.c -O2  -S
> grep -c "ldr" sha_dgst.s
> 42
> grep -c "str" sha_dgst.s
> 31
> grep -c "fmov" sha_dgst.s
> 105
>
> I  am not saying that we shouldn't use floating point register here.
 But
> from the above, it seems like register allocator is using it as more
> like core register (even though the cost mode has higher cost) and
 then
> moving the values to core registers before operations. if that is the
> case, my question is, how do we just make this as spill register
 class
> so that we will replace ldr/str with equal number of fmov when it is
> possible.

 I'm also seeing stuff like this:

 => 0x7fb72a0928 >>> Thread*)+2500>:
 add  x21, x4, x21, lsl #3
 => 0x7fb72a092c >>> Thread*)+2504>:
 fmov w2, s8
 => 0x7fb72a0930 >>> Thread*)+2508>:
 str  w2, [x21,#88]

 I guess GCC doesn't know how to store an SImode value in an FP register
 into
 memory?  This is  4.8.1.

>>>
>>> Please can you try that on trunk and report back.
>>
>> OK, this is trunk, and I'm not longer seeing that happen.
>>
>> However, I am seeing:
>>
>>0x007fb76dc82c <+160>:   adrpx25, 0x7fb7c8
>>0x007fb76dc830 <+164>:   add x25, x25, #0x480
>>0x007fb76dc834 <+168>:   fmovd8, x0
>>0x007fb76dc838 <+172>:   add x0, x29, #0x160
>>0x007fb76dc83c <+176>:   fmovd9, x0
>>0x007fb76dc840 <+180>:   add x0, x29, #0xd8
>>0x007fb76dc844 <+184>:   fmovd10, x0
>>0x007fb76dc848 <+188>:   add x0, x29, #0xf8
>>0x007fb76dc84c <+192>:   fmovd11, x0
>>
>> followed later by:
>>
>>0x007fb76dd224 <+2712>:  fmovx0, d9
>>0x007fb76dd228 <+2716>:  add x6, x29, #0x118
>>0x007fb76dd22c <+2720>:  str x20, [x0,w27,sxtw #3]
>>0x007fb76dd230 <+2724>:  fmovx0, d10
>>0x007fb76dd234 <+2728>:  str w28, [x0,w27,sxtw #2]
>>0x007fb76dd238 <+2732>:  fmovx0, d11
>>0x007fb76dd23c <+2736>:  str w19, [x0,w27,sxtw #2]
>>
>> which seems a bit suboptimal, given that these double registers now have
>> to be saved in the prologue.
> 
> That looks a bit suspicious - Is there a pre-processed file you can
> put on to bugzilla for someone to take a look at with command line
> options et al ?

I'll try, but I'm using precompiled headers so it's a bit tricky.  I'll let
you know.

Andrew.




adding support for vxworks os variants

2014-05-19 Thread Olivier Hainque
Hello,

Here is a quick description of changes we would like to contribute to the
VxWorks ports, with a preliminary query to maintainers on what would be the
most appropriate form for such changes to be deemed acceptable:


On a few CPU families, variants of the VxWorks OS are available.

Typically, there is the base VxWorks 6 or AE (653) kernel & environment,
then also:

- a simulator (VxSim) on some targets,
- a "CERT" variant of the OS to address requirements specific to 
  safety certification standards
- a "MILS" variant of the OS to address requirements specific to
  security standards
- an "SMP" variant of the OS for multiprocessor systems.

We (AdaCore) have been maintaining toolchains for a few of these variants
over the years, with integrated facilities allowing easier uses of the toolchain
directly from the command line.

For mils, the set of changes is significant enough to warrant a specific
triplet. I'll be posting the patches soon.

For the other variants, the need for separate triplets is less clear. Indeed,
what the changes do is essentially to control link time behavior, typically:

- for VXSIM or SMP, the crt files and libraries we need to link with are located
  in a different directory

- for CERT, the system entry points available to the application are all
  in a big object and we're not supposed to link in anything else by default

The WindRiver environment drives everything through a GUI and Makefiles. E.g.
for CERT, this explicitly links with -nostdlib to remove all the defaults, then
add what is really needed/allowed.

Working directly from the command line is often useful, and doing the correct
thing (getting rid of inappropriate defaults, figuring out the correct of -Ls,
...) is cumbersome.

For vxsim or smp, having entirely separate toolchains with different triplets
for so minor differences seemed overkill and impractical for users, so we have
added "-vxsim" and "-vxsmp" command line options to our toolchains to help.

We have done the same for the cert variants, with a "-vxcert" command line
option, but wonder if a separate triplet wouldn't actually be better in this
case.

One small concern is that the system toolchains don't know about the new
options, and we think that it might be of interest to minimize the interface
differences.

Thoughts ?

Thanks in advance for your feedback,

With Kind Regards,

Olivier





Re: adding support for vxworks os variants

2014-05-19 Thread Olivier Hainque

On May 19, 2014, at 15:41 , Olivier Hainque  wrote:
> For vxsim or smp, having entirely separate toolchains with different triplets
> for so minor differences seemed overkill and impractical for users, so we have
> added "-vxsim" and "-vxsmp" command line options to our toolchains to help.
> 
> We have done the same for the cert variants, with a "-vxcert" command line
> option, but wonder if a separate triplet wouldn't actually be better in this
> case.
> 
> One small concern is that the system toolchains don't know about the new
> options, and we think that it might be of interest to minimize the interface
> differences.
> 
> Thoughts ?

One point I forgot to mention: we have considered the use of external spec
files as an alternative strategy. We have started experimenting with it and
don't yet have a lot of feedback on this scheme.

Your opinion on this alternate option (how much more viable/flexible/acceptable
it would likely be) would be most appreciated.

I'm of course happy to provide extra details on what we have been doing if
needed.

Olivier



Re: [GSoC] writing test-case

2014-05-19 Thread Michael Matz
Hi,

On Thu, 15 May 2014, Richard Biener wrote:

> To me predicate (and capture without expression or predicate)
> differs from expression in that predicate is clearly a leaf of the
> expression tree while we have to recurse into expression operands.
> 
> Now, if we want to support applying predicates to the midst of an
> expression, like
> 
> (plus predicate(minus @0 @1)
> @2)
> (...)
> 
> then this would no longer be true.  At the moment you'd write
> 
> (plus (minus@3 @0 @1)
> @2)
>   if (predicate (@3))
> (...)
> 
> which makes it clearer IMHO (with the decision tree building
> you'd apply the predicates after matching the expression tree
> anyway I suppose, so code generation would be equivalent).

Syntaxwise I had this idea for adding generic predicates to expressions:

(plus (minus @0 @1):predicate
  @2)
(...)

If prefix or suffix doesn't matter much, but using a different syntax
to separate expression from predicate seems to make things clearer.  
Optionally adding things like and/or for predicates might also make sense:

(plus (minus @0 @1):positive_p(@0) || positive_p(@1)
  @2)
(...)


Ciao,
Michael.


[GSoC] first phase

2014-05-19 Thread Prathamesh Kulkarni
Hi,
   Unfortunately I shall need to take this week off, due to university exams,
which are up-to 27th May. I will start working from 28th on pattern
matching with decision tree, and try to cover up for the first week. I
am extremely sorry about this.
I thought I would be able to do both during exam week, but the exam
load has become too much -:(

In the first phase (up-to 23rd June), I hope to get genmatch ready:
a) pattern matching with decision tree.
b) Add patterns to test genmatch.
c) Depending upon the patterns, extending the meta-description
d) Other fixes:

* capturing outermost expressions.
For example this pattern does not get simplified
(match_and_simplify
  (plus@2 (negate @0) @1)
  if (!TYPE_SATURATING (TREE_TYPE (@2)))
  (minus @1 @0))
I guess this happens because in write_nary_simplifiers:
  if (s->match->type != OP_EXPR)
continue;
Maybe this is not correct way to fix this, should we also pass lhs to
generated gimple_match_and_simplify ? I guess that would be the capture
for outermost expression.
For above pattern, I guess @2 represents lhs.

So for this test-case:
int foo (int x, int y)
{
  int t1 = -x;
  int t2 = t1 + y;
  return t2;
}
t2 would be @2, t1 would be @0 and y would be @1.
Is that correct ?
This would create issues when lhs is NULL, for example,
in call to built-in functions ?

* avoid using statement expressions for code gen of expression
* rewriting code-generator using visitor classes, and other refactoring
(using std::string for example), etc.

I have a very rough time-line in mind, for completing tasks:
28th may - 31st may
a) Have test-case for each pattern present (except COND_EXPR) in match.pd
I guess most of it is already done, a few patterns are remaining.
b) Small fixes (for example, those mentioned above).
c) Have an initial idea/prototype for implementing decision tree

1st June - 15th June
a) Implementing decision tree
b) Adding patterns in match.pd to test the decision tree in match.pd,
and accompanying test-cases in tree-ssa/match-*.c

16th June - 23rd June
a) Support for GENERIC code generation.
b) Refactoring and backup time for backlog.

GENERIC code generation:
I am a bit confused about this. Currently, pattern matching is
implemented for GENERIC. However I believe simplification is done on
GIMPLE.
For example:
(match_and_simplify
  (plus (negate @0) @1)
  (minus @0 @1))
If given input is GENERIC , it would do matching on GENERIC, but shall
transform (minus @0 @1) to it's GIMPLE equivalent.
Is that correct ?

* Should we have a separate GENERIC match-and-simplify API like for gimple
instead of having GENERIC matching in gimple_match_and_simplify ?

* Do we add another pattern type, something like
generic_match_and_simplify that will do the transform on GENERIC
for example:
(generic_match_and_simplify
  (plus (negate @0) @1)
  (minus @0 @1))
would produce GENERIC equivalent of (minus @0 @1).

or maybe keep match_and_simplify, and tell the transform operand
to produce GENERIC.
Something like:
(match_and_simplify
  (plus (negate @0) @1)
  GENERIC: (minus @0 @1))

Another thing I would like to do in first phase is figure out dependencies
of tree-ssa-forwprop on GENERIC folding (for instance fold_comparison patterns).

Thanks and Regards,
Prathamesh


Re: we are starting the wide int merge

2014-05-19 Thread Richard Sandiford
Richard Sandiford  writes:
> Gerald Pfeifer  writes:
>> On Sat, 17 May 2014, Richard Sandiford wrote:
>>> To rule out one possibility: which GCC are you using for stage1?
>>
>> I think that may the smoking gun.  When I use GCC 4.7 to bootstrap,
>> FreeBSD 8, 9 and 10 all build fine on i386 (= i486) and amd64.
>>
>> When I use the system compiler, which is GCC 4.2 on FreeBSD 8 and 9
>> and clang on FreeBSD 10, things fail on FreeBSD 10...
>>
>> ...with a bootstrap comparison failure of stages 2 and 3 on i386:
>> https://redports.org/~gerald/20140518230801-31619-208277/gcc410-4.10.0.s20140518.log
>
> Do you get exactly the same comparison failures using clang and GCC 4.2
> as the stage1 compiler?  That would rule out the system compiler
> miscompiling stage1.

I couldn't reproduce this with GCC 4.2 but I could with clang.
The problem is that the C++ frontend's template instantation code has
several instances of foo (..., bar (...), bar (...), ...), where bar (...)
can create new decls.  The numbering of the decls can then depend on which
order the compiler chooses to evaluate the function arguments.  This later
causes code differences if the decl uids are used as tie-breakers to get
a stable sort.

I was just unlucky that this happened to trigger for the new wi:: code. :-)

I'm testing a patch now.  It might need more than one iteration, but hopefully
I'll have something to submit tomorrow.

Thanks,
Richard


dynamic_cast of a reference and -fno-exceptions

2014-05-19 Thread Daniel Gutson
Hi,

  should gcc warn at least if a dynamic_cast of a reference is used when
-fno-exceptions is specified?

At least 4.8.2 doesn't complain.

If so, I can implement the fix.

Example:

struct Base
{
virtual void f(){}
};

struct Der : Base {};

int main()
{
Der d;
Base& b = d;
dynamic_cast(b);
}


  Daniel.

-- 

Daniel F. Gutson
Chief Engineering Officer, SPD


San Lorenzo 47, 3rd Floor, Office 5

Córdoba, Argentina


Phone: +54 351 4217888 / +54 351 4218211

Skype: dgutson



Re: negative latencies

2014-05-19 Thread shmeel gutl

On 19-May-14 01:02 PM, Ajit Kumar Agarwal wrote:

Is it the case of code speculation where the negative latencies are used?
No. It is an exposed pipeline where instructions read registers during 
the required cycle. So if one instruction produces its results in the 
third pipeline stage and a second instruction reads the register in the 
sixth pipeline stage. The second instruction can read the results of the 
first instruction even if it is issued three cycles earlier.


Thanks & Regards
Ajit
-Original Message-
From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of shmeel 
gutl
Sent: Monday, May 19, 2014 12:23 PM
To: Andrew Pinski
Cc: gcc@gcc.gnu.org; Vladimir Makarov
Subject: Re: negative latencies

On 19-May-14 09:39 AM, Andrew Pinski wrote:

On Sun, May 18, 2014 at 11:13 PM, shmeel gutl
 wrote:

Are there hooks in gcc to deal with negative latencies? In other
words, an architecture that permits an instruction to use a result
from an instruction that will be issued later.

Do you mean bypasses?  If so there is a bypass feature which you can use:
https://gcc.gnu.org/onlinedocs/gccint/Processor-pipeline-description.h
tml#index-data-bypass-3773

Thanks,
Andrew Pinski

Unfortunately, bypasses in the pipeline description is not enough.
They only allow you to calculate the latency of true dependencies. They are 
also forced to be zero or greater. The real question is how the scheduler and 
register allocator can deal with negative latencies.

Thanks
Shmeel

At first glance it seems that it will will break a few things.
1) The definition of dependencies cannot come from the simple
ordering of rtl.
2) The scheduling problem starts to look like "get off the train 3
stops before me".
3) The definition of live ranges needs to use actual instruction
timing information, not just instruction sequencing.

The hooks in the scheduler seem to be enough to stop damage but not
enough to take advantage of this "feature".

Thanks


-
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2014.0.4577 / Virus Database: 3950/7515 - Release Date:
05/18/14




-
No virus found in this message.
Checked by AVG - www.avg.com
Version: 2014.0.4577 / Virus Database: 3950/7517 - Release Date: 05/18/14





Zero/Sign extension elimination using value ranges

2014-05-19 Thread Kugan

This is based on my earlier patch
https://gcc.gnu.org/ml/gcc-patches/2013-10/msg00452.html. Before I post
the new set of patches, I would like to make sure that I understood
review comments and my idea makes sense and acceptable. Please let me
know If I am missing anything or my assumptions are wrong.

To recap the basic idea, when GIMPLE_ASSIGN stmts are expanded to RTL,
if we can prove that zero/sign extension to fit the type is redundant,
we can generate RTL without it. For example, when an expression is
evaluated and it's value is assigned to variable of type short, the
generated RTL currently look similar to (set (reg:SI 110)
(zero_extend:SI (subreg:HI (reg:SI 117) 0))). Using value ranges, if we
can show that the value of the expression which is present in register
117 is within the limits of short and there is no sign conversion, we do
not need to perform zero_extend.

Cases to handle here are :

1.  Handling NOP_EXPR or CONVERT_EXPR that are in the IL because they
are required for type correctness. We have two cases here:

A) Mode is smaller than word_mode. This is usually from where the
zero/sign extensions are showing up in final assembly.
For example :
int = (int) short
which usually expands to
 (set (reg:SI )
  (sext:SI (subreg:HI (reg:SI 
We can expand  this
 (set (reg:SI ) (((reg:SI 

If following is true:
1. Value stored in RHS and LHS are of the same signedness
2. Type can hold the value. i.e., In cases like char = (char) short, we
check that the value in short is representable char type. (i.e. look at
the value range in RHS SSA_NAME and see if that can be represented in
types of LHS without overflowing)

Subreg here is not a paradoxical subreg. We are removing the subreg and
zero/sign extend here.

I am assuming here that QI/HI registers are represented in SImode
(basically word_mode) with zero/sign extend is used as in
(zero_extend:SI (subreg:HI (reg:SI 117)).

B) Mode is larger than word_mode
 long = (long) int

which usually expands to
   (set:DI (sext:DI (reg:SI)))
We have to expand this as paradoxical subreg
   (set:DI (subreg:DI (reg:SI)))

I am not sure that these cases results in actual zero/sign extensions
being generated. Therefore I think we should skip this case altogether.


2. Second are promotions required by the target (PROMOTE_MODE) that do
arithmetic on wider registers like:

char = char  + char

In this case we will have the value ranges of RHS char1 and char2. We
will have to compute the value range of (char1 + char2) in promoted mode
(from the values range stored in char1 SSANAME and char2 SSA_NAME) and
see if that value range can be  represented in LHS type.

Once again, if following is true, we can remove the subreg and zero/sign
extension in assignment:
1. Value stored in RHS and LHS are of the same signedness
2. Type can hold the value.

And also, when LHS  is promoted and thus the target is (subreg:XX N),
RHS has been expanded in XXmode. Dependent on the value-range and mode
XX which is bigger than word mode, set this to a paradoxical subreg of
the expanded result. However, since we are only interested in XXmode
lesser than word_mode (that is where most of the final zero/sign
extension asm are coming from), we don’t have to consider paradoxical
subreg here.

Does this make sense?

Thanks,

Kugan


Re: Zero/Sign extension elimination using value ranges

2014-05-19 Thread Jakub Jelinek
On Tue, May 20, 2014 at 12:27:31PM +1000, Kugan wrote:
> 1.  Handling NOP_EXPR or CONVERT_EXPR that are in the IL because they
> are required for type correctness. We have two cases here:
> 
> A) Mode is smaller than word_mode. This is usually from where the
> zero/sign extensions are showing up in final assembly.
> For example :
> int = (int) short
> which usually expands to
>  (set (reg:SI )
>   (sext:SI (subreg:HI (reg:SI 
> We can expand  this
>  (set (reg:SI ) (((reg:SI 
> 
> If following is true:
> 1. Value stored in RHS and LHS are of the same signedness
> 2. Type can hold the value. i.e., In cases like char = (char) short, we
> check that the value in short is representable char type. (i.e. look at
> the value range in RHS SSA_NAME and see if that can be represented in
> types of LHS without overflowing)
> 
> Subreg here is not a paradoxical subreg. We are removing the subreg and
> zero/sign extend here.
> 
> I am assuming here that QI/HI registers are represented in SImode
> (basically word_mode) with zero/sign extend is used as in
> (zero_extend:SI (subreg:HI (reg:SI 117)).

Wouldn't it be better to just set proper flags on the SUBREG based on value
range info (SUBREG_PROMOTED_VAR_P and SUBREG_PROMOTED_UNSIGNED_P)?
Then not only the optimizers could eliminate in zext/sext when possible, but
all other optimizations could benefit from that.

Jakub