Re: Anyone used Graphite of Gentoo recently?

2014-03-31 Thread Richard Biener
On Mon, Mar 31, 2014 at 8:23 AM, Tobias Grosser  wrote:
> On 03/31/2014 06:25 AM, Vladimir Kargov wrote:
>>
>> On 27 March 2014 18:39, Mircea Namolaru  wrote:
>>>
>>> The domain is computed on basis of the information provided by
>>> number_of_latch_execution that returns the tree expression
>>>
>>> (unsigned int) maxLen_6(D) - (unsigned int) minLen_4(D)
>>>
>>> For signed integers (without the cast to unsigned int) this seems to be
>>> the
>>> correct value. As an unsigned int expression it leads to incorrect
>>> domain.
>>
>>
>> I've got a couple of questions with regards to wrapping. Pasting the
>> relevant code from graphite-sese-to-poly.c:
>>
>> static isl_pw_aff *
>> extract_affine (scop_p s, tree e, __isl_take isl_space *space)
>> {
>> ...
>>// e comes from number_of_latch_executions()
>>type = TREE_TYPE (e);
>>if (TYPE_UNSIGNED (type))
>>  res = wrap (res, TYPE_PRECISION (type));
>>
>> 1) Any idea why wrapping occurs only for unsigned expressions?
>
>
> In the C standard, unsinged operations have defined overflow semantics in
> form of wrapping. Signed operations instead have undefined behavior, which
> means we can assume that no wrapping occurs and consequently do not need to
> model it.
>
>> 2) What exactly is wrapping needed for? I can't find it in the old PPL
>> implementation (though it could have been called differently).
>
>
> it is necessary to model the defined overflow behavior of unsigned integers.
>
>> Also, no matter what the logic behind wrapping is, it does seem
>> suspicious that this code checks the signedness of the result of
>> number_of_latch_execution() (which may be arbitrary, as far as I
>> understood), and not of the loop induction variable, for example.
>
>
> It is very suspicious. It is a rather difficult topic and I doubt it is
> implemented correctly. Also, I do not think that implementing wrapping this
> way is the right approach. Instead, we should just model it to
> extract a run-time check that verifiers that no wrapping occurs and then go
> one without wrapping.
>
> Regarding this bug there are two directions to go:
>
> 1) It is necessary to understand if we need to do wrapping on the result of
> number_of_latch_executions. Meaning, we should understand the semantics of
> this function and the expression it returns.
>
> 2) There is something else happening?? From my observations it seems
> that the generated code at least for this test case should behave correctly
> and the relevant loop should be executed exactly twice. Surprisingly this is
> not the case. It would be interesting to understand what exactly prevents
> this loop from being executed. (From the clast, this is not obvious to me).

Note that there are always two sides of the "undefined overflow" thing

 1. in the input IL whether an operation may overflow or not (check
TYPE_OVERFLOW_UNDEFINED, _not_ !TYPE_UNSIGNED)
 2. in the GIMPLE IL generated from clast - all operations that satisfy
TYPE_OVERFLOW_UNDEFINED may not have different overflow behavior
than in the original untransformed program (no intermediate overflows
unless they were already present).  Usually that's not very easy to guarantee,
thus re-writing everything into unsigned arithmetic may be easiest.

Richard.

> Tobi


Re: Request for discussion: Rewrite of inline assembler docs

2014-03-31 Thread Andrew Haley
On 03/31/2014 05:44 AM, dw wrote:
> So, after looking over this discussion, I have updated the text. This 
> time no undefined terms, while still conveying all the points I had in mind:
> 
> The "memory" clobber tells the compiler that the assembly code performs 
> memory reads or writes to items other than those listed in the input and 
> output operands (for example accessing the memory pointed to by one of 
> the input parameters).  To ensure memory contains correct values, GCC 
> may need to flush specific register values to memory before executing 
> the asm. Further, the compiler will not assume that any values read from 
> memory before the @code{asm} will remain unchanged after the @code{asm}; 
> it will reload them as needed.  This effectively forms a read/write 
> memory barrier for the compiler.
> 
> Note that this clobber does not prevent the @emph{processor} from doing 
> speculative reads past the @code{asm} statement. To stop that, you need 
> processor-specific fence instructions.
> 
> Objections?

No, none.  That's fine.

Andrew.




Re: Anyone used Graphite of Gentoo recently?

2014-03-31 Thread Tobias Grosser

On 03/31/2014 10:10 AM, Richard Biener wrote:

On Mon, Mar 31, 2014 at 8:23 AM, Tobias Grosser  wrote:

On 03/31/2014 06:25 AM, Vladimir Kargov wrote:
Regarding this bug there are two directions to go:

1) It is necessary to understand if we need to do wrapping on the result of
number_of_latch_executions. Meaning, we should understand the semantics of
this function and the expression it returns.

2) There is something else happening?? From my observations it seems
that the generated code at least for this test case should behave correctly
and the relevant loop should be executed exactly twice. Surprisingly this is
not the case. It would be interesting to understand what exactly prevents
this loop from being executed. (From the clast, this is not obvious to me).


Note that there are always two sides of the "undefined overflow" thing

  1. in the input IL whether an operation may overflow or not (check
TYPE_OVERFLOW_UNDEFINED, _not_ !TYPE_UNSIGNED)


Thanks Richi. I was not aware of this, but we really should check for 
TYPE_OVERFLOW_UNDEFINED instead.


I think for now we may want to limit ourselves to 
TYPE_OVERFLOW_UNDEFINED, as the wrapping code is not very reliable and 
will cause very ugly code.



  2. in the GIMPLE IL generated from clast - all operations that satisfy
TYPE_OVERFLOW_UNDEFINED may not have different overflow behavior
than in the original untransformed program (no intermediate overflows
unless they were already present).  Usually that's not very easy to guarantee,
thus re-writing everything into unsigned arithmetic may be easiest.


The code we generate should not have any overflows. In case there are 
earlier defined overflows we should at best bail out. This is the safest 
approach. Anything else requires some more investigations.


Cheers,
Tobias



VREGS fails to handle subreg of mem

2014-03-31 Thread Claudiu Zissulescu
Hi,

In our ARC port, we found the following situation after expand:

(insn 23 22 24 5 (set (reg:SI 176)
(subreg:SI (mem/c:DI (plus:SI (reg/f:SI 147 virtual-stack-vars)
(const_int -268 [0xfef4])) [3 
tmpoutst.st_size+0 S8 A32]) 4)) t02.c:64 -1
 (nil))

The virtual-stack-vars should be handled by GCC's VREGS step, in 
instantiate_virtual_regs_in_insn(). However, this is not happening as the 
subroutine is not designed to handle subregs of a mem. As a consequence, 
virtual-stack-vars is not eliminated, and the compilation fails later on. To 
solve this issue, I am proposing the attached patch on vregs, that implements 
handling of such situation by instantiate_virtual_regs_in_insn().

Can you please let me know if this is an acceptable solution for the given 
issue?

//Claudiu


function.c.patch
Description: function.c.patch


[Question, IRA] Different IRA behaviour with the same RTL input but trivially different CFG

2014-03-31 Thread Felix Yang
Hi Vladmir,

I think that IRA should give the same result with the same RTL
input. But I find that this is not always true.
I test IRA with two inputs, say X and Y. The RTL insns are the
same (ignore the UIDs). And the only difference between the two is the
CFG.
There are two blocks in X which is merged into one block in Y.
Only one edge exists between the two blocks in X.
After IRA processing, I find the move insns emitted by ira-emit.c
is different. And this brings performance issues with the target code.
What are the possible reasons for this? Since IRA is somewhat
complex, can you give me some suggestions please?
Many thanks.

Cheers,
Felix


Re: VREGS fails to handle subreg of mem

2014-03-31 Thread Eric Botcazou
> In our ARC port, we found the following situation after expand:
> 
> (insn 23 22 24 5 (set (reg:SI 176)
> (subreg:SI (mem/c:DI (plus:SI (reg/f:SI 147 virtual-stack-vars)
> (const_int -268 [0xfef4])) [3
> tmpoutst.st_size+0 S8 A32]) 4)) t02.c:64 -1 (nil))
> 
> The virtual-stack-vars should be handled by GCC's VREGS step, in
> instantiate_virtual_regs_in_insn(). However, this is not happening as the
> subroutine is not designed to handle subregs of a mem. As a consequence,
> virtual-stack-vars is not eliminated, and the compilation fails later on.
> To solve this issue, I am proposing the attached patch on vregs, that
> implements handling of such situation by
> instantiate_virtual_regs_in_insn().
> 
> Can you please let me know if this is an acceptable solution for the given
> issue?

Very likely not, there should be no SUBREGs of MEMs after expand.

-- 
Eric Botcazou


Re: RL78 sim?

2014-03-31 Thread DJ Delorie

> So far I've been testing with hardware but I'm pretty sure I read 
> somewhere about an RL78 simulator, which would be a useful addition. 
> Does this simulator exist, and if so, how do I run the tests against it?

The simulator is part of the GDB build.

> I tried 'make -k check RUNTESTFLAGS="--target_board=rl78-sim"' but in 
> amongst the errors I see 'ERROR: couldn't load description file for 
> rl78-sim', either it has a different name or I'm missing something on my 
> system (and a quick search didn't seem to find anything but I don't 
> really know what I'm looking for).

You'll need something like this in your local ${DEJAGNU} file:

{ "rl78*-*" } {
set boards_dir "/home/dj/dejagnu/baseboards"
set target_list { rl78-sim }
}

Here's my rl78-sim.exp for dejagnu (it goes in whatever directory you
specified above):

# This is a list of toolchains that are supported on this board.
set_board_info target_install {rl78-elf}

# Load the generic configuration for this board. This will define a basic set
# of routines needed by the tool to communicate with the board.
load_generic_config "sim"

# basic-sim.exp is a basic description for the standard Cygnus simulator.
load_base_board_description "basic-sim"

# "rl78" is the name of the sim subdir.
setup_sim rl78

# No multilib options needed by default.
process_multilib_options ""

# We only support newlib on this target. We assume that all multilib
# options have been specified before we get here.

set_board_info compiler  "[find_gcc]"
set_board_info cflags"[libgloss_include_flags] [newlib_include_flags] -msim"
set_board_info ldflags   "[libgloss_link_flags] [newlib_link_flags]"

# Doesn't pass arguments or signals, can't return results, and doesn't
# do inferiorio.
set_board_info noargs 1
set_board_info gdb,nosignals 1
set_board_info gdb,noresults 1
set_board_info gdb,noinferiorio 1

# Limit the stack size to something real tiny.
set_board_info gcc,stack_size 4096

set_board_info gcc,timeout 300