Re: [GSoC] How to get started with the isl code generation

2014-05-25 Thread Roman Gareev
Hi Tobias,

I tried to incorporate all your comments in the following patch. It
also contains traversing of ISL AST and its dump to a file. You can
find out more about this at the following link
http://romangareev.blogspot.ru/2014/05/gsoc-report-i.html

--

Cheers, Roman Gareev


patch
Description: Binary data


RE: Reducing Register Pressure through Live range Shrinking through Loops!!

2014-05-25 Thread Ajit Kumar Agarwal

On  Friday, May 23, 2014 1:46 AM Vladimir Makarov wrote:
On 05/21/2014 12:25 AM, Ajit Kumar Agarwal wrote:
> Hello All:
>
> Simpson does the Live range shrinking and reduction of register 
> pressure by using the computation that are not load and store but the 
> arithmetic computation. The computation where the operands and registers are 
> live at the entry and exit of the basic block but not touched inside the 
> block then the computation is moved at the end of the block the reducing the 
> register pressure inside the block by one.  Extension of the Simpson work by 
> extending the computation not being touched inside the basic block to the 
> spanning of the Loops. If the Live ranges spans the Loops and live at the 
> entry and exit of the Loop but the computation is not being touched inside 
> the Loops then the computation is moved after the exit of the Loop.
>
> REDUCTION OF REGISTER PRESSURE THROUGH LIVE RANGE SHRINKING INSIDE THE 
> LOOPS
>
> for each Loop starting from inner to outer do the following begin
> RELIEFIN(i) = null if i is the entry of the cfg.
> Else
> For all predecessors j RELIEFOUT(j)
> RELIEFOUT(i) = RELIEFIN(i) exposed union relief
> INSERT(I,j) = RELIEFOUT(i) RELIEFIN(i) Intersection
> Live(i)
> end
>
> The Simpson approach does takes the nesting depth into consideration 
> of placing the computation and the relieve of the register pressure. Simpson 
> approach doesn't takes into consideration the computation which spans 
> throughout the loop and the operands and results are live at the entry of the 
> Loop and exit of the Loop but not touched inside the Loops can be useful in 
> reduction of register pressure inside the Loops. This  approach will be 
> useful in Region Based Register Allocator for Live Range Splitting at the 
> Region Boundaries.
>
> Extension of the Simpson approach is to consider the data flow 
> analysis with respect to the given Loop rather than having it for 
> entire control flow graph. This data flow analysis starts from the 
> inner loop and extends it to the outer loop. If the reference is not 
> through the nested depth or with some depth then the computation can 
> be placed accordingly. For register allocator by Graph coloring the 
> live ranges that are with respect to operands and results of the 
> computation are taken into consideration and for the above approach 
> put into the stack during simplification phase of Graph Coloring so 
> that there is a chance of getting such Live ranges colorable and thus 
> reduces the register pressure. This is extended to splitting approach 
> based on containment of Live ranges
>
> OPTIMAL PLACEMENT OF THE COMPUTATION FOR SINGLE ENTRY AND MULTIPLE 
> EXIT LOOPS
>
> The placement of the computation to reduce the register pressure for 
> Single Entry and Multiple exit by Simpson approach lead to unoptimal 
> solution. The unoptimal Solution is because of the exit node of the loop does 
> not post dominates all the basic block inside the Loops. Due to this the 
> placement of the computation just after the tail block of the Loop will lead 
> to incorrect results. In order to perform the Optimal Solution of the 
> placement of the computation, the computation needs to be placed the block 
> just after all the exit points of the Loop reconverge and which will post 
> dominates all the blocks of the Loops. This will take care of reducing the 
> register pressure for the Loops that are single Entry and Multiple Exit. For 
> irreducible Loops the optimization to convert to reducible is done before the 
> register allocation that reduces the register pressure and will be applicable 
> to structured control flow and thus reduces the register pressure.
>
> The Live range shrinkage reducing register pressure takes load and store into 
> consideration but not computation as proposed by Simpson. I am proposing to 
> extend in GCC for the computation to reduce register pressure and for the 
> Loop as given above for both Single Entry and Single Exit and Single Entry 
> and Multiple Exit Loops.
>
>
>>Thanks for sharing with this.  I have plans to work on a new register 
>>pressure relief pass too this summer (more accurately this work is in my 
>>company plans -- so I should work on this >>anyway).  It will use Simpson's 
>>approach also.  I prefer to do it as a separate pass not as a part of IRA 
>>because IRA is already complicated.  It also permits to rematerialize not 
>>only on loop  >>borders (although it is the most important points).

>>It is hard for me to say what approach will be better as there are too many 
>>transformations even after IRA (e.g. in IRA you can think that pseudos got 
>>hard registers and rematerilize >>from that but LRA may spill this pseudos 
>>and it is more risk of this for x86/x86-64 than for other architectures as 
>>x86/x86-64 uses irregular reg file and has smaller number of regs).

>>So we could implement two approaches and choose the best if you want.


>>As for optimal placement o

Re: [GSoC] How to get started with the isl code generation

2014-05-25 Thread Tobias Grosser



On 25/05/2014 13:12, Roman Gareev wrote:

Hi Tobias,

I tried to incorporate all your comments in the following patch. It
also contains traversing of ISL AST and its dump to a file. You can
find out more about this at the following link
http://romangareev.blogspot.ru/2014/05/gsoc-report-i.html


Hi Roman,

thanks for this update (and expecially the nice blog post).
Also the size of the patch is great. Small enough to be easy to review
and still adding good features.


diff --git a/gcc/common.opt b/gcc/common.opt
index 5c3f834..005042d 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -1250,6 +1250,10 @@ fgraphite-identity
 Common Report Var(flag_graphite_identity) Optimization
 Enable Graphite Identity transformation

+fgraphite-isl-code-gen
+Common Report Var(flag_graphite_isl_code_gen) Optimization
+Enable isl code generator in Graphite


Can we use something like:

-fgraphite-code-generator=[isl|cloog]

This allows us to start with cloog as default and then flip the switch
as soon as we are confident enough.


diff --git a/gcc/graphite-clast-to-gimple.c b/gcc/graphite-clast-to-gimple.c
index 9ac9b67..d16439d 100644
--- a/gcc/graphite-clast-to-gimple.c
+++ b/gcc/graphite-clast-to-gimple.c
@@ -28,6 +28,7 @@ along with GCC; see the file COPYING3.  If not see
 #include 
 #include 
 #include 
+#include 


Can we put this into a new file? clast stands for CLOOG-AST. I would
prefer to have a new file ala graphite-isl-ast-to-gimple.c or
graphite-ast-generation.c



 #include 
 #include 
 #endif
@@ -1657,6 +1658,221 @@ debug_generated_program (scop_p scop)
   print_generated_program (stderr, scop);
 }

+
+/* Integration of ISL code generator. */
+
+/* TODO: eliminate this function as soon as possible */
+
+void
+print_indent (int indent, isl_printer *prn)
+{
+  int i;
+  for (i = 0; i < indent; i++)
+{
+  prn = isl_printer_print_str (prn, " ");
+}
+}


Why is this function necessary in the first place? Are you aware of
isl_printer_set_indent() and isl_printer_set_indent_prefix()?


+static void
+translate_isl_ast (int indent, __isl_keep isl_ast_node *node,
+  isl_printer *prn);
+
+/* Translates a isl_ast_node_for STMT to gimple.
+   TODO: replace output to FILE with translation
+   to GCC representation. */
+
+static void
+translate_isl_ast_node_for (int indent, __isl_keep isl_ast_node *node_for,
+   isl_printer *prn)
+{
+  isl_id *id;
+  const char *name;
+  const char *type;
+  type = isl_options_get_ast_iterator_type (isl_printer_get_ctx (prn));
+  isl_ast_expr *for_iterator = isl_ast_node_for_get_iterator (node_for);
+  isl_ast_expr_free (for_iterator);
+  id = isl_ast_expr_get_id (for_iterator);
+  name = isl_id_get_name (id);
+  isl_id_free (id);
+  prn = isl_printer_print_str (prn, "\n");
+  print_indent (indent, prn);
+  prn = isl_printer_print_str (prn, "for (");
+  prn = isl_printer_print_str (prn, type);
+  prn = isl_printer_print_str (prn, " ");
+  prn = isl_printer_print_str (prn, name);
+  prn = isl_printer_print_str (prn, " = ");
+  isl_ast_expr *for_init = isl_ast_node_for_get_init (node_for);
+  prn = isl_printer_print_ast_expr (prn, for_init);
+  isl_ast_expr_free (for_init);
+  prn = isl_printer_print_str (prn, "; ");
+  isl_ast_expr *for_cond = isl_ast_node_for_get_cond (node_for);
+  prn = isl_printer_print_ast_expr (prn, for_cond);
+  isl_ast_expr_free (for_cond);
+  prn = isl_printer_print_str (prn, "; ");
+  prn = isl_printer_print_str (prn, name);
+  prn = isl_printer_print_str (prn, " += ");
+  isl_ast_expr *for_inc = isl_ast_node_for_get_inc (node_for);
+  prn = isl_printer_print_ast_expr (prn, for_inc);
+  isl_ast_expr_free (for_inc);
+  prn = isl_printer_print_str (prn, ")");
+  isl_ast_node *for_body = isl_ast_node_for_get_body (node_for);
+  translate_isl_ast (indent + 2, for_body, prn);
+  isl_ast_node_free (for_body);
+}
+
+/* Translates a clast guard statement STMT to gimple.
+   TODO: replace output to FILE with translation
+   to GCC representation. */
+
+static void
+translate_isl_ast_node_if (int indent, __isl_keep isl_ast_node *node_if,
+  isl_printer *prn)
+{
+  prn = isl_printer_print_str (prn, "\n");
+  print_indent (indent, prn);
+  prn = isl_printer_print_str (prn, "if (");
+  isl_ast_expr *if_cond = isl_ast_node_if_get_cond (node_if);
+  prn = isl_printer_print_ast_expr (prn, if_cond);
+  isl_ast_expr_free (if_cond);
+  prn = isl_printer_print_str (prn, ")");
+  isl_ast_node *if_then = isl_ast_node_if_get_then (node_if);
+  translate_isl_ast (indent + 2, if_then, prn);
+  isl_ast_node_free (if_then);
+  if (isl_ast_node_if_has_else (node_if))
+{
+  prn = isl_printer_print_str (prn, "else");
+  isl_ast_node *if_else = isl_ast_node_if_get_else(node_if);
+  translate_isl_ast (indent + 2, if_else, prn);
+  isl_ast_node_free (if_else);
+}
+}
+
+/* Translates a clast guard statement STMT to gimple.
+   TODO: replace output to FILE with translation
+   to GCC representati

compiler output conflict

2014-05-25 Thread John
I ran across this puzzling difference between gcc and llvm today and think the 
specification to produce consistent output for this code should be worked out.
http://stackoverflow.com/questions/15929795/llvm-and-gcc-different-output-same-code/23856132#23856132

I presented this in the Freenode #gcc chatroom too.

John T


http://www.mozilla.org  Firefox browser, Thunderbird email, Seamonkey 
all-in-one, Sunbird calendar and more. Free open-source software for Windows, 
Linux, Mac OS and other systems


gcc-4.10-20140525 is now available

2014-05-25 Thread gccadmin
Snapshot gcc-4.10-20140525 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.10-20140525/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.10 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/trunk revision 210914

You'll find:

 gcc-4.10-20140525.tar.bz2Complete GCC

  MD5=ccf9e2bcd55ba1bee825b3c509a48d7d
  SHA1=6ae8bf8f0ef9d563aede862858cca0cc3c380c67

Diffs from 4.10-20140518 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.10
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


RFA: [VAX] SUBREG of MEM with a mode dependent address

2014-05-25 Thread Matt Thomas

GCC 4.8 for VAX is generating a subreg:HI for mem:SI indexed address.  This 
eventually gets caught by an assert in change_address_1.  Since the MEM rtx is 
SI, legimate_address_p thinks it's fine.  

I have a change to vax.md which catches these but it's extremely ugly and I 
have to think there's a better way.  But I have to wonder why is gcc even 
constructing a subreg of a mem with a mode dependent address.  

(gdb) call debug_rtx(insn)
(insn 73 72 374 12 (set (reg/v:HI 0 %r0 [orig:29 iCol ] [29])
(subreg:HI (mem/c:SI (plus:SI (mult:SI (reg/v:SI 10 %r10 [orig:22 i ] 
[22])
(const_int 4 [0x4]))
(reg/v/f:SI 11 %r11 [orig:101 aiCol ] [101])) [4 MEM[base: 
_154, offset: 0B]+0 S4 A32]) 0)) sqlite3.c:92031 13 {movhi_2}
 (nil))

Since this wasn't movstricthi, this could be rewritten to avoid the subreg and 
just treat %r0 as SI as in:

(insn 73 72 374 12 (set (reg/v:SI 0 %r0 [orig:29 iCol ] [29])
(mem/c:SI (plus:SI (mult:SI (reg/v:SI 10 %r10 [orig:22 i ] [22])
(const_int 4 [0x4]))
(reg/v/f:SI 11 %r11 [orig:101 aiCol ] [101]) [4 MEM[base: 
_154, offset: 0B]+0 S4 A32]) 0)) sqlite3.c:92031 13 {movsi_2}

But even if  movhi is a define_expand, as far as I can tell there's isn't 
enough info to know whether that is possible.  At that time, how can I tell 
that operands[0] will be a hard reg or operands[1] will be subreg of a mode 
dependent memory access?

I've tried using secondary_reload and it called called with 

(subreg:HI (reg:SI 113 [ MEM[base: _154, offset: 0B] ]) 0)

but it dies in change_address_1 before invoking the code returned in sri.

I've tracked this down to reload replacing (reg:SI 113) with reg_equiv_mem 
(133) in the rtx.  However, it doesn't verify the rtx is actually valid.  I 
added a gcc_assert to trap this and got:

#1  0x0089ab87 in eliminate_regs_1 (x=0x7f7fe7b5c498, 
mem_mode=VOIDmode, insn=0x0, may_use_invariant=true, for_costs=true)
at 
/u1/netbsd-HEAD/src/tools/gcc/../../external/gpl3/gcc/dist/gcc/reload1.c:2850(gdb)
 list
2845  && reg_equivs
2846  && reg_equiv_memory_loc (REGNO (SUBREG_REG (x))) != 0)
2847{
2848  new_rtx = SUBREG_REG (x);
2849  rtx z = reg_equiv_memory_loc (REGNO (new_rtx));
2850  gcc_assert (memory_address_addr_space_p (GET_MODE (x),
2851   XEXP (z, 0),
2852   MEM_ADDR_SPACE (z)));
2853}
2854  else
(gdb) call debug_rtx(z)
(mem:SI (plus:SI (mult:SI (reg/v:SI 22 [ i ])
(const_int 4 [0x4]))
(reg/v/f:SI 101 [ aiCol ])) [4 MEM[base: _154, offset: 0B]+0 S4 A32])
(gdb) call debug_rtx(x)
(subreg:HI (reg:SI 113 [ MEM[base: _154, offset: 0B] ]) 0)

#2  0x0089cb31 in elimination_costs_in_insn (insn=0x7f7fe7b5bbd0)
at 
/u1/netbsd-HEAD/src/tools/gcc/../../external/gpl3/gcc/dist/gcc/reload1.c:3751
(gdb) call debug_rtx (insn)
(insn 73 72 374 12 (set (nil)
(subreg:HI (reg:SI 113 [ MEM[base: _154, offset: 0B] ]) 0)) 
/u1/netbsd-HEAD/src/external/public-domain/sqlite/lib/../dist/sqlite3.c:92031 
14 {movhi}
 (expr_list:REG_DEAD (reg:SI 113 [ MEM[base: _154, offset: 0B] ])
(nil)))

And now I'm stymied.  The limits of gcc-ness are now exceeded :)  I'n looking 
for ideas on how to proceed.

Thanks.

Re: negative latencies

2014-05-25 Thread shmeel gutl

On 23-May-14 01:59 PM, Bernd Schmidt wrote:

On 05/23/2014 10:07 AM, shmeel gutl wrote:


Exposed pipeline is not my problem. Negative latency is my problem. I
don't see negative latency for c6x, not in unit reservations and not in
adjust cost. Did I miss something?


You just need to model it differently. Rather than saying instruction 
A has a negative latency relative to instruction B, you need to 
describe that instruction B reads its inputs later than when it is 
actually issued. The mechanism used in the C6X backend is the 
scheduler's record_delay_slot_pair function.


The scheduler would see

B is issued (*)

A is issued, executes and writes its outputs

B reads its inputs (*)

The two insns marked as (*) would be such a delay pair. The first one 
would generate code, the second one exists only for the purposes of 
building the right scheduling dependencies.



Bernd

Okay, I think that I have the idea. But I would still need to backtrack 
if the enabling instruction is not issued on time. I would also need to 
delay the dependent instruction if I can see in advance that the 
producer cannot be issued on time. And, as Vladimir pointed out, I need 
to watch out for various passes inserting unwanted instructions. Sounds 
like a big project.




Re: negative latencies

2014-05-25 Thread shmeel gutl

On 23-May-14 05:20 PM, Vladimir Makarov wrote:

On 2014-05-23, 3:49 AM, shmeel gutl wrote:

On 21-May-14 06:30 PM, Vladimir Makarov wrote:

I am just curious what happens when you put

insn2, insn1.

and insn2 uses a result of insn1 in 6 cycles and insn1 producing the
result in 3 cycles, but there are not ready functional units (e.g.
arithmentic units) necessary for insn1 for 4 or more cycles. It is
quite not trivial to guarantee that everything will be okay in general
case if you put insn2 before insn1.


This is not a problem for this architecture. The units are fully
pipelined and the only conflicts are in the first stage, during
instruction issue. That is, the vliw must be legal. The gcc dfa handles
this case fine.



Another problem is that besides insn-scheduler there are a lot of 
optimizations which can insert some insns between the two insns after 
scheduling.  In this case the result might be not ready for insn2.


So you at least should exclude 1st insn scheduling (before RA) and 
make 2nd insn scheduling as the very last pass.


In general, a traditional approach is to do such things on assembler 
level (e.g. as for older MIPS processor without hardware interlocks).



There is also IRA. I would need to tweak the definition of live range.



Re: compiler output conflict

2014-05-25 Thread Jakub Jelinek
On Sun, May 25, 2014 at 03:55:38PM -0700, John wrote:
> I ran across this puzzling difference between gcc and llvm today and think 
> the specification to produce consistent output for this code should be worked 
> out.
> http://stackoverflow.com/questions/15929795/llvm-and-gcc-different-output-same-code/23856132#23856132

In what way you misunderstand ISO C99, 6.5.2.2/10:
"The order of evaluation of the function designator, the actual arguments, and
subexpressions within the actual arguments is unspecified, but there is a
sequence point before the actual call."

Both compilers give sensible output for the bad testcase.
Please follow on (if at all needed) to gcc-help mailing list, this has
nothing to do with GCC development.

Jakub