missing symbols in libstdc++.so.6 built from the 4.9 branch

2014-07-01 Thread Matthias Klose
on some linux architectures there are some symbols missing in libstdc++.so.6
built from the 4.9 branch.  I didn't notice before due to a packaging bug.
affected are ARM32, HPPA, SPARC.

 - ARM32 (build log [1], both soft and hard float) are missing
 __aeabi_atexit@CXXABI_ARM_1.3.3
 __aeabi_vec_*

   Can these be ignored?

 - HPPA (build log [2]), is missing all the future_base symbols and
   exception_ptr13exception symbols, current_exception and
   rethrow_exception.

 - SPARC (build log [3]) configured for sparc64-linux-gnu is missing
   symbols in the 32bit multilib build, although these are present
   in a sparc-linux-gnu build. Missing are same ones as in the HPPA
   build, long double 128 related symbols, numeric_limits, and some
   math symbols.

   Looks like more than one issue is involved, I remember that the
   math symbols were already dropped in earlier versions for other
   architectures. The build is configured -with-long-double-128.

Matthias

[1]
https://buildd.debian.org/status/fetch.php?pkg=gcc-4.9&arch=armhf&ver=4.9.0-8&stamp=1403809654
[2]
http://buildd.debian-ports.org/status/fetch.php?pkg=gcc-4.9&arch=hppa&ver=4.9.0-9&stamp=1404018503
[3]
http://buildd.debian-ports.org/status/fetch.php?pkg=gcc-4.9&arch=sparc64&ver=4.9.0-9&stamp=1404033854



Re: missing symbols in libstdc++.so.6 built from the 4.9 branch

2014-07-01 Thread Jonathan Wakely
On 1 July 2014 09:40, Matthias Klose wrote:
>  - HPPA (build log [2]), is missing all the future_base symbols and
>exception_ptr13exception symbols, current_exception and
>rethrow_exception.

This implies ATOMIC_INT_LOCK_FREE <= 1 for that target. Our future and
exception_ptr implementations rely on usable atomics.

I don't know about the other missing symbols.


Re: Question about GCC's standard dependent optimization

2014-07-01 Thread Bin.Cheng
On Mon, Jun 30, 2014 at 5:42 PM, Jeff Law  wrote:
> On 06/26/14 14:13, Jeff Law wrote:
>>
>> On 06/26/14 02:44, Bin.Cheng wrote:
>>>
>>> Hi,
>>> I ran into PR60947, in which GCC understands the return value of
>>> memset is the first argument passed in, according to standard, then
>>> does optimization like below:
>>>  movip, sp
>>>  stmfdsp!, {r4, r5, r6, r7, r8, r9, r10, fp, ip, lr, pc}
>>>  subfp, ip, #4
>>>  subsp, sp, #20
>>>  ldrr8, [r0, #112]
>>>  addr3, r8, #232
>>>  addr4, r8, #328
>>> .L1064:
>>>  movr0, r3
>>>  movr1, #255
>>>  movr2, #8
>>>  blmemset
>>>  addr3, r0, #32   >>  cmpr3, r4
>>>  bne.L1064
>>>
>>> For X insn, GCC takes advantage of standard by using the returned r0
>>> directly.
>>>
>>> My question is, is it always safe for GCC to do such optimization?  Do
>>> we have an option to disable such standard dependent optimization?
>>
>> Others have already answered this question.
>>
>> FWIW, I just locally added the capability to track equivalences between
>> the destination argument to the various mem* str* functions and their
>> return value in DOM.  It triggers, but not terribly often.  I'll be
>> looking to see if the additional equivalences actually enable any
>> optimizations before going through the full bootstrap and test.
>
> Just as a follow-up.  This turns out to be a relatively bad idea as it gets
> in the way of tail call optimizations.
>
> Probably the only place where this is going to really be useful is in the
> allocators to allow us to cheaply rematerialize values and/or tie together
> two values that normally wouldn't be seen as related to each other.

Also it restrict the inline of string operation functions at expand
time.  Once we reuse the return value then inlining need to calculate
the return value.  I don't know if it will break some targets
expand/inline now, but it surely increases cost of inlined code.

Thanks,
bin
>
> Jeff
>


Re: missing symbols in libstdc++.so.6 built from the 4.9 branch

2014-07-01 Thread Matthias Klose
Am 01.07.2014 11:32, schrieb Jonathan Wakely:
> On 1 July 2014 09:40, Matthias Klose wrote:
>>  - HPPA (build log [2]), is missing all the future_base symbols and
>>exception_ptr13exception symbols, current_exception and
>>rethrow_exception.
> 
> This implies ATOMIC_INT_LOCK_FREE <= 1 for that target. Our future and
> exception_ptr implementations rely on usable atomics.

thanks for the reminder. then the same missing symbols for sparc is a missing
--with-cpu-32=ultrasparc.

  Matthias



Re: [GSoC] generation of GCC expression trees from isl ast expressions

2014-07-01 Thread Roman Gareev
Hi Tobias,

could you please advise me how to verify the results of gimple code
generation? I've written the first draft of the generation of loops
with empty bodies and tried to verify gimple code using the
representation, which is dumped at the end of the generation of the
dump_file. If we consider the following example, we'll see that cloog
and isl code generator generate similar representation (representation
generated by isl code generator doesn't have body of the loop, as was
expected).

int
main (int n, int *a)
{
  int i;

  for (i = 0; i < 100; i++)
a[i] = i;

 return 0;
}

gcc/graphite-isl-ast-to-gimple.c

loop_0 (header = 0, latch = 1, niter = )
{
  bb_2 (preds = {bb_0 }, succs = {bb_3 })
  {
:

  }
  bb_5 (preds = {bb_3 }, succs = {bb_1 })
  {
:
# .MEM_10 = PHI <.MEM_3(D)(3)>
# VUSE <.MEM_10>
return 0;

  }
  loop_2 (header = 3, latch = 4, niter = )
  {
bb_3 (preds = {bb_2 bb_4 }, succs = {bb_4 bb_5 })
{
  :
  # graphite_IV.3_1 = PHI <0(2), graphite_IV.3_14(4)>
  graphite_IV.3_14 = graphite_IV.3_1 + 1;
  if (graphite_IV.3_1 < 99)
goto ;
  else
goto ;

}
bb_4 (preds = {bb_3 }, succs = {bb_3 })
{
  :
  goto ;

}
  }
}

graphite-clast-to-gimple.c

loop_0 (header = 0, latch = 1, niter = )
{
  bb_2 (preds = {bb_0 }, succs = {bb_3 })
  {
:

  }
  bb_5 (preds = {bb_3 }, succs = {bb_1 })
  {
:
# .MEM_18 = PHI <.MEM_11(3)>
# VUSE <.MEM_18>
return 0;

  }
  loop_2 (header = 3, latch = 4, niter = )
  {
bb_3 (preds = {bb_2 bb_4 }, succs = {bb_4 bb_5 })
{
  :
  # graphite_IV.3_1 = PHI <0(2), graphite_IV.3_14(4)>
  # .MEM_19 = PHI <.MEM_3(D)(2), .MEM_11(4)>
  _2 = (sizetype) graphite_IV.3_1;
  _15 = _2 * 4;
  _16 = a_6(D) + _15;
  _17 = (int) graphite_IV.3_1;
  # .MEM_11 = VDEF <.MEM_19>
  *_16 = _17;
  graphite_IV.3_14 = graphite_IV.3_1 + 1;
  if (graphite_IV.3_1 < 99)
goto ;
  else
goto ;

}
bb_4 (preds = {bb_3 }, succs = {bb_3 })
{
  :
  goto ;

}
  }
}

However, this form doesn't have loop guards which are generated by
graphite_create_new_loop_guard in gcc/graphite-isl-ast-to-gimple.c and
by graphite_create_new_loop_guard in graphite-clast-to-gimple.c.

Below is the code of this generation (It still uses isl_int for
generation of isl_expr_int, because the error related to isl/val_gmp.h
still arises. I've tried to use isl 0.12.2 and 0.13, but gotten the
same error).

--
   Cheers, Roman Gareev
Index: gcc/graphite-isl-ast-to-gimple.c
===
--- gcc/graphite-isl-ast-to-gimple.c(revision 212194)
+++ gcc/graphite-isl-ast-to-gimple.c(working copy)
@@ -42,16 +42,620 @@
 #include "cfgloop.h"
 #include "tree-data-ref.h"
 #include "sese.h"
+#include "tree-ssa-loop-manip.h"
+#include "tree-scalar-evolution.h"
 
 #ifdef HAVE_cloog
 #include "graphite-poly.h"
 #include "graphite-isl-ast-to-gimple.h"
+#include "graphite-htab.h"
 
 /* This flag is set when an error occurred during the translation of
ISL AST to Gimple.  */
 
 static bool graphite_regenerate_error;
 
+/* Converts a GMP constant VAL to a tree and returns it.  */
+
+static tree
+gmp_cst_to_tree (tree type, mpz_t val)
+{
+  tree t = type ? type : integer_type_node;
+  mpz_t tmp;
+
+  mpz_init (tmp);
+  mpz_set (tmp, val);
+  wide_int wi = wi::from_mpz (t, tmp, true);
+  mpz_clear (tmp);
+
+  return wide_int_to_tree (t, wi);
+}
+
+/* Verifies properties that GRAPHITE should maintain during translation.  */
+
+static inline void
+graphite_verify (void)
+{
+#ifdef ENABLE_CHECKING
+  verify_loop_structure ();
+  verify_loop_closed_ssa (true);
+#endif
+}
+
+/* Stores the INDEX in a vector and the loop nesting LEVEL for a given
+   isl_id NAME.  BOUND_ONE and BOUND_TWO represent the exact lower and
+   upper bounds that can be inferred from the polyhedral representation.  */
+
+typedef struct ast_isl_name_index {
+  int index;
+  int level;
+  const char *name;
+  /* If free_name is set, the content of name was allocated by us and needs
+ to be freed.  */
+  char *free_name;
+} *ast_isl_name_index_p;
+
+/* Helper for hashing ast_isl_name_index.  */
+
+struct ast_isl_index_hasher
+{
+  typedef ast_isl_name_index value_type;
+  typedef ast_isl_name_index compare_type;
+  static inline hashval_t hash (const value_type *);
+  static inline bool equal (const value_type *, const compare_type *);
+  static inline void remove (value_type *);
+};
+
+/* Computes a hash function for database element E.  */
+
+inline hashval_t
+ast_isl_index_hasher::hash (const value_type *e)
+{
+  hashval_t hash = 0;
+
+  int length = strlen (e->name);
+  int i;
+
+  for (i = 0; i < length; ++i)
+hash = hash | (e->name[i] << (i % 4));
+
+  return hash;
+}
+
+/* Compares database elements ELT1 and ELT2.  */
+
+inline bool
+ast_isl_index_hasher::equal (const value_type *elt1, co

Re: [GSoC] Question about unit tests

2014-07-01 Thread Roman Gareev
Thank you for the answer!

--
   Cheers, Roman Gareev


Re: [GSoC] generation of GCC expression trees from isl ast expressions

2014-07-01 Thread Tobias Grosser

On 01/07/2014 14:53, Roman Gareev wrote:

Hi Tobias,

could you please advise me how to verify the results of gimple code
generation?


More comments inline, but here something on a very high level.

I personally like testing already on the GIMPLE level and could see us
matching for certain expressions in the dumped gimple output.
Unfortunately this kind of testing may be a little fragile depending how
often gcc changes its internal dumping (hopefully not too often). On the
other side, in gcc testing is commonly done by compiling and executing
files. For this to work, we would need at least a simple implementation
of body statements before we can get anything tested and checked in.


I've written the first draft of the generation of loops
with empty bodies and tried to verify gimple code using the
representation, which is dumped at the end of the generation of the
dump_file. If we consider the following example, we'll see that cloog
and isl code generator generate similar representation (representation
generated by isl code generator doesn't have body of the loop, as was
expected).

int
main (int n, int *a)
{
   int i;

   for (i = 0; i < 100; i++)
 a[i] = i;

  return 0;
}

gcc/graphite-isl-ast-to-gimple.c

loop_0 (header = 0, latch = 1, niter = )
{
   bb_2 (preds = {bb_0 }, succs = {bb_3 })
   {
 :

   }
   bb_5 (preds = {bb_3 }, succs = {bb_1 })
   {
 :
 # .MEM_10 = PHI <.MEM_3(D)(3)>
 # VUSE <.MEM_10>
 return 0;

   }
   loop_2 (header = 3, latch = 4, niter = )
   {
 bb_3 (preds = {bb_2 bb_4 }, succs = {bb_4 bb_5 })
 {
   :
   # graphite_IV.3_1 = PHI <0(2), graphite_IV.3_14(4)>
   graphite_IV.3_14 = graphite_IV.3_1 + 1;
   if (graphite_IV.3_1 < 99)
 goto ;
   else
 goto ;

 }
 bb_4 (preds = {bb_3 }, succs = {bb_3 })
 {
   :
   goto ;

 }
   }
}

graphite-clast-to-gimple.c

loop_0 (header = 0, latch = 1, niter = )
{
   bb_2 (preds = {bb_0 }, succs = {bb_3 })
   {
 :

   }
   bb_5 (preds = {bb_3 }, succs = {bb_1 })
   {
 :
 # .MEM_18 = PHI <.MEM_11(3)>
 # VUSE <.MEM_18>
 return 0;

   }
   loop_2 (header = 3, latch = 4, niter = )
   {
 bb_3 (preds = {bb_2 bb_4 }, succs = {bb_4 bb_5 })
 {
   :
   # graphite_IV.3_1 = PHI <0(2), graphite_IV.3_14(4)>
   # .MEM_19 = PHI <.MEM_3(D)(2), .MEM_11(4)>
   _2 = (sizetype) graphite_IV.3_1;
   _15 = _2 * 4;
   _16 = a_6(D) + _15;
   _17 = (int) graphite_IV.3_1;
   # .MEM_11 = VDEF <.MEM_19>
   *_16 = _17;
   graphite_IV.3_14 = graphite_IV.3_1 + 1;
   if (graphite_IV.3_1 < 99)
 goto ;
   else
 goto ;

 }
 bb_4 (preds = {bb_3 }, succs = {bb_3 })
 {
   :
   goto ;

 }
   }
}

However, this form doesn't have loop guards which are generated by
graphite_create_new_loop_guard in gcc/graphite-isl-ast-to-gimple.c and
by graphite_create_new_loop_guard in graphite-clast-to-gimple.c.


Maybe the guards are directly constant folded? Can you try with:

 int
 main (int n, int *a)
 {
int i;

for (i = 0; i < b; i++)
  a[i] = i;

   return 0;
 }


Below is the code of this generation (It still uses isl_int for
generation of isl_expr_int, because the error related to isl/val_gmp.h
still arises. I've tried to use isl 0.12.2 and 0.13, but gotten the
same error).


Did using 'extern "C"' around the include statement not help?



+/* Stores the INDEX in a vector and the loop nesting LEVEL for a given
+   isl_id NAME.  BOUND_ONE and BOUND_TWO represent the exact lower and
+   upper bounds that can be inferred from the polyhedral representation.  */


Why do you mention BOUND_ONE & BOUND_TWO? I do not see any use of them?


+typedef struct ast_isl_name_index {
+  int index;
+  int level;
+  const char *name;
+  /* If free_name is set, the content of name was allocated by us and needs
+ to be freed.  */
+  char *free_name;
+} *ast_isl_name_index_p;
+
+/* Helper for hashing ast_isl_name_index.  */
+
+struct ast_isl_index_hasher
+{
+  typedef ast_isl_name_index value_type;
+  typedef ast_isl_name_index compare_type;
+  static inline hashval_t hash (const value_type *);
+  static inline bool equal (const value_type *, const compare_type *);
+  static inline void remove (value_type *);
+};
+
+/* Computes a hash function for database element E.  */
+
+inline hashval_t
+ast_isl_index_hasher::hash (const value_type *e)
+{
+  hashval_t hash = 0;
+
+  int length = strlen (e->name);
+  int i;
+
+  for (i = 0; i < length; ++i)
+hash = hash | (e->name[i] << (i % 4));
+
+  return hash;
+}
+
+/* Compares database elements ELT1 and ELT2.  */
+
+inline bool
+ast_isl_index_hasher::equal (const value_type *elt1, const compare_type *elt2)
+{
+  return strcmp (elt1->name, elt2->name) == 0;
+}
+
+/* Free the memory taken by a ast_isl_name_index struct.  */
+
+inline void
+ast_isl_index_hasher::remove (value_type *c)
+{
+  if (c->free_name)
+free (c->free_name);
+  free (c);
+}
+
+typed

combination of read/write and earlyclobber constraint modifier

2014-07-01 Thread Tom de Vries

Vladimir,

There are a few patterns which use both the read/write constraint modifier (+) 
and the earlyclobber constraint modifier (&):

...
$ grep -c 'match_operand.*+.*&' gcc/config/*/* | grep -v :0
gcc/config/aarch64/aarch64-simd.md:1
gcc/config/arc/arc.md:1
gcc/config/arm/ldmstm.md:30
gcc/config/rs6000/spe.md:8
...

F.i., this one in gcc/config/aarch64/aarch64-simd.md:
...
(define_insn "vec_pack_trunc_"
 [(set (match_operand: 0 "register_operand" "+&w")
   (vec_concat:
 (truncate: (match_operand:VQN 1 "register_operand" "w"))
 (truncate: (match_operand:VQN 2 "register_operand" "w"]
...

The documentation ( 
https://gcc.gnu.org/onlinedocs/gccint/Modifiers.html#Modifiers ) states:

...
'‘&’ does not obviate the need to write ‘=’.
...
which seems to state that '&' implies '='.

An earlyclobber operand is defined as 'modified before the instruction is 
finished using the input operands'. AFAIU that would indeed exclude the 
possibility that the earlyclobber operand is an input/output operand it self, 
but perhaps I misunderstand.


So my question is: is the combination of '&' and '+' supported ? If so, what is 
the exact semantics ? If not, should we warn or give an error ?


Thanks,
- Tom


Re: combination of read/write and earlyclobber constraint modifier

2014-07-01 Thread Jeff Law

On 07/01/14 13:27, Tom de Vries wrote:

Vladimir,

There are a few patterns which use both the read/write constraint
modifier (+) and the earlyclobber constraint modifier (&):
...
$ grep -c 'match_operand.*+.*&' gcc/config/*/* | grep -v :0
gcc/config/aarch64/aarch64-simd.md:1
gcc/config/arc/arc.md:1
gcc/config/arm/ldmstm.md:30
gcc/config/rs6000/spe.md:8
...

F.i., this one in gcc/config/aarch64/aarch64-simd.md:
...
(define_insn "vec_pack_trunc_"
  [(set (match_operand: 0 "register_operand" "+&w")
(vec_concat:
  (truncate: (match_operand:VQN 1 "register_operand"
"w"))
  (truncate: (match_operand:VQN 2 "register_operand"
"w"]
...

The documentation (
https://gcc.gnu.org/onlinedocs/gccint/Modifiers.html#Modifiers ) states:
...
'‘&’ does not obviate the need to write ‘=’.
...
which seems to state that '&' implies '='.

An earlyclobber operand is defined as 'modified before the instruction
is finished using the input operands'. AFAIU that would indeed exclude
the possibility that the earlyclobber operand is an input/output operand
it self, but perhaps I misunderstand.

So my question is: is the combination of '&' and '+' supported ? If so,
what is the exact semantics ? If not, should we warn or give an error ?
I don't think we can define any reasonable semantics for &+.  My 
recommendation would be for this to be considered a hard error.



Jeff


Re: combination of read/write and earlyclobber constraint modifier

2014-07-01 Thread Marc Glisse

On Tue, 1 Jul 2014, Jeff Law wrote:


On 07/01/14 13:27, Tom de Vries wrote:

Vladimir,

There are a few patterns which use both the read/write constraint
modifier (+) and the earlyclobber constraint modifier (&):
...
$ grep -c 'match_operand.*+.*&' gcc/config/*/* | grep -v :0
gcc/config/aarch64/aarch64-simd.md:1
gcc/config/arc/arc.md:1
gcc/config/arm/ldmstm.md:30
gcc/config/rs6000/spe.md:8
...

F.i., this one in gcc/config/aarch64/aarch64-simd.md:
...
(define_insn "vec_pack_trunc_"
  [(set (match_operand: 0 "register_operand" "+&w")
(vec_concat:
  (truncate: (match_operand:VQN 1 "register_operand"
"w"))
  (truncate: (match_operand:VQN 2 "register_operand"
"w"]
...

The documentation (
https://gcc.gnu.org/onlinedocs/gccint/Modifiers.html#Modifiers ) states:
...
'‘&’ does not obviate the need to write ‘=’.
...
which seems to state that '&' implies '='.

An earlyclobber operand is defined as 'modified before the instruction
is finished using the input operands'. AFAIU that would indeed exclude
the possibility that the earlyclobber operand is an input/output operand
it self, but perhaps I misunderstand.

So my question is: is the combination of '&' and '+' supported ? If so,
what is the exact semantics ? If not, should we warn or give an error ?
I don't think we can define any reasonable semantics for &+.  My 
recommendation would be for this to be considered a hard error.


Uh? The doc explicitly says "An input operand can be tied to an 
earlyclobber operand" and goes on to explain why that is useful. It avoids 
using the same register for other input when they are identical.


--
Marc Glisse


Re: missing symbols in libstdc++.so.6 built from the 4.9 branch

2014-07-01 Thread John David Anglin

On 1-Jul-14, at 5:32 AM, Jonathan Wakely wrote:


On 1 July 2014 09:40, Matthias Klose wrote:

- HPPA (build log [2]), is missing all the future_base symbols and
  exception_ptr13exception symbols, current_exception and
  rethrow_exception.


This implies ATOMIC_INT_LOCK_FREE <= 1 for that target. Our future and
exception_ptr implementations rely on usable atomics.


ARM and HPPA use kernel assisted libraries for atomic support.  Not  
exactly

lock free, but possibly good enough...

Currently, c-cppbuiltin.c doesn't provide proper defines for this  
support.  We

currently define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4, etc, in
pa-linux.h.  I'll experiment with defining ATOMIC_INT_LOCK_FREE there.

Thanks,
Dave
--
John David Anglin   dave.ang...@bell.net





Re: missing symbols in libstdc++.so.6 built from the 4.9 branch

2014-07-01 Thread Jonathan Wakely
On 1 July 2014 20:58, John David Anglin wrote:
> On 1-Jul-14, at 5:32 AM, Jonathan Wakely wrote:
>
>> On 1 July 2014 09:40, Matthias Klose wrote:
>>>
>>> - HPPA (build log [2]), is missing all the future_base symbols and
>>>   exception_ptr13exception symbols, current_exception and
>>>   rethrow_exception.
>>
>>
>> This implies ATOMIC_INT_LOCK_FREE <= 1 for that target. Our future and
>> exception_ptr implementations rely on usable atomics.
>
>
> ARM and HPPA use kernel assisted libraries for atomic support.  Not exactly
> lock free, but possibly good enough...
>
> Currently, c-cppbuiltin.c doesn't provide proper defines for this support.
> We
> currently define __GCC_HAVE_SYNC_COMPARE_AND_SWAP_4, etc, in
> pa-linux.h.  I'll experiment with defining ATOMIC_INT_LOCK_FREE there.

It should already be defined, but its value is what matters for
libstdc++'s purposes.

To be honest I'm not sure if we really need the value to be greater
than one, if it's equal to one that might work. We'd need to check
though.


Re: combination of read/write and earlyclobber constraint modifier

2014-07-01 Thread Tom de Vries

On 01-07-14 21:58, Marc Glisse wrote:

So my question is: is the combination of '&' and '+' supported ? If so,
what is the exact semantics ? If not, should we warn or give an error ?

I don't think we can define any reasonable semantics for &+.  My
recommendation would be for this to be considered a hard error.


Uh? The doc explicitly says "An input operand can be tied to an earlyclobber
operand" and goes on to explain why that is useful. It avoids using the same
register for other input when they are identical.


Hi Marc,

That part of the doc refers to the mulsi3 insn for ARM as example:
...
;; Use `&' and then `0' to prevent the operands 0 and 1 being the same
(define_insn "*arm_mulsi3"
  [(set (match_operand:SI  0 "s_register_operand" "=&r,&r")
(mult:SI (match_operand:SI 2 "s_register_operand" "r,r")
 (match_operand:SI 1 "s_register_operand" "%0,r")))]
  "TARGET_32BIT && !arm_arch6"
  "mul%?\\t%0, %2, %1"
  [(set_attr "type" "mul")
   (set_attr "predicable" "yes")]
)
...

Note that there's no combination of & and + here.

AFAIU, the 'tie' established here is from input operand 1 to an earlyclobber 
output operand 0 using the '0' matching constraint.


Having said that, I don't understand the comment, AFAIU it should be: 'Use '0' 
to make sure operands 0 and 1 are the same, and use '&' to make sure operands 0 
and 2 are not the same.


Thanks,
- Tom


Re: reverse bitfield patch

2014-07-01 Thread DJ Delorie

Revisiting an old thread, as I still want to get this feature in...

https://gcc.gnu.org/ml/gcc/2012-10/msg00099.html

> >> Why do you need to change varasm.c at all?  The hunks seem to be
> >> completely separate of the attribute.
> >
> > Because static constructors have fields in the original order, not the
> > reversed order.  Otherwise code like this is miscompiled:
> 
> Err - the struct also has fields in the original order - only the bit 
> positions
> of the fields are different because of the layouting option.

The order of the field decls in the type (stor-layout.c) is not
changed, only the bit position information.  The order here *can't* be
changed, because the C language assumes that parameters, initializers,
etc are presented in the same order as the original declaration,
regardless of the target-specific layout.

When the program includes an initializer:

> > struct foo a = { 1, 2, 3 };

The order of 1, 2, and 3 need to correspond to the order of the
bitfields in 'a', so we can change neither the order of the bitfields
in 'a' nor the order of constructor fields.

However, when we stream the initializer out to the .S file, we need to
pack the bitfields in the right sequence to generate the right bit
patterns in the final output image.  The code in varasm.c exists to
make sure that the initializers for bitfields are written/packed in
the correct order, to correspond to the bitfield positions.  I.e.  the
1,2,3 initializer needs to be written to the .S file as either 0x0123
or 0x3210 depending on the bit positions.

In neither case do we change the order of the fields in the type
itself, i.e. the array/chain order.

> And you expect no other code looks at fields of a structure and its
> initializer?  It's bad to keep this not in-sync.  Thus I don't think it's
> viable to re-order fields just because bit allocation is reversed.

The fields are in sync.  The varasm.c change sorts the elements as
they're being output into the byte stream in the .S, it doesn't sort
the field definitions themselves.

> > + /* If the bitfield-order attribute has been used on this
> > +structure, the fields might not be in bit-order.  In that
> > +case, we need a separate representative for each
> > +field.  */
> > The typical use-case for this feature is memory-mapped hardware, where
> > pessimum access is preferred anyway.
> 
> I doubt that, looking at constraints for strict volatile bitfields.

The code that handles representatives requires (via an assert, IIRC)
that the bit offsets within a representative be in ascending order.
I.e. gcc ICEs if I don't bypass this.  In the case of volatile
bitfields, which would be the typical use case for a reversed
bitfield, the access mode is going to match the type size regardless,
so performance is not changed by this patch.


Re: combination of read/write and earlyclobber constraint modifier

2014-07-01 Thread Marc Glisse

On Tue, 1 Jul 2014, Tom de Vries wrote:


On 01-07-14 21:58, Marc Glisse wrote:

So my question is: is the combination of '&' and '+' supported ? If so,
what is the exact semantics ? If not, should we warn or give an error ?

I don't think we can define any reasonable semantics for &+.  My
recommendation would be for this to be considered a hard error.


Uh? The doc explicitly says "An input operand can be tied to an 
earlyclobber
operand" and goes on to explain why that is useful. It avoids using the 
same

register for other input when they are identical.


Hi Marc,

That part of the doc refers to the mulsi3 insn for ARM as example:
...
;; Use `&' and then `0' to prevent the operands 0 and 1 being the same
(define_insn "*arm_mulsi3"
 [(set (match_operand:SI  0 "s_register_operand" "=&r,&r")
   (mult:SI (match_operand:SI 2 "s_register_operand" "r,r")
(match_operand:SI 1 "s_register_operand" "%0,r")))]
 "TARGET_32BIT && !arm_arch6"
 "mul%?\\t%0, %2, %1"
 [(set_attr "type" "mul")
  (set_attr "predicable" "yes")]
)
...

Note that there's no combination of & and + here.


I think it could have used (match_dup 0) instead of operand 1, if there 
had been only the first alternative. And then the constraint would have 
been +&.


AFAIU, the 'tie' established here is from input operand 1 to an earlyclobber 
output operand 0 using the '0' matching constraint.


Having said that, I don't understand the comment, AFAIU it should be: 'Use 
'0' to make sure operands 0 and 1 are the same, and use '&' to make sure 
operands 0 and 2 are not the same.


Well, yeah, the comment doesn't seem completely in sync with the code.

In the first example you gave, looking at the pattern (no match_dup, 
setting the full register), it seems that it may have wanted "=&" instead 
of "+&".


(by the way, in the same aarch64-simd.md file, I noticed some 
define_expand with constraints, that looks strange)


--
Marc Glisse


Re: combination of read/write and earlyclobber constraint modifier

2014-07-01 Thread Tom de Vries

On 02-07-14 08:23, Marc Glisse wrote:

On Tue, 1 Jul 2014, Tom de Vries wrote:


On 01-07-14 21:58, Marc Glisse wrote:

So my question is: is the combination of '&' and '+' supported ? If so,
what is the exact semantics ? If not, should we warn or give an error ?

I don't think we can define any reasonable semantics for &+.  My
recommendation would be for this to be considered a hard error.


Uh? The doc explicitly says "An input operand can be tied to an earlyclobber
operand" and goes on to explain why that is useful. It avoids using the same
register for other input when they are identical.


Hi Marc,

That part of the doc refers to the mulsi3 insn for ARM as example:
...
;; Use `&' and then `0' to prevent the operands 0 and 1 being the same
(define_insn "*arm_mulsi3"
 [(set (match_operand:SI  0 "s_register_operand" "=&r,&r")
   (mult:SI (match_operand:SI 2 "s_register_operand" "r,r")
(match_operand:SI 1 "s_register_operand" "%0,r")))]
 "TARGET_32BIT && !arm_arch6"
 "mul%?\\t%0, %2, %1"
 [(set_attr "type" "mul")
  (set_attr "predicable" "yes")]
)
...

Note that there's no combination of & and + here.


I think it could have used (match_dup 0) instead of operand 1, if there had been
only the first alternative. And then the constraint would have been +&.



Marc,

isn't that explicitly listed as unsupported here ( 
https://gcc.gnu.org/onlinedocs/gccint/RTL-Template.html#index-match_005fdup-3244 ):

...
Note that match_dup should not be used to tell the compiler that a particular 
register is being used for two operands (example: add that adds one register to 
another; the second register is both an input operand and the output operand). 
Use a matching constraint (see Simple Constraints) for those. match_dup is for 
the cases where one operand is used in two places in the template, such as an 
instruction that computes both a quotient and a remainder, where the opcode 
takes two input operands but the RTL template has to refer to each of those 
twice; once for the quotient pattern and once for the remainder pattern.

...
?

Thanks,
- Tom