Fwd: Problem in using BDD libraries with GCC Plugin

2015-05-19 Thread Nishant Sahni
I am facing an issue compiling a gcc plugin. I am using the cudd
libraries (binary decision diagram libraries), Ubuntu version 14.04
and gcc version is 4.7.2. While running the Makefile, the compiler
fails to recognize the Cudd & BDD data structures whereas all the
necessary header files have been included and the libraries have been
linked. The error messages "Cudd was not declared in this scope" &
"BDD was not declared in this
scope" are displayed. Please find below the contents of the Makefile.



#-- MAKE CHANGES TO BASE_DIR : Please put the path to base
directory of your pristine gcc-4.7.2 build ---#
BASE_DIR = /home/nishant/GCC_BUILDS/gcc_4.7

INSTALL = $(BASE_DIR)/install
CC = $(INSTALL)/bin/g++
NEW_PATH = $(BASE_DIR)/gcc-4.7.2/gcc

# include files and library
PDIR= /home/nishant/Code/cudd-2.5.0
INCLUDE1 = $(PDIR)/include

# for c++
LIBS= $(PDIR)/obj/libobj.a $(PDIR)/cudd/libcudd.a $(PDIR)/mtr/libmtr.a \
  $(PDIR)/st/libst.a $(PDIR)/util/libutil.a $(PDIR)/epd/libepd.a


#- MAKE CHANGES TO OBJS : Add the name of your test file with
extension .o (say test as test.o) #
#--- Multiple dependent files maybe also
be added --#
#OBJS = test1.o
#OBJS = test2.o
#OBJS = test3.o
#OBJS = test4.o
OBJS = test.o

GCCPLUGINS_DIR:= $(shell $(CC) -print-file-name=plugin)
INCLUDE= -I$(GCCPLUGINS_DIR)/include -I$(NEW_PATH)

FLAGS= -fPIC -O0 -flto -flto-partition=none

%.o : %.c
$(CC) $(FLAGS) $(INCLUDE) -I$(INCLUDE1) -L$(LIBS) -c $<

%.o : %.cpp
$(CC) $(FLAGS) $(INCLUDE) -I$(INCLUDE1) -L$(LIBS) -c $<

plugin.so: plugin.o
$(CC) $(INCLUDE) $(FLAGS) -I$(INCLUDE1) -L$(LIBS) -shared $^ -o $@

run:  $(OBJS) plugin.so
$(CC) -o result -flto -flto-partition=none -fplugin=./plugin.so
$(OBJS) -O3 -fdump-ipa-all

clean:
\rm -f plugin.so *~ *.o a.out result* *.cpp.*


Fwd: Problem in using BDD libraries with GCC Plugin

2015-05-19 Thread Nishant Sahni
I am facing an issue compiling a gcc plugin. I am using the cudd
libraries (binary decision diagram libraries) and my gcc version is
4.7.2. While running the Makefile, the compiler fails to recognize the
Cudd & BDD data structures whereas all the necessary header files have
been included and the libraries have been linked. The error messages
"Cudd was not declared in this scope" & "BDD was not declared in this
scope" are displayed. Please find below the contents of the Makefile.



#-- MAKE CHANGES TO BASE_DIR : Please put the path to base
directory of your pristine gcc-4.7.2 build ---#
BASE_DIR = /home/nishant/GCC_BUILDS/gcc_4.7

INSTALL = $(BASE_DIR)/install
CC = $(INSTALL)/bin/g++
NEW_PATH = $(BASE_DIR)/gcc-4.7.2/gcc

# include files and library
PDIR= /home/nishant/Code/cudd-2.5.0
INCLUDE1 = $(PDIR)/include

# for c++
LIBS= $(PDIR)/obj/libobj.a $(PDIR)/cudd/libcudd.a $(PDIR)/mtr/libmtr.a \
  $(PDIR)/st/libst.a $(PDIR)/util/libutil.a $(PDIR)/epd/libepd.a


#- MAKE CHANGES TO OBJS : Add the name of your test file with
extension .o (say test as test.o) #
#--- Multiple dependent files maybe also
be added --#
#OBJS = test1.o
#OBJS = test2.o
#OBJS = test3.o
#OBJS = test4.o
OBJS = test.o

GCCPLUGINS_DIR:= $(shell $(CC) -print-file-name=plugin)
INCLUDE= -I$(GCCPLUGINS_DIR)/include -I$(NEW_PATH)

FLAGS= -fPIC -O0 -flto -flto-partition=none

%.o : %.c
$(CC) $(FLAGS) $(INCLUDE) -I$(INCLUDE1) -L$(LIBS) -c $<

%.o : %.cpp
$(CC) $(FLAGS) $(INCLUDE) -I$(INCLUDE1) -L$(LIBS) -c $<

plugin.so: plugin.o
$(CC) $(INCLUDE) $(FLAGS) -I$(INCLUDE1) -L$(LIBS) -shared $^ -o $@

run:  $(OBJS) plugin.so
$(CC) -o result -flto -flto-partition=none -fplugin=./plugin.so
$(OBJS) -O3 -fdump-ipa-all

clean:
\rm -f plugin.so *~ *.o a.out result* *.cpp.*


Balanced partition map for Firefox

2015-05-19 Thread Martin Liška

Hello.

I've just noticed that we, for default configuration, produce just 30 
partitions.
I'm wondering whether that's fine, or it would be necessary to re-tune 
partitioning
algorithm to produce better balanced map?

Attached patch is used to produce following dump:

Partition sizes:
partition 0 contains 9806 (5.42)% symbols and 232445 (2.37)% insns
partition 1 contains 15004 (8.30)% symbols and 389297 (3.96)% insns
partition 2 contains 13954 (7.71)% symbols and 390076 (3.97)% insns
partition 3 contains 14349 (7.93)% symbols and 390476 (3.97)% insns
partition 4 contains 13852 (7.66)% symbols and 391346 (3.98)% insns
partition 5 contains 10766 (5.95)% symbols and 278110 (2.83)% insns
partition 6 contains 11465 (6.34)% symbols and 396298 (4.03)% insns
partition 7 contains 16467 (9.10)% symbols and 396043 (4.03)% insns
partition 8 contains 12959 (7.16)% symbols and 316753 (3.22)% insns
partition 9 contains 17422 (9.63)% symbols and 402809 (4.10)% insns
partition 10 contains 15431 (8.53)% symbols and 404822 (4.12)% insns
partition 11 contains 15967 (8.83)% symbols and 342655 (3.49)% insns
partition 12 contains 12325 (6.81)% symbols and 409573 (4.17)% insns
partition 13 contains 11876 (6.57)% symbols and 411484 (4.19)% insns
partition 14 contains 20902 (11.56)% symbols and 391188 (3.98)% insns
partition 15 contains 18894 (10.45)% symbols and 339148 (3.45)% insns
partition 16 contains 27028 (14.94)% symbols and 426811 (4.34)% insns
partition 17 contains 19626 (10.85)% symbols and 431548 (4.39)% insns
partition 18 contains 23864 (13.19)% symbols and 437657 (4.45)% insns
partition 19 contains 28677 (15.86)% symbols and 445054 (4.53)% insns
partition 20 contains 32558 (18.00)% symbols and 457975 (4.66)% insns
partition 21 contains 37598 (20.79)% symbols and 470463 (4.79)% insns
partition 22 contains 21612 (11.95)% symbols and 488384 (4.97)% insns
partition 23 contains 18981 (10.49)% symbols and 493152 (5.02)% insns
partition 24 contains 20591 (11.38)% symbols and 493380 (5.02)% insns
partition 25 contains 20721 (11.46)% symbols and 496018 (5.05)% insns
partition 26 contains 26171 (14.47)% symbols and 479232 (4.88)% insns
partition 27 contains 29242 (16.17)% symbols and 530613 (5.40)% insns
partition 28 contains 35817 (19.80)% symbols and 563768 (5.74)% insns
partition 29 contains 42662 (23.59)% symbols and 741133 (7.54)% insns

As seen, there are partitions that are about 3x bigger than a different one.
What do you think about installing the patch to trunk? If yes, I'll test the 
patch
and write a ChangeLog entry.

Thanks,
Martin
diff --git a/gcc/lto/lto-partition.c b/gcc/lto/lto-partition.c
index 235b735..ba86f09 100644
--- a/gcc/lto/lto-partition.c
+++ b/gcc/lto/lto-partition.c
@@ -73,6 +73,7 @@ new_partition (const char *name)
   part->encoder = lto_symtab_encoder_new (false);
   part->name = name;
   part->insns = 0;
+  part->symbols = 0;
   ltrans_partitions.safe_push (part);
   return part;
 }
@@ -157,6 +158,8 @@ add_symbol_to_partition_1 (ltrans_partition part, symtab_node *node)
   gcc_assert (c != SYMBOL_EXTERNAL
 	  && (c == SYMBOL_DUPLICATE || !symbol_partitioned_p (node)));
 
+  part->symbols++;
+
   lto_set_symtab_encoder_in_partition (part->encoder, node);
 
   if (symbol_partitioned_p (node))
@@ -274,6 +277,7 @@ undo_partition (ltrans_partition partition, unsigned int n_nodes)
 {
   symtab_node *node = lto_symtab_encoder_deref (partition->encoder,
 		   n_nodes);
+  partition->symbols--;
   cgraph_node *cnode;
 
   /* After UNDO we no longer know what was visited.  */
@@ -462,7 +466,7 @@ lto_balanced_map (int n_lto_partitions)
   auto_vec varpool_order;
   int i;
   struct cgraph_node *node;
-  int total_size = 0, best_total_size = 0;
+  int original_total_size, total_size = 0, best_total_size = 0;
   int partition_size;
   ltrans_partition partition;
   int last_visited_node = 0;
@@ -488,6 +492,8 @@ lto_balanced_map (int n_lto_partitions)
 	  total_size += inline_summaries->get (node)->size;
   }
 
+  original_total_size = total_size;
+
   /* Streaming works best when the source units do not cross partition
  boundaries much.  This is because importing function from a source
  unit tends to import a lot of global trees defined there.  We should
@@ -782,6 +788,23 @@ lto_balanced_map (int n_lto_partitions)
   add_sorted_nodes (next_nodes, partition);
 
   free (order);
+
+  if (symtab->dump_file)
+{
+  fprintf (symtab->dump_file, "\nPartition sizes:\n");
+  unsigned partitions = ltrans_partitions.length ();
+
+  for (i = 0; i < partitions ; i++)
+	{
+	  ltrans_partition p = ltrans_partitions[i];
+	  fprintf (symtab->dump_file, "partition %d contains %d (%2.2f)%%"
+		   " symbols and %d (%2.2f)%% insns\n", i, p->symbols,
+		   100.0 * p->symbols / n_nodes, p->insns,
+		   100.0 * p->insns / original_total_size);
+	}
+
+  fprintf (symtab->dump_file, "\n");
+}
 }
 
 /* Return true if we must not change the name of the NODE.  The name as


target attributes/pragmas changing vector instruction availability and custom types

2015-05-19 Thread Kyrill Tkachov

Hi all,

I'm working on enabling target attributes and pragmas on aarch64 and I'm stuck 
on a particular issue.
I want to be able to use a target pragma to enable SIMD support in a SIMD 
intrinsics header file.
So it will look like this:

$ cat simd_header.h
#pragma GCC push_options
#pragma GCC target ("arch=armv8-a+simd")

#pragma GCC pop_options

I would then include it in a file with a function tagged with a simd target 
attribute:

$ cat foo.c
#inlcude "simd_header.h"

__attribute__((target("arch=armv8-a+simd")))
uint32x4_t
foo (uint32x4_t a)
{
  return simd_intrinsic (a); //simd_intrinsic defined in simd_header.h and 
implemented by a target builtin
}

This works fine for me. But if I try to compile this without SIMD support, say: 
aarch64-none-elf-gcc -c -march=armv8-a+nosimd foo.c
I get an ICE during builtin expansion time.

I think I've tracked it down to the problem that the type uint32x4_t is a 
builtin type that we define in the backend
(with add_builtin_type) during the target builtins initialisation code.

From what I can see, this code gets called early on after the command line 
options have been processed,
but before target pragmas or attributes are processed, so the builtin types are 
laid out assuming that no SIMD is available,
as per the command line option -march=armv8-a+nosimd, but later while expanding 
the builtin in simd_intrinsic with SIMD available
the ICE occurs. I think that is because the types were not re-laid out.

I'm somewhat stumped on ideas to work around this issue.
I notice that rs6000 also defines custom builtin vector types.
Michael, did you notice any issue similar to what I described above?

Would re-laying the builtin vector type on target attribute changes be a valid 
way to go forward here?

Thanks,
Kyrill



Re: optimization question

2015-05-19 Thread Martin Sebor

I'm not very familiar with the optimizations that are done in O2 vs O1,
or even what happens in these optimizations.

So, I'm wondering if this is a bug, or a subtle valid optimization that
I don't understand.  Any help would be appreciated.


Another approach to debugging a suspected optimization bug is
to look at the optimization dumps produced in response to one
or more of the -fdump-tree-xxx and -fdump-rtl-xxx options.
There are many of them and they tend to be verbose but not
as much as the raw assembly and are usually easier to follow.
The dumps make it possible to identify the buggy optimization
pass and avoid the bug by disabling just the problematic
optimization while leaving all the others enabled. With
attribute optimize and/or #pragma GCC optimize, you can then
control specify which optimization to disable for subsets of
functions in a file.

See the following sections in the manual for more:

http://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/Debugging-Options.html#index-fdump-tree-673

http://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/Function-Specific-Option-Pragmas.html

Martin



Re: optimization question

2015-05-19 Thread mark maule



On 5/19/2015 10:09 AM, Martin Sebor wrote:

I'm not very familiar with the optimizations that are done in O2 vs O1,
or even what happens in these optimizations.

So, I'm wondering if this is a bug, or a subtle valid optimization that
I don't understand.  Any help would be appreciated.


Another approach to debugging a suspected optimization bug is
to look at the optimization dumps produced in response to one
or more of the -fdump-tree-xxx and -fdump-rtl-xxx options.
There are many of them and they tend to be verbose but not
as much as the raw assembly and are usually easier to follow.
The dumps make it possible to identify the buggy optimization
pass and avoid the bug by disabling just the problematic
optimization while leaving all the others enabled. With
attribute optimize and/or #pragma GCC optimize, you can then
control specify which optimization to disable for subsets of
functions in a file.

See the following sections in the manual for more:

http://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/Debugging-Options.html#index-fdump-tree-673 



http://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/Function-Specific-Option-Pragmas.html 



Martin



Thanks again Martin.  I started going down that road yesterday, and got 
lost in the forest of options.   What I was looking for was some option 
that would tell me what was being done with dgHandle specifically.  I 
played around with -fopt-info-all but didn't get very far ...





Re: target attributes/pragmas changing vector instruction availability and custom types

2015-05-19 Thread Kyrill Tkachov


On 19/05/15 15:55, Christian Bruel wrote:

Hi Kiril,

This is funny, I've updated bz65837 today in the same direction.

On 05/19/2015 04:17 PM, Kyrill Tkachov wrote:

Hi all,

I'm working on enabling target attributes and pragmas on aarch64 and I'm stuck 
on a particular issue.
I want to be able to use a target pragma to enable SIMD support in a SIMD 
intrinsics header file.
So it will look like this:

$ cat simd_header.h
#pragma GCC push_options
#pragma GCC target ("arch=armv8-a+simd")

#pragma GCC pop_options

I would then include it in a file with a function tagged with a simd target 
attribute:

$ cat foo.c
#inlcude "simd_header.h"

__attribute__((target("arch=armv8-a+simd")))
uint32x4_t
foo (uint32x4_t a)
{
 return simd_intrinsic (a); //simd_intrinsic defined in simd_header.h and 
implemented by a target builtin
}

This works fine for me. But if I try to compile this without SIMD support, say: 
aarch64-none-elf-gcc -c -march=armv8-a+nosimd foo.c
I get an ICE during builtin expansion time.

I think I've tracked it down to the problem that the type uint32x4_t is a 
builtin type that we define in the backend
(with add_builtin_type) during the target builtins initialisation code.

   From what I can see, this code gets called early on after the command line 
options have been processed,
but before target pragmas or attributes are processed, so the builtin types are 
laid out assuming that no SIMD is available,
as per the command line option -march=armv8-a+nosimd, but later while expanding 
the builtin in simd_intrinsic with SIMD available
the ICE occurs. I think that is because the types were not re-laid out.


I share this analysis.


I'm somewhat stumped on ideas to work around this issue.
I notice that rs6000 also defines custom builtin vector types.
Michael, did you notice any issue similar to what I described above?

Would re-laying the builtin vector type on target attribute changes be a valid 
way to go forward here?

this is what I've done on arm, seems to work locally.

for aarch64, you can try to call aarch64_init_simd_builtins () from your
target hook, then we need to think how to unset them.


Hmmm, calling aarch64_init_simd_builtin_types in the VALID_TARGET_ATTRIBUTE_P 
hook seems to work for me.
Thanks for the suggestion! I think undefining shouldn't be a concern since they 
are target-specific builtins and we make no guarantee on their availability or 
behaviour through any other use other than
the intrinsics in arm_neon.h. Of course, we'd need to massage the 
aarch64_init_simd_builtin_types to only
re-layout the types once, so that we don't end up doing redundant work or 
bloating memory.

Thanks again!
Kyrill



Cheers

Christian


Thanks,
Kyrill





Re: [i386] Scalar DImode instructions on XMM registers

2015-05-19 Thread Vladimir Makarov

On 05/18/2015 08:13 AM, Ilya Enkovich wrote:

2015-05-06 17:18 GMT+03:00 Ilya Enkovich :

2015-04-25 4:32 GMT+03:00 Jan Hubicka :

Hi,
I am adding Vladimir and Richard into CC. I tried to solve similar problem
with FP math years ago by having -mfpmath=sse,i387. The idea was to allow
use of i387 registers when SSE ones run out and possibly also model the fact
that Pentium4 had faster i387 additions than SSE additions. I also had some
plans to extend this one mixed SSE/MMX/GPR integer arithmetics, but never
got to that.

This did not really fly becuase of the regalloc not really being able to
understnad it (I made path to regclass to propagate the classes and figure out
what operations needs to stay in i387 and what in SSE to avoid reloading, but
that never got in).

I believe Vladimir did some work on this with IRA (he is able to spill GPR
regs into SSE and do bit of other tricks).

Also I believe it was kind of Richard's design deicsion to avoid use of
(paradoxical) subregs for vector conversions because these have funny
implications.

The code for handling upper parts of paradoxical subregs is controlled by
macros around SUBREG_PROMOTED_VAR_P but I do not think it will handle
V1DI->V2DI conversions fluently without some middle-end hacking. (it will
probably try to produce zero extensions)

When we are on SSE instructions, it would be great to finally teach
copy_by_pieces/store_by_pieces to use vector instructions (these are more
compact and either equaly fast or faster on some CPUs). I hope to get into
this, but it would be great if someone beat me.

Honza


I'm trying to implement it as separate RTL pass which chooses a
scalar/vector mode for each 64bit computation chain and performs
transformation if we choose to use vectors. I also want to split DI
instructions which are going to be implemented on GPRs before RA
(currently it is done on the second split). Good metrics for such
transformation is a big question but currently I can't even make it
generate correct code when paradoxical subregs are used. It works in
simple cases but I get troubles when spills appear.

Trying to beat the following testcase:

test (long long *arr)
{
   register unsigned long long tmp;
   tmp = arr[0] | arr[1] & arr[2];
   while (tmp)
 {
   counter (tmp);
   tmp = *(arr++) & tmp;
 }
}

RTL I generate seems OK to me (ignoring the fact that it is not optimal):

(insn 6 3 50 2 (set (reg:DI 98 [ MEM[(long long int *)arr_5(D) + 8B] ])
 (mem:DI (plus:SI (reg/v/f:SI 96 [ arr ])
 (const_int 8 [0x8])) [2 MEM[(long long int *)arr_5(D)
+ 8B]+0 S8 A64])) pr65105-1.c:22 89 {*movdi_internal}
  (nil))
(insn 50 6 7 2 (set (reg:DI 104)
 (mem:DI (plus:SI (reg/v/f:SI 96 [ arr ])
 (const_int 16 [0x10])) [2 MEM[(long long int
*)arr_5(D) + 16B]+0 S8 A64])) pr65105-1.c:22 -1
  (nil))
(insn 7 50 51 2 (set (subreg:V2DI (reg:DI 97 [ D.2586 ]) 0)
 (and:V2DI (subreg:V2DI (reg:DI 98 [ MEM[(long long int
*)arr_5(D) + 8B] ]) 0)
 (subreg:V2DI (reg:DI 104) 0))) pr65105-1.c:22 3487 {*andv2di3}
  (expr_list:REG_DEAD (subreg:V2DI (reg:DI 98 [ MEM[(long long int
*)arr_5(D) + 8B] ]) 0)
 (expr_list:REG_UNUSED (reg:CC 17 flags)
 (expr_list:REG_EQUAL (and:DI (mem:DI (plus:SI (reg/v/f:SI
96 [ arr ])
 (const_int 8 [0x8])) [2 MEM[(long long int
*)arr_5(D) + 8B]+0 S8 A64])
 (mem:DI (plus:SI (reg/v/f:SI 96 [ arr ])
 (const_int 16 [0x10])) [2 MEM[(long long
int *)arr_5(D) + 16B]+0 S8 A64]))
 (nil)
(insn 51 7 8 2 (set (reg:DI 105)
 (mem:DI (reg/v/f:SI 96 [ arr ]) [2 *arr_5(D)+0 S8 A64]))
pr65105-1.c:22 -1
  (nil))
(insn 8 51 46 2 (set (subreg:V2DI (reg/v:DI 87 [ tmp ]) 0)
 (ior:V2DI (subreg:V2DI (reg:DI 97 [ D.2586 ]) 0)
 (subreg:V2DI (reg:DI 105) 0))) pr65105-1.c:22 3489 {*iorv2di3}
  (expr_list:REG_DEAD (subreg:V2DI (reg:DI 97 [ D.2586 ]) 0)
 (expr_list:REG_UNUSED (reg:CC 17 flags)
 (nil
(insn 46 8 47 2 (set (reg:V2DI 103)
 (subreg:V2DI (reg/v:DI 87 [ tmp ]) 0)) pr65105-1.c:22 -1
  (nil))
(insn 47 46 48 2 (set (subreg:SI (reg:DI 101) 0)
 (subreg:SI (reg:V2DI 103) 0)) pr65105-1.c:22 -1
  (nil))
(insn 48 47 49 2 (set (reg:V2DI 103)
 (lshiftrt:V2DI (reg:V2DI 103)
 (const_int 32 [0x20]))) pr65105-1.c:22 -1
  (nil))
(insn 49 48 9 2 (set (subreg:SI (reg:DI 101) 4)
 (subreg:SI (reg:V2DI 103) 0)) pr65105-1.c:22 -1
  (nil))
(note 9 49 10 2 NOTE_INSN_DELETED)
(insn 10 9 11 2 (parallel [
 (set (reg:CCZ 17 flags)
 (compare:CCZ (ior:SI (subreg:SI (reg:DI 101) 4)
 (subreg:SI (reg:DI 101) 0))
 (const_int 0 [0])))
 (clobber (scratch:SI))
 ]) pr65105-1.c:23 447 {*iorsi_3}
  (nil))
(jump_insn 11 10 37 2 (set (pc)
 (if_then_else (ne (reg:CCZ 17 flags)
  

Re: optimization question

2015-05-19 Thread Andrew Haley
On 05/19/2015 04:14 PM, mark maule wrote:
> Thanks again Martin.  I started going down that road yesterday, and got 
> lost in the forest of options.   What I was looking for was some option 
> that would tell me what was being done with dgHandle specifically.  I 
> played around with -fopt-info-all but didn't get very far ...

I think you're going to find it very difficult to debug such a large
chunk of code, and it's quite possible that your problem is in the RTL
optimization anyway.

I don't think you'll find anything better than trying to reduce the test
case.  Also, all those macros rather obfuscate what's going on.  It may
help later to preprocess the source and run it through indent.

Andrew.



Re: optimization question

2015-05-19 Thread mark maule



On 5/19/2015 10:28 AM, Andrew Haley wrote:

On 05/19/2015 04:14 PM, mark maule wrote:

Thanks again Martin.  I started going down that road yesterday, and got
lost in the forest of options.   What I was looking for was some option
that would tell me what was being done with dgHandle specifically.  I
played around with -fopt-info-all but didn't get very far ...

I think you're going to find it very difficult to debug such a large
chunk of code, and it's quite possible that your problem is in the RTL
optimization anyway.

I don't think you'll find anything better than trying to reduce the test
case.  Also, all those macros rather obfuscate what's going on.  It may
help later to preprocess the source and run it through indent.

Andrew.


Understood.  That's my next step, and will repost to gcc-help per 
earlier suggestions.


Thanks for the pointers.



Re: target attributes/pragmas changing vector instruction availability and custom types

2015-05-19 Thread Kyrill Tkachov


On 19/05/15 16:21, Kyrill Tkachov wrote:

On 19/05/15 15:55, Christian Bruel wrote:

Hi Kiril,

This is funny, I've updated bz65837 today in the same direction.

On 05/19/2015 04:17 PM, Kyrill Tkachov wrote:

Hi all,

I'm working on enabling target attributes and pragmas on aarch64 and I'm stuck 
on a particular issue.
I want to be able to use a target pragma to enable SIMD support in a SIMD 
intrinsics header file.
So it will look like this:

$ cat simd_header.h
#pragma GCC push_options
#pragma GCC target ("arch=armv8-a+simd")

#pragma GCC pop_options

I would then include it in a file with a function tagged with a simd target 
attribute:

$ cat foo.c
#inlcude "simd_header.h"

__attribute__((target("arch=armv8-a+simd")))
uint32x4_t
foo (uint32x4_t a)
{
  return simd_intrinsic (a); //simd_intrinsic defined in simd_header.h and 
implemented by a target builtin
}

This works fine for me. But if I try to compile this without SIMD support, say: 
aarch64-none-elf-gcc -c -march=armv8-a+nosimd foo.c
I get an ICE during builtin expansion time.

I think I've tracked it down to the problem that the type uint32x4_t is a 
builtin type that we define in the backend
(with add_builtin_type) during the target builtins initialisation code.

From what I can see, this code gets called early on after the command line 
options have been processed,
but before target pragmas or attributes are processed, so the builtin types are 
laid out assuming that no SIMD is available,
as per the command line option -march=armv8-a+nosimd, but later while expanding 
the builtin in simd_intrinsic with SIMD available
the ICE occurs. I think that is because the types were not re-laid out.


I share this analysis.


I'm somewhat stumped on ideas to work around this issue.
I notice that rs6000 also defines custom builtin vector types.
Michael, did you notice any issue similar to what I described above?

Would re-laying the builtin vector type on target attribute changes be a valid 
way to go forward here?

this is what I've done on arm, seems to work locally.

for aarch64, you can try to call aarch64_init_simd_builtins () from your
target hook, then we need to think how to unset them.

Hmmm, calling aarch64_init_simd_builtin_types in the VALID_TARGET_ATTRIBUTE_P 
hook seems to work for me.


Actually, scratch that. I had used the wrong compiler. Calling the builtin init 
code through
the target attribute hook didn't work :(.
Looking further into int...

Kyrill


Thanks for the suggestion! I think undefining shouldn't be a concern since they 
are target-specific builtins and we make no guarantee on their availability or 
behaviour through any other use other than
the intrinsics in arm_neon.h. Of course, we'd need to massage the 
aarch64_init_simd_builtin_types to only
re-layout the types once, so that we don't end up doing redundant work or 
bloating memory.

Thanks again!
Kyrill


Cheers

Christian


Thanks,
Kyrill





gcc-5-20150519 is now available

2015-05-19 Thread gccadmin
Snapshot gcc-5-20150519 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/5-20150519/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 5 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-5-branch 
revision 223417

You'll find:

 gcc-5-20150519.tar.bz2   Complete GCC

  MD5=84f261b2f23e154ec6d9bd4149851a21
  SHA1=c6e72cc4ebd446df4cc797947107cbefad21bdf5

Diffs from 5-20150512 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-5
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Compilers and RCU readers: Once more unto the breach!

2015-05-19 Thread Paul E. McKenney
Hello!

Following up on last year's discussion (https://lwn.net/Articles/586838/,
https://lwn.net/Articles/588300/), I believe that we have a solution.  If
I am wrong, I am sure you all will let me know, and in great detail.  ;-)

The key simplification is to "just say no" to RCU-protected array indexes:
https://lkml.org/lkml/2015/5/12/827, as was suggested by several people.
This simplification means that rcu_dereference (AKA memory_order_consume)
need only return pointers.  This in ture avoids things like (x-x),
(x*0), and (x%1) because if "x" is a pointer, these expressions either
return non-pointers are compilation errors.  With a very few exceptions,
dependency chains can lead -to- non-pointers, but cannot pass -through-
them.

The result is that dependencies are carried only by operations for
which the compiler cannot easily optimize the away those dependencies,
these operations including simple assignment, integer offset (including
indexing), dereferencing, casts, passing as a function argument, return
values from functions and so on.  A complete list with commentary starts
on page 28 of:

http://www.rdrop.com/users/paulmck/RCU/consume.2015.05.18a.pdf

Dependency chains are broken if a pointer compares equal to some other
pointer not part of the same dependency chain, if too many bits are ORed
onto or ANDed off of a intptr_t or uintptr_t, or if the dependency is
explicitly killed (which should now strictly speaking never be necessary,
but which might allow better diagnostics).  These are set out in more
detail on page 30 of the above PDF.

This covers all the uses in the Linux kernel that I am aware of without
any source-code changes (other than to the rcu_dereference() primitives
themselves) and should also work for compilers and standards.

Thoughts?

Thanx, Paul



Re: Compilers and RCU readers: Once more unto the breach!

2015-05-19 Thread Linus Torvalds
On Tue, May 19, 2015 at 5:55 PM, Paul E. McKenney
 wrote:
>
> http://www.rdrop.com/users/paulmck/RCU/consume.2015.05.18a.pdf

>From a very quick read-through, the restricted dependency chain in 7.9
seems to be reasonable, and essentially covers "thats' what hardware
gives us anyway", making compiler writers happy.

I would clarify the language somewhat:

 - it says that the result of a cast of a pointer is a dependency. You
need to make an exception for casting to bool, methinks (because
that's effectively just a test-against-NULL, which you later describe
as terminating the dependency).

   Maybe get rid of the "any type", and just limit it to casts to
types of size intptr_t, ie ones that don't drop significant bits. All
the other rules talk about [u]intptr_t anyway.

 - you clarify that the trivial "& 0" and "| ~0" kill the dependency
chain, but if you really want to be a stickler, you might want to
extend it to a few more cases. Things like "& 1" (to extract a tag
from the lot bit of a tagged pointer) should likely also drop the
dependency, since a compiler would commonly end up using the end
result as a conditional even if the code was written to then use
casting etc to look like a dereference.

 - the "you can add/subtract integral values" still opens you up to
language lawyers claiming "(char *)ptr - (intptr_t)ptr" preserving the
dependency, which it clearly doesn't. But language-lawyering it does,
since all those operations (cast to pointer, cast to integer,
subtracting an integer) claim to be dependency-preserving operations.

So I think you want to limit the logical operators to things that
don't mask off too many bits, and you should probably limit the
add/subtract operations some way (maybe specify that the integer value
you add/subtract cannot be related to the pointer). But I think
limiting it to mostly pointer ops (and a _few_ integer operations to
do offsets and remove tag bits) is otherwise a good approach.

   Linus


Re: Compilers and RCU readers: Once more unto the breach!

2015-05-19 Thread Linus Torvalds
On Tue, May 19, 2015 at 6:57 PM, Linus Torvalds
 wrote:
>
>  - the "you can add/subtract integral values" still opens you up to
> language lawyers claiming "(char *)ptr - (intptr_t)ptr" preserving the
> dependency, which it clearly doesn't. But language-lawyering it does,
> since all those operations (cast to pointer, cast to integer,
> subtracting an integer) claim to be dependency-preserving operations.
>
> So I think you want to limit the logical operators to things that
> don't mask off too many bits, and you should probably limit the
> add/subtract operations some way (maybe specify that the integer value
> you add/subtract cannot be related to the pointer).

Actually, "not related" doesn't work. For some buddy allocator thing,
you very much might want some of the bits to be related.

So I think you're better off just saying that operations designed to
drop significant bits break the dependency chain, and give things like
"& 1" and "(char *)ptr-(uintptr_t)ptr" as examples of such.

Making that just an extension of your existing "& 0" language would
seem to be natural.

Humans will understand, and compiler writers won't care. They will
either depend on hardware semantics anyway (and argue that your
language is tight enough that they don't need to do anything special)
or they will turn the consume into an acquire (on platforms that have
too weak hardware).

 Linus


Re: Compilers and RCU readers: Once more unto the breach!

2015-05-19 Thread Paul E. McKenney
On Tue, May 19, 2015 at 06:57:02PM -0700, Linus Torvalds wrote:
> On Tue, May 19, 2015 at 5:55 PM, Paul E. McKenney
>  wrote:
> >
> > http://www.rdrop.com/users/paulmck/RCU/consume.2015.05.18a.pdf
> 
> >From a very quick read-through, the restricted dependency chain in 7.9
> seems to be reasonable, and essentially covers "thats' what hardware
> gives us anyway", making compiler writers happy.
> 
> I would clarify the language somewhat:
> 
>  - it says that the result of a cast of a pointer is a dependency. You
> need to make an exception for casting to bool, methinks (because
> that's effectively just a test-against-NULL, which you later describe
> as terminating the dependency).
> 
>Maybe get rid of the "any type", and just limit it to casts to
> types of size intptr_t, ie ones that don't drop significant bits. All
> the other rules talk about [u]intptr_t anyway.

Excellent point!  I now say:

If a pointer is part of a dependency chain, then casting it
(either explicitly or implicitly) to any pointer-sized type
extends the chain to the result.

If this approach works out, the people in the Core Working Group will
come up with alternative language-lawyer-proof wording, but this informal
version will hopefully do for the moment.

>  - you clarify that the trivial "& 0" and "| ~0" kill the dependency
> chain, but if you really want to be a stickler, you might want to
> extend it to a few more cases. Things like "& 1" (to extract a tag
> from the lot bit of a tagged pointer) should likely also drop the
> dependency, since a compiler would commonly end up using the end
> result as a conditional even if the code was written to then use
> casting etc to look like a dereference.

Ah, how about the following?

If a value of type intptr_t or uintptr_t is part of a dependency
chain, and if that value is one of the operands to an & or |
infix operator whose result has too few or too many bits set,
then the resulting value will not be part of any dependency
chain.  For example, on a 64-bit system, if p is part of a
dependency chain, then (p & 0x7) provides just the tag bits,
and normally cannot even be legally dereferenced.  Similarly,
(p | ~0) normally cannot be legally dereferenced.

>  - the "you can add/subtract integral values" still opens you up to
> language lawyers claiming "(char *)ptr - (intptr_t)ptr" preserving the
> dependency, which it clearly doesn't. But language-lawyering it does,
> since all those operations (cast to pointer, cast to integer,
> subtracting an integer) claim to be dependency-preserving operations.

My thought was that the result of "(char *)ptr - (intptr_t)ptr" is a
NULL pointer in most environments, and dereferencing a NULL pointer
is undefined behavior.  So it becomes irrelevant whether or not the
NULL pointer carries a dependency.

There are some stranger examples, such as "(char *)ptr - ((intptr_t)ptr)/7",
but in that case, if the resulting pointer happens by chance to reference 
valid memory, I believe a dependency would still be carried.  Of course,
if you are producing code like that, I am guessing that dependencies are
the very least of your concerns.

However, I will give this some more thought.

> So I think you want to limit the logical operators to things that
> don't mask off too many bits, and you should probably limit the
> add/subtract operations some way (maybe specify that the integer value
> you add/subtract cannot be related to the pointer). But I think
> limiting it to mostly pointer ops (and a _few_ integer operations to
> do offsets and remove tag bits) is otherwise a good approach.

Glad you mostly like it!  ;-)

Thanx, Paul



Re: Compilers and RCU readers: Once more unto the breach!

2015-05-19 Thread Paul E. McKenney
On Tue, May 19, 2015 at 07:10:12PM -0700, Linus Torvalds wrote:
> On Tue, May 19, 2015 at 6:57 PM, Linus Torvalds
>  wrote:
> >
> >  - the "you can add/subtract integral values" still opens you up to
> > language lawyers claiming "(char *)ptr - (intptr_t)ptr" preserving the
> > dependency, which it clearly doesn't. But language-lawyering it does,
> > since all those operations (cast to pointer, cast to integer,
> > subtracting an integer) claim to be dependency-preserving operations.
> >
> > So I think you want to limit the logical operators to things that
> > don't mask off too many bits, and you should probably limit the
> > add/subtract operations some way (maybe specify that the integer value
> > you add/subtract cannot be related to the pointer).
> 
> Actually, "not related" doesn't work. For some buddy allocator thing,
> you very much might want some of the bits to be related.

Good point, you could do the buddy-allocator computations with add
and subtract instead of exclusive OR.

> So I think you're better off just saying that operations designed to
> drop significant bits break the dependency chain, and give things like
> "& 1" and "(char *)ptr-(uintptr_t)ptr" as examples of such.
> 
> Making that just an extension of your existing "& 0" language would
> seem to be natural.

Works for me!  I added the following bullet to the list of things
that break dependencies:

If a pointer is part of a dependency chain, and if the values
added to or subtracted from that pointer cancel the pointer
value so as to allow the compiler to precisely determine the
resulting value, then the resulting value will not be part of
any dependency chain.  For example, if p is part of a dependency
chain, then ((char *)p-(uintptr_t)p)+65536 will not be.

Seem reasonable?

> Humans will understand, and compiler writers won't care. They will
> either depend on hardware semantics anyway (and argue that your
> language is tight enough that they don't need to do anything special)
> or they will turn the consume into an acquire (on platforms that have
> too weak hardware).

Agreed.  Plus Core Working Group will hammer out the exact wording,
should this approach meet their approval.

Thanx, Paul