Fwd: Problem in using BDD libraries with GCC Plugin
I am facing an issue compiling a gcc plugin. I am using the cudd libraries (binary decision diagram libraries), Ubuntu version 14.04 and gcc version is 4.7.2. While running the Makefile, the compiler fails to recognize the Cudd & BDD data structures whereas all the necessary header files have been included and the libraries have been linked. The error messages "Cudd was not declared in this scope" & "BDD was not declared in this scope" are displayed. Please find below the contents of the Makefile. #-- MAKE CHANGES TO BASE_DIR : Please put the path to base directory of your pristine gcc-4.7.2 build ---# BASE_DIR = /home/nishant/GCC_BUILDS/gcc_4.7 INSTALL = $(BASE_DIR)/install CC = $(INSTALL)/bin/g++ NEW_PATH = $(BASE_DIR)/gcc-4.7.2/gcc # include files and library PDIR= /home/nishant/Code/cudd-2.5.0 INCLUDE1 = $(PDIR)/include # for c++ LIBS= $(PDIR)/obj/libobj.a $(PDIR)/cudd/libcudd.a $(PDIR)/mtr/libmtr.a \ $(PDIR)/st/libst.a $(PDIR)/util/libutil.a $(PDIR)/epd/libepd.a #- MAKE CHANGES TO OBJS : Add the name of your test file with extension .o (say test as test.o) # #--- Multiple dependent files maybe also be added --# #OBJS = test1.o #OBJS = test2.o #OBJS = test3.o #OBJS = test4.o OBJS = test.o GCCPLUGINS_DIR:= $(shell $(CC) -print-file-name=plugin) INCLUDE= -I$(GCCPLUGINS_DIR)/include -I$(NEW_PATH) FLAGS= -fPIC -O0 -flto -flto-partition=none %.o : %.c $(CC) $(FLAGS) $(INCLUDE) -I$(INCLUDE1) -L$(LIBS) -c $< %.o : %.cpp $(CC) $(FLAGS) $(INCLUDE) -I$(INCLUDE1) -L$(LIBS) -c $< plugin.so: plugin.o $(CC) $(INCLUDE) $(FLAGS) -I$(INCLUDE1) -L$(LIBS) -shared $^ -o $@ run: $(OBJS) plugin.so $(CC) -o result -flto -flto-partition=none -fplugin=./plugin.so $(OBJS) -O3 -fdump-ipa-all clean: \rm -f plugin.so *~ *.o a.out result* *.cpp.*
Fwd: Problem in using BDD libraries with GCC Plugin
I am facing an issue compiling a gcc plugin. I am using the cudd libraries (binary decision diagram libraries) and my gcc version is 4.7.2. While running the Makefile, the compiler fails to recognize the Cudd & BDD data structures whereas all the necessary header files have been included and the libraries have been linked. The error messages "Cudd was not declared in this scope" & "BDD was not declared in this scope" are displayed. Please find below the contents of the Makefile. #-- MAKE CHANGES TO BASE_DIR : Please put the path to base directory of your pristine gcc-4.7.2 build ---# BASE_DIR = /home/nishant/GCC_BUILDS/gcc_4.7 INSTALL = $(BASE_DIR)/install CC = $(INSTALL)/bin/g++ NEW_PATH = $(BASE_DIR)/gcc-4.7.2/gcc # include files and library PDIR= /home/nishant/Code/cudd-2.5.0 INCLUDE1 = $(PDIR)/include # for c++ LIBS= $(PDIR)/obj/libobj.a $(PDIR)/cudd/libcudd.a $(PDIR)/mtr/libmtr.a \ $(PDIR)/st/libst.a $(PDIR)/util/libutil.a $(PDIR)/epd/libepd.a #- MAKE CHANGES TO OBJS : Add the name of your test file with extension .o (say test as test.o) # #--- Multiple dependent files maybe also be added --# #OBJS = test1.o #OBJS = test2.o #OBJS = test3.o #OBJS = test4.o OBJS = test.o GCCPLUGINS_DIR:= $(shell $(CC) -print-file-name=plugin) INCLUDE= -I$(GCCPLUGINS_DIR)/include -I$(NEW_PATH) FLAGS= -fPIC -O0 -flto -flto-partition=none %.o : %.c $(CC) $(FLAGS) $(INCLUDE) -I$(INCLUDE1) -L$(LIBS) -c $< %.o : %.cpp $(CC) $(FLAGS) $(INCLUDE) -I$(INCLUDE1) -L$(LIBS) -c $< plugin.so: plugin.o $(CC) $(INCLUDE) $(FLAGS) -I$(INCLUDE1) -L$(LIBS) -shared $^ -o $@ run: $(OBJS) plugin.so $(CC) -o result -flto -flto-partition=none -fplugin=./plugin.so $(OBJS) -O3 -fdump-ipa-all clean: \rm -f plugin.so *~ *.o a.out result* *.cpp.*
Balanced partition map for Firefox
Hello. I've just noticed that we, for default configuration, produce just 30 partitions. I'm wondering whether that's fine, or it would be necessary to re-tune partitioning algorithm to produce better balanced map? Attached patch is used to produce following dump: Partition sizes: partition 0 contains 9806 (5.42)% symbols and 232445 (2.37)% insns partition 1 contains 15004 (8.30)% symbols and 389297 (3.96)% insns partition 2 contains 13954 (7.71)% symbols and 390076 (3.97)% insns partition 3 contains 14349 (7.93)% symbols and 390476 (3.97)% insns partition 4 contains 13852 (7.66)% symbols and 391346 (3.98)% insns partition 5 contains 10766 (5.95)% symbols and 278110 (2.83)% insns partition 6 contains 11465 (6.34)% symbols and 396298 (4.03)% insns partition 7 contains 16467 (9.10)% symbols and 396043 (4.03)% insns partition 8 contains 12959 (7.16)% symbols and 316753 (3.22)% insns partition 9 contains 17422 (9.63)% symbols and 402809 (4.10)% insns partition 10 contains 15431 (8.53)% symbols and 404822 (4.12)% insns partition 11 contains 15967 (8.83)% symbols and 342655 (3.49)% insns partition 12 contains 12325 (6.81)% symbols and 409573 (4.17)% insns partition 13 contains 11876 (6.57)% symbols and 411484 (4.19)% insns partition 14 contains 20902 (11.56)% symbols and 391188 (3.98)% insns partition 15 contains 18894 (10.45)% symbols and 339148 (3.45)% insns partition 16 contains 27028 (14.94)% symbols and 426811 (4.34)% insns partition 17 contains 19626 (10.85)% symbols and 431548 (4.39)% insns partition 18 contains 23864 (13.19)% symbols and 437657 (4.45)% insns partition 19 contains 28677 (15.86)% symbols and 445054 (4.53)% insns partition 20 contains 32558 (18.00)% symbols and 457975 (4.66)% insns partition 21 contains 37598 (20.79)% symbols and 470463 (4.79)% insns partition 22 contains 21612 (11.95)% symbols and 488384 (4.97)% insns partition 23 contains 18981 (10.49)% symbols and 493152 (5.02)% insns partition 24 contains 20591 (11.38)% symbols and 493380 (5.02)% insns partition 25 contains 20721 (11.46)% symbols and 496018 (5.05)% insns partition 26 contains 26171 (14.47)% symbols and 479232 (4.88)% insns partition 27 contains 29242 (16.17)% symbols and 530613 (5.40)% insns partition 28 contains 35817 (19.80)% symbols and 563768 (5.74)% insns partition 29 contains 42662 (23.59)% symbols and 741133 (7.54)% insns As seen, there are partitions that are about 3x bigger than a different one. What do you think about installing the patch to trunk? If yes, I'll test the patch and write a ChangeLog entry. Thanks, Martin diff --git a/gcc/lto/lto-partition.c b/gcc/lto/lto-partition.c index 235b735..ba86f09 100644 --- a/gcc/lto/lto-partition.c +++ b/gcc/lto/lto-partition.c @@ -73,6 +73,7 @@ new_partition (const char *name) part->encoder = lto_symtab_encoder_new (false); part->name = name; part->insns = 0; + part->symbols = 0; ltrans_partitions.safe_push (part); return part; } @@ -157,6 +158,8 @@ add_symbol_to_partition_1 (ltrans_partition part, symtab_node *node) gcc_assert (c != SYMBOL_EXTERNAL && (c == SYMBOL_DUPLICATE || !symbol_partitioned_p (node))); + part->symbols++; + lto_set_symtab_encoder_in_partition (part->encoder, node); if (symbol_partitioned_p (node)) @@ -274,6 +277,7 @@ undo_partition (ltrans_partition partition, unsigned int n_nodes) { symtab_node *node = lto_symtab_encoder_deref (partition->encoder, n_nodes); + partition->symbols--; cgraph_node *cnode; /* After UNDO we no longer know what was visited. */ @@ -462,7 +466,7 @@ lto_balanced_map (int n_lto_partitions) auto_vec varpool_order; int i; struct cgraph_node *node; - int total_size = 0, best_total_size = 0; + int original_total_size, total_size = 0, best_total_size = 0; int partition_size; ltrans_partition partition; int last_visited_node = 0; @@ -488,6 +492,8 @@ lto_balanced_map (int n_lto_partitions) total_size += inline_summaries->get (node)->size; } + original_total_size = total_size; + /* Streaming works best when the source units do not cross partition boundaries much. This is because importing function from a source unit tends to import a lot of global trees defined there. We should @@ -782,6 +788,23 @@ lto_balanced_map (int n_lto_partitions) add_sorted_nodes (next_nodes, partition); free (order); + + if (symtab->dump_file) +{ + fprintf (symtab->dump_file, "\nPartition sizes:\n"); + unsigned partitions = ltrans_partitions.length (); + + for (i = 0; i < partitions ; i++) + { + ltrans_partition p = ltrans_partitions[i]; + fprintf (symtab->dump_file, "partition %d contains %d (%2.2f)%%" + " symbols and %d (%2.2f)%% insns\n", i, p->symbols, + 100.0 * p->symbols / n_nodes, p->insns, + 100.0 * p->insns / original_total_size); + } + + fprintf (symtab->dump_file, "\n"); +} } /* Return true if we must not change the name of the NODE. The name as
target attributes/pragmas changing vector instruction availability and custom types
Hi all, I'm working on enabling target attributes and pragmas on aarch64 and I'm stuck on a particular issue. I want to be able to use a target pragma to enable SIMD support in a SIMD intrinsics header file. So it will look like this: $ cat simd_header.h #pragma GCC push_options #pragma GCC target ("arch=armv8-a+simd") #pragma GCC pop_options I would then include it in a file with a function tagged with a simd target attribute: $ cat foo.c #inlcude "simd_header.h" __attribute__((target("arch=armv8-a+simd"))) uint32x4_t foo (uint32x4_t a) { return simd_intrinsic (a); //simd_intrinsic defined in simd_header.h and implemented by a target builtin } This works fine for me. But if I try to compile this without SIMD support, say: aarch64-none-elf-gcc -c -march=armv8-a+nosimd foo.c I get an ICE during builtin expansion time. I think I've tracked it down to the problem that the type uint32x4_t is a builtin type that we define in the backend (with add_builtin_type) during the target builtins initialisation code. From what I can see, this code gets called early on after the command line options have been processed, but before target pragmas or attributes are processed, so the builtin types are laid out assuming that no SIMD is available, as per the command line option -march=armv8-a+nosimd, but later while expanding the builtin in simd_intrinsic with SIMD available the ICE occurs. I think that is because the types were not re-laid out. I'm somewhat stumped on ideas to work around this issue. I notice that rs6000 also defines custom builtin vector types. Michael, did you notice any issue similar to what I described above? Would re-laying the builtin vector type on target attribute changes be a valid way to go forward here? Thanks, Kyrill
Re: optimization question
I'm not very familiar with the optimizations that are done in O2 vs O1, or even what happens in these optimizations. So, I'm wondering if this is a bug, or a subtle valid optimization that I don't understand. Any help would be appreciated. Another approach to debugging a suspected optimization bug is to look at the optimization dumps produced in response to one or more of the -fdump-tree-xxx and -fdump-rtl-xxx options. There are many of them and they tend to be verbose but not as much as the raw assembly and are usually easier to follow. The dumps make it possible to identify the buggy optimization pass and avoid the bug by disabling just the problematic optimization while leaving all the others enabled. With attribute optimize and/or #pragma GCC optimize, you can then control specify which optimization to disable for subsets of functions in a file. See the following sections in the manual for more: http://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/Debugging-Options.html#index-fdump-tree-673 http://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/Function-Specific-Option-Pragmas.html Martin
Re: optimization question
On 5/19/2015 10:09 AM, Martin Sebor wrote: I'm not very familiar with the optimizations that are done in O2 vs O1, or even what happens in these optimizations. So, I'm wondering if this is a bug, or a subtle valid optimization that I don't understand. Any help would be appreciated. Another approach to debugging a suspected optimization bug is to look at the optimization dumps produced in response to one or more of the -fdump-tree-xxx and -fdump-rtl-xxx options. There are many of them and they tend to be verbose but not as much as the raw assembly and are usually easier to follow. The dumps make it possible to identify the buggy optimization pass and avoid the bug by disabling just the problematic optimization while leaving all the others enabled. With attribute optimize and/or #pragma GCC optimize, you can then control specify which optimization to disable for subsets of functions in a file. See the following sections in the manual for more: http://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/Debugging-Options.html#index-fdump-tree-673 http://gcc.gnu.org/onlinedocs/gcc-4.8.2/gcc/Function-Specific-Option-Pragmas.html Martin Thanks again Martin. I started going down that road yesterday, and got lost in the forest of options. What I was looking for was some option that would tell me what was being done with dgHandle specifically. I played around with -fopt-info-all but didn't get very far ...
Re: target attributes/pragmas changing vector instruction availability and custom types
On 19/05/15 15:55, Christian Bruel wrote: Hi Kiril, This is funny, I've updated bz65837 today in the same direction. On 05/19/2015 04:17 PM, Kyrill Tkachov wrote: Hi all, I'm working on enabling target attributes and pragmas on aarch64 and I'm stuck on a particular issue. I want to be able to use a target pragma to enable SIMD support in a SIMD intrinsics header file. So it will look like this: $ cat simd_header.h #pragma GCC push_options #pragma GCC target ("arch=armv8-a+simd") #pragma GCC pop_options I would then include it in a file with a function tagged with a simd target attribute: $ cat foo.c #inlcude "simd_header.h" __attribute__((target("arch=armv8-a+simd"))) uint32x4_t foo (uint32x4_t a) { return simd_intrinsic (a); //simd_intrinsic defined in simd_header.h and implemented by a target builtin } This works fine for me. But if I try to compile this without SIMD support, say: aarch64-none-elf-gcc -c -march=armv8-a+nosimd foo.c I get an ICE during builtin expansion time. I think I've tracked it down to the problem that the type uint32x4_t is a builtin type that we define in the backend (with add_builtin_type) during the target builtins initialisation code. From what I can see, this code gets called early on after the command line options have been processed, but before target pragmas or attributes are processed, so the builtin types are laid out assuming that no SIMD is available, as per the command line option -march=armv8-a+nosimd, but later while expanding the builtin in simd_intrinsic with SIMD available the ICE occurs. I think that is because the types were not re-laid out. I share this analysis. I'm somewhat stumped on ideas to work around this issue. I notice that rs6000 also defines custom builtin vector types. Michael, did you notice any issue similar to what I described above? Would re-laying the builtin vector type on target attribute changes be a valid way to go forward here? this is what I've done on arm, seems to work locally. for aarch64, you can try to call aarch64_init_simd_builtins () from your target hook, then we need to think how to unset them. Hmmm, calling aarch64_init_simd_builtin_types in the VALID_TARGET_ATTRIBUTE_P hook seems to work for me. Thanks for the suggestion! I think undefining shouldn't be a concern since they are target-specific builtins and we make no guarantee on their availability or behaviour through any other use other than the intrinsics in arm_neon.h. Of course, we'd need to massage the aarch64_init_simd_builtin_types to only re-layout the types once, so that we don't end up doing redundant work or bloating memory. Thanks again! Kyrill Cheers Christian Thanks, Kyrill
Re: [i386] Scalar DImode instructions on XMM registers
On 05/18/2015 08:13 AM, Ilya Enkovich wrote: 2015-05-06 17:18 GMT+03:00 Ilya Enkovich : 2015-04-25 4:32 GMT+03:00 Jan Hubicka : Hi, I am adding Vladimir and Richard into CC. I tried to solve similar problem with FP math years ago by having -mfpmath=sse,i387. The idea was to allow use of i387 registers when SSE ones run out and possibly also model the fact that Pentium4 had faster i387 additions than SSE additions. I also had some plans to extend this one mixed SSE/MMX/GPR integer arithmetics, but never got to that. This did not really fly becuase of the regalloc not really being able to understnad it (I made path to regclass to propagate the classes and figure out what operations needs to stay in i387 and what in SSE to avoid reloading, but that never got in). I believe Vladimir did some work on this with IRA (he is able to spill GPR regs into SSE and do bit of other tricks). Also I believe it was kind of Richard's design deicsion to avoid use of (paradoxical) subregs for vector conversions because these have funny implications. The code for handling upper parts of paradoxical subregs is controlled by macros around SUBREG_PROMOTED_VAR_P but I do not think it will handle V1DI->V2DI conversions fluently without some middle-end hacking. (it will probably try to produce zero extensions) When we are on SSE instructions, it would be great to finally teach copy_by_pieces/store_by_pieces to use vector instructions (these are more compact and either equaly fast or faster on some CPUs). I hope to get into this, but it would be great if someone beat me. Honza I'm trying to implement it as separate RTL pass which chooses a scalar/vector mode for each 64bit computation chain and performs transformation if we choose to use vectors. I also want to split DI instructions which are going to be implemented on GPRs before RA (currently it is done on the second split). Good metrics for such transformation is a big question but currently I can't even make it generate correct code when paradoxical subregs are used. It works in simple cases but I get troubles when spills appear. Trying to beat the following testcase: test (long long *arr) { register unsigned long long tmp; tmp = arr[0] | arr[1] & arr[2]; while (tmp) { counter (tmp); tmp = *(arr++) & tmp; } } RTL I generate seems OK to me (ignoring the fact that it is not optimal): (insn 6 3 50 2 (set (reg:DI 98 [ MEM[(long long int *)arr_5(D) + 8B] ]) (mem:DI (plus:SI (reg/v/f:SI 96 [ arr ]) (const_int 8 [0x8])) [2 MEM[(long long int *)arr_5(D) + 8B]+0 S8 A64])) pr65105-1.c:22 89 {*movdi_internal} (nil)) (insn 50 6 7 2 (set (reg:DI 104) (mem:DI (plus:SI (reg/v/f:SI 96 [ arr ]) (const_int 16 [0x10])) [2 MEM[(long long int *)arr_5(D) + 16B]+0 S8 A64])) pr65105-1.c:22 -1 (nil)) (insn 7 50 51 2 (set (subreg:V2DI (reg:DI 97 [ D.2586 ]) 0) (and:V2DI (subreg:V2DI (reg:DI 98 [ MEM[(long long int *)arr_5(D) + 8B] ]) 0) (subreg:V2DI (reg:DI 104) 0))) pr65105-1.c:22 3487 {*andv2di3} (expr_list:REG_DEAD (subreg:V2DI (reg:DI 98 [ MEM[(long long int *)arr_5(D) + 8B] ]) 0) (expr_list:REG_UNUSED (reg:CC 17 flags) (expr_list:REG_EQUAL (and:DI (mem:DI (plus:SI (reg/v/f:SI 96 [ arr ]) (const_int 8 [0x8])) [2 MEM[(long long int *)arr_5(D) + 8B]+0 S8 A64]) (mem:DI (plus:SI (reg/v/f:SI 96 [ arr ]) (const_int 16 [0x10])) [2 MEM[(long long int *)arr_5(D) + 16B]+0 S8 A64])) (nil) (insn 51 7 8 2 (set (reg:DI 105) (mem:DI (reg/v/f:SI 96 [ arr ]) [2 *arr_5(D)+0 S8 A64])) pr65105-1.c:22 -1 (nil)) (insn 8 51 46 2 (set (subreg:V2DI (reg/v:DI 87 [ tmp ]) 0) (ior:V2DI (subreg:V2DI (reg:DI 97 [ D.2586 ]) 0) (subreg:V2DI (reg:DI 105) 0))) pr65105-1.c:22 3489 {*iorv2di3} (expr_list:REG_DEAD (subreg:V2DI (reg:DI 97 [ D.2586 ]) 0) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil (insn 46 8 47 2 (set (reg:V2DI 103) (subreg:V2DI (reg/v:DI 87 [ tmp ]) 0)) pr65105-1.c:22 -1 (nil)) (insn 47 46 48 2 (set (subreg:SI (reg:DI 101) 0) (subreg:SI (reg:V2DI 103) 0)) pr65105-1.c:22 -1 (nil)) (insn 48 47 49 2 (set (reg:V2DI 103) (lshiftrt:V2DI (reg:V2DI 103) (const_int 32 [0x20]))) pr65105-1.c:22 -1 (nil)) (insn 49 48 9 2 (set (subreg:SI (reg:DI 101) 4) (subreg:SI (reg:V2DI 103) 0)) pr65105-1.c:22 -1 (nil)) (note 9 49 10 2 NOTE_INSN_DELETED) (insn 10 9 11 2 (parallel [ (set (reg:CCZ 17 flags) (compare:CCZ (ior:SI (subreg:SI (reg:DI 101) 4) (subreg:SI (reg:DI 101) 0)) (const_int 0 [0]))) (clobber (scratch:SI)) ]) pr65105-1.c:23 447 {*iorsi_3} (nil)) (jump_insn 11 10 37 2 (set (pc) (if_then_else (ne (reg:CCZ 17 flags)
Re: optimization question
On 05/19/2015 04:14 PM, mark maule wrote: > Thanks again Martin. I started going down that road yesterday, and got > lost in the forest of options. What I was looking for was some option > that would tell me what was being done with dgHandle specifically. I > played around with -fopt-info-all but didn't get very far ... I think you're going to find it very difficult to debug such a large chunk of code, and it's quite possible that your problem is in the RTL optimization anyway. I don't think you'll find anything better than trying to reduce the test case. Also, all those macros rather obfuscate what's going on. It may help later to preprocess the source and run it through indent. Andrew.
Re: optimization question
On 5/19/2015 10:28 AM, Andrew Haley wrote: On 05/19/2015 04:14 PM, mark maule wrote: Thanks again Martin. I started going down that road yesterday, and got lost in the forest of options. What I was looking for was some option that would tell me what was being done with dgHandle specifically. I played around with -fopt-info-all but didn't get very far ... I think you're going to find it very difficult to debug such a large chunk of code, and it's quite possible that your problem is in the RTL optimization anyway. I don't think you'll find anything better than trying to reduce the test case. Also, all those macros rather obfuscate what's going on. It may help later to preprocess the source and run it through indent. Andrew. Understood. That's my next step, and will repost to gcc-help per earlier suggestions. Thanks for the pointers.
Re: target attributes/pragmas changing vector instruction availability and custom types
On 19/05/15 16:21, Kyrill Tkachov wrote: On 19/05/15 15:55, Christian Bruel wrote: Hi Kiril, This is funny, I've updated bz65837 today in the same direction. On 05/19/2015 04:17 PM, Kyrill Tkachov wrote: Hi all, I'm working on enabling target attributes and pragmas on aarch64 and I'm stuck on a particular issue. I want to be able to use a target pragma to enable SIMD support in a SIMD intrinsics header file. So it will look like this: $ cat simd_header.h #pragma GCC push_options #pragma GCC target ("arch=armv8-a+simd") #pragma GCC pop_options I would then include it in a file with a function tagged with a simd target attribute: $ cat foo.c #inlcude "simd_header.h" __attribute__((target("arch=armv8-a+simd"))) uint32x4_t foo (uint32x4_t a) { return simd_intrinsic (a); //simd_intrinsic defined in simd_header.h and implemented by a target builtin } This works fine for me. But if I try to compile this without SIMD support, say: aarch64-none-elf-gcc -c -march=armv8-a+nosimd foo.c I get an ICE during builtin expansion time. I think I've tracked it down to the problem that the type uint32x4_t is a builtin type that we define in the backend (with add_builtin_type) during the target builtins initialisation code. From what I can see, this code gets called early on after the command line options have been processed, but before target pragmas or attributes are processed, so the builtin types are laid out assuming that no SIMD is available, as per the command line option -march=armv8-a+nosimd, but later while expanding the builtin in simd_intrinsic with SIMD available the ICE occurs. I think that is because the types were not re-laid out. I share this analysis. I'm somewhat stumped on ideas to work around this issue. I notice that rs6000 also defines custom builtin vector types. Michael, did you notice any issue similar to what I described above? Would re-laying the builtin vector type on target attribute changes be a valid way to go forward here? this is what I've done on arm, seems to work locally. for aarch64, you can try to call aarch64_init_simd_builtins () from your target hook, then we need to think how to unset them. Hmmm, calling aarch64_init_simd_builtin_types in the VALID_TARGET_ATTRIBUTE_P hook seems to work for me. Actually, scratch that. I had used the wrong compiler. Calling the builtin init code through the target attribute hook didn't work :(. Looking further into int... Kyrill Thanks for the suggestion! I think undefining shouldn't be a concern since they are target-specific builtins and we make no guarantee on their availability or behaviour through any other use other than the intrinsics in arm_neon.h. Of course, we'd need to massage the aarch64_init_simd_builtin_types to only re-layout the types once, so that we don't end up doing redundant work or bloating memory. Thanks again! Kyrill Cheers Christian Thanks, Kyrill
gcc-5-20150519 is now available
Snapshot gcc-5-20150519 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/5-20150519/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 5 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/branches/gcc-5-branch revision 223417 You'll find: gcc-5-20150519.tar.bz2 Complete GCC MD5=84f261b2f23e154ec6d9bd4149851a21 SHA1=c6e72cc4ebd446df4cc797947107cbefad21bdf5 Diffs from 5-20150512 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-5 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
Compilers and RCU readers: Once more unto the breach!
Hello! Following up on last year's discussion (https://lwn.net/Articles/586838/, https://lwn.net/Articles/588300/), I believe that we have a solution. If I am wrong, I am sure you all will let me know, and in great detail. ;-) The key simplification is to "just say no" to RCU-protected array indexes: https://lkml.org/lkml/2015/5/12/827, as was suggested by several people. This simplification means that rcu_dereference (AKA memory_order_consume) need only return pointers. This in ture avoids things like (x-x), (x*0), and (x%1) because if "x" is a pointer, these expressions either return non-pointers are compilation errors. With a very few exceptions, dependency chains can lead -to- non-pointers, but cannot pass -through- them. The result is that dependencies are carried only by operations for which the compiler cannot easily optimize the away those dependencies, these operations including simple assignment, integer offset (including indexing), dereferencing, casts, passing as a function argument, return values from functions and so on. A complete list with commentary starts on page 28 of: http://www.rdrop.com/users/paulmck/RCU/consume.2015.05.18a.pdf Dependency chains are broken if a pointer compares equal to some other pointer not part of the same dependency chain, if too many bits are ORed onto or ANDed off of a intptr_t or uintptr_t, or if the dependency is explicitly killed (which should now strictly speaking never be necessary, but which might allow better diagnostics). These are set out in more detail on page 30 of the above PDF. This covers all the uses in the Linux kernel that I am aware of without any source-code changes (other than to the rcu_dereference() primitives themselves) and should also work for compilers and standards. Thoughts? Thanx, Paul
Re: Compilers and RCU readers: Once more unto the breach!
On Tue, May 19, 2015 at 5:55 PM, Paul E. McKenney wrote: > > http://www.rdrop.com/users/paulmck/RCU/consume.2015.05.18a.pdf >From a very quick read-through, the restricted dependency chain in 7.9 seems to be reasonable, and essentially covers "thats' what hardware gives us anyway", making compiler writers happy. I would clarify the language somewhat: - it says that the result of a cast of a pointer is a dependency. You need to make an exception for casting to bool, methinks (because that's effectively just a test-against-NULL, which you later describe as terminating the dependency). Maybe get rid of the "any type", and just limit it to casts to types of size intptr_t, ie ones that don't drop significant bits. All the other rules talk about [u]intptr_t anyway. - you clarify that the trivial "& 0" and "| ~0" kill the dependency chain, but if you really want to be a stickler, you might want to extend it to a few more cases. Things like "& 1" (to extract a tag from the lot bit of a tagged pointer) should likely also drop the dependency, since a compiler would commonly end up using the end result as a conditional even if the code was written to then use casting etc to look like a dereference. - the "you can add/subtract integral values" still opens you up to language lawyers claiming "(char *)ptr - (intptr_t)ptr" preserving the dependency, which it clearly doesn't. But language-lawyering it does, since all those operations (cast to pointer, cast to integer, subtracting an integer) claim to be dependency-preserving operations. So I think you want to limit the logical operators to things that don't mask off too many bits, and you should probably limit the add/subtract operations some way (maybe specify that the integer value you add/subtract cannot be related to the pointer). But I think limiting it to mostly pointer ops (and a _few_ integer operations to do offsets and remove tag bits) is otherwise a good approach. Linus
Re: Compilers and RCU readers: Once more unto the breach!
On Tue, May 19, 2015 at 6:57 PM, Linus Torvalds wrote: > > - the "you can add/subtract integral values" still opens you up to > language lawyers claiming "(char *)ptr - (intptr_t)ptr" preserving the > dependency, which it clearly doesn't. But language-lawyering it does, > since all those operations (cast to pointer, cast to integer, > subtracting an integer) claim to be dependency-preserving operations. > > So I think you want to limit the logical operators to things that > don't mask off too many bits, and you should probably limit the > add/subtract operations some way (maybe specify that the integer value > you add/subtract cannot be related to the pointer). Actually, "not related" doesn't work. For some buddy allocator thing, you very much might want some of the bits to be related. So I think you're better off just saying that operations designed to drop significant bits break the dependency chain, and give things like "& 1" and "(char *)ptr-(uintptr_t)ptr" as examples of such. Making that just an extension of your existing "& 0" language would seem to be natural. Humans will understand, and compiler writers won't care. They will either depend on hardware semantics anyway (and argue that your language is tight enough that they don't need to do anything special) or they will turn the consume into an acquire (on platforms that have too weak hardware). Linus
Re: Compilers and RCU readers: Once more unto the breach!
On Tue, May 19, 2015 at 06:57:02PM -0700, Linus Torvalds wrote: > On Tue, May 19, 2015 at 5:55 PM, Paul E. McKenney > wrote: > > > > http://www.rdrop.com/users/paulmck/RCU/consume.2015.05.18a.pdf > > >From a very quick read-through, the restricted dependency chain in 7.9 > seems to be reasonable, and essentially covers "thats' what hardware > gives us anyway", making compiler writers happy. > > I would clarify the language somewhat: > > - it says that the result of a cast of a pointer is a dependency. You > need to make an exception for casting to bool, methinks (because > that's effectively just a test-against-NULL, which you later describe > as terminating the dependency). > >Maybe get rid of the "any type", and just limit it to casts to > types of size intptr_t, ie ones that don't drop significant bits. All > the other rules talk about [u]intptr_t anyway. Excellent point! I now say: If a pointer is part of a dependency chain, then casting it (either explicitly or implicitly) to any pointer-sized type extends the chain to the result. If this approach works out, the people in the Core Working Group will come up with alternative language-lawyer-proof wording, but this informal version will hopefully do for the moment. > - you clarify that the trivial "& 0" and "| ~0" kill the dependency > chain, but if you really want to be a stickler, you might want to > extend it to a few more cases. Things like "& 1" (to extract a tag > from the lot bit of a tagged pointer) should likely also drop the > dependency, since a compiler would commonly end up using the end > result as a conditional even if the code was written to then use > casting etc to look like a dereference. Ah, how about the following? If a value of type intptr_t or uintptr_t is part of a dependency chain, and if that value is one of the operands to an & or | infix operator whose result has too few or too many bits set, then the resulting value will not be part of any dependency chain. For example, on a 64-bit system, if p is part of a dependency chain, then (p & 0x7) provides just the tag bits, and normally cannot even be legally dereferenced. Similarly, (p | ~0) normally cannot be legally dereferenced. > - the "you can add/subtract integral values" still opens you up to > language lawyers claiming "(char *)ptr - (intptr_t)ptr" preserving the > dependency, which it clearly doesn't. But language-lawyering it does, > since all those operations (cast to pointer, cast to integer, > subtracting an integer) claim to be dependency-preserving operations. My thought was that the result of "(char *)ptr - (intptr_t)ptr" is a NULL pointer in most environments, and dereferencing a NULL pointer is undefined behavior. So it becomes irrelevant whether or not the NULL pointer carries a dependency. There are some stranger examples, such as "(char *)ptr - ((intptr_t)ptr)/7", but in that case, if the resulting pointer happens by chance to reference valid memory, I believe a dependency would still be carried. Of course, if you are producing code like that, I am guessing that dependencies are the very least of your concerns. However, I will give this some more thought. > So I think you want to limit the logical operators to things that > don't mask off too many bits, and you should probably limit the > add/subtract operations some way (maybe specify that the integer value > you add/subtract cannot be related to the pointer). But I think > limiting it to mostly pointer ops (and a _few_ integer operations to > do offsets and remove tag bits) is otherwise a good approach. Glad you mostly like it! ;-) Thanx, Paul
Re: Compilers and RCU readers: Once more unto the breach!
On Tue, May 19, 2015 at 07:10:12PM -0700, Linus Torvalds wrote: > On Tue, May 19, 2015 at 6:57 PM, Linus Torvalds > wrote: > > > > - the "you can add/subtract integral values" still opens you up to > > language lawyers claiming "(char *)ptr - (intptr_t)ptr" preserving the > > dependency, which it clearly doesn't. But language-lawyering it does, > > since all those operations (cast to pointer, cast to integer, > > subtracting an integer) claim to be dependency-preserving operations. > > > > So I think you want to limit the logical operators to things that > > don't mask off too many bits, and you should probably limit the > > add/subtract operations some way (maybe specify that the integer value > > you add/subtract cannot be related to the pointer). > > Actually, "not related" doesn't work. For some buddy allocator thing, > you very much might want some of the bits to be related. Good point, you could do the buddy-allocator computations with add and subtract instead of exclusive OR. > So I think you're better off just saying that operations designed to > drop significant bits break the dependency chain, and give things like > "& 1" and "(char *)ptr-(uintptr_t)ptr" as examples of such. > > Making that just an extension of your existing "& 0" language would > seem to be natural. Works for me! I added the following bullet to the list of things that break dependencies: If a pointer is part of a dependency chain, and if the values added to or subtracted from that pointer cancel the pointer value so as to allow the compiler to precisely determine the resulting value, then the resulting value will not be part of any dependency chain. For example, if p is part of a dependency chain, then ((char *)p-(uintptr_t)p)+65536 will not be. Seem reasonable? > Humans will understand, and compiler writers won't care. They will > either depend on hardware semantics anyway (and argue that your > language is tight enough that they don't need to do anything special) > or they will turn the consume into an acquire (on platforms that have > too weak hardware). Agreed. Plus Core Working Group will hammer out the exact wording, should this approach meet their approval. Thanx, Paul