RE: [PATCH] Add a new option "-ftree-bitfield-merge" (patch / doc inside)
Hello, This is new patch version. Optimization does not use BIT_FIELD_REF any more, instead it generates new COMPONENT_REF and FIELD_DECL. Existing Bit field representative is associated with newly created field declaration. During analysis phase optimization uses bit field representatives when deciding which bit-fields accesses can be merged. Instead of having separate pass optimization is moved to tree-sra.c and executed with sra early. New test case involving unions is added. Also, some other comments received on first patch are applied in new implementation. Example: Original code: D.1351; D.1350; D.1349; D.1349_2 = p1_1(D)->f1; p2_3(D)->f1 = D.1349_2; D.1350_4 = p1_1(D)->f2; p2_3(D)->f2 = D.1350_4; D.1351_5 = p1_1(D)->f3; p2_3(D)->f3 = D.1351_5; Optimized code: D.1358; _16 = pr1_2(D)->_field0; pr2_4(D)->_field0 = _16; Algorithm works on basic block level and consists of following 3 major steps: 1. Go trough basic block statements list. If there are statement pairs that implement copy of bit field content from one memory location to another record statements pointers and other necessary data in corresponding data structure. 2. Identify records that represent adjacent bit field accesses and mark them as merged. 3. Modify trees accordingly. New command line option "-ftree-bitfield-merge" is introduced. Tested - passed gcc regression tests. Changelog - gcc/ChangeLog: 2013-08-22 Zoran Jovanovic (zoran.jovano...@imgtec.com) * Makefile.in : Added tree-sra.c to GTFILES. * common.opt (ftree-bitfield-merge): New option. * doc/invoke.texi: Added reference to "-ftree-bitfield-merge". * tree-sra.c (ssa_bitfield_merge): New function. Entry for (-ftree-bitfield-merge). (bitfield_stmt_access_pair_htab_hash): New function. (bitfield_stmt_access_pair_htab_eq): New function. (cmp_access): New function. (create_and_insert_access): New function. (get_bit_offset): New function. (get_merged_bit_field_size): New function. (add_stmt_access_pair): New function. * dwarf2out.c (simple_type_size_in_bits): moved to tree.c. (field_byte_offset): declaration moved to tree.h, static removed. * testsuite/gcc.dg/tree-ssa/bitfldmrg1.c: New test. * testsuite/gcc.dg/tree-ssa/bitfldmrg2.c: New test. * tree-ssa-sccvn.c (expressions_equal_p): moved to tree.c. * tree-ssa-sccvn.h (expressions_equal_p): declaration moved to tree.h. * tree.c (expressions_equal_p): moved from tree-ssa-sccvn.c. (simple_type_size_in_bits): moved from dwarf2out.c. * tree.h (expressions_equal_p): declaration added. (field_byte_offset): declaration added. Patch - diff --git a/gcc/Makefile.in b/gcc/Makefile.in index 6034046..dad9337 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -3831,6 +3831,7 @@ GTFILES = $(CPP_ID_DATA_H) $(srcdir)/input.h $(srcdir)/coretypes.h \ $(srcdir)/vtable-verify.c \ $(srcdir)/asan.c \ $(srcdir)/tsan.c $(srcdir)/ipa-devirt.c \ + $(srcdir)/tree-sra.c \ @all_gtfiles@ # Compute the list of GT header files from the corresponding C sources, diff --git a/gcc/common.opt b/gcc/common.opt index 9082280..fe0ecd9 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -2160,6 +2160,10 @@ ftree-sra Common Report Var(flag_tree_sra) Optimization Perform scalar replacement of aggregates +ftree-bitfield-merge +Common Report Var(flag_tree_bitfield_merge) Init(0) Optimization +Enable bit-field merge on trees + ftree-ter Common Report Var(flag_tree_ter) Optimization Replace temporary expressions in the SSA->normal pass diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index dae7605..7abe538 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -412,7 +412,7 @@ Objective-C and Objective-C++ Dialects}. -fsplit-ivs-in-unroller -fsplit-wide-types -fstack-protector @gol -fstack-protector-all -fstack-protector-strong -fstrict-aliasing @gol -fstrict-overflow -fthread-jumps -ftracer -ftree-bit-ccp @gol --ftree-builtin-call-dce -ftree-ccp -ftree-ch @gol +-ftree-bitfield-merge -ftree-builtin-call-dce -ftree-ccp -ftree-ch @gol -ftree-coalesce-inline-vars -ftree-coalesce-vars -ftree-copy-prop @gol -ftree-copyrename -ftree-dce -ftree-dominator-opts -ftree-dse @gol -ftree-forwprop -ftree-fre -ftree-loop-if-convert @gol @@ -7646,6 +7646,11 @@ pointer alignment information. This pass only operates on local scalar variables and is enabled by default at @option{-O} and higher. It requires that @option{-ftree-ccp} is enabled. +@item -ftree-bitfield-merge +@opindex ftree-bitfield-merge +Combines several adjacent bit-field accesses that copy values +from one memory location to another into single bit-field access. + @item -ftree-ccp @opindex ftree-ccp Perform sparse conditional constant propagation (CCP) on trees. This diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c index fc1c3f2..f3530ef 100644 --- a/gcc/dwarf2out.c +++ b/gcc/dwarf2out.c @@ -3104,8 +3104,6 @@ static HOST_WIDE_INT ceili
RE: [PATCH] Add a new option "-ftree-bitfield-merge" (patch / doc inside)
Hello, This is new patch version. Comments from Bernhard Reutner-Fischer review applied. Also, test case bitfildmrg2.c modified - it is now execute test. Example: Original code: D.1351; D.1350; D.1349; D.1349_2 = p1_1(D)->f1; p2_3(D)->f1 = D.1349_2; D.1350_4 = p1_1(D)->f2; p2_3(D)->f2 = D.1350_4; D.1351_5 = p1_1(D)->f3; p2_3(D)->f3 = D.1351_5; Optimized code: D.1358; _16 = pr1_2(D)->_field0; pr2_4(D)->_field0 = _16; Algorithm works on basic block level and consists of following 3 major steps: 1. Go trough basic block statements list. If there are statement pairs that implement copy of bit field content from one memory location to another record statements pointers and other necessary data in corresponding data structure. 2. Identify records that represent adjacent bit field accesses and mark them as merged. 3. Modify trees accordingly. New command line option "-ftree-bitfield-merge" is introduced. Tested - passed gcc regression tests. Changelog - gcc/ChangeLog: 2013-09-24 Zoran Jovanovic (zoran.jovano...@imgtec.com) * Makefile.in : Added tree-sra.c to GTFILES. * common.opt (ftree-bitfield-merge): New option. * doc/invoke.texi: Added reference to "-ftree-bitfield-merge". * tree-sra.c (ssa_bitfield_merge): New function. Entry for (-ftree-bitfield-merge). (bitfield_stmt_access_pair_htab_hash): New function. (bitfield_stmt_access_pair_htab_eq): New function. (cmp_access): New function. (create_and_insert_access): New function. (get_bit_offset): New function. (get_merged_bit_field_size): New function. (add_stmt_access_pair): New function. * dwarf2out.c (simple_type_size_in_bits): moved to tree.c. (field_byte_offset): declaration moved to tree.h, static removed. * testsuite/gcc.dg/tree-ssa/bitfldmrg1.c: New test. * testsuite/gcc.dg/tree-ssa/bitfldmrg2.c: New test. * tree-ssa-sccvn.c (expressions_equal_p): moved to tree.c. * tree-ssa-sccvn.h (expressions_equal_p): declaration moved to tree.h. * tree.c (expressions_equal_p): moved from tree-ssa-sccvn.c. (simple_type_size_in_bits): moved from dwarf2out.c. * tree.h (expressions_equal_p): declaration added. (field_byte_offset): declaration added. Patch - diff --git a/gcc/Makefile.in b/gcc/Makefile.in index a2e3f7a..54aa8e7 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -3847,6 +3847,7 @@ GTFILES = $(CPP_ID_DATA_H) $(srcdir)/input.h $(srcdir)/coretypes.h \ $(srcdir)/asan.c \ $(srcdir)/ubsan.c \ $(srcdir)/tsan.c $(srcdir)/ipa-devirt.c \ + $(srcdir)/tree-sra.c \ @all_gtfiles@ # Compute the list of GT header files from the corresponding C sources, diff --git a/gcc/common.opt b/gcc/common.opt index 202e169..afac514 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -2164,6 +2164,10 @@ ftree-sra Common Report Var(flag_tree_sra) Optimization Perform scalar replacement of aggregates +ftree-bitfield-merge +Common Report Var(flag_tree_bitfield_merge) Init(0) Optimization +Enable bit-field merge on trees + ftree-ter Common Report Var(flag_tree_ter) Optimization Replace temporary expressions in the SSA->normal pass diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index aa0f4ed..e588cae 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -412,7 +412,7 @@ Objective-C and Objective-C++ Dialects}. -fsplit-ivs-in-unroller -fsplit-wide-types -fstack-protector @gol -fstack-protector-all -fstack-protector-strong -fstrict-aliasing @gol -fstrict-overflow -fthread-jumps -ftracer -ftree-bit-ccp @gol --ftree-builtin-call-dce -ftree-ccp -ftree-ch @gol +-ftree-bitfield-merge -ftree-builtin-call-dce -ftree-ccp -ftree-ch @gol -ftree-coalesce-inline-vars -ftree-coalesce-vars -ftree-copy-prop @gol -ftree-copyrename -ftree-dce -ftree-dominator-opts -ftree-dse @gol -ftree-forwprop -ftree-fre -ftree-loop-if-convert @gol @@ -7679,6 +7679,11 @@ pointer alignment information. This pass only operates on local scalar variables and is enabled by default at @option{-O} and higher. It requires that @option{-ftree-ccp} is enabled. +@item -ftree-bitfield-merge +@opindex ftree-bitfield-merge +Combines several adjacent bit-field accesses that copy values +from one memory location to another into one single bit-field access. + @item -ftree-ccp @opindex ftree-ccp Perform sparse conditional constant propagation (CCP) on trees. This diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c index 95049e4..e74db17 100644 --- a/gcc/dwarf2out.c +++ b/gcc/dwarf2out.c @@ -3108,8 +3108,6 @@ static HOST_WIDE_INT ceiling (HOST_WIDE_INT, unsigned int); static tree field_type (const_tree); static unsigned int simple_type_align_in_bits (const_tree); static unsigned int simple_decl_align_in_bits (const_tree); -static unsigned HOST_WIDE_INT simple_type_size_in_bits (const_tree); -static HOST_WIDE_INT field_byte_offset (const_tree); static void add_AT_location_description(dw_die_ref, enum dwarf_attribute,
[PING] [PATCH] Add a new option "-ftree-bitfield-merge" (patch / doc inside)
Just to ping this patch. http://gcc.gnu.org/ml/gcc-patches/2013-09/msg01829.html Regards, Zoran Jovanovic From: Zoran Jovanovic Sent: Tuesday, September 24, 2013 11:59 PM To: gcc-patches@gcc.gnu.org Cc: Petar Jovanovic Subject: RE: [PATCH] Add a new option "-ftree-bitfield-merge" (patch / doc inside) Hello, This is new patch version. Comments from Bernhard Reutner-Fischer review applied. Also, test case bitfildmrg2.c modified - it is now execute test. Example: Original code: D.1351; D.1350; D.1349; D.1349_2 = p1_1(D)->f1; p2_3(D)->f1 = D.1349_2; D.1350_4 = p1_1(D)->f2; p2_3(D)->f2 = D.1350_4; D.1351_5 = p1_1(D)->f3; p2_3(D)->f3 = D.1351_5; Optimized code: D.1358; _16 = pr1_2(D)->_field0; pr2_4(D)->_field0 = _16; Algorithm works on basic block level and consists of following 3 major steps: 1. Go trough basic block statements list. If there are statement pairs that implement copy of bit field content from one memory location to another record statements pointers and other necessary data in corresponding data structure. 2. Identify records that represent adjacent bit field accesses and mark them as merged. 3. Modify trees accordingly. New command line option "-ftree-bitfield-merge" is introduced. Tested - passed gcc regression tests. Changelog - gcc/ChangeLog: 2013-09-24 Zoran Jovanovic (zoran.jovano...@imgtec.com) * Makefile.in : Added tree-sra.c to GTFILES. * common.opt (ftree-bitfield-merge): New option. * doc/invoke.texi: Added reference to "-ftree-bitfield-merge". * tree-sra.c (ssa_bitfield_merge): New function. Entry for (-ftree-bitfield-merge). (bitfield_stmt_access_pair_htab_hash): New function. (bitfield_stmt_access_pair_htab_eq): New function. (cmp_access): New function. (create_and_insert_access): New function. (get_bit_offset): New function. (get_merged_bit_field_size): New function. (add_stmt_access_pair): New function. * dwarf2out.c (simple_type_size_in_bits): moved to tree.c. (field_byte_offset): declaration moved to tree.h, static removed. * testsuite/gcc.dg/tree-ssa/bitfldmrg1.c: New test. * testsuite/gcc.dg/tree-ssa/bitfldmrg2.c: New test. * tree-ssa-sccvn.c (expressions_equal_p): moved to tree.c. * tree-ssa-sccvn.h (expressions_equal_p): declaration moved to tree.h. * tree.c (expressions_equal_p): moved from tree-ssa-sccvn.c. (simple_type_size_in_bits): moved from dwarf2out.c. * tree.h (expressions_equal_p): declaration added. (field_byte_offset): declaration added. Patch - diff --git a/gcc/Makefile.in b/gcc/Makefile.in index a2e3f7a..54aa8e7 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -3847,6 +3847,7 @@ GTFILES = $(CPP_ID_DATA_H) $(srcdir)/input.h $(srcdir)/coretypes.h \ $(srcdir)/asan.c \ $(srcdir)/ubsan.c \ $(srcdir)/tsan.c $(srcdir)/ipa-devirt.c \ + $(srcdir)/tree-sra.c \ @all_gtfiles@ # Compute the list of GT header files from the corresponding C sources, diff --git a/gcc/common.opt b/gcc/common.opt index 202e169..afac514 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -2164,6 +2164,10 @@ ftree-sra Common Report Var(flag_tree_sra) Optimization Perform scalar replacement of aggregates +ftree-bitfield-merge +Common Report Var(flag_tree_bitfield_merge) Init(0) Optimization +Enable bit-field merge on trees + ftree-ter Common Report Var(flag_tree_ter) Optimization Replace temporary expressions in the SSA->normal pass diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index aa0f4ed..e588cae 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -412,7 +412,7 @@ Objective-C and Objective-C++ Dialects}. -fsplit-ivs-in-unroller -fsplit-wide-types -fstack-protector @gol -fstack-protector-all -fstack-protector-strong -fstrict-aliasing @gol -fstrict-overflow -fthread-jumps -ftracer -ftree-bit-ccp @gol --ftree-builtin-call-dce -ftree-ccp -ftree-ch @gol +-ftree-bitfield-merge -ftree-builtin-call-dce -ftree-ccp -ftree-ch @gol -ftree-coalesce-inline-vars -ftree-coalesce-vars -ftree-copy-prop @gol -ftree-copyrename -ftree-dce -ftree-dominator-opts -ftree-dse @gol -ftree-forwprop -ftree-fre -ftree-loop-if-convert @gol @@ -7679,6 +7679,11 @@ pointer alignment information. This pass only operates on local scalar variables and is enabled by default at @option{-O} and higher. It requires that @option{-ftree-ccp} is enabled. +@item -ftree-bitfield-merge +@opindex ftree-bitfield-merge +Combines several adjacent bit-field accesses that copy values +from one memory location to another into one single bit-field access. + @item -ftree-ccp @opindex ftree-ccp Perform sparse conditional constant propagation (CCP) on trees. This diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c index 95049e4..e74db17 100644 --- a/gcc/dwarf2out.c +++ b/gcc/dwarf2out.c @@ -3108,8 +3108,6 @@ static HOST_WIDE_INT ceiling (HOST_WIDE_INT, unsigned int); static tree field_type (const_tree);
RE: [PATCH] Add a new option "-fmerge-bitfields" (patch / doc inside)
Hello, This is new patch version in which reported issue is fixed. Also, patch is rebased to the revision 216452 and some minor code clean-up is done. -- Lowering is applied only for bit-fields copy sequences that are merged. Data structure representing bit-field copy sequences is renamed and reduced in size. Optimization turned on by default for -O2 and higher. Some comments fixed. Benchmarking performed on WebKit for Android. Code size reduction noticed on several files, best examples are: core/rendering/style/StyleMultiColData (632->520 bytes) core/platform/graphics/FontDescription (1715->1475 bytes) core/rendering/style/FillLayer (5069->4513 bytes) core/rendering/style/StyleRareInheritedData (5618->5346) core/css/CSSSelectorList(4047->3887) core/platform/animation/CSSAnimationData (3844->3440 bytes) core/css/resolver/FontBuilder (13818->13350 bytes) core/platform/graphics/Font (16447->15975 bytes) Example: One of the motivating examples for this work was copy constructor of the class which contains bit-fields. C++ code: class A { public: A(const A &x); unsigned a : 1; unsigned b : 2; unsigned c : 4; }; A::A(const A&x) { a = x.a; b = x.b; c = x.c; } GIMPLE code without optimization: : _3 = x_2(D)->a; this_4(D)->a = _3; _6 = x_2(D)->b; this_4(D)->b = _6; _8 = x_2(D)->c; this_4(D)->c = _8; return; Optimized GIMPLE code: : _10 = x_2(D)->D.1867; _11 = BIT_FIELD_REF <_10, 7, 0>; _12 = this_4(D)->D.1867; _13 = _12 & 128; _14 = (unsigned char) _11; _15 = _13 | _14; this_4(D)->D.1867 = _15; return; Generated MIPS32r2 assembly code without optimization: lw $3,0($5) lbu $2,0($4) andi$3,$3,0x1 andi$2,$2,0xfe or $2,$2,$3 sb $2,0($4) lw $3,0($5) andi$2,$2,0xf9 andi$3,$3,0x6 or $2,$2,$3 sb $2,0($4) lw $3,0($5) andi$2,$2,0x87 andi$3,$3,0x78 or $2,$2,$3 j $31 sb $2,0($4) Optimized MIPS32r2 assembly code: lw $3,0($5) lbu $2,0($4) andi$3,$3,0x7f andi$2,$2,0x80 or $2,$3,$2 j $31 sb $2,0($4) Algorithm works on basic block level and consists of following 3 major steps: 1. Go through basic block statements list. If there are statement pairs that implement copy of bit field content from one memory location to another record statements pointers and other necessary data in corresponding data structure. 2. Identify records that represent adjacent bit field accesses and mark them as merged. 3. Lower bit-field accesses by using new field size for those that can be merged. New command line option "-fmerge-bitfields" is introduced. Tested - passed gcc regression tests for MIPS32r2. Changelog - gcc/ChangeLog: 2014-04-22 Zoran Jovanovic (zoran.jovano...@imgtec.com) * common.opt (fmerge-bitfields): New option. * doc/invoke.texi: Add reference to "-fmerge-bitfields". * doc/invoke.texi: Add "-fmerge-bitfields" to the list of optimization flags turned on at -O2. * tree-sra.c (lower_bitfields): New function. Entry for (-fmerge-bitfields). (part_of_union_p): New function. (bf_access_candidate_p): New function. (lower_bitfield_read): New function. (lower_bitfield_write): New function. (bitfield_stmt_bfcopy_pair::hash): New function. (bitfield_stmt_bfcopy_pair::equal): New function. (bitfield_stmt_bfcopy_pair::remove): New function. (create_and_insert_bfcopy): New function. (get_bit_offset): New function. (add_stmt_bfcopy_pair): New function. (cmp_bfcopies): New function. (get_merged_bit_field_size): New function. * dwarf2out.c (simple_type_size_in_bits): Move to tree.c. (field_byte_offset): Move declaration to tree.h and make it extern. * testsuite/gcc.dg/tree-ssa/bitfldmrg1.c: New test. * testsuite/gcc.dg/tree-ssa/bitfldmrg2.c: New test. * tree-ssa-sccvn.c (expressions_equal_p): Move to tree.c. * tree-ssa-sccvn.h (expressions_equal_p): Move declaration to tree.h. * tree.c (expressions_equal_p): Move from tree-ssa-sccvn.c. (simple_type_size_in_bits): Move from dwarf2out.c. * tree.h (expressions_equal_p): Add declaration. (field_byte_offset): Add declaration. Patch - diff --git a/gcc/common.opt b/gcc/common.opt index 5db5e1e..cec145c 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -2270,6 +2270,10 @@ ftree-sra Common Report Var(flag_tree_sra) Optimization Perform scalar replacement of aggregates +fmerge-bitfields +Common Report Var(flag_tree_bitfield_merge) Optimization +Merge loads and stores of consecutive bitfields + ftree-ter Common
[PATCH] Add a new option "-ftree-bitfield-merge" (patch / doc inside)
Hello, This patch adds new optimization pass that combines several adjacent bit field accesses that copy values from one memory location to another into single bit field access. Example: Original code: D.1351; D.1350; D.1349; D.1349_2 = p1_1(D)->f1; p2_3(D)->f1 = D.1349_2; D.1350_4 = p1_1(D)->f2; p2_3(D)->f2 = D.1350_4; D.1351_5 = p1_1(D)->f3; p2_3(D)->f3 = D.1351_5; Optimized code: D.1358; D.1358_10 = BIT_FIELD_REF <*p1_1(D), 19, 13>; BIT_FIELD_REF <*p2_3(D), 19, 13> = D.1358_10; Algorithm works on basic block level and consists of following 3 major steps: 1. Go trough basic block statements list. If there are statement pairs that implement copy of bit field content from one memory location to another record statements pointers and other necessary data in corresponding data structure. 2. Identify records that represent adjacent bit field accesses and mark them as merged. 3. Modify trees accordingly. New command line option "-ftree-bitfield-merge" is introduced. Tested - passed gcc regression tests. Changelog - gcc/ChangeLog: 2013-07-17 Zoran Jovanovic (zoran.jovano...@imgtec.com) * Makefile.in : Added tree-ssa-bitfield-merge.o to OBJS. * common.opt (ftree-bitfield-merge): New option. * doc/invoke.texi: Added reference to "-ftree-bitfield-merge". * dwarf2out.c (field_type): static removed from declaration. (simple_type_size_in_bits): static removed from declaration. (field_byte_offset): static removed from declaration. (field_type): static inline removed from declaration. * passes.c (init_optimization_passes): pass_bitfield_merge pass added. * testsuite/gcc.dg/tree-ssa/bitfldmrg.c: New test. * timevar.def : Added TV_TREE_BITFIELD_MERGE. * tree-pass.h : Added pass_bitfield_merge declaration. * tree-ssa-bitfield-merge.c : New file. Patch - diff --git a/gcc/Makefile.in b/gcc/Makefile.in index d5121f3..5cdd6eb 100644 --- a/gcc/Makefile.in +++ b/gcc/Makefile.in @@ -1417,6 +1417,7 @@ OBJS = \ tree-ssa-dom.o \ tree-ssa-dse.o \ tree-ssa-forwprop.o \ + tree-ssa-bitfield-merge.o \ tree-ssa-ifcombine.o \ tree-ssa-live.o \ tree-ssa-loop-ch.o \ @@ -2312,6 +2313,11 @@ tree-ssa-forwprop.o : tree-ssa-forwprop.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \ $(TREE_FLOW_H) $(TREE_PASS_H) $(DIAGNOSTIC_H) \ langhooks.h $(FLAGS_H) $(GIMPLE_H) $(GIMPLE_PRETTY_PRINT_H) $(EXPR_H) \ $(OPTABS_H) tree-ssa-propagate.h +tree-ssa-bitfield-merge.o : tree-ssa-forwprop.c $(CONFIG_H) $(SYSTEM_H) \ + coretypes.h $(TM_H) $(TREE_H) $(TM_P_H) $(BASIC_BLOCK_H) \ + $(TREE_FLOW_H) $(TREE_PASS_H) $(TREE_DUMP_H) $(DIAGNOSTIC_H) $(TIMEVAR_H) \ + langhooks.h $(FLAGS_H) $(GIMPLE_H) tree-pretty-print.h \ + gimple-pretty-print.h $(EXPR_H) tree-ssa-phiprop.o : tree-ssa-phiprop.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \ $(TM_H) $(TREE_H) $(TM_P_H) $(BASIC_BLOCK_H) \ $(TREE_FLOW_H) $(TREE_PASS_H) $(DIAGNOSTIC_H) \ @@ -3803,6 +3809,7 @@ GTFILES = $(CPP_ID_DATA_H) $(srcdir)/input.h $(srcdir)/coretypes.h \ $(srcdir)/ipa-inline.h \ $(srcdir)/asan.c \ $(srcdir)/tsan.c \ + $(srcdir)/tree-ssa-bitfield-merge.c \ @all_gtfiles@ # Compute the list of GT header files from the corresponding C sources, diff --git a/gcc/common.opt b/gcc/common.opt index 4c7933e..e0dbc37 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -2088,6 +2088,10 @@ ftree-forwprop Common Report Var(flag_tree_forwprop) Init(1) Optimization Enable forward propagation on trees +ftree-bitfield-merge +Common Report Var(flag_tree_bitfield_merge) Init(0) Optimization +Enable bit field merge on trees + ftree-fre Common Report Var(flag_tree_fre) Optimization Enable Full Redundancy Elimination (FRE) on trees diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index dd82880..7b671aa 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -409,7 +409,7 @@ Objective-C and Objective-C++ Dialects}. -fsplit-ivs-in-unroller -fsplit-wide-types -fstack-protector @gol -fstack-protector-all -fstack-protector-strong -fstrict-aliasing @gol -fstrict-overflow -fthread-jumps -ftracer -ftree-bit-ccp @gol --ftree-builtin-call-dce -ftree-ccp -ftree-ch @gol +-ftree-bitfield-merge -ftree-builtin-call-dce -ftree-ccp -ftree-ch @gol -ftree-coalesce-inline-vars -ftree-coalesce-vars -ftree-copy-prop @gol -ftree-copyrename -ftree-dce -ftree-dominator-opts -ftree-dse @gol -ftree-forwprop -ftree-fre -ftree-loop-if-convert @gol @@ -7582,6 +7582,11 @@ pointer alignment information. This pass only operates on local scalar variables and is enabled by default at @option{-O} and higher. It requires that @option{-ftree-ccp} is enabled. +@item -ftree-bitfield-merge +@opindex ftree-bitfield-merge +Combines several adjacent bit field accesses that copy values +from one memory location to another into single bit field access. + @item -ftree-ccp @opindex ftree-ccp Perform sparse conditional constant propagation
RE: [PATCH] Add a new option "-ftree-bitfield-merge" (patch / doc inside)
Thank you for the reply. I am in the process of modifying the patch according to some comments received. Currently I am considering the usage of DECL_BIT_FIELD_REPRESENTATIVE. I see that they can be used during analysis phase for deciding which accesses can be merged - only accesses with same representative will be merged. I have more dilemmas with usage of representatives for lowering. If my understanding is correct bit field representative can only be associated to a field declaration, and not to a BIT_FIELD_REF node. As a consequence optimization must use COMPONENT_REF to model new bit field access (which should be an equivalent of several merged accesses). To use COMPONENT_REF a new field declaration with appropriate bit size (equal to sum of bit sizes of all merged bit field accesses) must be created and then corresponding bit field representative could be attached. Is my understanding correct? Is creating a new field declaration for every set of merged bit field accesses acceptable? Regards, Zoran Jovanovic From: Richard Biener [richard.guent...@gmail.com] Sent: Thursday, July 18, 2013 11:31 AM To: Zoran Jovanovic; gcc-patches@gcc.gnu.org Cc: Petar Jovanovic Subject: Re: [PATCH] Add a new option "-ftree-bitfield-merge" (patch / doc inside) Zoran Jovanovic wrote: >Hello, >This patch adds new optimization pass that combines several adjacent >bit field accesses that copy values from one memory location to another >into single bit field access. > >Example: > >Original code: > D.1351; > D.1350; > D.1349; > D.1349_2 = p1_1(D)->f1; > p2_3(D)->f1 = D.1349_2; > D.1350_4 = p1_1(D)->f2; > p2_3(D)->f2 = D.1350_4; > D.1351_5 = p1_1(D)->f3; > p2_3(D)->f3 = D.1351_5; > >Optimized code: > D.1358; > D.1358_10 = BIT_FIELD_REF <*p1_1(D), 19, 13>; > BIT_FIELD_REF <*p2_3(D), 19, 13> = D.1358_10; > >Algorithm works on basic block level and consists of following 3 major >steps: >1. Go trough basic block statements list. If there are statement pairs >that implement copy of bit field content from one memory location to >another record statements pointers and other necessary data in >corresponding data structure. >2. Identify records that represent adjacent bit field accesses and mark >them as merged. >3. Modify trees accordingly. All this should use BITFIELD_REPRESENTATIVE both to decide what accesses are related and for the lowering. This makes sure to honor the appropriate memory models. In theory only lowering is necessary and FRE and DSE will do the job of optimizing - also properly accounting for alias issues that Joseph mentions. The lowering and analysis is strongly related to SRA So I don't believe we want a new pass for this. Richard.
RE: [PATCH] Add a new option "-ftree-bitfield-merge" (patch / doc inside)
Hello, This is new patch version. Approach suggested by Richard Biener with lowering bit-field accesses instead of modifying gimple trees is implemented. Although, algorithm still detects adjacent bit field accesses, which copy values from one to another bit-field of same type. If those accesses are detected field size used during lowering is equal to sum of sizes of all adjacent fields that can be merged. Idea is to let dse and cse to remove unnecessary instructions afterwards. I wanted to preserve this behavior because it was one of original goals of this work. Example: Original code: : _2 = pr1.f1; pr2.f1 = _2; _4 = pr1.f2; pr2.f2 = _4; _6 = pr1.f3; pr2.f3 = _6; return; Optimized code: : _8 = pr1.D.1364; _9 = BIT_FIELD_REF <_8, 13, 0>; _10 = pr2.D.1364; _11 = _10 & 4294959104; _12 = (unsigned int) _9; _13 = _11 | _12; pr2.D.1364 = _13; _14 = pr1.D.1364; _15 = BIT_FIELD_REF <_14, 13, 0>; _16 = pr2.D.1364; _17 = _16 & 4294959104; _18 = (unsigned int) _15; _19 = _17 | _18; pr2.D.1364 = _19; _20 = pr1.D.1364; _21 = BIT_FIELD_REF <_20, 13, 0>; _22 = pr2.D.1364; _23 = _22 & 4294959104; _24 = (unsigned int) _21; _25 = _23 | _24; pr2.D.1364 = _25; return; Algorithm works on basic block level and consists of following 3 major steps: 1. Go through basic block statements list. If there are statement pairs that implement copy of bit field content from one memory location to another record statements pointers and other necessary data in corresponding data structure. 2. Identify records that represent adjacent bit field accesses and mark them as merged. 3. Lower bit-field accesses by using new field size for those that can be merged. New command line option "-fmerge-bitfields" is introduced. Tested - passed gcc regression tests. Changelog - gcc/ChangeLog: 2014-03-09 Zoran Jovanovic (zoran.jovano...@imgtec.com) * common.opt (fmerge-bitfields): New option. * doc/invoke.texi: Added reference to "-fmerge-bitfields". * tree-sra.c (lower_bitfields): New function. Entry for (-fmerge-bitfields). (bfaccess::hash): New function. (bfaccess::equal): New function. (bfaccess::remove): New function. (bitfield_access_p): New function. (lower_bitfield_read): New function. (lower_bitfield_write): New function. (bitfield_stmt_access_pair_htab_hash): New function. (bitfield_stmt_access_pair_htab_eq): New function. (create_and_insert_access): New function. (get_bit_offset): New function. (get_merged_bit_field_size): New function. (add_stmt_access_pair): New function. (cmp_access): New function. * dwarf2out.c (simple_type_size_in_bits): moved to tree.c. (field_byte_offset): declaration moved to tree.h, static removed. * testsuite/gcc.dg/tree-ssa/bitfldmrg1.c: New test. * testsuite/gcc.dg/tree-ssa/bitfldmrg2.c: New test. * tree-ssa-sccvn.c (expressions_equal_p): moved to tree.c. * tree-ssa-sccvn.h (expressions_equal_p): declaration moved to tree.h. * tree.c (expressions_equal_p): moved from tree-ssa-sccvn.c. (simple_type_size_in_bits): moved from dwarf2out.c. * tree.h (expressions_equal_p): declaration added. (field_byte_offset): declaration added. Patch - diff --git a/gcc/common.opt b/gcc/common.opt index 661516d..3331d03 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -2193,6 +2193,10 @@ ftree-sra Common Report Var(flag_tree_sra) Optimization Perform scalar replacement of aggregates +fmerge-bitfields +Common Report Var(flag_tree_bitfield_merge) Init(0) Optimization +Merge loads and stores of consecutive bitfields + ftree-ter Common Report Var(flag_tree_ter) Optimization Replace temporary expressions in the SSA->normal pass diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 24bd76e..54bae56 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -411,7 +411,7 @@ Objective-C and Objective-C++ Dialects}. -fsplit-ivs-in-unroller -fsplit-wide-types -fstack-protector @gol -fstack-protector-all -fstack-protector-strong -fstrict-aliasing @gol -fstrict-overflow -fthread-jumps -ftracer -ftree-bit-ccp @gol --ftree-builtin-call-dce -ftree-ccp -ftree-ch @gol +-fmerge-bitfields -ftree-builtin-call-dce -ftree-ccp -ftree-ch @gol -ftree-coalesce-inline-vars -ftree-coalesce-vars -ftree-copy-prop @gol -ftree-copyrename -ftree-dce -ftree-dominator-opts -ftree-dse @gol -ftree-forwprop -ftree-fre -ftree-loop-if-convert @gol @@ -7807,6 +7807,11 @@ pointer alignment information. This pass only operates on local scalar variables and is enabled by default at @option{-O} and higher. It requires that @option{-ftree-ccp} is enabled. +@item -fbitfield-merge +@opindex fmerge-bitfields +Combines several adjacent bit-field accesses that copy values +from one memory location to another into one single bit-field access. + @item -ftree-ccp @opindex ftree-ccp Perform sparse conditional constant propagation (CCP) on trees. This diff --
RE: [PATCH] Add a new option "-fmerge-bitfields" (patch / doc inside)
Hello, This is new patch version. Lowering is applied only for bit-fields copy sequences that are merged. Data structure representing bit-field copy sequences is renamed and reduced in size. Optimization turned on by default for -O2 and higher. Some comments fixed. Benchmarking performed on WebKit for Android. Code size reduction noticed on several files, best examples are: core/rendering/style/StyleMultiColData (632->520 bytes) core/platform/graphics/FontDescription (1715->1475 bytes) core/rendering/style/FillLayer (5069->4513 bytes) core/rendering/style/StyleRareInheritedData (5618->5346) core/css/CSSSelectorList(4047->3887) core/platform/animation/CSSAnimationData (3844->3440 bytes) core/css/resolver/FontBuilder (13818->13350 bytes) core/platform/graphics/Font (16447->15975 bytes) Example: One of the motivating examples for this work was copy constructor of the class which contains bit-fields. C++ code: class A { public: A(const A &x); unsigned a : 1; unsigned b : 2; unsigned c : 4; }; A::A(const A&x) { a = x.a; b = x.b; c = x.c; } GIMPLE code without optimization: : _3 = x_2(D)->a; this_4(D)->a = _3; _6 = x_2(D)->b; this_4(D)->b = _6; _8 = x_2(D)->c; this_4(D)->c = _8; return; Optimized GIMPLE code: : _10 = x_2(D)->D.1867; _11 = BIT_FIELD_REF <_10, 7, 0>; _12 = this_4(D)->D.1867; _13 = _12 & 128; _14 = (unsigned char) _11; _15 = _13 | _14; this_4(D)->D.1867 = _15; return; Generated MIPS32r2 assembly code without optimization: lw $3,0($5) lbu $2,0($4) andi$3,$3,0x1 andi$2,$2,0xfe or $2,$2,$3 sb $2,0($4) lw $3,0($5) andi$2,$2,0xf9 andi$3,$3,0x6 or $2,$2,$3 sb $2,0($4) lw $3,0($5) andi$2,$2,0x87 andi$3,$3,0x78 or $2,$2,$3 j $31 sb $2,0($4) Optimized MIPS32r2 assembly code: lw $3,0($5) lbu $2,0($4) andi$3,$3,0x7f andi$2,$2,0x80 or $2,$3,$2 j $31 sb $2,0($4) Algorithm works on basic block level and consists of following 3 major steps: 1. Go through basic block statements list. If there are statement pairs that implement copy of bit field content from one memory location to another record statements pointers and other necessary data in corresponding data structure. 2. Identify records that represent adjacent bit field accesses and mark them as merged. 3. Lower bit-field accesses by using new field size for those that can be merged. New command line option "-fmerge-bitfields" is introduced. Tested - passed gcc regression tests for MIPS32r2. Changelog - gcc/ChangeLog: 2014-04-16 Zoran Jovanovic (zoran.jovano...@imgtec.com) * common.opt (fmerge-bitfields): New option. * doc/invoke.texi: Add reference to "-fmerge-bitfields". * tree-sra.c (lower_bitfields): New function. Entry for (-fmerge-bitfields). (bf_access_candidate_p): New function. (lower_bitfield_read): New function. (lower_bitfield_write): New function. (bitfield_stmt_bfcopy_pair::hash): New function. (bitfield_stmt_bfcopy_pair::equal): New function. (bitfield_stmt_bfcopy_pair::remove): New function. (create_and_insert_bfcopy): New function. (get_bit_offset): New function. (add_stmt_bfcopy_pair): New function. (cmp_bfcopies): New function. (get_merged_bit_field_size): New function. * dwarf2out.c (simple_type_size_in_bits): Move to tree.c. (field_byte_offset): Move declaration to tree.h and make it extern. * testsuite/gcc.dg/tree-ssa/bitfldmrg1.c: New test. * testsuite/gcc.dg/tree-ssa/bitfldmrg2.c: New test. * tree-ssa-sccvn.c (expressions_equal_p): Move to tree.c. * tree-ssa-sccvn.h (expressions_equal_p): Move declaration to tree.h. * tree.c (expressions_equal_p): Move from tree-ssa-sccvn.c. (simple_type_size_in_bits): Move from dwarf2out.c. * tree.h (expressions_equal_p): Add declaration. (field_byte_offset): Add declaration. Patch - diff --git a/gcc/common.opt b/gcc/common.opt index da275e5..52c7f58 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -2203,6 +2203,10 @@ ftree-sra Common Report Var(flag_tree_sra) Optimization Perform scalar replacement of aggregates +fmerge-bitfields +Common Report Var(flag_tree_bitfield_merge) Optimization +Merge loads and stores of consecutive bitfields + ftree-ter Common Report Var(flag_tree_ter) Optimization Replace temporary expressions in the SSA->normal pass diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi index 3fdfeb9..546638e 100644 --- a/gcc/doc/invoke.texi +++ b/gcc/doc/invoke.texi @@ -411,7 +411,7 @@ Objective-C and Objective-C++ Dialects}. -fsplit-ivs-in-unroller -fsplit-wide-types -fstack-protector @gol -fstack-protector-all -fsta
RE: [PATCH] Add a new option "-fmerge-bitfields" (patch / doc inside)
Hello, My apologies for inconvenience. Removed every appearance of -ftree-bitfield-merge from the patch and fixed an issue with unions. The rest of the patch is the same as before. Regards, Zoran Jovanovic -- Lowering is applied only for bit-fields copy sequences that are merged. Data structure representing bit-field copy sequences is renamed and reduced in size. Optimization turned on by default for -O2 and higher. Some comments fixed. Benchmarking performed on WebKit for Android. Code size reduction noticed on several files, best examples are: core/rendering/style/StyleMultiColData (632->520 bytes) core/platform/graphics/FontDescription (1715->1475 bytes) core/rendering/style/FillLayer (5069->4513 bytes) core/rendering/style/StyleRareInheritedData (5618->5346) core/css/CSSSelectorList(4047->3887) core/platform/animation/CSSAnimationData (3844->3440 bytes) core/css/resolver/FontBuilder (13818->13350 bytes) core/platform/graphics/Font (16447->15975 bytes) Example: One of the motivating examples for this work was copy constructor of the class which contains bit-fields. C++ code: class A { public: A(const A &x); unsigned a : 1; unsigned b : 2; unsigned c : 4; }; A::A(const A&x) { a = x.a; b = x.b; c = x.c; } GIMPLE code without optimization: : _3 = x_2(D)->a; this_4(D)->a = _3; _6 = x_2(D)->b; this_4(D)->b = _6; _8 = x_2(D)->c; this_4(D)->c = _8; return; Optimized GIMPLE code: : _10 = x_2(D)->D.1867; _11 = BIT_FIELD_REF <_10, 7, 0>; _12 = this_4(D)->D.1867; _13 = _12 & 128; _14 = (unsigned char) _11; _15 = _13 | _14; this_4(D)->D.1867 = _15; return; Generated MIPS32r2 assembly code without optimization: lw $3,0($5) lbu $2,0($4) andi$3,$3,0x1 andi$2,$2,0xfe or $2,$2,$3 sb $2,0($4) lw $3,0($5) andi$2,$2,0xf9 andi$3,$3,0x6 or $2,$2,$3 sb $2,0($4) lw $3,0($5) andi$2,$2,0x87 andi$3,$3,0x78 or $2,$2,$3 j $31 sb $2,0($4) Optimized MIPS32r2 assembly code: lw $3,0($5) lbu $2,0($4) andi$3,$3,0x7f andi$2,$2,0x80 or $2,$3,$2 j $31 sb $2,0($4) Algorithm works on basic block level and consists of following 3 major steps: 1. Go through basic block statements list. If there are statement pairs that implement copy of bit field content from one memory location to another record statements pointers and other necessary data in corresponding data structure. 2. Identify records that represent adjacent bit field accesses and mark them as merged. 3. Lower bit-field accesses by using new field size for those that can be merged. New command line option "-fmerge-bitfields" is introduced. Tested - passed gcc regression tests for MIPS32r2. Changelog - gcc/ChangeLog: 2014-04-16 Zoran Jovanovic (zoran.jovano...@imgtec.com) * common.opt (fmerge-bitfields): New option. * doc/invoke.texi: Add reference to "-fmerge-bitfields". * tree-sra.c (lower_bitfields): New function. Entry for (-fmerge-bitfields). (part_of_union_p): New function. (bf_access_candidate_p): New function. (lower_bitfield_read): New function. (lower_bitfield_write): New function. (bitfield_stmt_bfcopy_pair::hash): New function. (bitfield_stmt_bfcopy_pair::equal): New function. (bitfield_stmt_bfcopy_pair::remove): New function. (create_and_insert_bfcopy): New function. (get_bit_offset): New function. (add_stmt_bfcopy_pair): New function. (cmp_bfcopies): New function. (get_merged_bit_field_size): New function. * dwarf2out.c (simple_type_size_in_bits): Move to tree.c. (field_byte_offset): Move declaration to tree.h and make it extern. * testsuite/gcc.dg/tree-ssa/bitfldmrg1.c: New test. * testsuite/gcc.dg/tree-ssa/bitfldmrg2.c: New test. * tree-ssa-sccvn.c (expressions_equal_p): Move to tree.c. * tree-ssa-sccvn.h (expressions_equal_p): Move declaration to tree.h. * tree.c (expressions_equal_p): Move from tree-ssa-sccvn.c. (simple_type_size_in_bits): Move from dwarf2out.c. * tree.h (expressions_equal_p): Add declaration. (field_byte_offset): Add declaration. Patch - diff --git a/gcc/common.opt b/gcc/common.opt index da275e5..52c7f58 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -2203,6 +2203,10 @@ ftree-sra Common Report Var(flag_tree_sra) Optimization Perform scalar replacement of aggregates +fmerge-bitfields +Common Report Var(flag_tree_bitfield_merge) Optimization +Merge loads and stores of consecutive bitfields + ftree-ter Common Report Var(flag_tree_ter) Optimization Replace temporary e
RE: [PATCH] Add a new option "-fmerge-bitfields" (patch / doc inside)
Hello, Unfortunately, optimization is limited only to bit-fields that have same bit-field representative (DECL_BIT_FIELD_REPRESENTATIVE), and fields from different classes do have different representatives. In given example optimization would merge accesses to x and y bit-fields from Base class, but not the access to z from Der class. Regards, Zoran From: Daniel Gutson [daniel.gut...@tallertechnologies.com] Sent: Wednesday, April 16, 2014 4:16 PM To: Zoran Jovanovic Cc: Bernhard Reutner-Fischer; Richard Biener; gcc-patches@gcc.gnu.org Subject: Re: [PATCH] Add a new option "-fmerge-bitfields" (patch / doc inside) On Wed, Apr 16, 2014 at 8:38 AM, Zoran Jovanovic wrote: > Hello, > This is new patch version. > Lowering is applied only for bit-fields copy sequences that are merged. > Data structure representing bit-field copy sequences is renamed and reduced > in size. > Optimization turned on by default for -O2 and higher. > Some comments fixed. > > Benchmarking performed on WebKit for Android. > Code size reduction noticed on several files, best examples are: > > core/rendering/style/StyleMultiColData (632->520 bytes) > core/platform/graphics/FontDescription (1715->1475 bytes) > core/rendering/style/FillLayer (5069->4513 bytes) > core/rendering/style/StyleRareInheritedData (5618->5346) > core/css/CSSSelectorList(4047->3887) > core/platform/animation/CSSAnimationData (3844->3440 bytes) > core/css/resolver/FontBuilder (13818->13350 bytes) > core/platform/graphics/Font (16447->15975 bytes) > > > Example: > > One of the motivating examples for this work was copy constructor of the > class which contains bit-fields. > > C++ code: > class A > { > public: > A(const A &x); > unsigned a : 1; > unsigned b : 2; > unsigned c : 4; > }; > > A::A(const A&x) > { > a = x.a; > b = x.b; > c = x.c; > } Very interesting. Does this work with inheritance too? E.g. struct Base { uint32_t x:1; uint32_t y:3; Base(const Base& other) { x = other.x; y = other.y; } }; struct Der : Base { Der() = default; Der(const Der& other) : Base(other) { z = other.z; } uint32_t z:9; }; > > GIMPLE code without optimization: > > : > _3 = x_2(D)->a; > this_4(D)->a = _3; > _6 = x_2(D)->b; > this_4(D)->b = _6; > _8 = x_2(D)->c; > this_4(D)->c = _8; > return; > > Optimized GIMPLE code: > : > _10 = x_2(D)->D.1867; > _11 = BIT_FIELD_REF <_10, 7, 0>; > _12 = this_4(D)->D.1867; > _13 = _12 & 128; > _14 = (unsigned char) _11; > _15 = _13 | _14; > this_4(D)->D.1867 = _15; > return; > > Generated MIPS32r2 assembly code without optimization: > lw $3,0($5) > lbu $2,0($4) > andi$3,$3,0x1 > andi$2,$2,0xfe > or $2,$2,$3 > sb $2,0($4) > lw $3,0($5) > andi$2,$2,0xf9 > andi$3,$3,0x6 > or $2,$2,$3 > sb $2,0($4) > lw $3,0($5) > andi$2,$2,0x87 > andi$3,$3,0x78 > or $2,$2,$3 > j $31 > sb $2,0($4) > > Optimized MIPS32r2 assembly code: > lw $3,0($5) > lbu $2,0($4) > andi$3,$3,0x7f > andi$2,$2,0x80 > or $2,$3,$2 > j $31 > sb $2,0($4) > > > Algorithm works on basic block level and consists of following 3 major steps: > 1. Go through basic block statements list. If there are statement pairs that > implement copy of bit field content from one memory location to another > record statements pointers and other necessary data in corresponding data > structure. > 2. Identify records that represent adjacent bit field accesses and mark them > as merged. > 3. Lower bit-field accesses by using new field size for those that can be > merged. > > > New command line option "-fmerge-bitfields" is introduced. > > > Tested - passed gcc regression tests for MIPS32r2. > > > Changelog - > > gcc/ChangeLog: > 2014-04-16 Zoran Jovanovic (zoran.jovano...@imgtec.com) > * common.opt (fmerge-bitfields): New option. > * doc/invoke.texi: Add reference to "-fmerge-bitfields". > * tree-sra.c (lower_bitfields): New function. > Entry for (-fmerge-bitfields). > (bf_access_candidate_p): New function. > (lower_bitfield_read): New function. > (lower_bitfield_write): New function. > (bitfield_stmt_bfcopy_pair::hash): New function. > (bitfield_stmt
RE: [PATCH] Add a new option "-fmerge-bitfields" (patch / doc inside)
Hello, Updated doc/invoke.texi by stating that new option is enabled by default at -O2 and higher. Also, -fmerge-bitfields added to the list of optimization flags enabled by default at -O2 and higher. Regards, Zoran Jovanovic -- Lowering is applied only for bit-fields copy sequences that are merged. Data structure representing bit-field copy sequences is renamed and reduced in size. Optimization turned on by default for -O2 and higher. Some comments fixed. Benchmarking performed on WebKit for Android. Code size reduction noticed on several files, best examples are: core/rendering/style/StyleMultiColData (632->520 bytes) core/platform/graphics/FontDescription (1715->1475 bytes) core/rendering/style/FillLayer (5069->4513 bytes) core/rendering/style/StyleRareInheritedData (5618->5346) core/css/CSSSelectorList(4047->3887) core/platform/animation/CSSAnimationData (3844->3440 bytes) core/css/resolver/FontBuilder (13818->13350 bytes) core/platform/graphics/Font (16447->15975 bytes) Example: One of the motivating examples for this work was copy constructor of the class which contains bit-fields. C++ code: class A { public: A(const A &x); unsigned a : 1; unsigned b : 2; unsigned c : 4; }; A::A(const A&x) { a = x.a; b = x.b; c = x.c; } GIMPLE code without optimization: : _3 = x_2(D)->a; this_4(D)->a = _3; _6 = x_2(D)->b; this_4(D)->b = _6; _8 = x_2(D)->c; this_4(D)->c = _8; return; Optimized GIMPLE code: : _10 = x_2(D)->D.1867; _11 = BIT_FIELD_REF <_10, 7, 0>; _12 = this_4(D)->D.1867; _13 = _12 & 128; _14 = (unsigned char) _11; _15 = _13 | _14; this_4(D)->D.1867 = _15; return; Generated MIPS32r2 assembly code without optimization: lw $3,0($5) lbu $2,0($4) andi$3,$3,0x1 andi$2,$2,0xfe or $2,$2,$3 sb $2,0($4) lw $3,0($5) andi$2,$2,0xf9 andi$3,$3,0x6 or $2,$2,$3 sb $2,0($4) lw $3,0($5) andi$2,$2,0x87 andi$3,$3,0x78 or $2,$2,$3 j $31 sb $2,0($4) Optimized MIPS32r2 assembly code: lw $3,0($5) lbu $2,0($4) andi$3,$3,0x7f andi$2,$2,0x80 or $2,$3,$2 j $31 sb $2,0($4) Algorithm works on basic block level and consists of following 3 major steps: 1. Go through basic block statements list. If there are statement pairs that implement copy of bit field content from one memory location to another record statements pointers and other necessary data in corresponding data structure. 2. Identify records that represent adjacent bit field accesses and mark them as merged. 3. Lower bit-field accesses by using new field size for those that can be merged. New command line option "-fmerge-bitfields" is introduced. Tested - passed gcc regression tests for MIPS32r2. Changelog - gcc/ChangeLog: 2014-04-22 Zoran Jovanovic (zoran.jovano...@imgtec.com) * common.opt (fmerge-bitfields): New option. * doc/invoke.texi: Add reference to "-fmerge-bitfields". * doc/invoke.texi: Add "-fmerge-bitfields" to the list of optimization flags turned on at -O2. * tree-sra.c (lower_bitfields): New function. Entry for (-fmerge-bitfields). (part_of_union_p): New function. (bf_access_candidate_p): New function. (lower_bitfield_read): New function. (lower_bitfield_write): New function. (bitfield_stmt_bfcopy_pair::hash): New function. (bitfield_stmt_bfcopy_pair::equal): New function. (bitfield_stmt_bfcopy_pair::remove): New function. (create_and_insert_bfcopy): New function. (get_bit_offset): New function. (add_stmt_bfcopy_pair): New function. (cmp_bfcopies): New function. (get_merged_bit_field_size): New function. * dwarf2out.c (simple_type_size_in_bits): Move to tree.c. (field_byte_offset): Move declaration to tree.h and make it extern. * testsuite/gcc.dg/tree-ssa/bitfldmrg1.c: New test. * testsuite/gcc.dg/tree-ssa/bitfldmrg2.c: New test. * tree-ssa-sccvn.c (expressions_equal_p): Move to tree.c. * tree-ssa-sccvn.h (expressions_equal_p): Move declaration to tree.h. * tree.c (expressions_equal_p): Move from tree-ssa-sccvn.c. (simple_type_size_in_bits): Move from dwarf2out.c. * tree.h (expressions_equal_p): Add declaration. (field_byte_offset): Add declaration. Patch - diff --git a/gcc/common.opt b/gcc/common.opt index da275e5..52c7f58 100644 --- a/gcc/common.opt +++ b/gcc/common.opt @@ -2203,6 +2203,10 @@ ftree-sra Common Report Var(flag_tree_sra) Optimization Perform scalar replacement of aggregates +fmerge-bitfields +Common Report Var(flag_tree_bitfield_m