RE: [PATCH] Add a new option "-ftree-bitfield-merge" (patch / doc inside)

2013-08-23 Thread Zoran Jovanovic
Hello,
This is new patch version. 
Optimization does not use BIT_FIELD_REF any more, instead it generates new 
COMPONENT_REF and FIELD_DECL.
Existing Bit field representative is associated with newly created field 
declaration.
During analysis phase optimization uses bit field representatives when deciding 
which bit-fields accesses can be merged.
Instead of having separate pass optimization is moved to tree-sra.c and 
executed with sra early.
New test case involving unions is added.
Also, some other comments received on first patch are applied in new 
implementation.


Example:

Original code:
   D.1351;
   D.1350;
   D.1349;
  D.1349_2 = p1_1(D)->f1;
  p2_3(D)->f1 = D.1349_2;
  D.1350_4 = p1_1(D)->f2;
  p2_3(D)->f2 = D.1350_4;
  D.1351_5 = p1_1(D)->f3;
  p2_3(D)->f3 = D.1351_5;

Optimized code:
   D.1358;
  _16 = pr1_2(D)->_field0;
  pr2_4(D)->_field0 = _16;
  
Algorithm works on basic block level and consists of following 3 major steps:
1. Go trough basic block statements list. If there are statement pairs that 
implement copy of bit field content from one memory location to another record 
statements pointers and other necessary data in corresponding data structure.
2. Identify records that represent adjacent bit field accesses and mark them as 
merged.
3. Modify trees accordingly.

New command line option "-ftree-bitfield-merge" is introduced.

Tested - passed gcc regression tests.

Changelog -

gcc/ChangeLog:
2013-08-22 Zoran Jovanovic (zoran.jovano...@imgtec.com)
  * Makefile.in : Added tree-sra.c to GTFILES.
  * common.opt (ftree-bitfield-merge): New option.
  * doc/invoke.texi: Added reference to "-ftree-bitfield-merge".
  * tree-sra.c (ssa_bitfield_merge): New function.
  Entry for (-ftree-bitfield-merge).
  (bitfield_stmt_access_pair_htab_hash): New function.
  (bitfield_stmt_access_pair_htab_eq): New function.
  (cmp_access): New function.
  (create_and_insert_access): New function.
  (get_bit_offset): New function.
  (get_merged_bit_field_size): New function.
  (add_stmt_access_pair): New function.
  * dwarf2out.c (simple_type_size_in_bits): moved to tree.c.
  (field_byte_offset): declaration moved to tree.h, static removed.
  * testsuite/gcc.dg/tree-ssa/bitfldmrg1.c: New test.
  * testsuite/gcc.dg/tree-ssa/bitfldmrg2.c: New test.
  * tree-ssa-sccvn.c (expressions_equal_p): moved to tree.c.
  * tree-ssa-sccvn.h (expressions_equal_p): declaration moved to tree.h.
  * tree.c (expressions_equal_p): moved from tree-ssa-sccvn.c.
  (simple_type_size_in_bits): moved from dwarf2out.c.
  * tree.h (expressions_equal_p): declaration added.
  (field_byte_offset): declaration added.

Patch -

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 6034046..dad9337 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -3831,6 +3831,7 @@ GTFILES = $(CPP_ID_DATA_H) $(srcdir)/input.h 
$(srcdir)/coretypes.h \
   $(srcdir)/vtable-verify.c \
   $(srcdir)/asan.c \
   $(srcdir)/tsan.c $(srcdir)/ipa-devirt.c \
+  $(srcdir)/tree-sra.c \
   @all_gtfiles@
 
 # Compute the list of GT header files from the corresponding C sources,
diff --git a/gcc/common.opt b/gcc/common.opt
index 9082280..fe0ecd9 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2160,6 +2160,10 @@ ftree-sra
 Common Report Var(flag_tree_sra) Optimization
 Perform scalar replacement of aggregates
 
+ftree-bitfield-merge
+Common Report Var(flag_tree_bitfield_merge) Init(0) Optimization
+Enable bit-field merge on trees
+
 ftree-ter
 Common Report Var(flag_tree_ter) Optimization
 Replace temporary expressions in the SSA->normal pass
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index dae7605..7abe538 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -412,7 +412,7 @@ Objective-C and Objective-C++ Dialects}.
 -fsplit-ivs-in-unroller -fsplit-wide-types -fstack-protector @gol
 -fstack-protector-all -fstack-protector-strong -fstrict-aliasing @gol
 -fstrict-overflow -fthread-jumps -ftracer -ftree-bit-ccp @gol
--ftree-builtin-call-dce -ftree-ccp -ftree-ch @gol
+-ftree-bitfield-merge -ftree-builtin-call-dce -ftree-ccp -ftree-ch @gol
 -ftree-coalesce-inline-vars -ftree-coalesce-vars -ftree-copy-prop @gol
 -ftree-copyrename -ftree-dce -ftree-dominator-opts -ftree-dse @gol
 -ftree-forwprop -ftree-fre -ftree-loop-if-convert @gol
@@ -7646,6 +7646,11 @@ pointer alignment information.
 This pass only operates on local scalar variables and is enabled by default
 at @option{-O} and higher.  It requires that @option{-ftree-ccp} is enabled.
 
+@item -ftree-bitfield-merge
+@opindex ftree-bitfield-merge
+Combines several adjacent bit-field accesses that copy values
+from one memory location to another into single bit-field access.
+
 @item -ftree-ccp
 @opindex ftree-ccp
 Perform sparse conditional constant propagation (CCP) on trees.  This
diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index fc1c3f2..f3530ef 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -3104,8 +3104,6 @@ static HOST_WIDE_INT ceili

RE: [PATCH] Add a new option "-ftree-bitfield-merge" (patch / doc inside)

2013-09-24 Thread Zoran Jovanovic
Hello,
This is new patch version. 
Comments from Bernhard Reutner-Fischer review applied.
Also, test case bitfildmrg2.c modified - it is now execute test.


Example:

Original code:
   D.1351;
   D.1350;
   D.1349;
  D.1349_2 = p1_1(D)->f1;
  p2_3(D)->f1 = D.1349_2;
  D.1350_4 = p1_1(D)->f2;
  p2_3(D)->f2 = D.1350_4;
  D.1351_5 = p1_1(D)->f3;
  p2_3(D)->f3 = D.1351_5;

Optimized code:
   D.1358;
  _16 = pr1_2(D)->_field0;
  pr2_4(D)->_field0 = _16;
  
Algorithm works on basic block level and consists of following 3 major steps:
1. Go trough basic block statements list. If there are statement pairs that 
implement copy of bit field content from one memory location to another record 
statements pointers and other necessary data in corresponding data structure.
2. Identify records that represent adjacent bit field accesses and mark them as 
merged.
3. Modify trees accordingly.

New command line option "-ftree-bitfield-merge" is introduced.

Tested - passed gcc regression tests.

Changelog -

gcc/ChangeLog:
2013-09-24 Zoran Jovanovic (zoran.jovano...@imgtec.com)
  * Makefile.in : Added tree-sra.c to GTFILES.
  * common.opt (ftree-bitfield-merge): New option.
  * doc/invoke.texi: Added reference to "-ftree-bitfield-merge".
  * tree-sra.c (ssa_bitfield_merge): New function.
  Entry for (-ftree-bitfield-merge).
  (bitfield_stmt_access_pair_htab_hash): New function.
  (bitfield_stmt_access_pair_htab_eq): New function.
  (cmp_access): New function.
  (create_and_insert_access): New function.
  (get_bit_offset): New function.
  (get_merged_bit_field_size): New function.
  (add_stmt_access_pair): New function.
  * dwarf2out.c (simple_type_size_in_bits): moved to tree.c.
  (field_byte_offset): declaration moved to tree.h, static removed.
  * testsuite/gcc.dg/tree-ssa/bitfldmrg1.c: New test.
  * testsuite/gcc.dg/tree-ssa/bitfldmrg2.c: New test.
  * tree-ssa-sccvn.c (expressions_equal_p): moved to tree.c.
  * tree-ssa-sccvn.h (expressions_equal_p): declaration moved to tree.h.
  * tree.c (expressions_equal_p): moved from tree-ssa-sccvn.c.
  (simple_type_size_in_bits): moved from dwarf2out.c.
  * tree.h (expressions_equal_p): declaration added.
  (field_byte_offset): declaration added.

Patch -

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index a2e3f7a..54aa8e7 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -3847,6 +3847,7 @@ GTFILES = $(CPP_ID_DATA_H) $(srcdir)/input.h 
$(srcdir)/coretypes.h \
   $(srcdir)/asan.c \
   $(srcdir)/ubsan.c \
   $(srcdir)/tsan.c $(srcdir)/ipa-devirt.c \
+  $(srcdir)/tree-sra.c \
   @all_gtfiles@
 
 # Compute the list of GT header files from the corresponding C sources,
diff --git a/gcc/common.opt b/gcc/common.opt
index 202e169..afac514 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2164,6 +2164,10 @@ ftree-sra
 Common Report Var(flag_tree_sra) Optimization
 Perform scalar replacement of aggregates
 
+ftree-bitfield-merge
+Common Report Var(flag_tree_bitfield_merge) Init(0) Optimization
+Enable bit-field merge on trees
+
 ftree-ter
 Common Report Var(flag_tree_ter) Optimization
 Replace temporary expressions in the SSA->normal pass
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index aa0f4ed..e588cae 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -412,7 +412,7 @@ Objective-C and Objective-C++ Dialects}.
 -fsplit-ivs-in-unroller -fsplit-wide-types -fstack-protector @gol
 -fstack-protector-all -fstack-protector-strong -fstrict-aliasing @gol
 -fstrict-overflow -fthread-jumps -ftracer -ftree-bit-ccp @gol
--ftree-builtin-call-dce -ftree-ccp -ftree-ch @gol
+-ftree-bitfield-merge -ftree-builtin-call-dce -ftree-ccp -ftree-ch @gol
 -ftree-coalesce-inline-vars -ftree-coalesce-vars -ftree-copy-prop @gol
 -ftree-copyrename -ftree-dce -ftree-dominator-opts -ftree-dse @gol
 -ftree-forwprop -ftree-fre -ftree-loop-if-convert @gol
@@ -7679,6 +7679,11 @@ pointer alignment information.
 This pass only operates on local scalar variables and is enabled by default
 at @option{-O} and higher.  It requires that @option{-ftree-ccp} is enabled.
 
+@item -ftree-bitfield-merge
+@opindex ftree-bitfield-merge
+Combines several adjacent bit-field accesses that copy values
+from one memory location to another into one single bit-field access.
+
 @item -ftree-ccp
 @opindex ftree-ccp
 Perform sparse conditional constant propagation (CCP) on trees.  This
diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index 95049e4..e74db17 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -3108,8 +3108,6 @@ static HOST_WIDE_INT ceiling (HOST_WIDE_INT, unsigned 
int);
 static tree field_type (const_tree);
 static unsigned int simple_type_align_in_bits (const_tree);
 static unsigned int simple_decl_align_in_bits (const_tree);
-static unsigned HOST_WIDE_INT simple_type_size_in_bits (const_tree);
-static HOST_WIDE_INT field_byte_offset (const_tree);
 static void add_AT_location_description(dw_die_ref, enum 
dwarf_attribute,

[PING] [PATCH] Add a new option "-ftree-bitfield-merge" (patch / doc inside)

2013-10-04 Thread Zoran Jovanovic
Just to ping this patch.

http://gcc.gnu.org/ml/gcc-patches/2013-09/msg01829.html

Regards,
Zoran Jovanovic

From: Zoran Jovanovic
Sent: Tuesday, September 24, 2013 11:59 PM
To: gcc-patches@gcc.gnu.org
Cc: Petar Jovanovic
Subject: RE: [PATCH] Add a new option "-ftree-bitfield-merge" (patch / doc 
inside)

Hello,
This is new patch version.
Comments from Bernhard Reutner-Fischer review applied.
Also, test case bitfildmrg2.c modified - it is now execute test.


Example:

Original code:
   D.1351;
   D.1350;
   D.1349;
  D.1349_2 = p1_1(D)->f1;
  p2_3(D)->f1 = D.1349_2;
  D.1350_4 = p1_1(D)->f2;
  p2_3(D)->f2 = D.1350_4;
  D.1351_5 = p1_1(D)->f3;
  p2_3(D)->f3 = D.1351_5;

Optimized code:
   D.1358;
  _16 = pr1_2(D)->_field0;
  pr2_4(D)->_field0 = _16;

Algorithm works on basic block level and consists of following 3 major steps:
1. Go trough basic block statements list. If there are statement pairs that 
implement copy of bit field content from one memory location to another record 
statements pointers and other necessary data in corresponding data structure.
2. Identify records that represent adjacent bit field accesses and mark them as 
merged.
3. Modify trees accordingly.

New command line option "-ftree-bitfield-merge" is introduced.

Tested - passed gcc regression tests.

Changelog -

gcc/ChangeLog:
2013-09-24 Zoran Jovanovic (zoran.jovano...@imgtec.com)
  * Makefile.in : Added tree-sra.c to GTFILES.
  * common.opt (ftree-bitfield-merge): New option.
  * doc/invoke.texi: Added reference to "-ftree-bitfield-merge".
  * tree-sra.c (ssa_bitfield_merge): New function.
  Entry for (-ftree-bitfield-merge).
  (bitfield_stmt_access_pair_htab_hash): New function.
  (bitfield_stmt_access_pair_htab_eq): New function.
  (cmp_access): New function.
  (create_and_insert_access): New function.
  (get_bit_offset): New function.
  (get_merged_bit_field_size): New function.
  (add_stmt_access_pair): New function.
  * dwarf2out.c (simple_type_size_in_bits): moved to tree.c.
  (field_byte_offset): declaration moved to tree.h, static removed.
  * testsuite/gcc.dg/tree-ssa/bitfldmrg1.c: New test.
  * testsuite/gcc.dg/tree-ssa/bitfldmrg2.c: New test.
  * tree-ssa-sccvn.c (expressions_equal_p): moved to tree.c.
  * tree-ssa-sccvn.h (expressions_equal_p): declaration moved to tree.h.
  * tree.c (expressions_equal_p): moved from tree-ssa-sccvn.c.
  (simple_type_size_in_bits): moved from dwarf2out.c.
  * tree.h (expressions_equal_p): declaration added.
  (field_byte_offset): declaration added.

Patch -

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index a2e3f7a..54aa8e7 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -3847,6 +3847,7 @@ GTFILES = $(CPP_ID_DATA_H) $(srcdir)/input.h 
$(srcdir)/coretypes.h \
   $(srcdir)/asan.c \
   $(srcdir)/ubsan.c \
   $(srcdir)/tsan.c $(srcdir)/ipa-devirt.c \
+  $(srcdir)/tree-sra.c \
   @all_gtfiles@

 # Compute the list of GT header files from the corresponding C sources,
diff --git a/gcc/common.opt b/gcc/common.opt
index 202e169..afac514 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2164,6 +2164,10 @@ ftree-sra
 Common Report Var(flag_tree_sra) Optimization
 Perform scalar replacement of aggregates

+ftree-bitfield-merge
+Common Report Var(flag_tree_bitfield_merge) Init(0) Optimization
+Enable bit-field merge on trees
+
 ftree-ter
 Common Report Var(flag_tree_ter) Optimization
 Replace temporary expressions in the SSA->normal pass
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index aa0f4ed..e588cae 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -412,7 +412,7 @@ Objective-C and Objective-C++ Dialects}.
 -fsplit-ivs-in-unroller -fsplit-wide-types -fstack-protector @gol
 -fstack-protector-all -fstack-protector-strong -fstrict-aliasing @gol
 -fstrict-overflow -fthread-jumps -ftracer -ftree-bit-ccp @gol
--ftree-builtin-call-dce -ftree-ccp -ftree-ch @gol
+-ftree-bitfield-merge -ftree-builtin-call-dce -ftree-ccp -ftree-ch @gol
 -ftree-coalesce-inline-vars -ftree-coalesce-vars -ftree-copy-prop @gol
 -ftree-copyrename -ftree-dce -ftree-dominator-opts -ftree-dse @gol
 -ftree-forwprop -ftree-fre -ftree-loop-if-convert @gol
@@ -7679,6 +7679,11 @@ pointer alignment information.
 This pass only operates on local scalar variables and is enabled by default
 at @option{-O} and higher.  It requires that @option{-ftree-ccp} is enabled.

+@item -ftree-bitfield-merge
+@opindex ftree-bitfield-merge
+Combines several adjacent bit-field accesses that copy values
+from one memory location to another into one single bit-field access.
+
 @item -ftree-ccp
 @opindex ftree-ccp
 Perform sparse conditional constant propagation (CCP) on trees.  This
diff --git a/gcc/dwarf2out.c b/gcc/dwarf2out.c
index 95049e4..e74db17 100644
--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -3108,8 +3108,6 @@ static HOST_WIDE_INT ceiling (HOST_WIDE_INT, unsigned 
int);
 static tree field_type (const_tree);
 

RE: [PATCH] Add a new option "-fmerge-bitfields" (patch / doc inside)

2014-10-29 Thread Zoran Jovanovic
Hello,
This is new patch version in which reported issue is fixed.
Also, patch is rebased to the revision 216452 and some minor code clean-up is 
done.

--

Lowering is applied only for bit-fields copy sequences that are merged.
Data structure representing bit-field copy sequences is renamed and reduced in 
size.
Optimization turned on by default for -O2 and higher.
Some comments fixed.

Benchmarking performed on WebKit for Android.
Code size reduction noticed on several files, best examples are:

core/rendering/style/StyleMultiColData (632->520 bytes)
core/platform/graphics/FontDescription (1715->1475 bytes)
core/rendering/style/FillLayer (5069->4513 bytes)
core/rendering/style/StyleRareInheritedData (5618->5346)
core/css/CSSSelectorList(4047->3887)
core/platform/animation/CSSAnimationData (3844->3440 bytes)
core/css/resolver/FontBuilder (13818->13350 bytes)
core/platform/graphics/Font (16447->15975 bytes)


Example:

One of the motivating examples for this work was copy constructor of the class 
which contains bit-fields.

C++ code:
class A
{
public:
A(const A &x);
unsigned a : 1;
unsigned b : 2;
unsigned c : 4;
};

A::A(const A&x)
{
a = x.a;
b = x.b;
c = x.c;
}

GIMPLE code without optimization:

  :
  _3 = x_2(D)->a;
  this_4(D)->a = _3;
  _6 = x_2(D)->b;
  this_4(D)->b = _6;
  _8 = x_2(D)->c;
  this_4(D)->c = _8;
  return;

Optimized GIMPLE code:
  :
  _10 = x_2(D)->D.1867;
  _11 = BIT_FIELD_REF <_10, 7, 0>;
  _12 = this_4(D)->D.1867;
  _13 = _12 & 128;
  _14 = (unsigned char) _11;
  _15 = _13 | _14;
  this_4(D)->D.1867 = _15;
  return;

Generated MIPS32r2 assembly code without optimization:
 lw  $3,0($5)
lbu $2,0($4)
andi$3,$3,0x1
andi$2,$2,0xfe
or  $2,$2,$3
sb  $2,0($4)
lw  $3,0($5)
andi$2,$2,0xf9
andi$3,$3,0x6
or  $2,$2,$3
sb  $2,0($4)
lw  $3,0($5)
andi$2,$2,0x87
andi$3,$3,0x78
or  $2,$2,$3
j   $31
sb  $2,0($4)

Optimized MIPS32r2 assembly code:
lw  $3,0($5)
lbu $2,0($4)
andi$3,$3,0x7f
andi$2,$2,0x80
or  $2,$3,$2
j   $31
sb  $2,0($4)


Algorithm works on basic block level and consists of following 3 major steps:
1. Go through basic block statements list. If there are statement pairs that 
implement copy of bit field content from one memory location to another record 
statements pointers and other necessary data in corresponding data structure.
2. Identify records that represent adjacent bit field accesses and mark them as 
merged.
3. Lower bit-field accesses by using new field size for those that can be 
merged.


New command line option "-fmerge-bitfields" is introduced.


Tested - passed gcc regression tests for MIPS32r2.


Changelog -

gcc/ChangeLog:
2014-04-22 Zoran Jovanovic (zoran.jovano...@imgtec.com)
  * common.opt (fmerge-bitfields): New option.
  * doc/invoke.texi: Add reference to "-fmerge-bitfields".
  * doc/invoke.texi: Add "-fmerge-bitfields" to the list of optimization
flags turned on at -O2.
  * tree-sra.c (lower_bitfields): New function.
  Entry for (-fmerge-bitfields).
  (part_of_union_p): New function.
  (bf_access_candidate_p): New function.
  (lower_bitfield_read): New function.
  (lower_bitfield_write): New function.
  (bitfield_stmt_bfcopy_pair::hash): New function.
  (bitfield_stmt_bfcopy_pair::equal): New function.
  (bitfield_stmt_bfcopy_pair::remove): New function.
  (create_and_insert_bfcopy): New function.
  (get_bit_offset): New function.
  (add_stmt_bfcopy_pair): New function.
  (cmp_bfcopies): New function.
  (get_merged_bit_field_size): New function.
  * dwarf2out.c (simple_type_size_in_bits): Move to tree.c.
  (field_byte_offset): Move declaration to tree.h and make it extern.
  * testsuite/gcc.dg/tree-ssa/bitfldmrg1.c: New test.
  * testsuite/gcc.dg/tree-ssa/bitfldmrg2.c: New test.
  * tree-ssa-sccvn.c (expressions_equal_p): Move to tree.c.
  * tree-ssa-sccvn.h (expressions_equal_p): Move declaration to tree.h.
  * tree.c (expressions_equal_p): Move from tree-ssa-sccvn.c.
  (simple_type_size_in_bits): Move from dwarf2out.c.
  * tree.h (expressions_equal_p): Add declaration.
  (field_byte_offset): Add declaration.

Patch -


diff --git a/gcc/common.opt b/gcc/common.opt
index 5db5e1e..cec145c 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2270,6 +2270,10 @@ ftree-sra
 Common Report Var(flag_tree_sra) Optimization
 Perform scalar replacement of aggregates
 
+fmerge-bitfields
+Common Report Var(flag_tree_bitfield_merge) Optimization
+Merge loads and stores of consecutive bitfields
+
 ftree-ter
 Common

[PATCH] Add a new option "-ftree-bitfield-merge" (patch / doc inside)

2013-07-17 Thread Zoran Jovanovic
Hello,
This patch adds new optimization pass that combines several adjacent bit field 
accesses that copy values from one memory location to another into single bit 
field access.

Example:

Original code:
   D.1351;
   D.1350;
   D.1349;
  D.1349_2 = p1_1(D)->f1;
  p2_3(D)->f1 = D.1349_2;
  D.1350_4 = p1_1(D)->f2;
  p2_3(D)->f2 = D.1350_4;
  D.1351_5 = p1_1(D)->f3;
  p2_3(D)->f3 = D.1351_5;

Optimized code:
   D.1358;
  D.1358_10 = BIT_FIELD_REF <*p1_1(D), 19, 13>;
  BIT_FIELD_REF <*p2_3(D), 19, 13> = D.1358_10;

Algorithm works on basic block level and consists of following 3 major steps:
1. Go trough basic block statements list. If there are statement pairs that 
implement copy of bit field content from one memory location to another record 
statements pointers and other necessary data in corresponding data structure.
2. Identify records that represent adjacent bit field accesses and mark them as 
merged.
3. Modify trees accordingly.

New command line option "-ftree-bitfield-merge" is introduced.

Tested - passed gcc regression tests.

Changelog -

gcc/ChangeLog:
2013-07-17 Zoran Jovanovic (zoran.jovano...@imgtec.com)
  * Makefile.in : Added tree-ssa-bitfield-merge.o to OBJS.
  * common.opt (ftree-bitfield-merge): New option.
  * doc/invoke.texi: Added reference to "-ftree-bitfield-merge".
  * dwarf2out.c (field_type): static removed from declaration.
  (simple_type_size_in_bits): static removed from declaration.
  (field_byte_offset): static removed from declaration.
  (field_type): static inline removed from declaration.
  * passes.c (init_optimization_passes): pass_bitfield_merge pass added.
  * testsuite/gcc.dg/tree-ssa/bitfldmrg.c: New test.
  * timevar.def : Added TV_TREE_BITFIELD_MERGE.
  * tree-pass.h : Added pass_bitfield_merge declaration.
  * tree-ssa-bitfield-merge.c : New file.

Patch -

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index d5121f3..5cdd6eb 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1417,6 +1417,7 @@ OBJS = \
    tree-ssa-dom.o \
    tree-ssa-dse.o \
    tree-ssa-forwprop.o \
+   tree-ssa-bitfield-merge.o \
    tree-ssa-ifcombine.o \
    tree-ssa-live.o \
    tree-ssa-loop-ch.o \
@@ -2312,6 +2313,11 @@ tree-ssa-forwprop.o : tree-ssa-forwprop.c $(CONFIG_H) 
$(SYSTEM_H) coretypes.h \
    $(TREE_FLOW_H) $(TREE_PASS_H) $(DIAGNOSTIC_H) \
    langhooks.h $(FLAGS_H) $(GIMPLE_H) $(GIMPLE_PRETTY_PRINT_H) $(EXPR_H) \
    $(OPTABS_H) tree-ssa-propagate.h
+tree-ssa-bitfield-merge.o : tree-ssa-forwprop.c $(CONFIG_H) $(SYSTEM_H) \
+   coretypes.h $(TM_H) $(TREE_H) $(TM_P_H) $(BASIC_BLOCK_H) \
+   $(TREE_FLOW_H) $(TREE_PASS_H) $(TREE_DUMP_H) $(DIAGNOSTIC_H) $(TIMEVAR_H) \
+   langhooks.h $(FLAGS_H) $(GIMPLE_H) tree-pretty-print.h \
+   gimple-pretty-print.h $(EXPR_H)
 tree-ssa-phiprop.o : tree-ssa-phiprop.c $(CONFIG_H) $(SYSTEM_H) coretypes.h \
    $(TM_H) $(TREE_H) $(TM_P_H) $(BASIC_BLOCK_H) \
    $(TREE_FLOW_H) $(TREE_PASS_H) $(DIAGNOSTIC_H) \
@@ -3803,6 +3809,7 @@ GTFILES = $(CPP_ID_DATA_H) $(srcdir)/input.h 
$(srcdir)/coretypes.h \
   $(srcdir)/ipa-inline.h \
   $(srcdir)/asan.c \
   $(srcdir)/tsan.c \
+  $(srcdir)/tree-ssa-bitfield-merge.c \
   @all_gtfiles@
 
 # Compute the list of GT header files from the corresponding C sources,
diff --git a/gcc/common.opt b/gcc/common.opt
index 4c7933e..e0dbc37 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2088,6 +2088,10 @@ ftree-forwprop
 Common Report Var(flag_tree_forwprop) Init(1) Optimization
 Enable forward propagation on trees
 
+ftree-bitfield-merge
+Common Report Var(flag_tree_bitfield_merge) Init(0) Optimization
+Enable bit field merge on trees
+
 ftree-fre
 Common Report Var(flag_tree_fre) Optimization
 Enable Full Redundancy Elimination (FRE) on trees
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index dd82880..7b671aa 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -409,7 +409,7 @@ Objective-C and Objective-C++ Dialects}.
 -fsplit-ivs-in-unroller -fsplit-wide-types -fstack-protector @gol
 -fstack-protector-all -fstack-protector-strong -fstrict-aliasing @gol
 -fstrict-overflow -fthread-jumps -ftracer -ftree-bit-ccp @gol
--ftree-builtin-call-dce -ftree-ccp -ftree-ch @gol
+-ftree-bitfield-merge -ftree-builtin-call-dce -ftree-ccp -ftree-ch @gol
 -ftree-coalesce-inline-vars -ftree-coalesce-vars -ftree-copy-prop @gol
 -ftree-copyrename -ftree-dce -ftree-dominator-opts -ftree-dse @gol
 -ftree-forwprop -ftree-fre -ftree-loop-if-convert @gol
@@ -7582,6 +7582,11 @@ pointer alignment information.
 This pass only operates on local scalar variables and is enabled by default
 at @option{-O} and higher.  It requires that @option{-ftree-ccp} is enabled.
 
+@item -ftree-bitfield-merge
+@opindex ftree-bitfield-merge
+Combines several adjacent bit field accesses that copy values
+from one memory location to another into single bit field access.
+
 @item -ftree-ccp
 @opindex ftree-ccp
 Perform sparse conditional constant propagation 

RE: [PATCH] Add a new option "-ftree-bitfield-merge" (patch / doc inside)

2013-07-30 Thread Zoran Jovanovic
Thank you for the reply.
I am in the process of modifying the patch according to some comments received.
Currently I am considering the usage of DECL_BIT_FIELD_REPRESENTATIVE.
I see that they can be used during analysis phase for deciding which accesses 
can be merged - only accesses with same representative will be merged.
I have more dilemmas with usage of representatives for lowering. If my 
understanding is correct bit field representative can only be associated to a 
field declaration, and not to a BIT_FIELD_REF node. As a consequence 
optimization must use COMPONENT_REF to model new bit field access (which should 
be an equivalent of several merged accesses). To use COMPONENT_REF a new field 
declaration with appropriate bit size (equal to sum of bit sizes of all merged 
bit field accesses) must be created and then corresponding bit field 
representative could be attached.
Is my understanding correct? Is creating a new field declaration for every set 
of merged bit field accesses acceptable?

Regards,
Zoran Jovanovic


From: Richard Biener [richard.guent...@gmail.com]
Sent: Thursday, July 18, 2013 11:31 AM
To: Zoran Jovanovic; gcc-patches@gcc.gnu.org
Cc: Petar Jovanovic
Subject: Re: [PATCH] Add a new option "-ftree-bitfield-merge" (patch / doc 
inside)

Zoran Jovanovic  wrote:

>Hello,
>This patch adds new optimization pass that combines several adjacent
>bit field accesses that copy values from one memory location to another
>into single bit field access.
>
>Example:
>
>Original code:
>   D.1351;
>   D.1350;
>   D.1349;
>  D.1349_2 = p1_1(D)->f1;
>  p2_3(D)->f1 = D.1349_2;
>  D.1350_4 = p1_1(D)->f2;
>  p2_3(D)->f2 = D.1350_4;
>  D.1351_5 = p1_1(D)->f3;
>  p2_3(D)->f3 = D.1351_5;
>
>Optimized code:
>   D.1358;
>  D.1358_10 = BIT_FIELD_REF <*p1_1(D), 19, 13>;
>  BIT_FIELD_REF <*p2_3(D), 19, 13> = D.1358_10;
>
>Algorithm works on basic block level and consists of following 3 major
>steps:
>1. Go trough basic block statements list. If there are statement pairs
>that implement copy of bit field content from one memory location to
>another record statements pointers and other necessary data in
>corresponding data structure.
>2. Identify records that represent adjacent bit field accesses and mark
>them as merged.
>3. Modify trees accordingly.

All this should use BITFIELD_REPRESENTATIVE both to decide what accesses are 
related and for the lowering. This makes sure to honor the appropriate memory 
models.

In theory only lowering is necessary and FRE and DSE will do the job of 
optimizing - also properly accounting for alias issues that Joseph mentions. 
The lowering and analysis is strongly related to SRA So I don't believe we want 
a new pass for this.

Richard.




RE: [PATCH] Add a new option "-ftree-bitfield-merge" (patch / doc inside)

2014-03-09 Thread Zoran Jovanovic
Hello,
This is new patch version. 
Approach suggested by Richard Biener with lowering bit-field accesses instead 
of modifying gimple trees is implemented.
Although, algorithm still detects adjacent bit field accesses, which copy 
values from one to another bit-field of same type.
If those accesses are detected field size used during lowering is equal to sum 
of sizes of all adjacent fields that can be merged. 
Idea is to let dse and cse to remove unnecessary instructions afterwards.
I wanted to preserve this behavior because it was one of original goals of this 
work.


Example:

Original code:
  :
  _2 = pr1.f1;
  pr2.f1 = _2;
  _4 = pr1.f2;
  pr2.f2 = _4;
  _6 = pr1.f3;
  pr2.f3 = _6;
  return;


Optimized code:
  :
  _8 = pr1.D.1364;
  _9 = BIT_FIELD_REF <_8, 13, 0>;
  _10 = pr2.D.1364;
  _11 = _10 & 4294959104;
  _12 = (unsigned int) _9;
  _13 = _11 | _12;
  pr2.D.1364 = _13;
  _14 = pr1.D.1364;
  _15 = BIT_FIELD_REF <_14, 13, 0>;
  _16 = pr2.D.1364;
  _17 = _16 & 4294959104;
  _18 = (unsigned int) _15;
  _19 = _17 | _18;
  pr2.D.1364 = _19;
  _20 = pr1.D.1364;
  _21 = BIT_FIELD_REF <_20, 13, 0>;
  _22 = pr2.D.1364;
  _23 = _22 & 4294959104;
  _24 = (unsigned int) _21;
  _25 = _23 | _24;
  pr2.D.1364 = _25;
  return;
  

Algorithm works on basic block level and consists of following 3 major steps:
1. Go through basic block statements list. If there are statement pairs that 
implement copy of bit field content from one memory location to another record 
statements pointers and other necessary data in corresponding data structure.
2. Identify records that represent adjacent bit field accesses and mark them as 
merged.
3. Lower bit-field accesses by using new field size for those that can be 
merged.


New command line option "-fmerge-bitfields" is introduced.

Tested - passed gcc regression tests.

Changelog -

gcc/ChangeLog:
2014-03-09 Zoran Jovanovic (zoran.jovano...@imgtec.com)
  * common.opt (fmerge-bitfields): New option.
  * doc/invoke.texi: Added reference to "-fmerge-bitfields".
  * tree-sra.c (lower_bitfields): New function.
  Entry for (-fmerge-bitfields).
  (bfaccess::hash): New function.
  (bfaccess::equal): New function.
  (bfaccess::remove): New function.
  (bitfield_access_p): New function.
  (lower_bitfield_read): New function.
  (lower_bitfield_write): New function.
  (bitfield_stmt_access_pair_htab_hash): New function.
  (bitfield_stmt_access_pair_htab_eq): New function.
  (create_and_insert_access): New function.
  (get_bit_offset): New function.
  (get_merged_bit_field_size): New function.
  (add_stmt_access_pair): New function.
  (cmp_access): New function.
  * dwarf2out.c (simple_type_size_in_bits): moved to tree.c.
  (field_byte_offset): declaration moved to tree.h, static removed.
  * testsuite/gcc.dg/tree-ssa/bitfldmrg1.c: New test.
  * testsuite/gcc.dg/tree-ssa/bitfldmrg2.c: New test.
  * tree-ssa-sccvn.c (expressions_equal_p): moved to tree.c.
  * tree-ssa-sccvn.h (expressions_equal_p): declaration moved to tree.h.
  * tree.c (expressions_equal_p): moved from tree-ssa-sccvn.c.
  (simple_type_size_in_bits): moved from dwarf2out.c.
  * tree.h (expressions_equal_p): declaration added.
  (field_byte_offset): declaration added.

Patch -

diff --git a/gcc/common.opt b/gcc/common.opt
index 661516d..3331d03 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2193,6 +2193,10 @@ ftree-sra
 Common Report Var(flag_tree_sra) Optimization
 Perform scalar replacement of aggregates
 
+fmerge-bitfields
+Common Report Var(flag_tree_bitfield_merge) Init(0) Optimization
+Merge loads and stores of consecutive bitfields
+
 ftree-ter
 Common Report Var(flag_tree_ter) Optimization
 Replace temporary expressions in the SSA->normal pass
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 24bd76e..54bae56 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -411,7 +411,7 @@ Objective-C and Objective-C++ Dialects}.
 -fsplit-ivs-in-unroller -fsplit-wide-types -fstack-protector @gol
 -fstack-protector-all -fstack-protector-strong -fstrict-aliasing @gol
 -fstrict-overflow -fthread-jumps -ftracer -ftree-bit-ccp @gol
--ftree-builtin-call-dce -ftree-ccp -ftree-ch @gol
+-fmerge-bitfields -ftree-builtin-call-dce -ftree-ccp -ftree-ch @gol
 -ftree-coalesce-inline-vars -ftree-coalesce-vars -ftree-copy-prop @gol
 -ftree-copyrename -ftree-dce -ftree-dominator-opts -ftree-dse @gol
 -ftree-forwprop -ftree-fre -ftree-loop-if-convert @gol
@@ -7807,6 +7807,11 @@ pointer alignment information.
 This pass only operates on local scalar variables and is enabled by default
 at @option{-O} and higher.  It requires that @option{-ftree-ccp} is enabled.
 
+@item -fbitfield-merge
+@opindex fmerge-bitfields
+Combines several adjacent bit-field accesses that copy values
+from one memory location to another into one single bit-field access.
+
 @item -ftree-ccp
 @opindex ftree-ccp
 Perform sparse conditional constant propagation (CCP) on trees.  This
diff --

RE: [PATCH] Add a new option "-fmerge-bitfields" (patch / doc inside)

2014-04-16 Thread Zoran Jovanovic
Hello,
This is new patch version. 
Lowering is applied only for bit-fields copy sequences that are merged.
Data structure representing bit-field copy sequences is renamed and reduced in 
size.
Optimization turned on by default for -O2 and higher.
Some comments fixed.

Benchmarking performed on WebKit for Android.
Code size reduction noticed on several files, best examples are:

core/rendering/style/StyleMultiColData (632->520 bytes)
core/platform/graphics/FontDescription (1715->1475 bytes)
core/rendering/style/FillLayer (5069->4513 bytes)
core/rendering/style/StyleRareInheritedData (5618->5346)
core/css/CSSSelectorList(4047->3887)
core/platform/animation/CSSAnimationData (3844->3440 bytes)
core/css/resolver/FontBuilder (13818->13350 bytes)
core/platform/graphics/Font (16447->15975 bytes)


Example:

One of the motivating examples for this work was copy constructor of the class 
which contains bit-fields.

C++ code:
class A
{
public:
A(const A &x);
unsigned a : 1;
unsigned b : 2;
unsigned c : 4;
};

A::A(const A&x)
{
a = x.a;
b = x.b;
c = x.c;
}

GIMPLE code without optimization:

  :
  _3 = x_2(D)->a;
  this_4(D)->a = _3;
  _6 = x_2(D)->b;
  this_4(D)->b = _6;
  _8 = x_2(D)->c;
  this_4(D)->c = _8;
  return;

Optimized GIMPLE code:
  :
  _10 = x_2(D)->D.1867;
  _11 = BIT_FIELD_REF <_10, 7, 0>;
  _12 = this_4(D)->D.1867;
  _13 = _12 & 128;
  _14 = (unsigned char) _11;
  _15 = _13 | _14;
  this_4(D)->D.1867 = _15;
  return;

Generated MIPS32r2 assembly code without optimization:
 lw  $3,0($5)
lbu $2,0($4)
andi$3,$3,0x1
andi$2,$2,0xfe
or  $2,$2,$3
sb  $2,0($4)
lw  $3,0($5)
andi$2,$2,0xf9
andi$3,$3,0x6
or  $2,$2,$3
sb  $2,0($4)
lw  $3,0($5)
andi$2,$2,0x87
andi$3,$3,0x78
or  $2,$2,$3
j   $31
sb  $2,0($4)

Optimized MIPS32r2 assembly code:
lw  $3,0($5)
lbu $2,0($4)
andi$3,$3,0x7f
andi$2,$2,0x80
or  $2,$3,$2
j   $31
sb  $2,0($4)


Algorithm works on basic block level and consists of following 3 major steps:
1. Go through basic block statements list. If there are statement pairs that 
implement copy of bit field content from one memory location to another record 
statements pointers and other necessary data in corresponding data structure.
2. Identify records that represent adjacent bit field accesses and mark them as 
merged.
3. Lower bit-field accesses by using new field size for those that can be 
merged.


New command line option "-fmerge-bitfields" is introduced.


Tested - passed gcc regression tests for MIPS32r2.


Changelog -

gcc/ChangeLog:
2014-04-16 Zoran Jovanovic (zoran.jovano...@imgtec.com)
  * common.opt (fmerge-bitfields): New option.
  * doc/invoke.texi: Add reference to "-fmerge-bitfields".
  * tree-sra.c (lower_bitfields): New function.
  Entry for (-fmerge-bitfields).
  (bf_access_candidate_p): New function.
  (lower_bitfield_read): New function.
  (lower_bitfield_write): New function.
  (bitfield_stmt_bfcopy_pair::hash): New function.
  (bitfield_stmt_bfcopy_pair::equal): New function.
  (bitfield_stmt_bfcopy_pair::remove): New function.
  (create_and_insert_bfcopy): New function.
  (get_bit_offset): New function.
  (add_stmt_bfcopy_pair): New function.
  (cmp_bfcopies): New function.
  (get_merged_bit_field_size): New function.
  * dwarf2out.c (simple_type_size_in_bits): Move to tree.c.
  (field_byte_offset): Move declaration to tree.h and make it extern.
  * testsuite/gcc.dg/tree-ssa/bitfldmrg1.c: New test.
  * testsuite/gcc.dg/tree-ssa/bitfldmrg2.c: New test.
  * tree-ssa-sccvn.c (expressions_equal_p): Move to tree.c.
  * tree-ssa-sccvn.h (expressions_equal_p): Move declaration to tree.h.
  * tree.c (expressions_equal_p): Move from tree-ssa-sccvn.c.
  (simple_type_size_in_bits): Move from dwarf2out.c.
  * tree.h (expressions_equal_p): Add declaration.
  (field_byte_offset): Add declaration.

Patch -

diff --git a/gcc/common.opt b/gcc/common.opt
index da275e5..52c7f58 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2203,6 +2203,10 @@ ftree-sra
 Common Report Var(flag_tree_sra) Optimization
 Perform scalar replacement of aggregates
 
+fmerge-bitfields
+Common Report Var(flag_tree_bitfield_merge) Optimization
+Merge loads and stores of consecutive bitfields
+
 ftree-ter
 Common Report Var(flag_tree_ter) Optimization
 Replace temporary expressions in the SSA->normal pass
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 3fdfeb9..546638e 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -411,7 +411,7 @@ Objective-C and Objective-C++ Dialects}.
 -fsplit-ivs-in-unroller -fsplit-wide-types -fstack-protector @gol
 -fstack-protector-all -fsta

RE: [PATCH] Add a new option "-fmerge-bitfields" (patch / doc inside)

2014-04-17 Thread Zoran Jovanovic
Hello,
My apologies for inconvenience.
Removed every appearance of -ftree-bitfield-merge from the patch and fixed an 
issue with unions.
The rest of the patch is the same as before.

Regards,
Zoran Jovanovic

--

Lowering is applied only for bit-fields copy sequences that are merged.
Data structure representing bit-field copy sequences is renamed and reduced in 
size.
Optimization turned on by default for -O2 and higher.
Some comments fixed.

Benchmarking performed on WebKit for Android.
Code size reduction noticed on several files, best examples are:

core/rendering/style/StyleMultiColData (632->520 bytes)
core/platform/graphics/FontDescription (1715->1475 bytes)
core/rendering/style/FillLayer (5069->4513 bytes)
core/rendering/style/StyleRareInheritedData (5618->5346)
core/css/CSSSelectorList(4047->3887)
core/platform/animation/CSSAnimationData (3844->3440 bytes)
core/css/resolver/FontBuilder (13818->13350 bytes)
core/platform/graphics/Font (16447->15975 bytes)


Example:

One of the motivating examples for this work was copy constructor of the class 
which contains bit-fields.

C++ code:
class A
{
public:
A(const A &x);
unsigned a : 1;
unsigned b : 2;
unsigned c : 4;
};

A::A(const A&x)
{
a = x.a;
b = x.b;
c = x.c;
}

GIMPLE code without optimization:

  :
  _3 = x_2(D)->a;
  this_4(D)->a = _3;
  _6 = x_2(D)->b;
  this_4(D)->b = _6;
  _8 = x_2(D)->c;
  this_4(D)->c = _8;
  return;

Optimized GIMPLE code:
  :
  _10 = x_2(D)->D.1867;
  _11 = BIT_FIELD_REF <_10, 7, 0>;
  _12 = this_4(D)->D.1867;
  _13 = _12 & 128;
  _14 = (unsigned char) _11;
  _15 = _13 | _14;
  this_4(D)->D.1867 = _15;
  return;

Generated MIPS32r2 assembly code without optimization:
 lw  $3,0($5)
lbu $2,0($4)
andi$3,$3,0x1
andi$2,$2,0xfe
or  $2,$2,$3
sb  $2,0($4)
lw  $3,0($5)
andi$2,$2,0xf9
andi$3,$3,0x6
or  $2,$2,$3
sb  $2,0($4)
lw  $3,0($5)
andi$2,$2,0x87
andi$3,$3,0x78
or  $2,$2,$3
j   $31
sb  $2,0($4)

Optimized MIPS32r2 assembly code:
lw  $3,0($5)
lbu $2,0($4)
andi$3,$3,0x7f
andi$2,$2,0x80
or  $2,$3,$2
j   $31
sb  $2,0($4)


Algorithm works on basic block level and consists of following 3 major steps:
1. Go through basic block statements list. If there are statement pairs that 
implement copy of bit field content from one memory location to another record 
statements pointers and other necessary data in corresponding data structure.
2. Identify records that represent adjacent bit field accesses and mark them as 
merged.
3. Lower bit-field accesses by using new field size for those that can be 
merged.


New command line option "-fmerge-bitfields" is introduced.


Tested - passed gcc regression tests for MIPS32r2.


Changelog -

gcc/ChangeLog:
2014-04-16 Zoran Jovanovic (zoran.jovano...@imgtec.com)
  * common.opt (fmerge-bitfields): New option.
  * doc/invoke.texi: Add reference to "-fmerge-bitfields".
  * tree-sra.c (lower_bitfields): New function.
  Entry for (-fmerge-bitfields).
  (part_of_union_p): New function.
  (bf_access_candidate_p): New function.
  (lower_bitfield_read): New function.
  (lower_bitfield_write): New function.
  (bitfield_stmt_bfcopy_pair::hash): New function.
  (bitfield_stmt_bfcopy_pair::equal): New function.
  (bitfield_stmt_bfcopy_pair::remove): New function.
  (create_and_insert_bfcopy): New function.
  (get_bit_offset): New function.
  (add_stmt_bfcopy_pair): New function.
  (cmp_bfcopies): New function.
  (get_merged_bit_field_size): New function.
  * dwarf2out.c (simple_type_size_in_bits): Move to tree.c.
  (field_byte_offset): Move declaration to tree.h and make it extern.
  * testsuite/gcc.dg/tree-ssa/bitfldmrg1.c: New test.
  * testsuite/gcc.dg/tree-ssa/bitfldmrg2.c: New test.
  * tree-ssa-sccvn.c (expressions_equal_p): Move to tree.c.
  * tree-ssa-sccvn.h (expressions_equal_p): Move declaration to tree.h.
  * tree.c (expressions_equal_p): Move from tree-ssa-sccvn.c.
  (simple_type_size_in_bits): Move from dwarf2out.c.
  * tree.h (expressions_equal_p): Add declaration.
  (field_byte_offset): Add declaration.

Patch -

diff --git a/gcc/common.opt b/gcc/common.opt
index da275e5..52c7f58 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2203,6 +2203,10 @@ ftree-sra
 Common Report Var(flag_tree_sra) Optimization
 Perform scalar replacement of aggregates
 
+fmerge-bitfields
+Common Report Var(flag_tree_bitfield_merge) Optimization
+Merge loads and stores of consecutive bitfields
+
 ftree-ter
 Common Report Var(flag_tree_ter) Optimization
 Replace temporary e

RE: [PATCH] Add a new option "-fmerge-bitfields" (patch / doc inside)

2014-04-17 Thread Zoran Jovanovic
Hello,
Unfortunately, optimization is limited only to bit-fields that have same 
bit-field representative (DECL_BIT_FIELD_REPRESENTATIVE), and fields from 
different classes do have different representatives.
In given example optimization would merge accesses to x and y bit-fields from 
Base class, but not the access to z from Der class.

Regards,
Zoran
  

From: Daniel Gutson [daniel.gut...@tallertechnologies.com]
Sent: Wednesday, April 16, 2014 4:16 PM
To: Zoran Jovanovic
Cc: Bernhard Reutner-Fischer; Richard Biener; gcc-patches@gcc.gnu.org
Subject: Re: [PATCH] Add a new option "-fmerge-bitfields" (patch / doc inside)

On Wed, Apr 16, 2014 at 8:38 AM, Zoran Jovanovic
 wrote:
> Hello,
> This is new patch version.
> Lowering is applied only for bit-fields copy sequences that are merged.
> Data structure representing bit-field copy sequences is renamed and reduced 
> in size.
> Optimization turned on by default for -O2 and higher.
> Some comments fixed.
>
> Benchmarking performed on WebKit for Android.
> Code size reduction noticed on several files, best examples are:
>
> core/rendering/style/StyleMultiColData (632->520 bytes)
> core/platform/graphics/FontDescription (1715->1475 bytes)
> core/rendering/style/FillLayer (5069->4513 bytes)
> core/rendering/style/StyleRareInheritedData (5618->5346)
> core/css/CSSSelectorList(4047->3887)
> core/platform/animation/CSSAnimationData (3844->3440 bytes)
> core/css/resolver/FontBuilder (13818->13350 bytes)
> core/platform/graphics/Font (16447->15975 bytes)
>
>
> Example:
>
> One of the motivating examples for this work was copy constructor of the 
> class which contains bit-fields.
>
> C++ code:
> class A
> {
> public:
> A(const A &x);
> unsigned a : 1;
> unsigned b : 2;
> unsigned c : 4;
> };
>
> A::A(const A&x)
> {
> a = x.a;
> b = x.b;
> c = x.c;
> }

Very interesting.

Does this work with inheritance too? E.g.

struct Base
{
uint32_t x:1;
uint32_t y:3;

Base(const Base& other) { x = other.x; y = other.y; }
};

struct Der : Base
{
Der() = default;
Der(const Der& other) : Base(other)
{ z = other.z; }

uint32_t z:9;
};


>
> GIMPLE code without optimization:
>
>   :
>   _3 = x_2(D)->a;
>   this_4(D)->a = _3;
>   _6 = x_2(D)->b;
>   this_4(D)->b = _6;
>   _8 = x_2(D)->c;
>   this_4(D)->c = _8;
>   return;
>
> Optimized GIMPLE code:
>   :
>   _10 = x_2(D)->D.1867;
>   _11 = BIT_FIELD_REF <_10, 7, 0>;
>   _12 = this_4(D)->D.1867;
>   _13 = _12 & 128;
>   _14 = (unsigned char) _11;
>   _15 = _13 | _14;
>   this_4(D)->D.1867 = _15;
>   return;
>
> Generated MIPS32r2 assembly code without optimization:
>  lw  $3,0($5)
> lbu $2,0($4)
> andi$3,$3,0x1
> andi$2,$2,0xfe
> or  $2,$2,$3
> sb  $2,0($4)
> lw  $3,0($5)
> andi$2,$2,0xf9
> andi$3,$3,0x6
> or  $2,$2,$3
> sb  $2,0($4)
> lw  $3,0($5)
> andi$2,$2,0x87
> andi$3,$3,0x78
> or  $2,$2,$3
> j   $31
> sb  $2,0($4)
>
> Optimized MIPS32r2 assembly code:
> lw  $3,0($5)
> lbu $2,0($4)
> andi$3,$3,0x7f
> andi$2,$2,0x80
> or  $2,$3,$2
> j   $31
> sb  $2,0($4)
>
>
> Algorithm works on basic block level and consists of following 3 major steps:
> 1. Go through basic block statements list. If there are statement pairs that 
> implement copy of bit field content from one memory location to another 
> record statements pointers and other necessary data in corresponding data 
> structure.
> 2. Identify records that represent adjacent bit field accesses and mark them 
> as merged.
> 3. Lower bit-field accesses by using new field size for those that can be 
> merged.
>
>
> New command line option "-fmerge-bitfields" is introduced.
>
>
> Tested - passed gcc regression tests for MIPS32r2.
>
>
> Changelog -
>
> gcc/ChangeLog:
> 2014-04-16 Zoran Jovanovic (zoran.jovano...@imgtec.com)
>   * common.opt (fmerge-bitfields): New option.
>   * doc/invoke.texi: Add reference to "-fmerge-bitfields".
>   * tree-sra.c (lower_bitfields): New function.
>   Entry for (-fmerge-bitfields).
>   (bf_access_candidate_p): New function.
>   (lower_bitfield_read): New function.
>   (lower_bitfield_write): New function.
>   (bitfield_stmt_bfcopy_pair::hash): New function.
>   (bitfield_stmt

RE: [PATCH] Add a new option "-fmerge-bitfields" (patch / doc inside)

2014-04-22 Thread Zoran Jovanovic
Hello,
Updated doc/invoke.texi by stating that new option is enabled by default at -O2 
and higher. 
Also, -fmerge-bitfields added to the list  of optimization flags enabled by 
default  at -O2 and higher.

Regards,
Zoran Jovanovic

--

Lowering is applied only for bit-fields copy sequences that are merged.
Data structure representing bit-field copy sequences is renamed and reduced in 
size.
Optimization turned on by default for -O2 and higher.
Some comments fixed.

Benchmarking performed on WebKit for Android.
Code size reduction noticed on several files, best examples are:

core/rendering/style/StyleMultiColData (632->520 bytes)
core/platform/graphics/FontDescription (1715->1475 bytes)
core/rendering/style/FillLayer (5069->4513 bytes)
core/rendering/style/StyleRareInheritedData (5618->5346)
core/css/CSSSelectorList(4047->3887)
core/platform/animation/CSSAnimationData (3844->3440 bytes)
core/css/resolver/FontBuilder (13818->13350 bytes)
core/platform/graphics/Font (16447->15975 bytes)


Example:

One of the motivating examples for this work was copy constructor of the class 
which contains bit-fields.

C++ code:
class A
{
public:
A(const A &x);
unsigned a : 1;
unsigned b : 2;
unsigned c : 4;
};

A::A(const A&x)
{
a = x.a;
b = x.b;
c = x.c;
}

GIMPLE code without optimization:

  :
  _3 = x_2(D)->a;
  this_4(D)->a = _3;
  _6 = x_2(D)->b;
  this_4(D)->b = _6;
  _8 = x_2(D)->c;
  this_4(D)->c = _8;
  return;

Optimized GIMPLE code:
  :
  _10 = x_2(D)->D.1867;
  _11 = BIT_FIELD_REF <_10, 7, 0>;
  _12 = this_4(D)->D.1867;
  _13 = _12 & 128;
  _14 = (unsigned char) _11;
  _15 = _13 | _14;
  this_4(D)->D.1867 = _15;
  return;

Generated MIPS32r2 assembly code without optimization:
 lw  $3,0($5)
lbu $2,0($4)
andi$3,$3,0x1
andi$2,$2,0xfe
or  $2,$2,$3
sb  $2,0($4)
lw  $3,0($5)
andi$2,$2,0xf9
andi$3,$3,0x6
or  $2,$2,$3
sb  $2,0($4)
lw  $3,0($5)
andi$2,$2,0x87
andi$3,$3,0x78
or  $2,$2,$3
j   $31
sb  $2,0($4)

Optimized MIPS32r2 assembly code:
lw  $3,0($5)
lbu $2,0($4)
andi$3,$3,0x7f
andi$2,$2,0x80
or  $2,$3,$2
j   $31
sb  $2,0($4)


Algorithm works on basic block level and consists of following 3 major steps:
1. Go through basic block statements list. If there are statement pairs that 
implement copy of bit field content from one memory location to another record 
statements pointers and other necessary data in corresponding data structure.
2. Identify records that represent adjacent bit field accesses and mark them as 
merged.
3. Lower bit-field accesses by using new field size for those that can be 
merged.


New command line option "-fmerge-bitfields" is introduced.


Tested - passed gcc regression tests for MIPS32r2.


Changelog -

gcc/ChangeLog:
2014-04-22 Zoran Jovanovic (zoran.jovano...@imgtec.com)
  * common.opt (fmerge-bitfields): New option.
  * doc/invoke.texi: Add reference to "-fmerge-bitfields".
  * doc/invoke.texi: Add "-fmerge-bitfields" to the list of optimization
flags turned on at -O2.
  * tree-sra.c (lower_bitfields): New function.
  Entry for (-fmerge-bitfields).
  (part_of_union_p): New function.
  (bf_access_candidate_p): New function.
  (lower_bitfield_read): New function.
  (lower_bitfield_write): New function.
  (bitfield_stmt_bfcopy_pair::hash): New function.
  (bitfield_stmt_bfcopy_pair::equal): New function.
  (bitfield_stmt_bfcopy_pair::remove): New function.
  (create_and_insert_bfcopy): New function.
  (get_bit_offset): New function.
  (add_stmt_bfcopy_pair): New function.
  (cmp_bfcopies): New function.
  (get_merged_bit_field_size): New function.
  * dwarf2out.c (simple_type_size_in_bits): Move to tree.c.
  (field_byte_offset): Move declaration to tree.h and make it extern.
  * testsuite/gcc.dg/tree-ssa/bitfldmrg1.c: New test.
  * testsuite/gcc.dg/tree-ssa/bitfldmrg2.c: New test.
  * tree-ssa-sccvn.c (expressions_equal_p): Move to tree.c.
  * tree-ssa-sccvn.h (expressions_equal_p): Move declaration to tree.h.
  * tree.c (expressions_equal_p): Move from tree-ssa-sccvn.c.
  (simple_type_size_in_bits): Move from dwarf2out.c.
  * tree.h (expressions_equal_p): Add declaration.
  (field_byte_offset): Add declaration.

Patch -

diff --git a/gcc/common.opt b/gcc/common.opt
index da275e5..52c7f58 100644
--- a/gcc/common.opt
+++ b/gcc/common.opt
@@ -2203,6 +2203,10 @@ ftree-sra
 Common Report Var(flag_tree_sra) Optimization
 Perform scalar replacement of aggregates
 
+fmerge-bitfields
+Common Report Var(flag_tree_bitfield_m