[Bug c++/45548] New: Add with carry - missed optimization

2010-09-05 Thread tmartsum at gmail dot com
This is very related to this bug (43892):
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=43892

There are many ways to try to add with carry - and difficult to catch them all.
I really 'tried to think like a compiler' when I wrote the following
(C++ Intel 32bit code) code:
(not even strict correctly c++. It won't work with AMD64 - since long long is
64 bit - just like unsigned long - and __int128 isn't quite there yet).

// Data structures:
struct Skew1Even
{
  unsigned long long data; // This could be an array 
  unsigned long unused;
};

struct Skew2Odd
{
  unsigned long unused;
  unsigned long long data;  // This could be an array
};

struct ULongLongLong
{
  union
  {
unsigned long m_data[3];
Skew1 m_rep1;
Skew2 m_rep2;
  };
  ULongLongLong()
  {
m_data[0]=0;
m_data[1]=0;
m_data[2]=0;
  }
//  void print() {  std::cout << m_data[0] << "," << m_data[1] << "," << //
m_data[2] << "\n";}
  void addtest(const ULongLongLong &b); // operator += 
};

The addtest is the important part:
void ULongLongLong::addtest(const ULongLongLong &b)
{
//  if (this==&b) // removed to make the example easier
//doTimes2();
  m_rep1.data+=b.m_data[0];
  m_rep2.data+=b.m_data[1];
  m_data[2]+=b.m_data[2];
}

The main point in my code is also in the compiled code (but not used by the
compiler). What I hoped to happen was that gcc saw that adding 0 with carry
'quickly' followed by a normal add would be the same as just the last add (but)
with carry.

I however only get the code:
.globl _ZN13ULongLongLong7addtestERKS_
.type   _ZN13ULongLongLong7addtestERKS_, @function
_ZN13ULongLongLong7addtestERKS_:
.LFB964:
.cfi_startproc
.cfi_personality 0x0,__gxx_personality_v0
pushl   %ebp
.cfi_def_cfa_offset 8
movl%esp, %ebp
.cfi_offset 5, -8
.cfi_def_cfa_register 5
movl12(%ebp), %edx
movl8(%ebp), %eax
pushl   %ebx
xorl%ebx, %ebx
.cfi_offset 3, -12
movl(%edx), %ecx
addl%ecx, (%eax)
adcl%ebx, 4(%eax)
xorl%ebx, %ebx
movl4(%edx), %ecx
addl%ecx, 4(%eax)
adcl%ebx, 8(%eax)
movl8(%edx), %edx
addl%edx, 8(%eax)
popl%ebx
popl%ebp
ret
.cfi_endproc

What I wanted was this code:
globl _ZN13ULongLongLong7addtestERKS_
.type   _ZN13ULongLongLong7addtestERKS_, @function
_ZN13ULongLongLong7addtestERKS_:
.LFB1001:
.cfi_startproc
.cfi_personality 0x0,__gxx_personality_v0
pushl   %ebp
.cfi_def_cfa_offset 8
movl%esp, %ebp
.cfi_offset 5, -8
.cfi_def_cfa_register 5
movl12(%ebp), %edx
movl8(%ebp), %eax
/*  pushl   %ebx */  /* not needed anymore - we don't use it */
/*  xorl%ebx, %ebx   No need to reset ebx */
.cfi_offset 3, -12
movl(%edx), %ecx
addl%ecx, (%eax)
/*  adcl%ebx, 4(%eax)   */
/*  xorl%ebx, %ebx Why do it at all - ebx was already 0 !?*/
movl4(%edx), %ecx
adcl%ecx, 4(%eax) /* modified addl to adcl */
/*  adcl%ebx, 8(%eax)  */
movl8(%edx), %edx
adcl%edx, 8(%eax)  /* modified addl to adcl */
/*  popl%ebx */
popl%ebp
ret
.cfi_endproc

However - the code I want is:
Note: It seems like adding could be replaced with subtraction.

It may still be better to make carry work a bit more in general - and I
understand that this might be a won't fix - especially if you provide a clear
way to add with carry in general.

However this might just be a much easier peephole(-like) optimization.

PS: Thanks for a really great compiler.


-- 
   Summary: Add with carry - missed optimization
   Product: gcc
   Version: 4.4.1
Status: UNCONFIRMED
  Severity: enhancement
      Priority: P3
     Component: c++
AssignedTo: unassigned at gcc dot gnu dot org
ReportedBy: tmartsum at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45548



[Bug target/45548] Add with carry - missed optimization on x86

2010-09-12 Thread tmartsum at gmail dot com


--- Comment #2 from tmartsum at gmail dot com  2010-09-12 15:01 ---
With Subtraction the situation is very similar:

struct Skew1 // Even
{
  unsigned long long data;
  unsigned long unused;
};

struct Skew2 // Odd
{
  unsigned long unused;
  unsigned long long data;
};

struct ULongLongLong
{
  union
  {
unsigned long m_data[3];
Skew1 m_rep1;
Skew2 m_rep2;
  };
  ULongLongLong()
  {
m_data[0]=0;
m_data[1]=0;
m_data[2]=0;
  }
  void print() {  std::cout << m_data[0] << "," << m_data[1] << "," <<
m_data[2] << "\n";}
//  void addtest(const ULongLongLong &b); // operator += 
  void subtest(const ULongLongLong &b); // operator -=   
};

It gives the following code:

.globl _ZN13ULongLongLong7subtestERKS_
.type   _ZN13ULongLongLong7subtestERKS_, @function
_ZN13ULongLongLong7subtestERKS_:
.LFB965:
.cfi_startproc
.cfi_personality 0x0,__gxx_personality_v0
pushl   %ebp
.cfi_def_cfa_offset 8
movl%esp, %ebp
.cfi_offset 5, -8
.cfi_def_cfa_register 5
movl12(%ebp), %edx
movl8(%ebp), %eax
pushl   %ebx
xorl%ebx, %ebx
.cfi_offset 3, -12
movl(%edx), %ecx
subl%ecx, (%eax)
sbbl%ebx, 4(%eax)
xorl%ebx, %ebx
movl4(%edx), %ecx
subl%ecx, 4(%eax)
sbbl%ebx, 8(%eax)
movl8(%edx), %edx
subl%edx, 8(%eax)
popl%ebx
popl%ebp
ret
.cfi_endproc

This could be optimized (just like the addition).


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45548