[Bug tree-optimization/54978] New: Add ability to provide vectorized functions

2012-10-18 Thread ddesics at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54978



 Bug #: 54978

   Summary: Add ability to provide vectorized functions

Classification: Unclassified

   Product: gcc

   Version: unknown

Status: UNCONFIRMED

  Severity: enhancement

  Priority: P3

 Component: tree-optimization

AssignedTo: unassig...@gcc.gnu.org

ReportedBy: ddes...@gmail.com





Presently, the auto-vectorizer chokes on any function calls present in a loop. 

It would be nice to be able to provide vectorized versions of functions, as

well as non-vectorized versions, and let the auto-vectorizer notice these.



This is already done for transcendental functions when -mveclibabi is used. 

However, to do this, some standardized form for the vectorized function would

have to be defined. 



For instance, assuming sse type functions with double precision as an example,

this could be done as follows.



C++: the simple case because overloading is available



double fx(double x);

v2df fx(v2df x);



C: this is more difficult, but something should be doable.  To prevent

accidental name overlap, a new function attribute could be used, to define a

vectorized version, i.e.: 



double fx(double x) __attribute__((vectorized_alias(fxv2df)));

v2df fxv2df(v2df x);


[Bug tree-optimization/54978] Add ability to provide vectorized functions

2012-10-19 Thread ddesics at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=54978



--- Comment #3 from Daniel Davis  2012-10-19 16:14:36 
UTC ---

Obviously, it would be nice if gcc can build the functions if they are pure

functions.  But that would require somehow knowing that those functions should

be built and having access to the code, which may not be the case for something

like a library.  



On second thought, the attribute would best go with declaration of the

vectorized function.  So, to rewrite my example, 



double fx(double x); 

v2df fx_v2df(v2df x) __attribute__((vectorized_alias(fx,double,16))); 

v4df fx_v4df(v4df x) __attribute__((vectorized_alias(fx,double,32)));



That would let the compiler know exactly what could be replaced, although it

should be able to figure out the argument types from the declarations.



So I see two potential optimizations here.  The first is a way to let the

compiler know that vectorized functions are available.  The second would be to

let the auto-vectorizer create vectorized forms of functions.


[Bug tree-optimization/52112] New: Vectorizer fails when using CRTP

2012-02-03 Thread ddesics at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52112

 Bug #: 52112
   Summary: Vectorizer fails when using CRTP
Classification: Unclassified
   Product: gcc
   Version: 4.6.1
Status: UNCONFIRMED
  Severity: normal
  Priority: P3
 Component: tree-optimization
AssignedTo: unassig...@gcc.gnu.org
ReportedBy: ddes...@gmail.com


Created attachment 26565
  --> http://gcc.gnu.org/bugzilla/attachment.cgi?id=26565
The test case code.

Using the CRTP along with static_cast pointers prevents auto-vectorization.
If the code below is compiled with -ftree-vectorizer-verbose=7, the CRTP method
fails to vectorize with "not vectorized: control flow in loop."  However, if
compiled with -O3 -S -fno-tree-vectorize, both methods produce identical
assembly.

I don't know how difficult this would be to change, but it could certainly
speed up a lot of c++ code.  For instance, this currently prevents boost ublas
from vectorizing.

#include

template class CRTP_base {
  public:
   typedef E& reference;
   typedef Tp value_type;

   reference operator()() { return *static_cast(this); }
   value_type square() { return (*this)().x() * (*this)().x(); }
  protected:
   CRTP_base() {}
   ~CRTP_base() {}
};

template class CRTP_child : public CRTP_base,Tp> {
   Tp xval;
   typedef CRTP_base,Tp> parent;
  public:
   CRTP_child(Tp xv = Tp()) : xval(xv) {}
   Tp x() { return xval; }
   using parent::square;
};

int main() {
  const int N = 100;
  double A[N] __attribute__((aligned(16)));
  double B[N] __attribute__((aligned(16)));
  double sum1=0.0;

  for(int i = 0; i < N; ++i) { A[i] = i; }

  for(int i = 0; i < N; ++i) { B[i] = A[i]*A[i]; }
  for(int i = 0; i < N; ++i) { sum1 += B[i]; }
  std::cout << "Sum of method 1: " << sum1;

  for(int i = 0; i < N; ++i) { B[i] = CRTP_child(A[i]).square(); }
  for(int i = 0; i < N; ++i) { sum1 += B[i]; }
  std::cout << "\nSum of method 2: " << sum2 << std::endl;
  return 0;
}


[Bug tree-optimization/52112] Vectorizer fails when using CRTP

2012-02-03 Thread ddesics at gmail dot com
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=52112

--- Comment #4 from Daniel Davis  2012-02-03 18:24:49 
UTC ---
Any thoughts on why it won't vectorize for me on x86_64 4.6.1?


[Bug rtl-optimization/31485] C complex numbers, amd64 SSE, missed optimization opportunity

2010-04-16 Thread ddesics at gmail dot com


--- Comment #6 from ddesics at gmail dot com  2010-04-17 00:28 ---
Has any work been done on this enhancement?  I'm using gcc 4.3.2, and I noticed
that there is still limited use of SSE instructions for complex arithmetic.  

Unless I'm missing something in my understanding, wouldn't the ideal for all
_Complex double additions with SSE2 be to use addpd, and movapd or movupd for
memory operations?


-- 

ddesics at gmail dot com changed:

   What|Removed |Added

 CC|            |ddesics at gmail dot com


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=31485