http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49363
davidxl <xinliangli at gmail dot com> changed:
What |Removed |Added
----------------------------------------------------------------------------
CC| |tmsriram at google dot com
--- Comment #3 from davidxl <xinliangli at gmail dot com> 2011-06-11 06:23:21
UTC ---
(In reply to comment #2)
> These new developments sound interesting. Hope somebody is working on it and
> will publish a testable version soon.
> On the other hand I was thinking more of exploiting auto-vectorization for
> which having multiple copy of the very same code looks to me not necessary and
> error prone, Something for instance along these line
> float __attribute__ ((__target__ ("sse2","sse3","avx","fma")))
> sum0(float const * __restrict__ x,
> float const * __restrict__ y, float const * __restrict__ z) {
> float sum=0;
> for (int i=0; i!=1024; ++i)
> sum += z[i]+x[i]*y[i];
> return sum;
> }
>
> If my understanding of the proposal is correct I will have to copy-paste this
> function four times, one for each target.
The overloading proposal works for both manual versioning and automatic
versioning.
For manual versioning:
// generic version
void foo()
{
}
void foo() __attribute((dispatch_selector(v1))
{
}
void foo() __attribute((dispatch_selector(v2))
{
}
void bar()
{
foo(); <--- becomes a three way dispatching
...
}
Here both the version and dispatcher ARE defined by the user.
However, without specifying any attribute for the generic version of foo, the
compiler can also version it automatically according to command line option
that specifies the target hardwares the binary is going to be run.
For your case, the standard CPU feature is specified,
void __attribute__((__target__("sse2", "avx", "fma")) foo ()
{ ... }
it is the same as the auto version case mentioned above -- the difference is
that the function to be versioned is determined by the user, not the compiler.
David