[Bug target/14552] compiled trivial vector intrinsic code is inefficient

ubizjak at gmail dot com Fri, 21 Mar 2008 03:34:46 -0700


------- Comment #36 from ubizjak at gmail dot com  2008-03-21 10:33 -------
(In reply to comment #35)


> Also ffmpeg uses almost entirely asm() instead of intrinsics so this alone is
> not so much a problem for ffmpeg than it is for others who followed the
> recommandition of "intrinsics are better than asm".
> 
> About trolling, well i made no attempt to reply politely and diplomatic, no.
> But "solving" a "problem" in some use case by droping support for that use
> case is kinda extreem.
> 
> The way i see it is that
> * Its non trivial to place emms optimally and automatically
> * there needs to be a emms between mmx code and fpu code
> 
> The solutions to this would be any one of
> A. let the programmer place emms like it has been in the past
> B. dont support mmx at all
> C. dont support x87 fpu at all
> D. place emms after every bunch of mmx instructions
> E. solve a quite non trivial problem and place emms optimally
> 
> The solution which has been selected apparently is B., why was that choosen?
> Instead of lets say A.?
> 
> If i do write SIMD code then i do know that i need an emms on x86. Its
> trivial for the programmer to place it optimally.

I don't know where you get the idea that MMX support was dropped in any way. I
won't engage in a discussion about autovectorisation, intrinsics, builtins,
generic vectorisation, etc, etc with you, but please look at PR 21395 how
performance PR should be filled. The MMX code in that PR is _far_ from trivial,
but since it is well written using intrinsic instructions, it enables
jaw-dropping performance increase that is simply not possible when ASM blocks
are used.

Now, I'm sure that you have your numbers ready to back up your claims from
Comment #33 about performance of generated code, and I challenge you to beat
performance of gcc-4.4 generated code by hand-crafted assembly using the
example of PR 21395.


-- 


http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14552

[Bug target/14552] compiled trivial vector intrinsic code is inefficient

Reply via email to