------- Comment #37 from michaelni at gmx dot at 2008-03-22 02:39 ------- Subject: Re: compiled trivial vector intrinsic code is inefficient
On Fri, Mar 21, 2008 at 10:34:00AM -0000, ubizjak at gmail dot com wrote: > > > ------- Comment #36 from ubizjak at gmail dot com 2008-03-21 10:33 ------- > (In reply to comment #35) > > > Also ffmpeg uses almost entirely asm() instead of intrinsics so this alone > > is > > not so much a problem for ffmpeg than it is for others who followed the > > recommandition of "intrinsics are better than asm". > > > > About trolling, well i made no attempt to reply politely and diplomatic, no. > > But "solving" a "problem" in some use case by droping support for that use > > case is kinda extreem. > > > > The way i see it is that > > * Its non trivial to place emms optimally and automatically > > * there needs to be a emms between mmx code and fpu code > > > > The solutions to this would be any one of > > A. let the programmer place emms like it has been in the past > > B. dont support mmx at all > > C. dont support x87 fpu at all > > D. place emms after every bunch of mmx instructions > > E. solve a quite non trivial problem and place emms optimally > > > > The solution which has been selected apparently is B., why was that choosen? > > Instead of lets say A.? > > > > If i do write SIMD code then i do know that i need an emms on x86. Its > > trivial for the programmer to place it optimally. > > I don't know where you get the idea that MMX support was dropped in any way. I Maybe because the SIMD code in this PR compiled with -mmmx does not use mmx but very significantly less efficient integer instructions. And you added a test to gcc which ensures that this case does not use mmx instructions. This is pretty much the definion of droping mmx support (for this specific case). > won't engage in a discussion about autovectorisation, intrinsics, builtins, > generic vectorisation, etc, etc with you, And somehow iam glad about that. > but please look at PR 21395 how > performance PR should be filled. > The MMX code in that PR is _far_ from trivial, Well that is something i would disagree about. > but since it is well written using intrinsic instructions, it enables > jaw-dropping performance increase that is simply not possible when ASM blocks > are used. > > Now, I'm sure that you have your numbers ready to back up your claims from > Comment #33 about performance of generated code, and I challenge you to beat > performance of gcc-4.4 generated code by hand-crafted assembly using the > example of PR 21395. done, jaw-dropping intrinsics need 2.034s stinky hand written asm needs 1.312s But you can read the details in PR 21395. [...] -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14552