------- Comment #36 from ubizjak at gmail dot com 2008-03-21 10:33 ------- (In reply to comment #35)
> Also ffmpeg uses almost entirely asm() instead of intrinsics so this alone is > not so much a problem for ffmpeg than it is for others who followed the > recommandition of "intrinsics are better than asm". > > About trolling, well i made no attempt to reply politely and diplomatic, no. > But "solving" a "problem" in some use case by droping support for that use > case is kinda extreem. > > The way i see it is that > * Its non trivial to place emms optimally and automatically > * there needs to be a emms between mmx code and fpu code > > The solutions to this would be any one of > A. let the programmer place emms like it has been in the past > B. dont support mmx at all > C. dont support x87 fpu at all > D. place emms after every bunch of mmx instructions > E. solve a quite non trivial problem and place emms optimally > > The solution which has been selected apparently is B., why was that choosen? > Instead of lets say A.? > > If i do write SIMD code then i do know that i need an emms on x86. Its > trivial for the programmer to place it optimally. I don't know where you get the idea that MMX support was dropped in any way. I won't engage in a discussion about autovectorisation, intrinsics, builtins, generic vectorisation, etc, etc with you, but please look at PR 21395 how performance PR should be filled. The MMX code in that PR is _far_ from trivial, but since it is well written using intrinsic instructions, it enables jaw-dropping performance increase that is simply not possible when ASM blocks are used. Now, I'm sure that you have your numbers ready to back up your claims from Comment #33 about performance of generated code, and I challenge you to beat performance of gcc-4.4 generated code by hand-crafted assembly using the example of PR 21395. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=14552