Paolo Carlini wrote: > Hi, > > we have this long standing issue which really we should solve, one way > or another: otherwise there are both correctness and performance issues > which we cannot fix, new features which we cannot implement. I have > plenty of examples, just ask, in case, if you want more details and > motivations.
I think this is a somewhat difficult problem because of the tension between performance and functionality. In particular, as you say, the code sequence you want to use varies by CPU. I don't think I have good answers; this email is just me musing out loud. You probably don't want to inline the assembly code equivalent of: if (cpu == i386) ... else if (cpu == i486) ... else if (cpu == i586) ... ... On the other hand, if you inline, say, the i486 variant, and then run on a i686, you may not get very good performance. So, the important thing is to weigh the cost of a function call plus run-time conditionals (when using a libgcc routine that would contain support for all the CPUs) against the benefit of getting the fastest code sequences on the current processors. And in a workstation distribution you may be concerned about supporting multiple CPUs; if you're building for a specific hardware board, then you only care about the CPU actually on that board. What do you propose that the libgcc routine do for a CPU that cannot support the builtin at all? Just do a trivial implementation that is safe only for a single-CPU, single-threaded system? I think that to satisfy everyone, you may need a configure option to decide between inlining support for a particular processor (for maximum performance when you know the target performance) and making a library call (when you don't). -- Mark Mitchell CodeSourcery, LLC [EMAIL PROTECTED] (916) 791-8304