Hi Mark, >I think this is a somewhat difficult problem because of the tension >between performance and functionality. In particular, as you say, the >code sequence you want to use varies by CPU. > >I don't think I have good answers; this email is just me musing out loud. > >You probably don't want to inline the assembly code equivalent of: > > if (cpu == i386) ... > else if (cpu == i486) ... > else if (cpu == i586) ... > ... > >On the other hand, if you inline, say, the i486 variant, and then run on >a i686, you may not get very good performance. > >So, the important thing is to weigh the cost of a function call plus >run-time conditionals (when using a libgcc routine that would contain >support for all the CPUs) against the benefit of getting the fastest >code sequences on the current processors. > > Actually, the situation is not as bad, as far as I can see: the worst case is i386 vs i486+, and Old-Sparc vs New-Sparc. More generally, a targer either cannot implement the builtin at all (a trivial fall back using locks or no MT support at all) or can in no more than 1 non-trivial way. Then libgcc would contain at most 2 versions: the trivial one, and another piece of assembly, absolutely identical in principle to what the builtin is expanded too in case the inline version is actually desired.
>And in a workstation distribution you may be concerned about supporting >multiple CPUs; if you're building for a specific hardware board, then >you only care about the CPU actually on that board. > >What do you propose that the libgcc routine do for a CPU that cannot >support the builtin at all? Just do a trivial implementation that is >safe only for a single-CPU, single-threaded system? > > Either that or a very low performance one, using locks. The issue it's still open, we can resolve it rather easily, I think. >I think that to satisfy everyone, you may need a configure option to >decide between inlining support for a particular processor (for maximum >performance when you know the target performance) and making a library >call (when you don't). > > Yes, let's consider for simplicity the obnoxious i686: if the user doesn't passes any -march then the fallback using locks is picked from libgcc or the non-trivial implementation if the specific target (i486+) supports it; if the user passes -march=i486+ then the builtin is expanded inline by the compiler, no use of libgcc at all. Similarly for Sparc. Paolo.