-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 On Tuesday 29 April 2003 07:50, you wrote: > Arnd Bergmann <[EMAIL PROTECTED]> writes: > > No, look at my patch again. If you build without i486 optimization, > > the compiler will see only the extern declaration for > > __exchange_and_add(). > > I see. What sonames do you suggest to give to the two copies of > libstdc++? You once said you'd call them libstdc++-i386.so.5, Yes, either have libstdc++-i386 for i386 optimized binaries plus libstdc++ for others or have libstdc++ everywhere. Both should work in principle. Using libstdc++-i386 will break Debian binaries on other platforms explicitly, which may or may not be considered a good idea.
> 2. Running Debian binaries on foreign systems won't be easy. > In particular, they all link to libstdc++-i386.so.5, so > such a library needs to be provided for other systems. > Mixing that library with that native libstdc++.so.5 might > cause problems, so anybody running a Debian binary on > a foreign system would need the binary and all shared libraries > it links with, even though those libraries have the same > sonames as the libraries available on the foreign system. There are two ways out of this: a) The patch gets merged upstream. It won't hurt anyone who is building i486+ optimized binaries and fixes a real bug. This would mean we should not have libstdc++-i386.so.5. b) We provide a libstdc++-i386.so.$(version) file that contains only the __exchange_and_add function and is linked to libstdc++.so. > 3. Debian i486 binaries take a significant performance hit. > The attached program demonstrates that the cost of > __atomic_add is roughly twice as much if done out-of-line, > compared to the inline version. On my system, I get > inline: 2.4061 > out-of-line: 4.60658 We can shave a bit off by making the function __attribute__((regparm(2))) and perhaps by using a trivial non-locking variant when compiling without threads, as the i386 version uses the mutex only in those cases and AFAICS it is compatible with the i486 version otherwise. The numbers I get on my P3 now are (in average cpu cycles): non-locked locked i486 inline: 6.5 24.2 i486 out-of-line: 7.3 35.8 i386 inline: 4.5 189.9 i386 out-of-line: 9.9 196.4 If we know at compile time that locking (neither 'lock;' prefix nor the mutex call) is never needed, we can even get much faster than the current i486 code. Also, if an application or library cares about this sort of micro-optimization, it probably should be provided in an optimized version anyway. Arnd <>< -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (GNU/Linux) iD8DBQE+rnlj5t5GS2LDRf4RAm1SAJ0WwDALgWCpJ6/8l+xfk5oSWeftuwCeOoKz jsbSsLCw1g4NlK6axPBQwXk= =iwhP -----END PGP SIGNATURE-----