At 03:40 AM 5/6/2005 +0200, Eric Auer wrote:
<massive snippage>
Is there a question in all that?
You have missed the point of proper optimization. Algorithms optimized first, only where they matter. I deliberately isolated and offered up a pure portion of code which is stand-alone and directly affects performance in all computers: the memory copy at the heart of XMS and EMS functions moving kilobytes of data. That is one of the only places that a small, easily digestible, optimization makes sense.
It is far too early for line-by-line or function-by-function code optimization, if that time ever should come.
To the details...
As far the current memory copy, there is only one REP MOVS general optimization not present which makes overall good sense. I didn't put it in because it will take time, patience, and speed testing to tweak and smooth. The optimization is the one that tries to align EDI to an eight-byte boundary before the main REP MOVSD. And that optimization only makes sense because once you align EDI, you commonly align ESI along with it, at least in the three areas to be optimized.
Paraphrasing an x86 optimizing document: With EDI and ESI aligned to eight byte boundary, with CLD cleared, with ECX > 64, with the difference between EDI and ESI >=32, the Pentium Pro, Pentium II, and Pentium III will move an entire cache line at once: three times faster than when those conditions are not met. Three times faster moves for many of the machines that FreeDOS runs on, now THAT is an optimization which is worth it.
Of course, because slower machines need speed the most with memory copy, there are considerations. You can't sacrifice performance when lower byte counts, close EDI/ESI addresses, or oddball or mismatch EDI/ESI alignments pop up. You can't optimize for one class of machines and hurt another. So the code surrounding or leading up the main optimized loop has to work almost as well for an old slug of a 80386sx and 8086dx as with an unoptimized loop. Oh, and you have to live with the BIG_NOP macro for full 80386 stepping compatibility.
Note that carrying around multiple memory copy functioins in EMM386 and testing CPUs with dynamic configuration to the appropriate version of the memory copy isn't worth the hassle and extra EMM386 code.
The function by function allocator code dissection is a waste of time.
------------------------------------------------------- This SF.Net email is sponsored by: NEC IT Guy Games. Get your fingers limbered up and give it your best shot. 4 great events, 4 opportunities to win big! Highest score wins.NEC IT Guy Games. Play to win an NEC 61 plasma display. Visit http://www.necitguy.com/?r=20 _______________________________________________ Freedos-devel mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/freedos-devel
