On Mon, Mar 21, 2005 at 01:45:19PM +0100, Richard Guenther wrote: > I also cannot > see why we zero the mm registers before loading and why we > load them high/low separated:
We load hi/lo separate because movlps+movhps is faster than movups. We zero first to break the insn dependency chain before doing two half-register modifies. IIRC such chain breaking is only relevant to the p4. r~