On Mon, Mar 21, 2005 at 01:45:19PM +0100, Richard Guenther wrote:
> I also cannot
> see why we zero the mm registers before loading and why we
> load them high/low separated:
We load hi/lo separate because movlps+movhps is faster than movups.
We zero first to break the insn dependency chain befor
On Mon, 21 Mar 2005, Dorit Naishlos wrote:
>
>
>
>
> > On Mon, 21 Mar 2005 13:45:19 +0100 (CET), Richard Guenther
> > <[EMAIL PROTECTED]> wrote:
> > ...
> >
> > Uh, and with -funroll-loops we seem to be lost completely, as we
> > produce peeling/loops for a eight times four rolling loop! Where is
> On Mon, 21 Mar 2005 13:45:19 +0100 (CET), Richard Guenther
> <[EMAIL PROTECTED]> wrote:
> ...
>
> Uh, and with -funroll-loops we seem to be lost completely, as we
> produce peeling/loops for a eight times four rolling loop! Where is
> the information about the loop counter gone??
>
the thin
> Hi!
>
> On mainline we now use loop versioning and peeling for alignment
> for the following loop (-march=pentium4):
>
we don't yet use loop-versioning in the vectorizer in mainline (we do in
autovect). we do apply peeling.
> void foo3(float * __restrict__ a, float * __restrict__ b,
>
On Mon, 21 Mar 2005 13:45:19 +0100 (CET), Richard Guenther
<[EMAIL PROTECTED]> wrote:
> Hi!
>
> On mainline we now use loop versioning and peeling for alignment
> for the following loop (-march=pentium4):
>
> void foo3(float * __restrict__ a, float * __restrict__ b,
> float * __restrict