Re: Useless vectorization of small loops

2005-03-22 Thread Richard Henderson
On Mon, Mar 21, 2005 at 01:45:19PM +0100, Richard Guenther wrote: > I also cannot > see why we zero the mm registers before loading and why we > load them high/low separated: We load hi/lo separate because movlps+movhps is faster than movups. We zero first to break the insn dependency chain befor

Re: Useless vectorization of small loops

2005-03-21 Thread Richard Guenther
On Mon, 21 Mar 2005, Dorit Naishlos wrote: > > > > > > On Mon, 21 Mar 2005 13:45:19 +0100 (CET), Richard Guenther > > <[EMAIL PROTECTED]> wrote: > > ... > > > > Uh, and with -funroll-loops we seem to be lost completely, as we > > produce peeling/loops for a eight times four rolling loop! Where is

Re: Useless vectorization of small loops

2005-03-21 Thread Dorit Naishlos
> On Mon, 21 Mar 2005 13:45:19 +0100 (CET), Richard Guenther > <[EMAIL PROTECTED]> wrote: > ... > > Uh, and with -funroll-loops we seem to be lost completely, as we > produce peeling/loops for a eight times four rolling loop! Where is > the information about the loop counter gone?? > the thin

Re: Useless vectorization of small loops

2005-03-21 Thread Dorit Naishlos
> Hi! > > On mainline we now use loop versioning and peeling for alignment > for the following loop (-march=pentium4): > we don't yet use loop-versioning in the vectorizer in mainline (we do in autovect). we do apply peeling. > void foo3(float * __restrict__ a, float * __restrict__ b, >

Re: Useless vectorization of small loops

2005-03-21 Thread Richard Guenther
On Mon, 21 Mar 2005 13:45:19 +0100 (CET), Richard Guenther <[EMAIL PROTECTED]> wrote: > Hi! > > On mainline we now use loop versioning and peeling for alignment > for the following loop (-march=pentium4): > > void foo3(float * __restrict__ a, float * __restrict__ b, > float * __restrict