tbp wrote:
On 8/23/07, Tim Prince <[EMAIL PROTECTED]> wrote:
Note that icc9 has a strong bias for pentium4, which had no stall
penalty for mistyped fp vectors as for Intel it came with the pentium
M line, so you see a pxor even if generating code for the core2.
# cat autoicc.cc
float foo(const float *a, int n) {
        float sum = 0.f;
        for (int i = 0; i <n; ++i)
                if (a[i] > 0.f)
                        sum += a[i];
        return sum;
}
int main() { return 0; }
# /opt/intel/cce/9.1.051/bin/icpc -O3 -xT autoicc.cc
autoicc.cc(3) : (col. 2) remark: LOOP WAS VECTORIZED.
  4007a9:       pxor   %xmm4,%xmm4
  4007ad:       cmpltps %xmm3,%xmm4
  4007b1:       andps  %xmm3,%xmm4
# /opt/intel/cce/10.0.023/bin/icpc -O3 -xT autoicc.cc
autoicc.cc(3): (col. 2) remark: LOOP WAS VECTORIZED.
  400b50:       xorps  %xmm3,%xmm3
  400b53:       cmpltps %xmm4,%xmm3
  400b57:       andps  %xmm3,%xmm4
For what little it's worth, I found no measurable difference between these choices on Core 2 Duo. People I know prefer to use the Intel -xW option, or the gcc default, in the absence of clear evidence that another option could improve performance without reducing the range of supported targets.

---AV & Spam Filtering by M+Guardian - Risk Free Email (TM)---

Reply via email to