> > + p->MUL(stackptr, stackptr, GenRegister::immud(perLaneSize)); // > > + (threadId > > * simdWidth + laneId)*perLaneSize > According to Hardware Spec: > For IVB and HSW, When both src0 and src1 are of type D or UD, only the low > 16 bits of each element of src1 are used. The accumulator maintains full 48- > bit precision. > So looks like you should place (threadId * simdWidth + laneId) at src1. > Have you ever do some try on IVB or HSW? Yes, DW must be on src0 and W on src1. I will modify it and push it directly.
> > Thanks! > Ruiling _______________________________________________ Beignet mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/beignet
