craig.topper added inline comments.
================ Comment at: clang/lib/Headers/mmintrin.h:367 { - return (__m64)__builtin_ia32_paddb((__v8qi)__m1, (__v8qi)__m2); + return (__m64)(((__v8qi)__m1) + ((__v8qi)__m2)); } ---------------- I think you we should use __v8qu to match what we do in emmintrin.h. We don't currently set nsw on signed vector arithmetic, but we should be careful in case that changes in the future. ================ Comment at: clang/lib/Headers/mmintrin.h:1097 { - return __builtin_ia32_pand((__v1di)__m1, (__v1di)__m2); + return (__m64)(((__v1di)__m1) & ((__v1di)__m2)); } ---------------- I think we probably want to use a v2su or v2si here. Using v1di scalarizes and splits on 32-bit targets. On 64-bit targets it emits GPR code. ================ Comment at: clang/lib/Headers/mmintrin.h:1242 { - return (__m64)__builtin_ia32_pcmpgtb((__v8qi)__m1, (__v8qi)__m2); + return (__m64)((__v8qi)__m1 > (__v8qi)__m2); } ---------------- Need to use __v8qs here to force "signed char" elements. __v8qi uses "char" which has platform dependent signedness or can be changed with a command line. ================ Comment at: clang/lib/Headers/mmintrin.h:1264 { - return (__m64)__builtin_ia32_pcmpgtw((__v4hi)__m1, (__v4hi)__m2); + return (__m64)((__v4hi)__m1 > (__v4hi)__m2); } ---------------- Same here ================ Comment at: clang/lib/Headers/mmintrin.h:1394 { - return _mm_set_pi32(__i, __i); + return __extension__ (__m64)(__v2si){__i, __i}; } ---------------- Is this needed? ================ Comment at: clang/lib/Headers/mmintrin.h:1452 { - return _mm_set_pi32(__i1, __i0); + return __extension__ (__m64)(__v2si){__i1, __i0}; } ---------------- I don't think this change is needed. And I think the operands are in the wrong order. ================ Comment at: clang/lib/Headers/tmmintrin.h:17 #define __DEFAULT_FN_ATTRS __attribute__((__always_inline__, __nodebug__, __target__("ssse3"), __min_vector_width__(64))) -#define __DEFAULT_FN_ATTRS_MMX __attribute__((__always_inline__, __nodebug__, __target__("mmx,ssse3"), __min_vector_width__(64))) +#define __trunc64(x) (__m64)__builtin_shufflevector((__v2di)(x), __extension__ (__v2di){}, 0) +#define __anyext128(x) (__m128i)__builtin_shufflevector((__v1di)(x), __extension__ (__v1di){}, 0, -1) ---------------- I'm worried that using v1di with the shuffles will lead to scalarization in the type legalizer. Should we use v2si instead? ================ Comment at: clang/lib/Headers/xmmintrin.h:2316 { - return __builtin_ia32_pmovmskb((__v8qi)__a); + return __builtin_ia32_pmovmskb128((__v16qi)__anyext128(__a)); } ---------------- This doesn't guarantee zeroes in bits 15:8 does it? ================ Comment at: clang/lib/Headers/xmmintrin.h:2404 + __m128i __n128 = __zext128(__n); + if (((__SIZE_TYPE__)__p & 0xfff) > 4096-15 && ((__SIZE_TYPE__)__p & 0xfff) < 4096-8) { + __p -= 8; ---------------- Does this work with large pages? Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D86855/new/ https://reviews.llvm.org/D86855 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits