rjmccall added inline comments.
================ Comment at: clang/test/CodeGen/x86_32-arguments-win32.c:77 +// CHECK-LABEL: define dso_local void @receive_vec_256(<8 x float> inreg %x, <8 x float> inreg %y, <8 x float> inreg %z, <8 x float>* %0, <8 x float>* %1) +// CHECK-LABEL: define dso_local void @receive_vec_512(<16 x float> inreg %x, <16 x float> inreg %y, <16 x float> inreg %z, <16 x float>* %0, <16 x float>* %1) +// CHECK-LABEL: define dso_local void @receive_vec_1024(<32 x float>* %0, <32 x float>* %1, <32 x float>* %2, <32 x float>* %3, <32 x float>* %4) ---------------- AntonYudintsev wrote: > rjmccall wrote: > > AntonYudintsev wrote: > > > rjmccall wrote: > > > > rnk wrote: > > > > > rjmccall wrote: > > > > > > rnk wrote: > > > > > > > craig.topper wrote: > > > > > > > > What happens in the backend with inreg if 512-bit vectors > > > > > > > > aren't legal? > > > > > > > LLVM splits the vector up using the largest legal vector size. As > > > > > > > many pieces as possible are passed in available XMM/YMM > > > > > > > registers, and the rest are passed in memory. MSVC, of course, > > > > > > > assumes the user wanted the larger vector size, and uses whatever > > > > > > > vector instructions it needs to move the arguments around. > > > > > > > > > > > > > > Previously, I advocated for a model where calling an Intel > > > > > > > intrinsic function had the effect of implicitly marking the > > > > > > > caller with the target attributes of the intrinsic. This falls > > > > > > > down if the user tries to write a single function that > > > > > > > conditionally branches between code that uses different > > > > > > > instruction set extensions. You can imagine the SSE2 codepath > > > > > > > accidentally using AVX instructions because the compiler thinks > > > > > > > they are better. I'm told that ICC models CPU micro-architectural > > > > > > > features in the CFG, but I don't ever expect that LLVM will do > > > > > > > that. If we're stuck with per-function CPU feature settings, it > > > > > > > seems like it would be nice to try to do what the user asked by > > > > > > > default, and warn the user if we see them doing a cpuid check in > > > > > > > a function that has been implicitly blessed with some target > > > > > > > attributes. You could imagine doing a similar thing when large > > > > > > > vector type variables are used: if a large vector argument or > > > > > > > local is used, implicitly enable the appropriate target features > > > > > > > to move vectors of that size around. > > > > > > > > > > > > > > This idea didn't get anywhere, and the current situation has > > > > > > > persisted. > > > > > > > > > > > > > > --- > > > > > > > > > > > > > > You know, maybe we should just keep clang the way it is, and just > > > > > > > set up a warning in the backend that says "hey, I split your > > > > > > > large vector. You probably didn't want that." And then we just > > > > > > > continue doing what we do now. Nobody likes backend warnings, but > > > > > > > it seems better than the current direction of the frontend > > > > > > > knowing every detail of x86 vector extensions. > > > > > > If target attributes affect ABI, it seems really dangerous to > > > > > > implicitly set attributes based on what intrinsics are called. > > > > > > > > > > > > The local CPU-testing problem seems similar to the problems with > > > > > > local `#pragma STDC FENV_ACCESS` blocks that the constrained-FP > > > > > > people are looking into. They both have a "this operation is > > > > > > normally fully optimizable, but we might need to be more careful in > > > > > > specific functions" aspect to them. I wonder if there's a > > > > > > reasonable way to unify the approaches, or at least benefit from > > > > > > lessons learned. > > > > > I agree, we wouldn't want intrinsic usage to change ABI. But, does > > > > > anybody actually want the behavior that LLVM implements today where > > > > > large vectors get split across registers and memory? I think most > > > > > users who pass a 512 bit vector want it to either be passed in ZMM > > > > > registers or fail to compile. Why do we even support passing 1024 bit > > > > > vectors? Could we make that an error? > > > > > > > > > > Anyway, major redesigns aside, should clang do something when large > > > > > vectors are passed? Maybe we should warn here? Passing by address is > > > > > usually the safest way to pass something, so that's an option. > > > > > Implementing splitting logic in clang doesn't seem worth it. > > > > > I agree, we wouldn't want intrinsic usage to change ABI. But, does > > > > > anybody actually want the behavior that LLVM implements today where > > > > > large vectors get split across registers and memory? > > > > > > > > I take it you're implying that the actual (Windows-only?) platform ABI > > > > doesn't say anything about this because other compilers don't allow > > > > large vectors. How large are the vectors we do have ABI rules for? Do > > > > they have the problem as the SysV ABI where the ABI rules are sensitive > > > > to compiler flags? > > > > > > > > Anyway, I didn't realize the i386 Windows ABI *ever* used registers for > > > > arguments. (Whether you can convince LLVM to do so for a function > > > > signature that Clang isn't supposed to emit for ABI-conforming > > > > functions is a separate story.) You're saying it uses them for > > > > vectors? Presumably up to some limit, and only when they're > > > > non-variadic arguments? > > > > You're saying it uses them for vectors? Presumably up to some limit, > > > > and only when they're non-variadic arguments? > > > > > > Sorry to cut in (I am the one who report the issue, and so looking > > > forward for this patch to be merged). > > > Yes, MSVC/x86 win ABI uses three registers for first three arguments. > > > > > > https://godbolt.org/z/PZ3dBa > > Interesting. And anything that would caused the stack to be used is still > > banned: passing more than 3 vectors or passing them as variadic arguments. > > > > I guess they chose not to implement stack realignment when they implemented > > this, and rather than being stuck with a suboptimal ABI, they just banned > > the cases that would have required it. Technically that means that they > > haven't committed to an ABI, so even though LLVM is perfectly happy to > > realign the stack when required, we shouldn't actually take advantage of > > that here, and instead we should honor the same restriction. > As I mentioned in https://reviews.llvm.org/D71915 ( and in > https://bugs.llvm.org/show_bug.cgi?id=44395 ) > > there is at least one particular case, where they do align stack for aligned > arguments, although it is not directly a fun call. Oh, I see. Then they *have* in fact implemented stack realignment, sometime between MSVC v19.10 and v19.14, so the ABI is now settled; I should've checked a more recent compiler than the one you linked. So this is now a fairly standard in-registers-up-to-a-limit ABI, except that the limit is only non-zero for vectors. (Highly-aligned struct types are passed indirectly.) Repository: rG LLVM Github Monorepo CHANGES SINCE LAST ACTION https://reviews.llvm.org/D72114/new/ https://reviews.llvm.org/D72114 _______________________________________________ cfe-commits mailing list cfe-commits@lists.llvm.org https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits