[PATCH] D72114: [MS] Overhaul how clang passes overaligned args on x86_32

Anton Yudintsev via Phabricator via cfe-commits Wed, 08 Jan 2020 00:24:17 -0800

AntonYudintsev added inline comments.


================
Comment at: clang/test/CodeGen/x86_32-arguments-win32.c:77
+// CHECK-LABEL: define dso_local void @receive_vec_256(<8 x float> inreg %x, 
<8 x float> inreg %y, <8 x float> inreg %z, <8 x float>* %0, <8 x float>* %1)
+// CHECK-LABEL: define dso_local void @receive_vec_512(<16 x float> inreg %x, 
<16 x float> inreg %y, <16 x float> inreg %z, <16 x float>* %0, <16 x float>* 
%1)
+// CHECK-LABEL: define dso_local void @receive_vec_1024(<32 x float>* %0, <32 
x float>* %1, <32 x float>* %2, <32 x float>* %3, <32 x float>* %4)
----------------
rjmccall wrote:
> AntonYudintsev wrote:
> > rjmccall wrote:
> > > AntonYudintsev wrote:
> > > > rjmccall wrote:
> > > > > rnk wrote:
> > > > > > rjmccall wrote:
> > > > > > > rnk wrote:
> > > > > > > > craig.topper wrote:
> > > > > > > > > What happens in the backend with inreg if 512-bit vectors 
> > > > > > > > > aren't legal?
> > > > > > > > LLVM splits the vector up using the largest legal vector size. 
> > > > > > > > As many pieces as possible are passed in available XMM/YMM 
> > > > > > > > registers, and the rest are passed in memory. MSVC, of course, 
> > > > > > > > assumes the user wanted the larger vector size, and uses 
> > > > > > > > whatever vector instructions it needs to move the arguments 
> > > > > > > > around.
> > > > > > > > 
> > > > > > > > Previously, I advocated for a model where calling an Intel 
> > > > > > > > intrinsic function had the effect of implicitly marking the 
> > > > > > > > caller with the target attributes of the intrinsic. This falls 
> > > > > > > > down if the user tries to write a single function that 
> > > > > > > > conditionally branches between code that uses different 
> > > > > > > > instruction set extensions. You can imagine the SSE2 codepath 
> > > > > > > > accidentally using AVX instructions because the compiler thinks 
> > > > > > > > they are better. I'm told that ICC models CPU 
> > > > > > > > micro-architectural features in the CFG, but I don't ever 
> > > > > > > > expect that LLVM will do that. If we're stuck with per-function 
> > > > > > > > CPU feature settings, it seems like it would be nice to try to 
> > > > > > > > do what the user asked by default, and warn the user if we see 
> > > > > > > > them doing a cpuid check in a function that has been implicitly 
> > > > > > > > blessed with some target attributes. You could imagine doing a 
> > > > > > > > similar thing when large vector type variables are used: if a 
> > > > > > > > large vector argument or local is used, implicitly enable the 
> > > > > > > > appropriate target features to move vectors of that size around.
> > > > > > > > 
> > > > > > > > This idea didn't get anywhere, and the current situation has 
> > > > > > > > persisted.
> > > > > > > > 
> > > > > > > > ---
> > > > > > > > 
> > > > > > > > You know, maybe we should just keep clang the way it is, and 
> > > > > > > > just set up a warning in the backend that says "hey, I split 
> > > > > > > > your large vector. You probably didn't want that." And then we 
> > > > > > > > just continue doing what we do now. Nobody likes backend 
> > > > > > > > warnings, but it seems better than the current direction of the 
> > > > > > > > frontend knowing every detail of x86 vector extensions.
> > > > > > > If target attributes affect ABI, it seems really dangerous to 
> > > > > > > implicitly set attributes based on what intrinsics are called.
> > > > > > > 
> > > > > > > The local CPU-testing problem seems similar to the problems with 
> > > > > > > local `#pragma STDC FENV_ACCESS` blocks that the constrained-FP 
> > > > > > > people are looking into.  They both have a "this operation is 
> > > > > > > normally fully optimizable, but we might need to be more careful 
> > > > > > > in specific functions" aspect to them.  I wonder if there's a 
> > > > > > > reasonable way to unify the approaches, or at least benefit from 
> > > > > > > lessons learned.
> > > > > > I agree, we wouldn't want intrinsic usage to change ABI. But, does 
> > > > > > anybody actually want the behavior that LLVM implements today where 
> > > > > > large vectors get split across registers and memory? I think most 
> > > > > > users who pass a 512 bit vector want it to either be passed in ZMM 
> > > > > > registers or fail to compile. Why do we even support passing 1024 
> > > > > > bit vectors? Could we make that an error?
> > > > > > 
> > > > > > Anyway, major redesigns aside, should clang do something when large 
> > > > > > vectors are passed? Maybe we should warn here? Passing by address 
> > > > > > is usually the safest way to pass something, so that's an option. 
> > > > > > Implementing splitting logic in clang doesn't seem worth it.
> > > > > > I agree, we wouldn't want intrinsic usage to change ABI. But, does 
> > > > > > anybody actually want the behavior that LLVM implements today where 
> > > > > > large vectors get split across registers and memory?
> > > > > 
> > > > > I take it you're implying that the actual (Windows-only?) platform 
> > > > > ABI doesn't say anything about this because other compilers don't 
> > > > > allow large vectors.  How large are the vectors we do have ABI rules 
> > > > > for?  Do they have the problem as the SysV ABI where the ABI rules 
> > > > > are sensitive to compiler flags?
> > > > > 
> > > > > Anyway, I didn't realize the i386 Windows ABI *ever* used registers 
> > > > > for arguments.  (Whether you can convince LLVM to do so  for a 
> > > > > function signature that Clang isn't supposed to emit for 
> > > > > ABI-conforming functions is a separate story.)   You're saying it 
> > > > > uses them for vectors?  Presumably up to some limit, and only when 
> > > > > they're non-variadic arguments?
> > > > > You're saying it uses them for vectors? Presumably up to some limit, 
> > > > > and only when they're non-variadic arguments?
> > > > 
> > > > Sorry to cut in (I am the one who report the issue, and so looking 
> > > > forward for this patch to be merged).
> > > > Yes, MSVC/x86 win ABI uses three registers for first three arguments.
> > > > 
> > > > https://godbolt.org/z/PZ3dBa
> > > Interesting.  And anything that would caused the stack to be used is 
> > > still banned: passing more than 3 vectors or passing them as variadic 
> > > arguments.
> > > 
> > > I guess they chose not to implement stack realignment when they 
> > > implemented this, and rather than being stuck with a suboptimal ABI, they 
> > > just banned the cases that would have required it.  Technically that 
> > > means that they haven't committed to an ABI, so even though LLVM is 
> > > perfectly happy to realign the stack when required, we shouldn't actually 
> > > take advantage of that here, and instead we should honor the same 
> > > restriction.
> > As I mentioned in https://reviews.llvm.org/D71915 ( and in 
> > https://bugs.llvm.org/show_bug.cgi?id=44395 )
> > 
> > there is at least one particular case, where they do align stack for 
> > aligned arguments, although it is not directly a fun call.
> Oh, I see.  Then they *have* in fact implemented stack realignment, sometime 
> between MSVC v19.10 and v19.14, so the ABI is now settled; I should've 
> checked a more recent compiler than the one you linked.  So this is now a 
> fairly standard in-registers-up-to-a-limit ABI, except that the limit is only 
> non-zero for vectors.  (Highly-aligned struct types are passed indirectly.)
Sorry for not being clear enough :(.

Yes, newer MSVC seem to have implemented stack realignment universally.

Older one (19.10) only did that in a particular cases (like in the one I 
mentioned in my original patchset).


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D72114/new/

https://reviews.llvm.org/D72114



_______________________________________________
cfe-commits mailing list
cfe-commits@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D72114: [MS] Overhaul how clang passes overaligned args on x86_32

Reply via email to