gchatelet added a subscriber: tejohnson.
gchatelet added a comment.
In D61634#1500453 <https://reviews.llvm.org/D61634#1500453>, @efriedma wrote:
> My main blocker is that I want to make sure we're moving in the right
> direction: towards LLVM IR with clear semantics, towards straightforward
> rules for writing freestanding C code, and towards solutions which behave
> appropriately for all targets. There's clearly a problem here that's worth
> solving, but I want to make sure we're adding small features that interact
> with existing features in an obvious way. The patch as written is adding a
> new attribute that changes the semantics of C and LLVM IR in a subtle way
> that interacts with existing optimizations and lowering in ways that are
> complex and hard to understand.
This makes a lot of sense, I'm totally on board to reduce entropy and confusion
along the way.
> I don't want to mix general restrictions on calling C library functions, with
> restrictions that apply specifically to memcpy: clang generally works on the
> assumption that a "memcpy" symbol is available in freestanding environments,
> and we don't really want to change that.
>
> With -fno-builtin, specifically, we currently apply the restriction that
> optimizations will not introduce memcpy calls that would not exist with
> optimization disabled. This is sort of artificial, and and maybe a bit
> confusing, but it seems to work well enough in practice. gcc does something
> similar.
>
> I don't really want to add an attribute that has a different meaning from
> -fno-builtin. An attribute that has the same meaning is a lot less
> problematic: it reduces potential confusion, and solves a related problem
> that -fno-builtin currently doesn't really work with LTO, because we don't
> encode it into the IR.
Adding @tejohnson to the conversation.
IIUC you're offering to introduce something like
`__attribute__((no-builtin-memcpy))` or `__attribute__((no-builtin("memcpy")))`.
As attributes they would compose nicely with (Thin)LTO.
I believe we still want to turn `-fno-builtin` flags into attributes so there's
no impedance mismatch between the flag and the attribute right?
> Your current patch is using the "AlwaysInline" flag for
> SelectionDAG::getMemcpy, which forces a memcpy to be lowered to an inline
> implementation. In general, this flag is only supported for constant-size
> memcpy calls; otherwise, on targets where EmitTargetCodeForMemcpy does not
> have some special lowering, like the x86 "rep;movs", you'll hit an assertion
> failure. And we don't really want to add an implementation of
> variable-length memcpy for a lot of targets; it's very inefficient on targets
> which don't have unaligned memory access. You could try to narrowly fix this
> to only apply "AlwaysInline" if the size is a constant integer, but there's a
> related problem: even if a memcpy is constant size in C, optimizations don't
> promise to preserve that, in general. We could theoretically add such a
> promise, I guess, but that's awkward: it would conditionally apply to
> llvm.memcpy where the parameter is already const, so it's not something we
> can easily verify.
Fair enough. This patch was really to get the conversation started : I was
myself not especially convinced about the approach. Hijacking the
`AlwaysInline` parameter was a shortcut that would not work for other mem
function anyways.
> If we added a new builtin `llvm.memcpy.inline`, or something like that, the
> end result ends up being simpler in many ways. It has obvious rules for both
> generating it and lowering it, which don't depend on attributes in the parent
> function. We could mark the size as "immarg", so we don't have to add
> special optimization rules for it. And if we have it, we have a "complete"
> solution for writing memcpy implementations in C: we make `__builtin_memcpy`,
> or a new `__builtin_inline_memcpy`, produce it, and it can be combined with
> -fno-builtin to ensure we don't produce any calls to the "real" memcpy. The
> downside of a new builtin, of course, is that we now have two functions with
> similar semantics, and potentially have to fix a bunch of places to check for
> both of them.
This was one of the approaches I envisioned, it's much cleaner and it's also a
lot more work : )
I'm willing to go that route knowing that it would also work for @theraven's
use case.
> MemCpyOpt has been mentioned (the pass which transforms memcpy-like loops
> into llvm.memcpy). If we want it to perform some transform in circumstances
> where the call "memcpy" isn't available, we have to make sure to restrict it
> based on the target: in the worst case, on some targets without unaligned
> memory access, it could just act as a low-quality loop unroller. This
> applies no matter how the result is actually represented in IR.
Yes if we have to generate loops it need to happen before SelectionDAG.
----
In this framework `-ffreestanding` stays as it is - namely it implies
`-fno-builtin`. I still think that the semantic is somewhat surprising and
unclear to the newcomer but I guess we can't do anything about it at this point
- apart adding more documentation.
Lastly, if we are to introduce new IR intrinsics, how about adding some for
`memcmp` and `bcmp`? It does not have to be part of this patch but I think it's
worth considering for consistency.
Repository:
rG LLVM Github Monorepo
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D61634/new/
https://reviews.llvm.org/D61634
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits