[PATCH] D61634: [clang/llvm] Allow efficient implementation of libc's memory functions in C/C++

Guillaume Chatelet via Phabricator via cfe-commits Tue, 14 May 2019 04:43:09 -0700

gchatelet added a subscriber: tejohnson.
gchatelet added a comment.

In D61634#1500453 <https://reviews.llvm.org/D61634#1500453>, @efriedma wrote:


> My main blocker is that I want to make sure we're moving in the right 
> direction: towards LLVM IR with clear semantics, towards straightforward 
> rules for writing freestanding C code, and towards solutions which behave 
> appropriately for all targets.  There's clearly a problem here that's worth 
> solving, but I want to make sure we're adding small features that interact 
> with existing features in an obvious way.  The patch as written is adding a 
> new attribute that changes the semantics of C and LLVM IR in a subtle way 
> that interacts with existing optimizations and lowering in ways that are 
> complex and hard to understand.


This makes a lot of sense, I'm totally on board to reduce entropy and confusion 
along the way.

> I don't want to mix general restrictions on calling C library functions, with 
> restrictions that apply specifically to memcpy: clang generally works on the 
> assumption that a "memcpy" symbol is available in freestanding environments, 
> and we don't really want to change that.
> 
> With -fno-builtin, specifically, we currently apply the restriction that 
> optimizations will not introduce memcpy calls that would not exist with 
> optimization disabled.  This is sort of artificial, and and maybe a bit 
> confusing, but it seems to work well enough in practice.  gcc does something 
> similar.
> 
> I don't really want to add an attribute that has a different meaning from 
> -fno-builtin.  An attribute that has the same meaning is a lot less 
> problematic: it reduces potential confusion, and solves a related problem 
> that -fno-builtin currently doesn't really work with LTO, because we don't 
> encode it into the IR.

Adding @tejohnson to the conversation.

IIUC you're offering to introduce something like 
`__attribute__((no-builtin-memcpy))` or `__attribute__((no-builtin("memcpy")))`.
As attributes they would compose nicely with (Thin)LTO.

I believe we still want to turn `-fno-builtin` flags into attributes so there's 
no impedance mismatch between the flag and the attribute right?

> Your current patch is using the "AlwaysInline" flag for 
> SelectionDAG::getMemcpy, which forces a memcpy to be lowered to an inline 
> implementation.  In general, this flag is only supported for constant-size 
> memcpy calls; otherwise, on targets where EmitTargetCodeForMemcpy does not 
> have some special lowering, like the x86 "rep;movs", you'll hit an assertion 
> failure.  And we don't really want to add an implementation of 
> variable-length memcpy for a lot of targets; it's very inefficient on targets 
> which don't have unaligned memory access.  You could try to narrowly fix this 
> to only apply "AlwaysInline" if the size is a constant integer, but there's a 
> related problem: even if a memcpy is constant size in C, optimizations don't 
> promise to preserve that, in general.  We could theoretically add such a 
> promise, I guess, but that's awkward: it would conditionally apply to 
> llvm.memcpy where the parameter is already const, so it's not something we 
> can easily verify.

Fair enough. This patch was really to get the conversation started : I was 
myself not especially convinced about the approach. Hijacking the 
`AlwaysInline` parameter was a shortcut that would not work for other mem 
function anyways.

> If we added a new builtin `llvm.memcpy.inline`, or something like that, the 
> end result ends up being simpler in many ways. It has obvious rules for both 
> generating it and lowering it, which don't depend on attributes in the parent 
> function.  We could mark the size as "immarg", so we don't have to add 
> special optimization rules for it.   And if we have it, we have a "complete" 
> solution for writing memcpy implementations in C: we make `__builtin_memcpy`, 
> or a new `__builtin_inline_memcpy`, produce it, and it can be combined with 
> -fno-builtin to ensure we don't produce any calls to the "real" memcpy.  The 
> downside of a new builtin, of course, is that we now have two functions with 
> similar semantics, and potentially have to fix a bunch of places to check for 
> both of them.

This was one of the approaches I envisioned, it's much cleaner and it's also a 
lot more work : )
I'm willing to go that route knowing that it would also work for @theraven's 
use case.

> MemCpyOpt has been mentioned (the pass which transforms memcpy-like loops 
> into llvm.memcpy).  If we want it to perform some transform in circumstances 
> where the call "memcpy" isn't available, we have to make sure to restrict it 
> based on the target: in the worst case, on some targets without unaligned 
> memory access, it could just act as a low-quality loop unroller.  This 
> applies no matter how the result is actually represented in IR.

Yes if we have to generate loops it need to happen before SelectionDAG.

----

In this framework `-ffreestanding` stays as it is - namely it implies 
`-fno-builtin`. I still think that the semantic is somewhat surprising and 
unclear to the newcomer but I guess we can't do anything about it at this point 
- apart adding more documentation.

Lastly, if we are to introduce new IR intrinsics, how about adding some for 
`memcmp` and `bcmp`? It does not have to be part of this patch but I think it's 
worth considering for consistency.


Repository:
  rG LLVM Github Monorepo

CHANGES SINCE LAST ACTION
  https://reviews.llvm.org/D61634/new/

https://reviews.llvm.org/D61634



_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits

[PATCH] D61634: [clang/llvm] Allow efficient implementation of libc's memory functions in C/C++

Reply via email to