jfb added a comment.
Can you add a link to bug 40605 in the commit message?
> I'm not quite sure how to show the resulting difference in code. Do you mean
> we need Clang to support both modes and to compare the resulting assembly?
I only meant tests that show codegen, as you've now added auto-var-init.cpp.
There might be interesting cases which I wasn't testing when I wrote it, so if
you think of any please do make sure you add some. You mentioned a Linux struct
that used to crash? That would be a useful test (I imagine it's one with the
`Loc` adjustment).
================
Comment at: tools/clang/lib/CodeGen/CGDecl.cpp:1143
+ const llvm::StructLayout *Layout =
+ CGM.getDataLayout().getStructLayout(cast<llvm::StructType>(Ty));
+ for (unsigned i = 0; i != constant->getNumOperands(); i++) {
----------------
glider wrote:
> jfb wrote:
> > I think you need a heuristic here, where you don't emit multiple stores if
> > `getNumOperands` is greater than some number. In that case you'd keep doing
> > memcpy instead. Note that a struct containing sub-structs (or arrays)
> > should also count the number of sub-structs (so just checking
> > `getNumOperands` isn't sufficient).
> >
> > Other parts of this code chose 6 stores...
> Do we ever need to not split these stores? Can't the MemCpyOpt pass take
> care of them later on?
You're assuming the optimizer is running at all :-)
In general your approach seems to care about `-O2`, and I agree that's usually
what we want to tune. However, clang also support `-O0` which won't touch these
stores at all, and in `-O0` it's usually nice to just see a copy from a global
while debugging. Further, configurations such as `-Os` and `-Oz` would be
hampered by excessive codegen.
You also need to consider the compile-time hit the expansion you're making
would add. More code means more time to compile, and I'd expect little to no
win from emitting more than a handful of stores. If someone has a large thing
on the stack, and it can be optimized, then we should teach the optimizer to
deal with memcpy better. I don't think e.g. a 400+ byte struct makes sense to
initialize with a bunch of store.
So I think you want to check what the size of the struct is (check out when
x86-64 and ARM64 inline memcpy / memset, I assume those are decent guesses for
size to inline).
Further, your current code sill do something silly for code like this:
```
struct S { int j; char arr[8]; float f; };
```
Before you'd have a single `memcpy`, but now it'll be a store, a `memset`, and
a store. I think you really want to have the same heuristic for arrays (so
you'd have just stores), and you want to make sure that if you're going to
split up a struct you do so for all the fields inside that struct.
================
Comment at: tools/clang/lib/CodeGen/CGDecl.cpp:1765
+ emitStoresForConstant(CGM, D, Loc, isVolatile, Builder, constant,
+ /*forInit*/ false);
}
----------------
glider wrote:
> jfb wrote:
> > Can you explain why this case needs to be different? IIUC it's because the
> > types can differ which makes the struct's elements not match up the ones
> > from the constant in the loop above? I think the code above should
> > automatically handle that case instead of passing in a flag here.
> >
> > You also definitely need to test the case where that happens :)
> I just thought we shouldn't break the constants that we didn't create with
> -ftrivial-var-auto-init.
> I'll take a closer look though (indeed, this crashes on one of the structs
> from the Linux kernel)
Yeah I think it's from the `Loc` adjustment above. I think your change is worth
doing for all cases, not just trivial var auto-init. It'll lead us to optimize
all of it much better IMO (and support things like `-O0` `-Os` and `-Oz`
uniformly well too).
================
Comment at: tools/clang/test/CodeGenCXX/auto-var-init.cpp:37
// PATTERN: @__const.test_small_custom.custom = private unnamed_addr constant
%struct.small { i8 42 }, align 1
// ZERO: @__const.test_small_custom.custom = private unnamed_addr constant
%struct.small { i8 42 }, align 1
struct small { char c; };
----------------
How come `@__const.test_small_custom.custom` still gets emitted? I guess custom
initialization goes through a different code path?
================
Comment at: tools/clang/test/CodeGenCXX/auto-var-init.cpp:496
// ZERO-LABEL: @test_smallpartinit_uninit()
// ZERO: call void @llvm.memset{{.*}}, i8 0,
----------------
I wonder if (maybe in a separate patch) you also want to avoid `memset` when
something is pretty small.
================
Comment at: tools/clang/test/CodeGenCXX/auto-var-init.cpp:664
+// PATTERN: store i32 -1431655766, {{.*}} align 4
+// PATTERN: call void @llvm.memset.{{.*}}(i8* align 4 {{.*}}, i8 0, i64 0, i1
false)
// ZERO-LABEL: @test_arraytail_uninit()
----------------
This one is particularly silly since it's a zero-sized `memset` :)
CHANGES SINCE LAST ACTION
https://reviews.llvm.org/D57898/new/
https://reviews.llvm.org/D57898
_______________________________________________
cfe-commits mailing list
[email protected]
https://lists.llvm.org/cgi-bin/mailman/listinfo/cfe-commits