hfinkel added inline comments. ================ Comment at: include/clang/Basic/AttrDocs.td:1330 @@ -1328,1 +1329,3 @@ +the loop unroller tries to unroll the loop by a power-of-two factor +subject to a limit on unrolled code size. ---------------- meheff wrote: > hfinkel wrote: > > meheff wrote: > > > broune wrote: > > > > Is #pragma unroll different from #pragma unroll(full) ? If not, why not > > > > make the two doc strings the same? > > > In AttrDocs.td there is not documentation for "#pragma clang loop > > > unroll". Instead it just mentions "#pragma clang loop" and points to > > > LanguageExtensions for details. Regarding LanguageExtensions > > > documentation of "#pragma clang loop unroll" and AttrDocs documentation > > > for "#pragma unroll", I believe they do both say the same thing (or at > > > least aren't conflicting with each other) with the LanguageExtensions > > > documentation providing more context and describing the limits on > > > unrolling. > > > > > > Little bit of background here. There are two forms of the loop unroll > > > pragmas: > > > > > > #pragma unroll > > > #pragma unroll N > > > > > > and: > > > > > > #pragma clang loop unroll(full) > > > #pragma clang loop unroll_count(N) > > > > > > The "#pragma unroll" forms are supported for compatibilty with other > > > compilers (such as nvcc CUDA compiler) where this form is common while > > > "#pragma clang .." is a more cannonical form specific to clang. "#pragma > > > unroll" and "#pragma clang loop unroll(full)" are equivalent. What I > > > propose in the related optimization patch > > > (http://reviews.llvm.org/D10854) is that "#pragma unroll" should enable > > > loop unrolling with a runtime trip count of the loop. So if you have: > > > > > > void foo(int n) { > > > #pragma unroll > > > for (int i=0; i<n; i++) { ... } > > > } > > > > > > Then the optimizer will unroll the loop with some reasonable power-of-two > > > factor. One potentially funny bit about this approach is that "#pragma > > > clang loop unroll(full)" will also enable unrolling of a loop with a > > > runtime trip count. This may be surprising... or maybe not. Full > > > unrolling is clearly not possible in this case so just being more > > > aggressive with unrolling may be the sensible thing to do. > > > One potentially funny bit about this approach is that "#pragma clang loop > > > unroll(full)" will also enable unrolling of a loop with a runtime trip > > > count. This may be surprising... or maybe not. Full unrolling is clearly > > > not possible in this case so just being more aggressive with unrolling > > > may be the sensible thing to do. > > > > I really don't like this. If the user is requesting 'full' unrolling, I'd > > rather we do only that (at least up to some really high limit, as we do > > currently). Runtime unrolling adds extra expense in the low-trip-count case > > (because of the various checks involved) and has a different set of > > trade-offs than simpler unrolling. > > > > I think that we should have 'full' unrolling add > > "llvm.loop.unroll.runtime.disable" metadata. We should add some other mode, > > "aggressive" for example, which is like full unrolling but also allows for > > runtime unrolling. I think the desired end state here is that -Rpass=unroll > > would provide some specific feedback in the case where full unrolling was > > requested, but not possible because of a runtime trip count. > > > I think that makes sense. The issue we were seeing was a loop with runtime > trip count behaving badly because it was marked '#pragma unroll', had a > runtime trip count, and we'd like to enable runtime unrolling for the NVPTX > target (http://reviews.llvm.org/D10855). IIRC, the loop sometimes is > compile-time trip count depending upon template magic. Anyway, your > suggestion (unroll(full) implies llvm.loop.unroll.runtime.disable) fixes that > issue. One question is how do you feel about enabling runtime unrolling with > '#pragma unroll N"? This gives more benefit to loops with higher trip > counts, but I could be convinced either way. > One question is how do you feel about enabling runtime unrolling with > '#pragma unroll N"? This gives more benefit to loops with higher trip counts, > but I could be convinced either way.
I think it makes sense to enable it for '#pragma unroll N', since to user likely expects that behavior in the case of a runtime trip count. Repository: rL LLVM http://reviews.llvm.org/D10857 _______________________________________________ cfe-commits mailing list [email protected] http://lists.cs.uiuc.edu/mailman/listinfo/cfe-commits
