On 05/05/2017 06:13 AM, Richard Sandiford wrote:
Hi Jeff,
Jeff Law <l...@redhat.com> writes:
+/* Compute the number of elements that we can trim from the head and
+ tail of ORIG resulting in a bitmap that is a superset of LIVE.
+
+ Store the number of elements trimmed from the head and tail in
+ TRIM_HEAD and TRIM_TAIL. */
+
+static void
+compute_trims (ao_ref *ref, sbitmap live, int *trim_head, int *trim_tail)
+{
+ /* We use sbitmaps biased such that ref->offset is bit zero and the bitmap
+ extends through ref->size. So we know that in the original bitmap
+ bits 0..ref->size were true. We don't actually need the bitmap, just
+ the REF to compute the trims. */
+
+ /* Now identify how much, if any of the tail we can chop off. */
+ *trim_tail = 0;
+ int last_orig = (ref->size / BITS_PER_UNIT) - 1;
+ int last_live = bitmap_last_set_bit (live);
+ *trim_tail = (last_orig - last_live) & ~0x1;
+
+ /* Identify how much, if any of the head we can chop off. */
+ int first_orig = 0;
+ int first_live = bitmap_first_set_bit (live);
+ *trim_head = (first_live - first_orig) & ~0x1;
+}
Can you remember why you needed to force the lengths to be even (the & ~0x1s)?
I was wondering whether it might have been because trimming single bytes
interferes with the later strlen optimisations, which the patch I just
posted should fix.
I guess there's also a risk that trimming a byte from a memcpy that has
a "nice" length could make things less efficient, but that could go both
ways: changing a memcpy of 9 bytes to a mempcy of 8 bytes would be good,
while changing from 8 to 7 might not be. The same goes for even lengths
too though, like 10->8 (good) and 16->14 (maybe not a win). FWIW, it
looks like the strlen pass uses:
/* Don't adjust the length if it is divisible by 4, it is more efficient
to store the extra '\0' in that case. */
if ((tree_to_uhwi (len) & 3) == 0)
return;
for that.
Tested on aarch64-linux-gnu and x86_64-linux-gnu. OK if the strlen
patch is OK?
It was primarily to avoid mucking up alignments of the start of the copy
or leaving residuals at the end of a copy. It's an idea I saw while
scanning the LLVM implementation of DSE. The fact that it avoids
mucking things up for tree-ssa-strlen was a unplanned side effect.
I never did any real benchmarking either way. If you've got any hard
data which shows it's a bad idea, then let's remove it and deal with the
tree-ssa-strlen stuff (as I noted you'd done this morning).
jeff