On 05/05/2017 06:13 AM, Richard Sandiford wrote:
Hi Jeff,

Jeff Law <l...@redhat.com> writes:
+/* Compute the number of elements that we can trim from the head and
+   tail of ORIG resulting in a bitmap that is a superset of LIVE.
+
+   Store the number of elements trimmed from the head and tail in
+   TRIM_HEAD and TRIM_TAIL.  */
+
+static void
+compute_trims (ao_ref *ref, sbitmap live, int *trim_head, int *trim_tail)
+{
+  /* We use sbitmaps biased such that ref->offset is bit zero and the bitmap
+     extends through ref->size.  So we know that in the original bitmap
+     bits 0..ref->size were true.  We don't actually need the bitmap, just
+     the REF to compute the trims.  */
+
+  /* Now identify how much, if any of the tail we can chop off.  */
+  *trim_tail = 0;
+  int last_orig = (ref->size / BITS_PER_UNIT) - 1;
+  int last_live = bitmap_last_set_bit (live);
+  *trim_tail = (last_orig - last_live) & ~0x1;
+
+  /* Identify how much, if any of the head we can chop off.  */
+  int first_orig = 0;
+  int first_live = bitmap_first_set_bit (live);
+  *trim_head = (first_live - first_orig) & ~0x1;
+}

Can you remember why you needed to force the lengths to be even (the & ~0x1s)?
I was wondering whether it might have been because trimming single bytes
interferes with the later strlen optimisations, which the patch I just
posted should fix.

I guess there's also a risk that trimming a byte from a memcpy that has
a "nice" length could make things less efficient, but that could go both
ways: changing a memcpy of 9 bytes to a mempcy of 8 bytes would be good,
while changing from 8 to 7 might not be.  The same goes for even lengths
too though, like 10->8 (good) and 16->14 (maybe not a win).  FWIW, it
looks like the strlen pass uses:

       /* Don't adjust the length if it is divisible by 4, it is more efficient
          to store the extra '\0' in that case.  */
       if ((tree_to_uhwi (len) & 3) == 0)
         return;

for that.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK if the strlen
patch is OK?
It was primarily to avoid mucking up alignments of the start of the copy or leaving residuals at the end of a copy. It's an idea I saw while scanning the LLVM implementation of DSE. The fact that it avoids mucking things up for tree-ssa-strlen was a unplanned side effect.

I never did any real benchmarking either way. If you've got any hard data which shows it's a bad idea, then let's remove it and deal with the tree-ssa-strlen stuff (as I noted you'd done this morning).

jeff

Reply via email to