Daniel Kurtz <[email protected]> writes:

> Fix both of these by converting width from bits to FbBits, and let the
> pointer arithmetic convert to byte addresses.

Ok, I'm doing a bit more careful review this morning, and I think the
width parameter isn't quite right.

(Note that we can ignore bpp as fbBlt doesn't use it for computation at
all; it's purely a bit-copy function.). Imagine copying at 8-bpp two
pixels one place right:

        dstLine = srcLine
        srcX = 24
        dstX = 32
        width = 16

        width >> FB_SHIFT = 0

        srcLine < dstLine                       TRUE
        &srcLine[width >> FB_SHIFT] > dstLine   FALSE

        dstLine < srcLine                       FALSE
        &dstLine[width >> FB_SHIFT] > srcLine   TRUE

You actually need to check whether there are any overlapping bits in the
copy. There are two ways to do this -- either add in the X offset when
computing the last address or by converting to byte addresses earlier
and using those. I think the byte address version is a lot simpler:

        byteSrc   = (uint8_t *) srcLine + (srcX >> 3);
        byteDst   = (uint8_t *) dstLine + (dstX >> 3);
        byteWidth = width >> 3;

        disjoint = (byteSrc + byteWidth <= byteDst) || (byteDst + byteWidth <= 
byteSrc);

The extra check for (bpp & 7) is not necessary as the memcpy case
already checks for byte alignment of source, dest and width.
        
For our test case above

        dstLine = srcLine
        srcX = 24
        dstX = 32
        width = 16

        byteSrc   = srcLine + 3
        byteDst   = dstLine + 4 
        byteWidth = 2

        disjoint = (srcLine + 3 + 2 <= dstLine + 4) || (dstLine + 4 + 2 <= 
srcLine + 3)
        
                 = (srcLine + 5 <= srcLine + 4) || (srcLine + 6 <= srcLine + 3)

                 = FALSE || FALSE

                 = FALSE

However, if we copy two pixels right:

        dstLine = srcLine
        srcX = 24
        dstX = 40
        width = 16

        byteSrc   = srcLine + 3
        byteDst   = dstLine + 5
        byteWidth = 2

        disjoint = (srcLine + 3 + 2 <= dstLine + 5) || (dstLine + 5 + 2 <= 
srcLine + 3)
        
                 = (srcLine + 5 <= srcLine + 5) || (srcLine + 7 <= srcLine + 3)

                 = TRUE || FALSE

                 = TRUE

Or, if we copy one pixel left:

        dstLine = srcLine
        srcX = 32
        dstX = 24
        width = 16

        byteSrc   = srcLine + 4
        byteDst   = dstLine + 3 
        byteWidth = 2

        disjoint = (srcLine + 4 + 2 <= dstLine + 3) || (dstLine + 3 + 2 <= 
srcLine + 4)
        
                 = (srcLine + 6 <= srcLine + 3) || (srcLine + 5 <= srcLine + 4)

                 = FALSE || FALSE

                 = FALSE

However, if we copy two pixels left:

        dstLine = srcLine
        srcX = 40
        dstX = 24
        width = 16

        byteSrc   = srcLine + 5
        byteDst   = dstLine + 3
        byteWidth = 2

        disjoint = (srcLine + 5 + 2 <= dstLine + 3) || (dstLine + 3 + 2 <= 
srcLine + 5)
        
                 = (srcLine + 7 <= srcLine + 3) || (srcLine + 5 <= srcLine + 5)

                 = FALSE || TRUE

                 = TRUE

Of course, we can only do this in bytes if the copy is byte aligned. The
current code already checks that with

        !(srcX & 7) && !(dstX & 7) && !(width & 7)

So, our final code can look like this:

From 5eb183640017a55b9e507d2f8bbc3f6c6ff18dbe Mon Sep 17 00:00:00 2001
From: Keith Packard <[email protected]>
Date: Tue, 25 Mar 2014 08:21:16 -0700
Subject: [PATCH] fb: fix fast-path blt detection

The width parameter is used to disable the blit fast-path (memcpy) when
source and destination rows overlap in memory. This check was added in [0].

Unfortunately, the calculation to determine if source and destination
lines overlapped was incorrect:
  (1) it converts width from pixels to bytes, but width is actually in
      bits, not pixels.
  (2) it adds this byte offset to dst/srcLine, which implicitly converts
      the offset from bytes to sizeof(FbBits).

Fix both of these by converting addresses to byte pointers and width
to bytes and doing comparisons on the resulting byte address.

For example:
A 32-bpp 1366 pixel-wide row will have
  width = 1366 * 32 = 43712 bits
  bpp = 32
  (bpp >> 3) = 4
  width * (bpp >> 3) = 174848 FbBits
  (FbBits *)width => 699392 bytes

So, "careful" was true if the destination line was within 699392 bytes,
instead of just within its 1366 * 4 = 5464 byte row.

This bug causes us to take the slow path for large non-overlapping rows
that are "close" in memory.  As a data point, XGetImage(1366x768) on my
ARM chromebook was taking ~140 ms, but with this fixed, it now takes
about 60 ms.
  XGetImage() -> exaGetImage() -> fbGetImage -> fbBlt()

[0] commit e32cc0b4c85c78cd8743a6e1680dcc79054b57ce
Author: Adam Jackson <[email protected]>
Date:   Thu Apr 21 16:37:11 2011 -0400

    fb: Fix memcpy abuse

    The memcpy fast path implicitly assumes that the copy walks
    left-to-right.  That's not something memcpy guarantees, and newer glibc
    on some processors will indeed break that assumption.  Since we walk a
    line at a time, check the source and destination against the width of
    the blit to determine whether we can be sloppy enough to allow memcpy.
    (Having done this, we can remove the check for !reverse as well.)

v3: Convert to byte units

This first checks to make sure the blt is byte aligned, converts all
of the data to byte units and then compares for byte address range
overlap between source and dest.

Signed-off-by: Keith Packard <[email protected]>
---
 fb/fbblt.c | 60 +++++++++++++++++++++++++++++++++---------------------------
 1 file changed, 33 insertions(+), 27 deletions(-)

diff --git a/fb/fbblt.c b/fb/fbblt.c
index 72a05f6..c615106 100644
--- a/fb/fbblt.c
+++ b/fb/fbblt.c
@@ -56,42 +56,48 @@ fbBlt(FbBits * srcLine,
     int n, nmiddle;
     Bool destInvarient;
     int startbyte, endbyte;
-    int careful;
 
     FbDeclareMergeRop();
 
+    if (alu == GXcopy && pm == FB_ALLONES &&
+        !(srcX & 7) && !(dstX & 7) && !(width & 7))
+    {
+        CARD8           *src_byte = (CARD8 *) srcLine + (srcX >> 3);
+        CARD8           *dst_byte = (CARD8 *) dstLine + (dstX >> 3);
+        FbStride        src_byte_stride = srcStride << (FB_SHIFT - 3);
+        FbStride        dst_byte_stride = dstStride << (FB_SHIFT - 3);
+        int             width_byte = (width >> 3);
+
+        /* Make sure there's no overlap; we can't use memcpy in that
+         * case as it's not well defined, so fall through to the
+         * general code
+         */
+        if (src_byte + width_byte <= dst_byte ||
+            dst_byte + width_byte <= src_byte)
+        {
+            int i;
+
+            if (!upsidedown)
+                for (i = 0; i < height; i++)
+                    MEMCPY_WRAPPED(dst_byte + i * dst_byte_stride,
+                                   src_byte + i * src_byte_stride,
+                                   width_byte);
+            else
+                for (i = height - 1; i >= 0; i--)
+                    MEMCPY_WRAPPED(dst_byte + i * dst_byte_stride,
+                                   src_byte + i * src_byte_stride,
+                                   width_byte);
+
+            return;
+        }
+    }
+
     if (bpp == 24 && !FbCheck24Pix(pm)) {
         fbBlt24(srcLine, srcStride, srcX, dstLine, dstStride, dstX,
                 width, height, alu, pm, reverse, upsidedown);
         return;
     }
 
-    careful = !((srcLine < dstLine && srcLine + width * (bpp >> 3) > dstLine) ||
-                (dstLine < srcLine && dstLine + width * (bpp >> 3) > srcLine))
-        || (bpp & 7);
-
-    if (alu == GXcopy && pm == FB_ALLONES && !careful &&
-        !(srcX & 7) && !(dstX & 7) && !(width & 7)) {
-        int i;
-        CARD8 *tmpsrc = (CARD8 *) srcLine;
-        CARD8 *tmpdst = (CARD8 *) dstLine;
-
-        srcStride *= sizeof(FbBits);
-        dstStride *= sizeof(FbBits);
-        width >>= 3;
-        tmpsrc += (srcX >> 3);
-        tmpdst += (dstX >> 3);
-
-        if (!upsidedown)
-            for (i = 0; i < height; i++)
-                MEMCPY_WRAPPED(tmpdst + i * dstStride, tmpsrc + i * srcStride, width);
-        else
-            for (i = height - 1; i >= 0; i--)
-                MEMCPY_WRAPPED(tmpdst + i * dstStride, tmpsrc + i * srcStride, width);
-
-        return;
-    }
-
     FbInitializeMergeRop(alu, pm);
     destInvarient = FbDestInvarientMergeRop();
     if (upsidedown) {
-- 
1.9.0

-- 
[email protected]

Attachment: pgp2Y0apvfcn5.pgp
Description: PGP signature

_______________________________________________
[email protected]: X.Org development
Archives: http://lists.x.org/archives/xorg-devel
Info: http://lists.x.org/mailman/listinfo/xorg-devel

Reply via email to