While making radeondrm(4) work on powerpc64 I'm running into an
interesting unaligned access issue.

Modern POWER CPUs generally support unaligned access.  Normal load and
store unstructions work fine with addresses that aren't naturally
aligned when operating on cached memory.  As a result, clang will
optimize code by replacing two 32-bit store instructions with a single
64-bit store instruction even if there is only 32-bit alignment.

However, this doesn't work for memory that is mapped uncachable.  And
there is some code in radeondrm(4) (and also in amdgpu(4)) that
generates alignment exceptions because it is writing to bits of video
memory that are mapped through the graphics aperture.

There are two ways to fix this.  The compiler won't apply this
optimization if memory is accessed through pointers that are marked
volatile.  Hence the fix below.  In my opinion that is the right fix
as rdev->uvd.cpu_addr is a volatile pointer and that aspect shouldn't
be dropped.  The downside of this approach is that we may need to
maintain some additional local fixes.

The alternative is to emulate the access in the kernel.  I fear that
is what Linux does, which is why they don't notice this issue.  As
such, this issue may crop up in more places and the emulation would
catch them all.  But I'm a bit reluctant to add this emulation since
it may hide bugs in other parts of our kernel.

Thoughts?  ok?


Index: dev/pci/drm/radeon/radeon_uvd.c
===================================================================
RCS file: /cvs/src/sys/dev/pci/drm/radeon/radeon_uvd.c,v
retrieving revision 1.3
diff -u -p -r1.3 radeon_uvd.c
--- dev/pci/drm/radeon/radeon_uvd.c     8 Jun 2020 04:48:16 -0000       1.3
+++ dev/pci/drm/radeon/radeon_uvd.c     25 Oct 2020 09:11:33 -0000
@@ -781,7 +781,7 @@ int radeon_uvd_get_create_msg(struct rad
        uint64_t offs = radeon_bo_size(rdev->uvd.vcpu_bo) -
                RADEON_GPU_PAGE_SIZE;
 
-       uint32_t *msg = rdev->uvd.cpu_addr + offs;
+       volatile uint32_t *msg = rdev->uvd.cpu_addr + offs;
        uint64_t addr = rdev->uvd.gpu_addr + offs;
 
        int r, i;
@@ -817,7 +817,7 @@ int radeon_uvd_get_destroy_msg(struct ra
        uint64_t offs = radeon_bo_size(rdev->uvd.vcpu_bo) -
                RADEON_GPU_PAGE_SIZE;
 
-       uint32_t *msg = rdev->uvd.cpu_addr + offs;
+       volatile uint32_t *msg = rdev->uvd.cpu_addr + offs;
        uint64_t addr = rdev->uvd.gpu_addr + offs;
 
        int r, i;

Reply via email to