Siarhei Siamashka <[email protected]> writes: > On Sat, 28 Sep 2013 23:52:36 +0200 > [email protected] (Søren Sandmann) wrote: > >> Siarhei Siamashka <[email protected]> writes: >> >> > The SIMD optimized inner loops in the VMX/Altivec code are trying >> > to emulate unaligned accesses to the destination buffer. For each >> > 4 pixels (which fit into a 128-bit register) the current >> > implementation: >> > 1. first performs two aligned reads, which cover the needed data >> > 2. reshuffles bytes to get the needed data in a single vector register >> > 3. does all the necessary calculations >> > 4. reshuffles bytes back to their original location in two registers >> > 5. performs two aligned writes back to the destination buffer >> > >> > Unfortunately in the case if the destination buffer is unaligned and >> > the width is a perfect multiple of 4 pixels, we may have some writes >> > crossing the boundaries of the destination buffer. In a multithreaded >> > environment this may potentially corrupt the data outside of the >> > destination buffer if it is concurrently read and written by some >> > other thread. >> > >> > It is the primary suspect for the "make check" failure on power7 hardware: >> > http://lists.freedesktop.org/archives/pixman/2013-August/002871.html >> > >> > The valgrind report for blitters-test is full of: >> > >> > =#085== Invalid write of size 8 >> > =#085== at 0x1004B0B4: vmx_combine_add_u (pixman-vmx.c:1089) >> > =#085== by 0x100446EF: general_composite_rect (pixman-general.c:214) >> > =#085== by 0x10002537: test_composite (blitters-test.c:363) >> > =#085== by 0x1000369B: fuzzer_test_main._omp_fn.0 (utils.c:733) >> > =#085== by 0x10004943: fuzzer_test_main (utils.c:728) >> > =#085== by 0x10002C17: main (blitters-test.c:397) >> > =#085== Address 0x5188218 is 0 bytes after a block of size 88 alloc'd >> > =#085== at 0x4051DA0: memalign (vg_replace_malloc.c:581) >> > =#085== by 0x4051E7B: posix_memalign (vg_replace_malloc.c:709) >> > =#085== by 0x10004CFF: aligned_malloc (utils.c:833) >> > =#085== by 0x10001DCB: create_random_image (blitters-test.c:47) >> > =#085== by 0x10002263: test_composite (blitters-test.c:283) >> > =#085== by 0x1000369B: fuzzer_test_main._omp_fn.0 (utils.c:733) >> > =#085== by 0x10004943: fuzzer_test_main (utils.c:728) >> > =#085== by 0x10002C17: main (blitters-test.c:397) >> > >> > This patch addresses the problem by first aligning the destination >> > buffer at a 16 byte boundary in each combiner function. This trick >> > is borrowed from the pixman SSE2 code. >> >> Looks good to me. >> >> The patches here: >> >> http://lists.freedesktop.org/archives/pixman/2013-September/002976.html >> >> introduce a new thread-test program that is intended to exercise the >> bug, and I verified that if a similar issue is introduced on x86: >> >> >> http://people.freedesktop.org/~sandmann/0001-HACK-introduce-bug-in-core_combine_over_u_sse2_no_ma.patch >> >> thread-test does detect it (whereas blitters-test doesn't). However, I >> still don't have a PowerPC (let alone a Power7), so I'm not sure if it >> detects the bug there. If it does, though, it may be worthwhile rebasing >> your fixes on top of it to verify that they do indeed fix the bug. > > Thanks, that was an interesting test. But in its current form, it does > not seem to detect the bug yet.
Okay, new version to follow that incorporates your endianness and undefinedness fixes plus uses wider compositing. Also available in the thread-test branch of my repository. Søren _______________________________________________ Pixman mailing list [email protected] http://lists.freedesktop.org/mailman/listinfo/pixman
