Hi,

On Mon, Jul 23, 2012 at 5:30 PM, Derek Buitenhuis
<[email protected]> wrote:
> From: Yang Wang <[email protected]>
>
> In ff_put_pixels_clamped_mmx(), there are two assembly code blocks.
> In the first block (in the unrolled loop), the instructions
> "movq 8%3, %%mm1 \n\t", and so forth, have problems.
>
> From above instruction, it is clear what the programmer wants: a load from
> p + 8. But this assembly code doesn’t guarantee that. It only works if the
> compiler puts p in a register to produce an instruction like this:
> "movq 8(%edi), %mm1". During compiler optimization, it is possible that the
> compiler will be able to constant propagate into p. Suppose p = &x[10000].
> Then operand 3 can become 10000(%edi), where %edi holds &x. And the 
> instruction
> becomes "movq 810000(%edx)". That is, it will stride by 810000 instead of 8.
>
> This will cause a segmentation fault.
>
> This error was fixed in the second block of the assembly code, but not in
> the unrolled loop.
>
> How to reproduce:
>     This error is exposed when we build the ffmpeg using Intel C++ Compiler,
>     IPO+PGO optimization. Crashed when decoding an MJPEG video.
>
> Signed-off-by: Michael Niedermayer <[email protected]>
> Signed-off-by: Derek Buitenhuis <[email protected]>
> ---
>  libavcodec/x86/dsputil_mmx.c |   18 +++++++++---------
>  1 file changed, 9 insertions(+), 9 deletions(-)
>
> diff --git a/libavcodec/x86/dsputil_mmx.c b/libavcodec/x86/dsputil_mmx.c
> index 5eb4a24..522a565 100644
> --- a/libavcodec/x86/dsputil_mmx.c
> +++ b/libavcodec/x86/dsputil_mmx.c
> @@ -245,14 +245,14 @@ void ff_put_pixels_clamped_mmx(const DCTELEM *block, 
> uint8_t *pixels,
>      pix = pixels;
>      /* unrolled loop */
>      __asm__ volatile (
> -        "movq        %3, %%mm0          \n\t"
> -        "movq       8%3, %%mm1          \n\t"
> -        "movq      16%3, %%mm2          \n\t"
> -        "movq      24%3, %%mm3          \n\t"
> -        "movq      32%3, %%mm4          \n\t"
> -        "movq      40%3, %%mm5          \n\t"
> -        "movq      48%3, %%mm6          \n\t"
> -        "movq      56%3, %%mm7          \n\t"
> +        "movq      (%3), %%mm0          \n\t"
> +        "movq     8(%3), %%mm1          \n\t"
> +        "movq    16(%3), %%mm2          \n\t"
> +        "movq    24(%3), %%mm3          \n\t"
> +        "movq    32(%3), %%mm4          \n\t"
> +        "movq    40(%3), %%mm5          \n\t"
> +        "movq    48(%3), %%mm6          \n\t"
> +        "movq    56(%3), %%mm7          \n\t"
>          "packuswb %%mm1, %%mm0          \n\t"
>          "packuswb %%mm3, %%mm2          \n\t"
>          "packuswb %%mm5, %%mm4          \n\t"
> @@ -262,7 +262,7 @@ void ff_put_pixels_clamped_mmx(const DCTELEM *block, 
> uint8_t *pixels,
>          "movq     %%mm4, (%0, %1, 2)    \n\t"
>          "movq     %%mm6, (%0, %2)       \n\t"
>          :: "r"(pix), "r"((x86_reg)line_size), "r"((x86_reg)line_size * 3),
> -           "m"(*p)
> +           "r"(p)
>          : "memory");
>      pix += line_size * 4;
>      p   += 32;

OK.

Ronald
_______________________________________________
libav-devel mailing list
[email protected]
https://lists.libav.org/mailman/listinfo/libav-devel

Reply via email to