Hello,
> I used GCC 7.2. clear_blocks_mmx is slower than c for me as well, but
> not the rest.
> Your compiler seems to have done a much better job than mine. Is it
> Clang? Does it somehow have vectorization enabled perhaps? Because
> that's not supposed to happen.
>
>
Yes it's Clang 8.1
I put the clear_blocks_c function, in a file and run
clang -S -O1 test_asm_gen.c
the asm result is
.section __TEXT,__text,regular,pure_instructions
.macosx_version_min 10, 12
.globl _clear_blocks_c
.p2align 4, 0x90
_clear_blocks_c: ## @clear_blocks_c
.cfi_startproc
## BB#0:
pushq %rbp
Ltmp0:
.cfi_def_cfa_offset 16
Ltmp1:
.cfi_offset %rbp, -16
movq %rsp, %rbp
Ltmp2:
.cfi_def_cfa_register %rbp
movl $768, %esi ## imm = 0x300
callq ___bzero
popq %rbp
retq
.cfi_endproc
.subsections_via_symbols
Seems like an optimized function is call for clear_blocks_c
>
> > I also modify several decoder/encoder, in order to fix the
> DECLARE_ALIGNED
> > from 16 to 32
> >
> > I run make fate SAMPLES=fate-suite/
> > i have several errors, but after a check, these errors
> > doesn't seems to be related to this patch
>
> Make sure to clean your build folder if you recently pulled new commits
> from the git repository. Reconfigure if necessary.
>
>
Ok, i rerun it, and pass fate test
2017-10-02 4:05 GMT+02:00 Ronald S. Bultje <[email protected]>:
> Hi,
>
> On Sun, Oct 1, 2017 at 7:46 PM, Martin Vignali <[email protected]>
> wrote:
>
> > I also modify several decoder/encoder, in order to fix the
> DECLARE_ALIGNED
> > from 16 to 32
> >
>
> How did you decide which ones to change?
>
> Ronald
>
after running fate test, looks like tests fail when
LOCAL_ALIGNED_16 or DECLARE_ALIGNED(16 is use to declare block variable
not in other case.
using git grep clear_block, i check all the files who use this func
and change LOCAL_ALIGNED_16 to LOCAL_ALIGNED_32
or DECLARE_ALIGNED(16.. to DECLARE_ALIGNED(32...
Martin
_______________________________________________
ffmpeg-devel mailing list
[email protected]
http://ffmpeg.org/mailman/listinfo/ffmpeg-devel