Hi, On Fri, Jul 27, 2012 at 1:01 PM, Loren Merritt <[email protected]> wrote: > On Fri, 27 Jul 2012, Ronald S. Bultje wrote: > >> From: "Ronald S. Bultje" <[email protected]> >> >> This completes the conversion of h264dsp to yasm; note that h264 also >> uses some dsputil functions, most notably qpel. Performance-wise, the >> yasm-version is ~10 cycles faster (182->172) on x86-64, and ~8 cycles >> faster (201->193) on x86-32. >> --- >> libavcodec/x86/h264_deblock.asm | 168 >> +++++++++++++++++++++++++++++++++++++++ >> libavcodec/x86/h264dsp_mmx.c | 162 ++----------------------------------- >> 2 files changed, 175 insertions(+), 155 deletions(-) >> >> diff --git a/libavcodec/x86/h264_deblock.asm >> b/libavcodec/x86/h264_deblock.asm >> index 1982dc4..77b25d2 100644 >> --- a/libavcodec/x86/h264_deblock.asm >> +++ b/libavcodec/x86/h264_deblock.asm >> @@ -27,6 +27,10 @@ >> %include "x86inc.asm" >> %include "x86util.asm" >> >> +SECTION_RODATA >> + >> +pb_3_1: times 4 db 3, 1 >> + >> SECTION .text >> >> cextern pb_0 >> @@ -911,3 +915,167 @@ ff_chroma_intra_body_mmxext: >> paddb m1, m5 >> paddb m2, m6 >> ret >> + >> +;----------------------------------------------------------------------------- >> +; void h264_loop_filter_strength(int16_t bs[2][4][4], uint8_t nnz[40], >> +; int8_t ref[2][40], int16_t mv[2][40][2], >> +; int bidir, int edges, int step, >> +; int mask_mv0, int mask_mv1, int field); >> +; >> +; bidir is 0 or 1 >> +; edges is 1 or 4 >> +; step is 1 or 2 >> +; mask_mv0 is 0 or 3 >> +; mask_mv1 is 0 or 1 >> +; field is 0 or 1 >> +;----------------------------------------------------------------------------- >> +%macro loop_filter_strength_iteration 7 ; edges, step, mask_mv, >> + ; dir, d_idx, mask_dir, bidir >> +%define edgesd %1 >> +%define stepd %2 >> +%define mask_mvd %3 >> +%define dir %4 >> +%define d_idx %5 >> +%define mask_dir %6 >> +%define bidir %7 >> + xor b_idxd, b_idxd ; for (b_idx = 0; b_idx < edges; b_idx += >> step) >> +.b_idx_loop_ %+ dir %+ _ %+ bidir: > > %%.b_idx_loop: > Automatically generates a different label for each instantiation of the macro.
My disassembly now looks like this: 0x00000001004c43b4 <ff_h264_loop_filter_strength_mmx2.nofield+25>: jne 0x1004c44d1 <ff_h264_loop_filter_strength_mmx2.bidir> 0x00000001004c43ba <ff_h264_loop_filter_strength_mmx2.nofield+31>: xor %r8d,%r8d 0x00000001004c43bd <[email protected]_idx_loop+0>: pxor %mm0,%mm0 0x00000001004c43c0 <[email protected]_idx_loop+3>: test %r11d,%r8d Can I somehow keep the function name in it? I find that somewhat useful when debugging. Ronald _______________________________________________ libav-devel mailing list [email protected] https://lists.libav.org/mailman/listinfo/libav-devel
