On Tue, Apr 21, 2015 at 06:51:52PM -0300, James Almer wrote: > On 21/04/15 6:38 PM, Michael Niedermayer wrote: > > On Tue, Apr 21, 2015 at 05:45:25PM -0300, James Almer wrote: > >> Signed-off-by: James Almer <[email protected]> > >> --- > >> libavcodec/acelp_vectors.c | 8 +++----- > >> libavcodec/adpcm.c | 2 +- > >> libavcodec/alacenc.c | 3 +-- > >> libavcodec/atrac3plus.c | 4 ++-- > >> libavcodec/dnxhdenc.c | 2 +- > >> libavcodec/dvenc.c | 2 +- > >> libavcodec/ffv1.h | 2 +- > >> libavcodec/ffv1dec.c | 3 +-- > >> libavcodec/g726.c | 2 +- > >> libavcodec/g729dec.c | 2 +- > >> libavcodec/golomb.h | 2 +- > >> libavcodec/h264.c | 5 ++--- > >> libavcodec/h264_refs.c | 2 +- > >> libavcodec/hevc.c | 26 ++++++++++++-------------- > >> libavcodec/hevc_cabac.c | 8 ++++---- > >> libavcodec/hevc_filter.c | 15 ++++++--------- > >> libavcodec/hevc_mvs.c | 4 ++-- > >> libavcodec/hevc_ps.c | 4 ++-- > >> libavcodec/hevc_refs.c | 5 ++--- > >> libavcodec/hevcpred_template.c | 4 ++-- > >> libavcodec/mpeg12enc.c | 8 ++++---- > >> libavcodec/opus.h | 2 +- > >> libavcodec/opus_celt.c | 9 ++++----- > >> libavcodec/proresenc_kostya.c | 6 ++---- > >> libavcodec/put_bits.h | 2 +- > >> libavcodec/vorbisdec.c | 5 ++--- > >> 26 files changed, 61 insertions(+), 76 deletions(-) > >> > >> diff --git a/libavcodec/acelp_vectors.c b/libavcodec/acelp_vectors.c > >> index 86851a3..7aef8c7 100644 > >> --- a/libavcodec/acelp_vectors.c > >> +++ b/libavcodec/acelp_vectors.c > >> @@ -133,12 +133,11 @@ void ff_acelp_fc_pulse_per_track( > >> int pulse_count, > >> int bits) > >> { > >> - int mask = (1 << bits) - 1; > >> int i; > >> > >> for(i=0; i<pulse_count; i++) > >> { > >> - fc_v[i + tab1[pulse_indexes & mask]] += > >> + fc_v[i + tab1[av_mod_uintp2(pulse_indexes, bits)]] += > >> (pulse_signs & 1) ? 8191 : -8192; // +/-1 in (2.13) > >> > >> pulse_indexes >>= bits; > >> @@ -154,13 +153,12 @@ void ff_decode_10_pulses_35bits(const int16_t > >> *fixed_index, > >> int half_pulse_count, int bits) > >> { > >> int i; > >> - int mask = (1 << bits) - 1; > >> > >> fixed_sparse->no_repeat_mask = 0; > >> fixed_sparse->n = 2 * half_pulse_count; > >> for (i = 0; i < half_pulse_count; i++) { > >> - const int pos1 = gray_decode[fixed_index[2*i+1] & mask] + i; > >> - const int pos2 = gray_decode[fixed_index[2*i ] & mask] + i; > >> + const int pos1 = gray_decode[av_mod_uintp2(fixed_index[2*i+1], > >> bits)] + i; > >> + const int pos2 = gray_decode[av_mod_uintp2(fixed_index[2*i ], > >> bits)] + i; > >> const float sign = (fixed_index[2*i+1] & (1 << bits)) ? -1.0 : > >> 1.0; > >> fixed_sparse->x[2*i+1] = pos1; > >> fixed_sparse->x[2*i ] = pos2; > > [...] > >> diff --git a/libavcodec/ffv1.h b/libavcodec/ffv1.h > >> index 5081397..bfc4d71 100644 > >> --- a/libavcodec/ffv1.h > >> +++ b/libavcodec/ffv1.h > >> @@ -143,7 +143,7 @@ static av_always_inline int fold(int diff, int bits) > >> diff = (int8_t)diff; > >> else { > >> diff += 1 << (bits - 1); > >> - diff &= (1 << bits) - 1; > >> + diff = av_mod_uintp2(diff, bits); > >> diff -= 1 << (bits - 1); > >> } > >> > > > > iam not sure some of these changes are are a good idea > > for example above the mask has to be calculated anyway what can be > > gained from av_mod_uintp2() use here ? > > i think this should be bechmarked when its in performance critical code > > "1 << (bits - 1)" is not the same as "(1 << bits) - 1". The latter is not > going to be > calculated if the arch optimized version of av_mod_uintp2 is used.
yep, i should not review patches when i am in a (IRC) meeting
before the patch fold looks like this: (for the non 8bit case)
addl 100(%rsp), %ecx
andl 104(%rsp), %ecx
subl 100(%rsp), %ecx
.L869:
afterwards it looks like this:
addl 104(%rsp), %ecx
andl 64(%rsp), %ecx
subl 104(%rsp), %ecx
.L869:
so it seems ok
>
> The one file where i don't know if there's going to be any gain is in one of
> the opus_celt.c
> changes, where the mask needs to be calculated even if arch optimized
> av_mod_uintp2 is used.
> That one can be removed, but every other change is pretty straightforward.
ok, then iam fine with the patch
Thanks
[...]
--
Michael GnuPG fingerprint: 9FF2128B147EF6730BADF133611EC787040B0FAB
Asymptotically faster algorithms should always be preferred if you have
asymptotical amounts of data
signature.asc
Description: Digital signature
_______________________________________________ ffmpeg-devel mailing list [email protected] http://ffmpeg.org/mailman/listinfo/ffmpeg-devel
