Re: Basic libav profiling
Hi, > I put a build harness around libav and gathered some profiling data. See: > bzr branch lp:~linaro-toolchain-dev/+junk/libav-suite Thanks! > README.rst has the basic commands for running ffmpeg and initial perf > results showing the hot functions. Dave, 20 % of the time is spent in > memcpy() so you might want to have a look. FWIW I usually suspect when the profiling info shows that helper functions are the hottest. It might be that a bigger input should be used to stress out the functions which do the real computation. Thanks, Revital ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Linaro GDB 7.3 2011.08 released
The Linaro Toolchain Working Group is pleased to announce the release of Linaro GDB 7.3. Linaro GDB 7.3 2011.08 is the first release in the 7.3 series. Based off the latest GDB 7.3, it includes a number of ARM-focused bug fixes. This release includes all bug fixes from the latest Linaro GDB 7.2 release that were not already included in FSF GDB 7.3. In addition, this release fixes: * LP: #804401 [remote testsuite] Thread support * LP: #804387 [remote testsuite] Shared library test problems * LP: #804392 [remote testsuite] Rebuilt executables not copied * LP: #804396 [remote testsuite] Spurious failures The source tarball is available at: https://launchpad.net/gdb-linaro/+milestone/7.3-2011.08 More information on Linaro GDB is available at: https://launchpad.net/gdb-linaro ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: Basic libav profiling
Michael Hope writes: > On Tue, Aug 16, 2011 at 11:32 PM, Richard Sandiford > wrote: >> Michael Hope writes: >>> I put a build harness around libav and gathered some profiling data. See: >>> bzr branch lp:~linaro-toolchain-dev/+junk/libav-suite >>> >>> It includes a Makefile that builds a C only, h.264 only decoder and >>> two Creative Commons licensed videos to use as input. >> >> Thanks for putting this together. >> >>> README.rst has the basic commands for running ffmpeg and initial perf >>> results showing the hot functions. Dave, 20 % of the time is spent in >>> memcpy() so you might want to have a look. >>> >>> The vectoriser has no effect. GCC 4.5 is ~17 % faster than 4.6. I'll >>> look into extracting and harnessing the functions themselves later >>> this week. >> >> I had a look why auto-vectorisation wasn't having much effect. >> It looks from your profile that most of the hot functions are >> operating on 16x16 blocks of pixels with an unknown line stride. >> So the C code looks like: >> >> for (i = 0; i < 16; i++) >> { >> x[0] = OP (x[0]); >> ... >> x[15] = OP (x[15]); >> x += stride; >> } >> >> Because of the unknown stride, we're relying on SLP rather than >> loop-based vectorisation to handle this kind of loop. The problem >> is that SLP is being run _as_ a loop optimisation. At the moment, >> the gimple data-ref analysis code assumes that, during a loop >> optimisation, only simple induction variables are of interest, >> so it treats all of the x[...] references above as unrepresentable. >> If I move SLP outside the loop optimisations (just as a proof of concept), >> then that problem goes away. >> >> I talked about this with Ira, who said that SLP had been placed >> where it is because ivopts (a later loop optimisation) obfuscates >> things too much. As Ira said, we should probably look at (conditionally) >> removing the assumption that only IVs are of interest during loop >> optimisations. >> >> Another problem is that SLP supports a much smaller range of >> optimisations than the loop-based vectoriser. There's no support >> for promotion, demotion, or conditional expressions. This affects >> things like the weight_h264_pixels* functions, which contain >> conditional moves. > > I had a poke about. GCC isn't too happy about unrolled loops either. Right. Sorry, I should have been clearer, but this hand-unrolling was the trigger for this loop being SLP's job, rather than the normal loop vectoriser's. So the loop above was exactly the kind of loop you describe (OP was the same for each x[...]). SLP should still (in theory) be able to optimise the loop body as straight-line code. The problem is that it doesn't yet support the same range of operations. Richard ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: Basic libav profiling
On 18/08/11 06:56, Ira Rosen wrote: How can I tell the vectoriser that a input is a multiple of something? Unfortunately, I don't think you can. I think you can do something like this: void multiple(struct image * __restrict dst, struct image * __restrict src, int h) { if (h & 0xf) __gcc_unreachable (); for (int i = 0; i < h; i++) { dst->d[i] = A*src->d[i] + B*src->d[i+1]; } } [Just off the top of my head - you'd have to check the syntax for gcc_unreachable.] That should allow the value range propagation to do the right thing whilst inserting no real code, but whether that's properly hooked into vectorization I have no idea? Andrew ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: Is the Linaro toolchain useful on x86/x86_64?
On 17/08/11 12:38, Christian Robottom Reis wrote: On Wed, Aug 17, 2011 at 04:09:21AM -0700, Bernhard Rosenkranzer wrote: is the Linaro toolchain (esp. gcc) useful on x86/x86_64, or is an attempt to use the Linaro toolchain with such a target just asking for trouble? Isn't this exactly what Ubuntu do for all their releases (from when we started producing a gcc-linaro)? I believe Linaro GCC is used as the basis for the Ubuntu system compiler on x86, amd64, armel, and ppc. However, they do apply some additional patches, so we can't make any promises that the results will be the same. That said, most of their patches are customization and hardening, and they do report any bugs they find to us, so the extra patches shouldn't be that important. Andrew ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: Basic libav profiling
On 18 August 2011 12:45, Andrew Stubbs wrote: > On 18/08/11 06:56, Ira Rosen wrote: >>> >>> How can I tell the vectoriser that a input is a multiple of something? >> >> Unfortunately, I don't think you can. > > I think you can do something like this: > > void multiple(struct image * __restrict dst, struct image * __restrict > src, int h) > { > if (h & 0xf) > __gcc_unreachable (); > > for (int i = 0; i < h; i++) { > dst->d[i] = A*src->d[i] + B*src->d[i+1]; > } > } > > [Just off the top of my head - you'd have to check the syntax for > gcc_unreachable.] > > That should allow the value range propagation to do the right thing whilst > inserting no real code, but whether that's properly hooked into > vectorization I have no idea? Yes, the problem is that the vectorizer (or more precisely loop iteration analysis in tree-ssa-loop-niter.c) doesn't use this information. Ira > > Andrew > ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
[ACTIVITY] August 14-18
Hi, - change of default vector size for auto-vectorization on NEON - submitted and approved - continued working on vectorization of widening shifts - looked into SLP vectorization for libav - two vacation days I'll be on vacation on Aug 22-30. Ira ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain