iDCT implementation in GCC NEON intrinsics, for your perusal
Here's an implementation of an 8x8 integer DCT done with NEON intrinsics -- essentially a translation of the assembly version in libjpeg-turbo trunk: https://github.com/mkedwards/crosstool-ng/blob/master/patches/libjpeg-turbo/trunk/0001-Implement-jsimd_idct_ifast-using-NEON-intrinsics.patch It is in a compilable (on Linaro 2011.05 GCC 4.5, anyway; a recent Linaro 4.6 snapshot ICEs) but otherwise untested state. Still, it's interesting to compare the assembly that it generates against the hand-written version. I thought I'd give linaro-toolchain a heads-up in case y'all could use a test case that generates plenty of pressure on the VFP/NEON register bank. (I intend to use it to see how much performance difference there really is, on the A8 and A9, between NEON code compiled for 16 vs. 32 registers.) Cheers, - Michael ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: Is the Linaro toolchain useful on x86/x86_64?
We use the Linaro x86 toolchain to cross-compile full system images for our embedded target, which work great. I apply a subset of the Ubuntu patchset to Linaro gcc and gdb as well as upstream eglibc and binutils. (I omit the multilib patches and some other things that didn't seem necessary for the target.) You can find this at https://github.com/mkedwards/crosstool-ng . ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: [ACTIVITY] w36
I should be able to equip you soon with a web browsing benchmark based on Firefox's Talos suite (https://wiki.mozilla.org/Buildbot/Talos). The most representative test (tp5) unfortunately depends on a data set that we can't distribute; but there are SunSpider and Dromaeo suites as well, and I am sure that we can arrange nightly tp5 runs within Mozilla. Let's work out what spectrum of hardware / kernel / userland options you'd like to cover with this benchmark. Cheers, - Michael ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: eglibc and fun with config.sub
Please coordinate with Jon Masters at RedHat/Fedora and Adam Conrad at Ubuntu/Debian on this. (Cc'ing the cross-distro list, through which the recent ARM summit at Linux Plumbers was organized.) Cheers, - Michael On Sep 16, 2011 8:41 AM, "David Gilbert" wrote: > OK, so we seem to have agreement here that what we want is autodetect > for eglibc and > forget about the triplet; well technically that probably makes my life > easier, and I don't > think it's too hard a sell. > > Dave > > ___ > linaro-toolchain mailing list > linaro-toolchain@lists.linaro.org > http://lists.linaro.org/mailman/listinfo/linaro-toolchain ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: [ACTIVITY] WW42
You may find the extensions at https://github.com/mkedwards/crosstool-ng useful. Cheers, - Michael On Sun, Oct 23, 2011 at 6:09 PM, Zhenqiang Chen wrote: > Summary: > * Exercise crosstool-ng and summarize the gaps. > > Details: > * Exercise crosstool-ng > (1) Sync with lp:~linaro-toolchain-dev/crosstool-ng/linaro. > (2) Try to config linux-host-baremental-target an > mingw32-host-baremental-target. > (3) Try to build the toolchain for both embedded toolchain and > linaro-gcc-4.6-2011.10 with the config. > . C compiler for linux and mingw32 hosts and c++ compiler for > linux host can be built without any change. > . C++ compiler for mingw32 host can be built after PCH is disabled. > . GDB-cross build fail due to dependence packages. > * Gaps in crosstool-ng > (1) Improve GDB-cross scripts to download and build the dependence > packages: expat and ncurses. Or put expat and ncurses as > companion_libraries. > (2) To remove dependence, embedded toolchain requires more > prerequisites like zlib. > New config and scripts are required to support the packages. > (3) Currently, the embedded toolchain source packages are released > as a tarball, which includes gcc, gmp, etc. New scripts are required > to support it. > (4) To make sure the toolchain can run with lower version glibc like > redhat4/5, the embedded toolchain requires lower version native > gcc4.3.6 to build it. > To support it, > . Users can build the native gcc manually, or > . Enhance the scripts to add one step to build native gcc. > (5) All the default package configurations are different from > embedded toolchain internal build scripts. > Since the configurations in embedded toolchain had been tuned > and tested, we will change the configurations in crosstool-ng if they > do not match and not configurable. > The same rule will apply for linaro toolchain. > > Plans: > * Write scripts to re-pack the embedded toolchain source packages. > * Add the supports for all prerequisites in crosstool-ng menuconfig. > > Thanks! > -Zhenqiang > > ___ > linaro-toolchain mailing list > linaro-toolchain@lists.linaro.org > http://lists.linaro.org/mailman/listinfo/linaro-toolchain > ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: ARM eglibc: adding ucontext?
The calling convention for makecontext() isn't just an issue for standards lawyers, it's an actual problem on ARM, especially on hard-float systems. According to the standard, "The application shall ensure that the value of argc matches the number of arguments of type int passed to func; otherwise, the behavior is undefined." If you read this to imply that func() must be a non-variadic function that takes zero or more int arguments, and call it as if it took exactly as many int arguments as fit in registers -- stuffing extras onto the stack if necessary -- then you're OK. But supporting more general calling conventions for func() gets hairy, and correctly supporting both variadic and non-variadic functions on a hard-float system is impossible. If you ask me, the right way to handle this is to replace makecontext() with ucontext_t * makevcontext(ucontext_t *ucp, void (*func)(va_list), va_list args). That is, if you're going to buck the standards committee and fix the interface instead of calling it obselescent and dropping the functionality. For what it's worth, mcontext_t on ARMv7-A shouldn't have to be as big a beast as eglibc's implementation for, say, powerpc; you don't have to keep any state which is clobbered by a function call, so you just need stack/frame/return addresses plus the callee-save core and VFP registers. Note that you will need to save/restore FP state in all swapcontext() calls. User thread context switch code doesn't have access to the FPEXC register and thus can't identify non-FP-using contexts and skip the q4-q7 save/restore. However, the standard is silent on whether you can have context-specific settings of floating point exception control, rounding mode, etc. If QEMU's coroutines need this, you'll need to save/restore FPSCR as well. Now it seems to me that you can make the "CPU state" part of the context structure drastically simpler than the conventional mcontext_t -- all you really need is the stack pointer. An inline assembly implementation of the innards of swapcontext() can explicitly push the FPSCR and return address; switch stacks; and pop the same explicit state. The rest can be left up to the compiler by marking this assembly block as having clobbered all registers. (The signal mask save/restore should be done in C outside this, to prevent signal delivery during this maneuver.) You might as well make this an inline function, and then you shouldn't wind up with gratuitous register churn. I suppose I might as well turn this into an eglibc patch. Probably not this week, though, as I have a lot of work to do before my Linaro Connect plenary. Cheers, - Michael On Oct 24, 2011 6:46 AM, "Peter Maydell" wrote: > At the moment ARM eglibc doesn't support the functions declared > in ucontext.h: getcontext(), setcontext(), swapcontext() and > makecontext(). Instead you get implementations which always > fail and set errno to ENOSYS. > > QEMU uses these functions to implement coroutines. Although there > is a fallback implementation in terms of threads, there are reasons > why using the fallback is suboptimal: > * its performance is worse > * it will be less tested, because x86_64 and i386 both implement > the ucontext functions and so QEMU on those hosts will be using > different code paths > * I'm not aware of a good way at configure time to detect whether > getcontext() et al will always fail without actually running a > test binary, which won't work in a cross-compile setup. (If eglibc > just didn't provide the functions at all this would be much > simpler...) > > We're going to care about performance and reliability of QEMU on > ARM hosts as we start to support KVM on Cortex-A15, so it would > be good if we could add ucontext function support to eglibc as > part of that effort. > > Opinions? Have I missed some good reason why there isn't an > ARM implementation of these functions? > > (I'm aware that the ucontext functions have been removed from > the latest version of the POSIX spec; however AFAIK there's no > equivalent functionality that replaces them so I think they're > still worth having implementations of for parity with other > architectures.) > > -- PMM > > ___ > linaro-toolchain mailing list > linaro-toolchain@lists.linaro.org > http://lists.linaro.org/mailman/listinfo/linaro-toolchain > ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain