iDCT implementation in GCC NEON intrinsics, for your perusal

2011-06-27 Thread Michael K. Edwards
Here's an implementation of an 8x8 integer DCT done with NEON
intrinsics -- essentially a translation of the assembly version in
libjpeg-turbo trunk:

https://github.com/mkedwards/crosstool-ng/blob/master/patches/libjpeg-turbo/trunk/0001-Implement-jsimd_idct_ifast-using-NEON-intrinsics.patch

It is in a compilable (on Linaro 2011.05 GCC 4.5, anyway; a recent
Linaro 4.6 snapshot ICEs) but otherwise untested state.  Still, it's
interesting to compare the assembly that it generates against the
hand-written version.  I thought I'd give linaro-toolchain a heads-up
in case y'all could use a test case that generates plenty of pressure
on the VFP/NEON register bank.  (I intend to use it to see how much
performance difference there really is, on the A8 and A9, between NEON
code compiled for 16 vs. 32 registers.)

Cheers,
- Michael

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain


Re: Is the Linaro toolchain useful on x86/x86_64?

2011-08-17 Thread Michael K. Edwards
We use the Linaro x86 toolchain to cross-compile full system images
for our embedded target, which work great.  I apply a subset of the
Ubuntu patchset to Linaro gcc and gdb as well as upstream eglibc and
binutils.  (I omit the multilib patches and some other things that
didn't seem necessary for the target.)  You can find this at
https://github.com/mkedwards/crosstool-ng .

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain


Re: [ACTIVITY] w36

2011-09-09 Thread Michael K. Edwards
I should be able to equip you soon with a web browsing benchmark based
on Firefox's Talos suite (https://wiki.mozilla.org/Buildbot/Talos).
The most representative test (tp5) unfortunately depends on a data set
that we can't distribute; but there are SunSpider and Dromaeo suites
as well, and I am sure that we can arrange nightly tp5 runs within
Mozilla.  Let's work out what spectrum of hardware / kernel / userland
options you'd like to cover with this benchmark.

Cheers,
- Michael

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain


Re: eglibc and fun with config.sub

2011-09-18 Thread Michael K. Edwards
Please coordinate with Jon Masters at RedHat/Fedora and Adam Conrad at
Ubuntu/Debian on this.  (Cc'ing the cross-distro list, through which the
recent ARM summit at Linux Plumbers was organized.)

Cheers,
- Michael
On Sep 16, 2011 8:41 AM, "David Gilbert"  wrote:
> OK, so we seem to have agreement here that what we want is autodetect
> for eglibc and
> forget about the triplet; well technically that probably makes my life
> easier, and I don't
> think it's too hard a sell.
>
> Dave
>
> ___
> linaro-toolchain mailing list
> linaro-toolchain@lists.linaro.org
> http://lists.linaro.org/mailman/listinfo/linaro-toolchain
___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain


Re: [ACTIVITY] WW42

2011-10-23 Thread Michael K. Edwards
You may find the extensions at https://github.com/mkedwards/crosstool-ng useful.

Cheers,
- Michael

On Sun, Oct 23, 2011 at 6:09 PM, Zhenqiang Chen
 wrote:
> Summary:
> * Exercise crosstool-ng and summarize the gaps.
>
> Details:
> * Exercise crosstool-ng
>  (1) Sync with lp:~linaro-toolchain-dev/crosstool-ng/linaro.
>  (2) Try to config linux-host-baremental-target an
> mingw32-host-baremental-target.
>  (3) Try to build the toolchain for both embedded toolchain and
> linaro-gcc-4.6-2011.10 with the config.
>     . C compiler for linux and mingw32 hosts and c++ compiler for
> linux host can be built without any change.
>     . C++ compiler for mingw32 host can be built after PCH is disabled.
>     . GDB-cross build fail due to dependence packages.
> * Gaps in crosstool-ng
>  (1) Improve GDB-cross scripts to download and build the dependence
> packages: expat and ncurses. Or put expat and ncurses as
> companion_libraries.
>  (2) To remove dependence, embedded toolchain requires more
> prerequisites like zlib.
>      New config and scripts are required to support the packages.
>  (3) Currently, the embedded toolchain source packages are released
> as a tarball, which includes gcc, gmp, etc. New scripts are required
> to support it.
>  (4) To make sure the toolchain can run with lower version glibc like
> redhat4/5, the embedded toolchain requires lower version native
> gcc4.3.6 to build it.
>      To support it,
>      . Users can build the native gcc manually, or
>      . Enhance the scripts to add one step to build native gcc.
>  (5) All the default package configurations are different from
> embedded toolchain internal build scripts.
>      Since the configurations in embedded toolchain had been tuned
> and tested, we will change the configurations in crosstool-ng if they
> do not match and not configurable.
>      The same rule will apply for linaro toolchain.
>
> Plans:
> * Write scripts to re-pack the embedded toolchain source packages.
> * Add the supports for all prerequisites in crosstool-ng menuconfig.
>
> Thanks!
> -Zhenqiang
>
> ___
> linaro-toolchain mailing list
> linaro-toolchain@lists.linaro.org
> http://lists.linaro.org/mailman/listinfo/linaro-toolchain
>

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain


Re: ARM eglibc: adding ucontext?

2011-10-24 Thread Michael K. Edwards
The calling convention for makecontext() isn't just an issue for standards
lawyers, it's an actual problem on ARM, especially on hard-float systems.
According to the standard, "The application shall ensure that the value of
argc matches the number of arguments of type int passed to func; otherwise,
the behavior is undefined."  If you read this to imply that func() must be a
non-variadic function that takes zero or more int arguments, and call it as
if it took exactly as many int arguments as fit in registers -- stuffing
extras onto the stack if necessary -- then you're OK.  But supporting more
general calling conventions for func() gets hairy, and correctly supporting
both variadic and non-variadic functions on a hard-float system is
impossible.

If you ask me, the right way to handle this is to replace makecontext() with
ucontext_t * makevcontext(ucontext_t *ucp, void (*func)(va_list), va_list
args).  That is, if you're going to buck the standards committee and fix the
interface instead of calling it obselescent and dropping the functionality.
For what it's worth, mcontext_t on ARMv7-A shouldn't have to be as big a
beast as eglibc's implementation for, say, powerpc; you don't have to keep
any state which is clobbered by a function call, so you just need
stack/frame/return addresses plus the callee-save core and VFP registers.

Note that you will need to save/restore FP state in all swapcontext()
calls.  User thread context switch code doesn't have access to the FPEXC
register and thus can't identify non-FP-using contexts and skip the q4-q7
save/restore.  However, the standard is silent on whether you can have
context-specific settings of floating point exception control, rounding
mode, etc.  If QEMU's coroutines need this, you'll need to save/restore
FPSCR as well.

Now it seems to me that you can make the "CPU state" part of the context
structure drastically simpler than the conventional mcontext_t -- all you
really need is the stack pointer.  An inline assembly implementation of the
innards of swapcontext() can explicitly push the FPSCR and return address;
switch stacks; and pop the same explicit state.  The rest can be left up to
the compiler by marking this assembly block as having clobbered all
registers.  (The signal mask save/restore should be done in C outside this,
to prevent signal delivery during this maneuver.)  You might as well make
this an inline function, and then you shouldn't wind up with gratuitous
register churn.

I suppose I might as well turn this into an eglibc patch.  Probably not this
week, though, as I have a lot of work to do before my Linaro Connect
plenary.

Cheers,
- Michael
On Oct 24, 2011 6:46 AM, "Peter Maydell"  wrote:

> At the moment ARM eglibc doesn't support the functions declared
> in ucontext.h: getcontext(), setcontext(), swapcontext() and
> makecontext(). Instead you get implementations which always
> fail and set errno to ENOSYS.
>
> QEMU uses these functions to implement coroutines. Although there
> is a fallback implementation in terms of threads, there are reasons
> why using the fallback is suboptimal:
>  * its performance is worse
>  * it will be less tested, because x86_64 and i386 both implement
> the ucontext functions and so QEMU on those hosts will be using
> different code paths
>  * I'm not aware of a good way at configure time to detect whether
> getcontext() et al will always fail without actually running a
> test binary, which won't work in a cross-compile setup. (If eglibc
> just didn't provide the functions at all this would be much
> simpler...)
>
> We're going to care about performance and reliability of QEMU on
> ARM hosts as we start to support KVM on Cortex-A15, so it would
> be good if we could add ucontext function support to eglibc as
> part of that effort.
>
> Opinions? Have I missed some good reason why there isn't an
> ARM implementation of these functions?
>
> (I'm aware that the ucontext functions have been removed from
> the latest version of the POSIX spec; however AFAIK there's no
> equivalent functionality that replaces them so I think they're
> still worth having implementations of for parity with other
> architectures.)
>
> -- PMM
>
> ___
> linaro-toolchain mailing list
> linaro-toolchain@lists.linaro.org
> http://lists.linaro.org/mailman/listinfo/linaro-toolchain
>
___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain