Re: Notes on mixing D16/D32 code
On Thu, Nov 25, 2010 at 10:15 PM, Michael Hope wrote: > On Fri, Nov 26, 2010 at 12:00 AM, Dave Martin wrote: >> Hi, >> >> On Wed, Nov 24, 2010 at 9:49 PM, Michael Hope >> wrote: >>> It's a bit of a newbie question, but I've been wondering if you can >>> intermix hard float VFPv3-D16 code with VFPv3-D32 code. You can as: >>> >>> According to the ABI: >>> * d0-d15 are used for floating point parameters, no matter if you are >>> D16 or D32 >>> * d0-d15 are not preserved across function calls >>> * d16-d31 must be preserved across function calls >> >> No, I don't think that's correct - see the procedure call standard >> section 5.1.2.1 >> "VFP register usage conventions (VFP v2, v3 and the Advanced SIMD Extension)" >> >> It's not too hard to misread ... my understanding is as follows--- >> >> * d0-d7 (s0-s15; q0-q3) are not callee-saved and are the only regs >> used for parameter and return value exchange in the standard ABI >> variants >> * d8-d15 (s16-s31; q4-q7) are _callee-saved_ >> * d16-d31 (q8-q15) are _not callee-saved_ > > Ah, I got the s* and d* registers mixed up. So if you have a function > which takes doubles, the first eight parameters go in registers. If > the function takes floats, then the first sixteen go in registers. > D8-D15 are preserved across calls, D16+ aren't. yup >> So basically, D32 code just gets to use d16-d31 for extra scratchpad >> bandwidth _in between_ external function call sites. (Of course, >> compiler-generated or hand-written code can relax the rules locally in >> some circumstances, just as for the integer ABI) > > I think the conclusion is the same: you can intermix VFP-D16 and > VFP-D32 code as D16 code doesn't use D16-D31 and D32 code doesn't > expect D16-D31 to be preserved across function calls. Yes--- agreed (actually, I misread your first paragraph and thought you were arguing that they _couldn't_ be intermixed) The ABI was deliberately specified that way to allow that AFAIK--- to VFPv3-D32 is just another "extension" which islands of code can optionally use or not use; just like v5E, v6 SIMD or NEON. Cheers ---Dave ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
GCC Optimization Brain Storming Session
Hi All, As we discussed on Monday, I think it might be helpful to get a number of knowledgeable people together on a call to discuss GCC optimization opportunities. So, I'd like to get some idea of who would like to attend, and we'll try to find a slot we can all make. I'm on vacation next week, so I expect it'll be in two or three week's time. Before we get there, I'd like to have a list of ideas to discuss. Partly so that we don't forget anything, and partly so that people can have a think about them before the day. I'm really looking for bigger picture stuff, rather than individual poor code generation bugs. So here's a few to kick off: * Costs tuning. - GCC 4.6 has a new costs model, but are we making full use of it? - What about optimizing for size? - Do the optimizers take any notice? [1] * Instruction set coverage. - Are there any ARM/Thumb2 instructions that we are not taking advantage of? [2] - Do we test that we use the instructions we do have? [3] * Constant pools - it might be a very handy space optimization to have small functions share one constant pool, but the way the passes work one function at a time makes this hard. (LP:625233) * NEON - There's already a lot of work going on here, and I don't want it to hog all our time, but it might be worth touching on. What else? I'm not the most experienced person with GCC internals, and I'm relatively new to the ARM specific parts of those, so somebody else must be able to come up with something far more exciting! So, please, get brain-storming! Andrew [1] We discovered recently that combine is happy to take two insns and combine them into a pattern that matches a splitter that then explodes into three insns (partly due to being no longer able to generate pseudo-registers). [2] For example, I just wrote a patch to add addw and subw support (not yet submitted). [3] LP:643479 is an example of a case where we don't. ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
[ACTIVITY] 2010-11-26
Hand crafted a simple strchr and comparing it with Libc: https://wiki.linaro.org/WorkingGroups/ToolChain/Benchmarks/InitialStrchr It's interesting it's significantly faster than libc's on A9's, but on A8's it's slower for large sizes. I've not really looked why yet; my implementation is just the absolute simplest thumb-2 version. Did some ltrace profiling to see what typical strchr and strlen sizes were, and got a bit surprised at some of the typical behaviours (Lots of cases where strchr is being used in loops to see if another string contains anyone of a set of characters, a few cases of strchr being called with Null strings, and the corner case in the spec that allows you to call strchr with \0 as the character to search for). Trying some other benchmarks (pybench spends very little time in libc,package builds of simple packages seem to have a more interesting mix of libc use). Sorting out some of the red tape for contributing. Dave ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
[ACTIVITY] weekly status
Hi, * the ARM __sync_* glibc-ports patch was accepted upstream * posted proposal for consolidating sync primitives but stdatomic seems to be the future * used my small gcc testsuite patch to verify __sync_* support of the gcc- linaro * created: https://wiki.linaro.org/WorkingGroups/ToolChain/AtomicMemoryOperations * looked into GOMP support on ARM: - #pragma omp atomic results in proper asm code (dmb, ldrex, strex, dmb) - #pragma omp flush results in a DMB instruction - #pragma omp barrier results to a call to GOMP_barrier (I'm not sure if this is the desired behavior) * started to look into #681138 Regards Ken ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
[ACTIVITY] Weekly status
== This week == * More ARM testing of binutils support for STT_GNU_IFUNC. * Implemented the GLIBC support for STT_GNU_IFUNC. Simple ARM testcases seem to run correctly. * Ran the GLIBC testsuite -- which includes some ifunc coverage -- but haven't analysed the results yet. * Started looking at Thumb for STT_GNU_IFUNC. The problem is that BFD internally represents Thumb symbols with an even value and a special st_type (STT_ARM_TFUNC); this is also the old, pre-EABI external representation. We need something different for STT_GNU_IFUNC. * Tried making BFD represent Thumb symbols as odd-value functions internally. I got it to "work", but I wasn't really happy with the results. * Looked at alternatives, and in the end decided that it would be better to have extra internal-only information in Elf_Internal_Sym. This "works" too, and seems cleaner to me. Sent an RFC upstream: http://sourceware.org/ml/binutils/2010-11/msg00475.html * Started writing some Thumb tests for STT_GNU_IFUNC. * Investigated #618684. Turned out to be something that Bernd had already fixed upstream. Tested a backport. == Next week == * More IFUNC tests (ARM and Thumb, EGLIBC and binutils). Richard ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: GCC Optimization Brain Storming Session
On 11/26/2010 3:11 AM, Andrew Stubbs wrote: > As we discussed on Monday, I think it might be helpful to get a number > of knowledgeable people together on a call to discuss GCC optimization > opportunities. I'm not sure I qualify as knowledgeable, and my schedule is a mess, so please don't try to schedule around me -- but please do let me know when the call will be. For big-picture issues, I'd be interested in looking at inlining, both on its own, and in the context of profile-directed feedback. My guess is that we're not inlining enough in hot spots, and too much in cold spots. (The latter is important, in that I expect that overall system performance is impacted by larger binaries, leading to more cache misses and page faults.) As an extension of that idea, and tying into the idea of optimizing for size, I suspect that we (a) ought be optimizing for size automatically in cold code, and (b) that we're probably not doing a great job of optimizing for size. (For example, and making the whole problem circular, I've seen cases where inlining would definitely reduce code size -- but we don't do it. I suspect that cases where many of the arguments to a relatively small function are constants should be considered for inlining even when optimizing for size; you're often going to simplify away the entire function. Perhaps we need to actually be able to try inlining, and back out if it is unprofitable.) Both of these things require some significant infrastructure work. They aren't ARM-specific back-end changes. But, I suspect that they're important in terms of allowing GCC to take full advantage of ARM CPUs. -- Mark Mitchell CodeSourcery m...@codesourcery.com (650) 331-3385 x713 ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: [ACTIVITY] Weekly status
On Fri, Nov 26, 2010 at 05:57:57PM +, Richard Sandiford wrote: > * More ARM testing of binutils support for STT_GNU_IFUNC. > > * Implemented the GLIBC support for STT_GNU_IFUNC. Simple ARM testcases > seem to run correctly. Coincidentally I was reading about this exact feature at http://www.airs.com/blog/page/4 In the interest of self-education, could you tell me a little bit about what GLIBC support for ifunc entails -- is it basically infrastructure to allow libc functions to be defined as ifuncs, now that the assembler supports it? -- Christian Robottom Reis | [+55] 16 9112 6430 | http://launchpad.net/~kiko Linaro Engineering VP | [ +1] 612 216 4935 | http://async.com.br/~kiko ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Re: GCC Optimization Brain Storming Session
On Fri, Nov 26, 2010 at 10:40:34AM -0800, Mark Mitchell wrote: > As an extension of that idea, and tying into the idea of optimizing for > size, I suspect that we (a) ought be optimizing for size automatically > in cold code, and (b) that we're probably not doing a great job of > optimizing for size. Somewhat OT, and I'm not sure this has been mentioned before, though I guess Mark has seen it: http://embed.cs.utah.edu/embarrassing/ This group basically runs a set of size benchmarks against multiple compilers. I wonder how a) this would look for a survey of ARM compilers and b) if we could get them to run a benchmark for us if we provided them with hardware (and if not, if they'd provide us with the code). Note also that the same group has a tool for generating random C code with volatiles which we could use for some level of validation: http://www.cs.utah.edu/~eeide/emsoft08/ I've inlined below a sample program generated with randprog (you'll want to apt-get install libboost-program-options1.40-dev to build it): -- * This is a RANDOMLY GENERATED PROGRAM. * * Generator: randprog 1.0.0 * Options: (none) * Seed: 3881625688 */ #include uint16_t context = 0; #if defined(__AVR_ARCH__) # include "platform_avr.h" #elif defined (__MSP430__) # include "platform_msp430.h" #else # include "platform_generic.h" #endif #include "random_runtime.h" /* --- GLOBAL VARIABLES --- */ uint8_t g_4 = -4L; /* --- FORWARD DECLARATIONS --- */ int32_t func_1(void); /* --- FUNCTIONS --- */ /* -- */ /* * reads : g_4 * writes: */ int32_t func_1(void) { { int8_t l_2 = 0L; for (l_2 = -1; (l_2 == 0); l_2 -= 0) { uint16_t l_3 = 1L; return l_3; } return g_4; } } /* */ int main(int argc, char *argv[]) { platform_main_begin(); /* Call the first function */ func_1(); crcBytes(g_4); platform_main_end(context); return 0; } -- Christian Robottom Reis | [+55] 16 9112 6430 | http://launchpad.net/~kiko Linaro Engineering VP | [ +1] 612 216 4935 | http://async.com.br/~kiko ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain
Silverbell dchroots
David G. requested a few packages installed on silverbell today (the quad-A9 VE porter machine we host in the datacenter). We got dchroots instead: - Forwarded message from LaMont Jones via RT - Date: Fri, 26 Nov 2010 21:02:22 + Subject: [rt.admin.canonical.com #42662] Simple package installs on silverbell On Fri Nov 26 17:08:28 2010, k...@canonical.com wrote: > Hi there, > > Could we get installed on silverbell: > > build-essential > debhelper > fakeroot > > And could we get deb-src's added to sources.list and do an apt-get > update to allow us to apt-get source certain packages? This is for > simple compilation benchmarks. Thanks! Dchroot environments have been created on silverbell for both maverick and natty. Within the chroot, you can sudo apt-get install to install packages. apt-get update/dist-upgrade and installs that cause package removal will require a GSA to do them. To build for maverick: dchroot -c maverick (and then build however you want...) lamont - End forwarded message - -- Christian Robottom Reis | [+55] 16 9112 6430 | http://launchpad.net/~kiko Linaro Engineering VP | [ +1] 612 216 4935 | http://async.com.br/~kiko ___ linaro-toolchain mailing list linaro-toolchain@lists.linaro.org http://lists.linaro.org/mailman/listinfo/linaro-toolchain