Re: Notes on mixing D16/D32 code

2010-11-26 Thread Dave Martin
On Thu, Nov 25, 2010 at 10:15 PM, Michael Hope  wrote:
> On Fri, Nov 26, 2010 at 12:00 AM, Dave Martin  wrote:
>> Hi,
>>
>> On Wed, Nov 24, 2010 at 9:49 PM, Michael Hope  
>> wrote:
>>> It's a bit of a newbie question, but I've been wondering if you can
>>> intermix hard float VFPv3-D16 code with VFPv3-D32 code.  You can as:
>>>
>>> According to the ABI:
>>>  * d0-d15 are used for floating point parameters, no matter if you are
>>> D16 or D32
>>>  * d0-d15 are not preserved across function calls
>>>  * d16-d31 must be preserved across function calls
>>
>> No, I don't think that's correct - see the procedure call standard
>> section 5.1.2.1
>> "VFP register usage conventions (VFP v2, v3 and the Advanced SIMD Extension)"
>>
>> It's not too hard to misread ... my understanding is as follows---
>>
>>  * d0-d7 (s0-s15; q0-q3) are not callee-saved and are the only regs
>> used for parameter and return value exchange in the standard ABI
>> variants
>>  * d8-d15 (s16-s31; q4-q7) are _callee-saved_
>>  * d16-d31 (q8-q15) are _not callee-saved_
>
> Ah, I got the s* and d* registers mixed up.  So if you have a function
> which takes doubles, the first eight parameters go in registers.  If
> the function takes floats, then the first sixteen go in registers.
> D8-D15 are preserved across calls, D16+ aren't.

yup

>> So basically, D32 code just gets to use d16-d31 for extra scratchpad
>> bandwidth _in between_ external function call sites.  (Of course,
>> compiler-generated or hand-written code can relax the rules locally in
>> some circumstances, just as for the integer ABI)
>
> I think the conclusion is the same:  you can intermix VFP-D16 and
> VFP-D32 code as D16 code doesn't use D16-D31 and D32 code doesn't
> expect D16-D31 to be preserved across function calls.

Yes--- agreed (actually, I misread your first paragraph and thought
you were arguing that they _couldn't_ be intermixed)

The ABI was deliberately specified that way to allow that AFAIK--- to
VFPv3-D32 is just another "extension" which islands of code can
optionally use or not use; just like v5E, v6 SIMD or NEON.

Cheers
---Dave

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain


GCC Optimization Brain Storming Session

2010-11-26 Thread Andrew Stubbs

Hi All,

As we discussed on Monday, I think it might be helpful to get a number 
of knowledgeable people together on a call to discuss GCC optimization 
opportunities.


So, I'd like to get some idea of who would like to attend, and we'll try 
to find a slot we can all make. I'm on vacation next week, so I expect 
it'll be in two or three week's time.


Before we get there, I'd like to have a list of ideas to discuss. Partly 
so that we don't forget anything, and partly so that people can have a 
think about them before the day.


I'm really looking for bigger picture stuff, rather than individual poor 
code generation bugs.


So here's a few to kick off:

 * Costs tuning.
   - GCC 4.6 has a new costs model, but are we making full use of it?
   - What about optimizing for size?
   - Do the optimizers take any notice? [1]

 * Instruction set coverage.
   - Are there any ARM/Thumb2 instructions that we are not taking 
advantage of? [2]

   - Do we test that we use the instructions we do have? [3]

 * Constant pools
   - it might be a very handy space optimization to have small 
functions share one constant pool, but the way the passes work one 
function at a time makes this hard. (LP:625233)


 * NEON
   - There's already a lot of work going on here, and I don't want it 
to hog all our time, but it might be worth touching on.


What else? I'm not the most experienced person with GCC internals, and 
I'm relatively new to the ARM specific parts of those, so somebody else 
must be able to come up with something far more exciting!


So, please, get brain-storming!

Andrew


[1] We discovered recently that combine is happy to take two insns and 
combine them into a pattern that matches a splitter that then explodes 
into three insns (partly due to being no longer able to generate 
pseudo-registers).


[2] For example, I just wrote a patch to add addw and subw support (not 
yet submitted).


[3] LP:643479 is an example of a case where we don't.

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain


[ACTIVITY] 2010-11-26

2010-11-26 Thread David Gilbert
Hand crafted a simple strchr and comparing it with Libc:
https://wiki.linaro.org/WorkingGroups/ToolChain/Benchmarks/InitialStrchr
  It's interesting it's significantly faster than libc's on A9's, but on
A8's it's slower for large sizes. I've not really looked why yet; my
implementation is just the absolute simplest thumb-2 version.

Did some ltrace profiling to see what typical strchr and strlen sizes were,
and got a bit surprised at some of the typical behaviours
(Lots of cases where strchr is being used in loops to see if another string
contains anyone of a set of characters, a few cases
of strchr being called with Null strings, and the corner case in the spec
that allows you to call strchr with \0 as the character
to search for).


Trying some other benchmarks (pybench spends very little time in
libc,package builds of simple packages seem to have a more interesting
mix of libc use).

Sorting out some of the red tape for contributing.

Dave
___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain


[ACTIVITY] weekly status

2010-11-26 Thread Ken Werner
Hi,

 * the ARM __sync_* glibc-ports patch was accepted upstream
 * posted proposal for consolidating sync primitives but stdatomic seems to be 
the future
 * used my small gcc testsuite patch to verify __sync_* support of the gcc-
linaro
 * created: 
https://wiki.linaro.org/WorkingGroups/ToolChain/AtomicMemoryOperations
 * looked into GOMP support on ARM:
- #pragma omp atomic results in proper asm code (dmb, ldrex, strex, dmb)
- #pragma omp flush results in a DMB instruction
- #pragma omp barrier results to a call to GOMP_barrier (I'm not sure if 
this is the desired behavior)
 * started to look into #681138

Regards
Ken

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain


[ACTIVITY] Weekly status

2010-11-26 Thread Richard Sandiford
== This week ==

* More ARM testing of binutils support for STT_GNU_IFUNC.

* Implemented the GLIBC support for STT_GNU_IFUNC.  Simple ARM testcases
  seem to run correctly.

* Ran the GLIBC testsuite -- which includes some ifunc coverage --
  but haven't analysed the results yet.

* Started looking at Thumb for STT_GNU_IFUNC.  The problem is that
  BFD internally represents Thumb symbols with an even value and
  a special st_type (STT_ARM_TFUNC); this is also the old, pre-EABI
  external representation.  We need something different for STT_GNU_IFUNC.

* Tried making BFD represent Thumb symbols as odd-value functions
  internally.  I got it to "work", but I wasn't really happy with
  the results.

* Looked at alternatives, and in the end decided that it would be
  better to have extra internal-only information in Elf_Internal_Sym.
  This "works" too, and seems cleaner to me.  Sent an RFC upstream:

http://sourceware.org/ml/binutils/2010-11/msg00475.html

* Started writing some Thumb tests for STT_GNU_IFUNC.

* Investigated #618684.  Turned out to be something that Bernd
  had already fixed upstream.  Tested a backport.

== Next week ==

* More IFUNC tests (ARM and Thumb, EGLIBC and binutils).

Richard

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain


Re: GCC Optimization Brain Storming Session

2010-11-26 Thread Mark Mitchell
On 11/26/2010 3:11 AM, Andrew Stubbs wrote:

> As we discussed on Monday, I think it might be helpful to get a number
> of knowledgeable people together on a call to discuss GCC optimization
> opportunities.

I'm not sure I qualify as knowledgeable, and my schedule is a mess, so
please don't try to schedule around me -- but please do let me know when
the call will be.

For big-picture issues, I'd be interested in looking at inlining, both
on its own, and in the context of profile-directed feedback.  My guess
is that we're not inlining enough in hot spots, and too much in cold
spots.  (The latter is important, in that I expect that overall system
performance is impacted by larger binaries, leading to more cache misses
and page faults.)

As an extension of that idea, and tying into the idea of optimizing for
size, I suspect that we (a) ought be optimizing for size automatically
in cold code, and (b) that we're probably not doing a great job of
optimizing for size.  (For example, and making the whole problem
circular, I've seen cases where inlining would definitely reduce code
size -- but we don't do it.  I suspect that cases where many of the
arguments to a relatively small function are constants should be
considered for inlining even when optimizing for size; you're often
going to simplify away the entire function.  Perhaps we need to actually
be able to try inlining, and back out if it is unprofitable.)

Both of these things require some significant infrastructure work.  They
aren't ARM-specific back-end changes.  But, I suspect that they're
important in terms of allowing GCC to take full advantage of ARM CPUs.

-- 
Mark Mitchell
CodeSourcery
m...@codesourcery.com
(650) 331-3385 x713

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain


Re: [ACTIVITY] Weekly status

2010-11-26 Thread Christian Robottom Reis
On Fri, Nov 26, 2010 at 05:57:57PM +, Richard Sandiford wrote:
> * More ARM testing of binutils support for STT_GNU_IFUNC.
> 
> * Implemented the GLIBC support for STT_GNU_IFUNC.  Simple ARM testcases
>   seem to run correctly.

Coincidentally I was reading about this exact feature at

http://www.airs.com/blog/page/4

In the interest of self-education, could you tell me a little bit about
what GLIBC support for ifunc entails -- is it basically infrastructure
to allow libc functions to be defined as ifuncs, now that the assembler
supports it?
-- 
Christian Robottom Reis   | [+55] 16 9112 6430 | http://launchpad.net/~kiko
Linaro Engineering VP | [ +1] 612 216 4935 | http://async.com.br/~kiko

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain


Re: GCC Optimization Brain Storming Session

2010-11-26 Thread Christian Robottom Reis
On Fri, Nov 26, 2010 at 10:40:34AM -0800, Mark Mitchell wrote:
> As an extension of that idea, and tying into the idea of optimizing for
> size, I suspect that we (a) ought be optimizing for size automatically
> in cold code, and (b) that we're probably not doing a great job of
> optimizing for size.

Somewhat OT, and I'm not sure this has been mentioned before, though I
guess Mark has seen it:

http://embed.cs.utah.edu/embarrassing/

This group basically runs a set of size benchmarks against multiple
compilers. I wonder how a) this would look for a survey of ARM compilers
and b) if we could get them to run a benchmark for us if we provided
them with hardware (and if not, if they'd provide us with the code).

Note also that the same group has a tool for generating random C code
with volatiles which we could use for some level of validation:

http://www.cs.utah.edu/~eeide/emsoft08/

I've inlined below a sample program generated with randprog (you'll want
to apt-get install libboost-program-options1.40-dev to build it):

--
 * This is a RANDOMLY GENERATED PROGRAM.
 *
 * Generator: randprog 1.0.0
 * Options:   (none)
 * Seed:  3881625688
 */

#include 
uint16_t context = 0;

#if defined(__AVR_ARCH__)
#  include "platform_avr.h"
#elif defined (__MSP430__)
#  include "platform_msp430.h"
#else
#  include "platform_generic.h"
#endif

#include "random_runtime.h"

/* --- GLOBAL VARIABLES --- */
uint8_t g_4 = -4L;


/* --- FORWARD DECLARATIONS --- */
int32_t func_1(void);


/* --- FUNCTIONS --- */
/* -- */
/*
 * reads : g_4
 * writes:
 */
int32_t func_1(void)
{
{
int8_t l_2 = 0L;
for (l_2 = -1; (l_2 == 0); l_2 -= 0)
{
uint16_t l_3 = 1L;
return l_3;
}
return g_4;
}
}




/*  */
int main(int argc, char *argv[])
{
platform_main_begin();
/* Call the first function */
func_1();
crcBytes(g_4);
platform_main_end(context);
return 0;
}
-- 
Christian Robottom Reis   | [+55] 16 9112 6430 | http://launchpad.net/~kiko
Linaro Engineering VP | [ +1] 612 216 4935 | http://async.com.br/~kiko

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain


Silverbell dchroots

2010-11-26 Thread Christian Robottom Reis
David G. requested a few packages installed on silverbell today (the
quad-A9 VE porter machine we host in the datacenter). We got dchroots
instead:

- Forwarded message from LaMont Jones via RT  
-

Date: Fri, 26 Nov 2010 21:02:22 +
Subject: [rt.admin.canonical.com #42662] Simple package installs on silverbell

On Fri Nov 26 17:08:28 2010, k...@canonical.com wrote:
> Hi there,
> 
> Could we get installed on silverbell:
> 
> build-essential
> debhelper
> fakeroot
> 
> And could we get deb-src's added to sources.list and do an apt-get
> update to allow us to apt-get source certain packages? This is for
> simple compilation benchmarks. Thanks!

Dchroot environments have been created on silverbell for both maverick
and natty. Within the chroot, you can sudo apt-get install to install
packages. apt-get update/dist-upgrade and installs that cause package
removal will require a GSA to do them.

To build for maverick: dchroot -c maverick (and then build however you want...)

lamont

- End forwarded message -
-- 
Christian Robottom Reis   | [+55] 16 9112 6430 | http://launchpad.net/~kiko
Linaro Engineering VP | [ +1] 612 216 4935 | http://async.com.br/~kiko

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain