from:"David Gilbert"

A9 Neon confusion

2010-11-11 Thread David Gilbert

Hi,
   I've been looking at some basic libc routine optimisation and have a
curious problem with memset and wondered if
anyone can offer some insights.

Some graphs and links to code are on
https://wiki.linaro.org/WorkingGroups/ToolChain/Benchmarks/InitialMemset

I've written a simple memset in both a with and without Neon variety and
tested them on a Beagle(C4) and a Panda
board and I'm finding that the Neon version is faster than the non-neon
version (a bit) on the Beagle but a LOT slower on the
Panda - and I'd like to understand why it's slower than the non-neon version
- I'm guessing it's some form of cache interaction.

The graphs on that page are all generated by timing a loop that repeatedly
memsets the same area of memory; the X axis
is the size of the memset.   Prior to the test loop the area is read into
cache (I came to the conclusion the A8 didn't write
allocate?).  There are two variants of the graphs - absolute in MB/s on Y,
and a relative set (below the absolute) that
are relative to the performance of the libc routines.  (The ones below those
pairs are just older versions).

 if you look at the top left graph on that page you can see that on the
Beagle (left) my Neon routine beats my Thumb routine
a bit (both beating libc).  If you look on the top right you see the Panda
performance with my Thumb code being the fastest and generally
following libc, but the Neon code (red line) topping out at about 2.5GB/s
which is substantially below the peak of the libc and ARM code.

The core loop of the Neon code (see the bzr link for the full thing) is:

4:
  subs  r4,r4,#32
  vst2.8  {d0,d1,d2,d3}, [ r3:256 ]!
  bne   4b

while the core of the non-Neon version is:

4:
  subs  r4,r4,#16
  stmia r3!,{r1,r5,r6,r7}
  bne   4b


I've also tried vst1 and vstm in the neon loop and it still won't match the
non-Neon version.

All suggestions welcome, plus I'd appreciate if anyone can suggest which
particular limit it's hitting - does
anyone have figures for the theoretical bus and L1 and L2 write bandwidths
for a Panda (and Beagle) ?

Thanks in advance,

Dave
___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

[ACTIVITY] 2010-11-19

2010-11-19 Thread David Gilbert

Short week.
Finally got external hard drive for my beagle - makes it sanely possible to
natively build things.
Got eglibc cross built (Thanks to Wookey for pointing me in the right
direction with the magic incantation of dpkg-buildpackage -aarmel
--target=binary) and
easily rebuilding .   I have a version with the neon version of my memset
built into it - it doesn't seem to make a noticeable difference to my
ghostscript benchmark
though.
Panda's aren't likely to turn up until mid December; arranging borrowing
an A9 is turning out to be difficult, but it looks like we should be able to
get access to
the one in the London datacentre - although it has a disc problem at the
moment.

I did manage to get a colleague to try my tests on his own Toshiba AC-10
(Tegra-2 - no Neon); the
graphs had approximately the same shape as my previous Panda tests.  Memchr
looked pretty
good on there.

Also trying to look at the sign off I need for various libc access.

Dave
___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Of instruction timings

2010-11-22 Thread David Gilbert

Hi Richard,
  As per the discussion at this mornings call; I've reread the TRM and I
agree with you about the LSLS being the same speed as the TST. (1 cycle)

However as we agreed, the uxtb does look like 2 cycles v the AND 1 cycle.

On the space v perf theme, one thing that would be interesting to know is
whether there are any icache/issue stage limitations;
i.e. if I have a stream of 32-bit Thumb-2 instructions that are all listed
as 1 cycle and are all in i-cache, can they be fetched
and issued fast enough, or is there a performance advantage to short
instructions?

Dave
___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

[ACTIVITY] 2010-11-26

2010-11-26 Thread David Gilbert

Hand crafted a simple strchr and comparing it with Libc:
https://wiki.linaro.org/WorkingGroups/ToolChain/Benchmarks/InitialStrchr
  It's interesting it's significantly faster than libc's on A9's, but on
A8's it's slower for large sizes. I've not really looked why yet; my
implementation is just the absolute simplest thumb-2 version.

Did some ltrace profiling to see what typical strchr and strlen sizes were,
and got a bit surprised at some of the typical behaviours
(Lots of cases where strchr is being used in loops to see if another string
contains anyone of a set of characters, a few cases
of strchr being called with Null strings, and the corner case in the spec
that allows you to call strchr with \0 as the character
to search for).


Trying some other benchmarks (pybench spends very little time in
libc,package builds of simple packages seem to have a more interesting
mix of libc use).

Sorting out some of the red tape for contributing.

Dave
___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Re: GCC Optimization Brain Storming Session

2010-11-29 Thread David Gilbert

On 29 November 2010 00:18, Michael Hope  wrote:

>
> To add to the mix:
>
> Some ideas that are logged as blueprints:
>  Using ARMv5 saturated instructions
> (https://blueprints.launchpad.net/gcc-linaro/+spec/armv5-saturated-ops)
>  Using ARMv6 SIMD instructions
> (https://blueprints.launchpad.net/gcc-linaro/+spec/armv6-simd)
>

Those are quite nice instructions; certainly they seem useful for string ops
of various types
if misued creatively.


>  Using ARMv7 unaligned accesses
> (https://blueprints.launchpad.net/gcc-linaro/+spec/unaligned-accesses)
>  Changing the built-in memcpy to use unaligned
> (https://blueprints.launchpad.net/gcc-linaro/+spec/unaligned-memcpy)
>

The interesting challenge here is figuring out how expensive unaligned's are
and if the cost trade offs are the same on different chips.

The following areas have been suggested.  I don't know if they're still
valid:

>
> Register allocator: The register allocator is designed around the
> needs of architectures with a low register count and restrictive
> register classes. The ARM architecture has many general purpose
> registers. Different assumptions may give better code.
>
> Conditional instructions: The ARM and, to a lesser extent, Thumb-2
> ISAs allow conditional execution of instructions. This can be used in
> many situations to eliminate an expensive branch. The middle end
> expands and transforms branches. The ARM backend tries to recombine
> the RTL back into conditional instructions, but often can't due to the
> middle end transforms.
>

GCC is quite creative in avoiding branches by doing lots of masking and
logic;
it'll be interesting how much this has to gain.

Dave
___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

[ACTIVITY] 2010-12-03

2010-12-03 Thread David Gilbert

  * Benchmarking of simple package builds with various string routine
versions; not finding enough difference in the noise to make any large
conclusions
  * Looking at the string routine behaviour with perf to see where the time
is going
 - getting hit by the Linaro kernels on silverbell missing Perf
enablement in the config
 - Useful amount of time does seem to be spent outside the main 'fast
aligned' chunks of code
 - pushing/popping registers does seem to be pretty expensive
  * Started looking at libffi and hard float
 - Started writing a spec
https://wiki.linaro.org/WorkingGroups/ToolChain/Specs/LibFFI-variadic
 - It's going to need an API change to libffi, although the change
shouldn't break any existing code on existing platforms where they work.
  * Helping with the image testing

Dave
___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Hard float chroot

2010-12-08 Thread David Gilbert

Hi,
  As mentioned on the standup, I just got an armhf chroot going, thanks to
markos for pointing me at using multistrap
I put the following in a armhfmultistrap.conf and did

multistrap -f armhfmultistrap.conf

Once that's done, chroot in and then do

dpkg --configure -a

it's pretty sparse in there, but it's enough to get going.

Dave

==
[General]
arch=armhf
directory=/discs/more/armhf
cleanup=true
noauth=true
unpack=true
explicitsuite=false

aptsources=unstable unreleased
bootstrap=unstable unreleased

[unstable]
packages=
source=http://ftp.de.debian.org/debian-ports/
keyring=debian-archive-keyring
suite=unstable
omitdebsrc=true

[unreleased]
packages=
source=http://ftp.de.debian.org/debian-ports/
keyring=debian-archive-keyring
suite=unreleased
omitdebsrc=true
___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Silverbell

2010-12-09 Thread David Gilbert

Hi,
  Those of you use silverbell may be glad to know it's back up.
Be a little careful, if you shovel large amounts of stuff over it's network
the network tends to disappear.
(Not sure if this is hardware or driver)

Dave
___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

[ACTIVITY] 2010-12-09

2010-12-09 Thread David Gilbert

Mostly more working with libffi; swapping some ideas back and forwards with
Marcus Shawcroft and it looks like we have
a good way forward.

Got an armhf chroot going, libffi built.
Got a testcase failing as expected.
Trying to look at other processors ABIs to understand why varargs works for
anyone else.

Cut through one layer of red tape; can now do the next level of comparison
in the string routine work.

Started looking at SPEC; hit problems with network stability on VExpress
(turns out to be bug 673820)

long long weekend; short weeks=2;

Back in on Tuesday.

Dave
___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Profile guided and string routines?

2010-12-16 Thread David Gilbert

Does anyone have any experience of what can be profiled in the profiled
guided optimisations?

One of the problems with some of the string routines is that you can write
pretty neat fast routines that
work well for long strings - but most of the calls actually pass short
strings and the overhead of the
fast routine means that for most cases you are slower than you would have
been with a simple routine.

If Profile Guiding could spot that a particular callsite to say strlen() was
often associated with strings
of at least 'n' characters we could call a different implementation.

Dave
___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

[ACTIVITY] 2010-12-17

2010-12-17 Thread David Gilbert

Got SPEC2006 building on Silverbell (VExpress) and Canis1 (Orion).  There
are still some issues;
The builds are still going (6 hours so far on a 1GHz A9 for a build and
'test' case), and the Silverbell one has hit an ICE on one of the tests that
looks like 635409,
and also looks like it needs some help getting Perl to work.   The build on
Canis has only just started,
but hasn't got Fortran installed.
(The SPEC2006 tools build also failed in the Perl testsuite on sprintf.t and
sprintf2.t which seem to test integer
overflow cases in sprintf % fields)

Added a few of the kernel string/memory routines and bionic routines into my
string/memory graphs and
also ran the tests on the Orion board (similar to other A9 performance - no
surprise).

Wrote up a draft of an email to libffi-dev describing the varargs state; and
as I was doing it realised that
one of the ways didn't quite work and was more messy.
Using rdepends to find all packages using ffi, need to figure out if any
actually care about varargs.


Dave
___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

[ACTIVITY] 2010-12-23

2010-12-23 Thread David Gilbert

Continued looking at SPEC 2006.
The two ICEs I mentioned last week are gone on the Natty version of the
compiler, however the 4 programs that run and give the wrong
results still happen with the Natty version and the latest version from bzr.
   The 4 failures are:
  h264ref - still fails on bzr 99447 with -O2 or -O0
  sphinx3 - still fails on bzr 99447 with -O2 or -O0
  gromacs - still fails on bzr 99447 with -O2 but works with -O1; I've
followed this through and detailed it in bug 693502; it looks to me like
 a post-increment gone wrong (it's split so it's not
actually a post increment and the original rather than post inc'd value gets
used)
  zeusmp - this fails to load the binary; it's got a >1GB bss section.
Interestingly it gets further on my beagle with less memory but a bit of
swap,
even though I think it's not really using all of the BSS
in the config I'm using.

   I'm hoping to leave a 'ref' run going over the new year.

  The canis1 Orion board I was also running Spec on last weekend died during
the run and hasn't come back.

perf
  We now have silverberry using the -proposed kernel which has the fixed
PERF_EVENT config, and perf seems to work fine.

libffi
  I've started building the page
https://wiki.linaro.org/WorkingGroups/ToolChain/FFIusers  listing things
that use FFI; (generated by a bit of apt wrangling).
  There are basically 3 sets:
 a) Apps that just use ffi for something specific
 b) Languages that then let the users of those languages have varying
degrees of freedom in themselves
 c) Haskell - While some of the packages are actually probably ffi
users, I think a lot of these are false dependencies; almost every haskell
user seems
to gain a dependency on libffi directly.

I'm back on the 4th January.

Dave
___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

[ACTIVITY] 2011-01-07

2011-01-07 Thread David Gilbert

Got h264ref working in SPEC; this was another signed-char issue (very
difficult one to find - it didn't crash, it just came out with subtly
the wrong result);
compiling that and sphinx3 with -fsigned-char and they seem happy.

With Richard's fix for gromacs that leaves just the zeusmp binary
that's too large to run on silverberry; it seems to startup on canis1
(that has more RAM).
So that should be a full set; I just need to get the fortran stuff
going on canis1.

Kicked off discussion on libffi-discuss about variadic calls; people
seem OK with the idea of adding it (although unsure exactly how many
things really use it
- even though there are examples in Python documentation).  Also
kicked off some of the internal paperwork to contribute code for it.

Dave

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

[ACTIVITY] 2011-01-14

2011-01-14 Thread David Gilbert

Got a complete run of SPEC Train on Orion board - all working.
Kicked off a SPEC ref run - canis1 died.

Gathered a full set of 'perf record's for all of SPEC on silverbell;
and had a quick look through them;
there aren't too many surprises; a few things that might be worth a
look at though.
(Not as much using libc functions as I hoped).
There are some odd bits - chunks of samples landing apparently outside libraries
that aren't obvious what's going on.

Sent tentative patch for Thumb perf annotate issue (bug 677547) to
lkml for comments.

Started on libffi variadic fixing.

Caught the qemu pbx-a9 testing from PM; got qemu built and getting a
handful of lines of output
both from a kernel from arm.com's site and a linaro-2.6.37 that I built for it.

Dave

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

[ACTIVITY] 2011-01-21

2011-01-21 Thread David Gilbert

=A9 Qemu=

I've spent most of the week looking at QEmu emulation of SMP A9.  The
model (of a realview-pbx-a9) doesn't have any working block IO;
I spent some time looking at trying to get SD working and got part
way, but I fell back to using NFS root.

It seems to work OK for basic CPU emulation, and SMP 'works' in the
sense that the guest sees multiple CPUs;
however QEmu is restricted to only using one host CPU core for
multiple guest CPUs, so it's of limited
help in debugging SMP code.
Video doesn't seem to work either.

To get to that point it does need a bunch of patches to QEmu, most of
which Peter Maydell already knows of;
 I've put some notes here :
https://wiki.linaro.org/Internal/People/DaveGilbert/QEMUA9SMP

Note that the realview-pbx doesn't currently have a Linaro hardware
pack; I used a kernel from ARMs website
and a 2.6.37 I built myself.

=SPEC=

SPEC ref got quite a far way through on Canis (with half-duplex
ether), however 'lbm' failed when running
in ref mode (while having worked in test and train) giving a different
output;  it takes quite a long time to fail.

= Perf =

I sent an updated version of my patch for perf's thumb annotation upstream.

and apparently we have a Panda on the way.

Dave

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Re: Neon Support in kernel

2011-01-26 Thread David Gilbert

On 26 January 2011 12:12, Dave Martin  wrote:
> Hi Vijay,
>
> On Sat, Jan 22, 2011 at 9:59 AM, Vijay Kilari  wrote:
>> Hello Dave,
>>
>>   Thanks for this info.
>>
>> I have few more queries after looking at the results of memset on A9 & A8.
>> I agree that externel bus speed matters in comparision across platforms.
>>
>> 1) Why memset is performance is good on A8 than A9?. any justification?
>
> I've CC'd the linaro-toolchain list who have been working on this
> topic and may be able to provide you with more information.

Unfortunately we don't know why Neon was a bad idea for memset etc
on A9, it's just the tests show it being worse and the advice we get
says to avoid it - we've just not got an explanation.

The test code is trivially simple for the cases I tried.

Dave

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

[ACTIVITY] 2011-01-28

2011-01-28 Thread David Gilbert

SPEC
  Tried to track down what was going on with lbm; it doesn't seem to
be repeatable on canis1; I'd previously seen it fail at O1 and work at
O0 and tried to chop down the flags between the two; but after adding
all the flags back in on top of -O0 it still worked and then I tried
-O1 again and it worked.   Going to try on another machine, but it
might be uninitialised data somewhere.

Panda
  Our panda arrived; it's now happily nestling near our Beagles and
running the 0126 headless snapshot (with 0127 hwpack).  It seems fine
except
for rather slow USB and SD IO.  Tip: Panda's do absolutely nothing (no
LEDs, no serial console activity) unless you put an SD card
in with the firmware on.

Libffi
  Wrote the changes for armhf.  Tested on arm, armhf, i386, ppc and
s390x - all happy.  (Not too unsuspectingly variadic calls just work
on everything other than armhf without the api change)
  Mailed Python CType list asking how much of a pain the API change
will be and any hints on what might be affected.
  Awaiting sign off for submission of code.

Optimised library routines
  Looked at benchmarking 'git'; I'd seen previous discussions where it
had been pointed out that it spends a lot of time in library routines;
and indeed it does spend useful
amounts in memchr, memcpy and friends on a simple  git diff v2.6.36
v.2.6.37 > /dev/null of the current kernel tree produces a useful
~25second run.
  One interesting observation is that the variation in the times
reported by 'time' - i.e. user, system and real, the variation in
user+system is much less than either user or
system individually and is quite stable (within ~0.7% over 10 runs).
  I've just tried preloading my memchr routine in and it does get a
consistent 1-1.2% improvement which does look above the nice.
  Also asked on libc-help list for suggestions as to other benchmarks
people actually trust to reflect useful performance increases in core
routines as opposed to totally
artificial ones.

Dave

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

IT block semantic question

2011-02-01 Thread David Gilbert

Hi,
  What do people understand to be the expected semantics of IT blocks
in the cases below, of which there has been some confusion
in relation to a recent Qt issue.

  The code in question had a sequence something like:


  comparison
  IT... EQ
  blahEQ
  TEQ
  BEQ

The important bits here are that we have an IT EQ block and two special cases:

  1) There is a TEQ in the IT block - are all comparisons in the block
allowed and do their effects immediately take
effect?  As far as I can tell this is allowed and any flag changes are
used straight away;

  2) There is a BEQ at the end of the IT block, as far as I can tell,
as long as the destination of the BEQ is close it shouldn't
make any difference if the BEQ is included in the IT block or not.

Does that match everyone elses understanding?

Dave

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Re: IT block semantic question

2011-02-02 Thread David Gilbert

On 2 February 2011 10:47, Dave Martin  wrote:
> On Tue, Feb 1, 2011 at 12:33 PM, David Gilbert  
> wrote:
>> Hi,
>>  What do people understand to be the expected semantics of IT blocks
>> in the cases below, of which there has been some confusion
>> in relation to a recent Qt issue.
>>
>>  The code in question had a sequence something like:
>>
>>
>>  comparison
>>  IT... EQ
>>  blahEQ
>>  TEQ
>>  BEQ
>>
>> The important bits here are that we have an IT EQ block and two special 
>> cases:
>>
>>  1) There is a TEQ in the IT block - are all comparisons in the block
>> allowed and do their effects immediately take
>> effect?  As far as I can tell this is allowed and any flag changes are
>> used straight away;
>
> Yes; yes; and: you're right.  This was a specific intention, since
> there was always a common idiom on ARM of sequences like this:
>
> CMP r0, #1
> CMPEQ r1, #2
> CMPEQ r2, #3
> BEQ ...
>
> with the effect of "if(r0==1 && r1==2 && r2==3) ..."
>
>>
>>  2) There is a BEQ at the end of the IT block, as far as I can tell,
>> as long as the destination of the BEQ is close it shouldn't
>> make any difference if the BEQ is included in the IT block or not.
>
> Again, I believe you're right.  The assembler will generate different
> code, because the explicit conditional branch encodings are not
> allowed in IT blocks.  But the assembler takes care of this for you:
>
>  :
>   0:   d001            beq.n   6 
>
> versus
>
>   2:   bf08            it      eq
>   4:   e7ff            beq.n   6 
>
> 0006 :
>
> Both snippets are equivalent, though as you say, with IT you can
> insert more code between the branch and its destination before the
> assembler will barf with a fixup overflow, because the unconditional
> branch encoding (e000..e7fff) has more bits to express the branch
> offset.

Thanks for the confirmation Dave.

Dave

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Re: do we consider extravagant memory usage a gcc bug?

2011-02-02 Thread David Gilbert

On 2 February 2011 12:28, Peter Maydell  wrote:
> On 2 February 2011 11:54, Peter Maydell  wrote:
>> ie gcc wants (at least) 100M of RAM trying to compile a 190K sourcefile.
>> (and probably more overall since the board has 500MB RAM total and
>> it hit the out-of-memory condition).
>
> On a rerun which hit the kernel OOM-killer the kernel said:
> Killed process 5362 (cc1) vsz:480472kB, anon-rss:469676kB, file-rss:88kB
>
> so gcc's claim that it only wanted 100MB is underreading rather.

480MB does appear excessive; to be a little fair to gcc that file does
look like it's trying to
build itself as a vast inline'd set of switch statements so it will be
stressing the compiler.

Is this 480MB much more than on x86 or on older versions of the compiler?

Dave
(resend remembering to hit all)

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

[ACTIVITY] 2011-02-04

2011-02-04 Thread David Gilbert

== String routines ==

  * After some discussions about IT semantics managed to shave a
couple of instructions out of a couple of routines
  * Got around to trying a suggestion that was made some months ago,
that LDM is faster than LDRD on A9's; and indeed
it does seem to be in some cases; those cases seem pretty hard to
define though - it's no slower than LDRD, so it seems
best to avoid LDRD.
  * Digging around eglibc's build/configure system to see how to add
assembler routines to only get used on certain build
conditions (i.e. v7 & up)

== SPEC ==

  * Compiled lbm -O2 and ran it on our local panda and on Michael's
ursa1 - it seems happy (with a drop of swap); so I'd say that
confirms the issues I previously had were local to something on canis.
 That's a bit of a pain since it's the only machine with enough
RAM to run the rest of the suite.

== Other ==

  * Tested a headless Alpha-2 install on our Beagle C4 - mostly worked
  * Tested qemu-linaro release on the realview-pbx kernel/nfs setup I had
  * A simple smoke test for pldw on qemu
  * Tripped over ltrace not working while trying to profile git's use
of memcpy and memcmp; it does some _very_ odd things;
it's predominant size of memcpy seems to be 1 byte.

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

[ACTIVITY] 2011-02-11

2011-02-11 Thread David Gilbert

== String routines ==
  * Copied an improvement I'd previously made to memchr (removing a
branch using a big IT block) to strlen
  * Modified benchmark setup to build everything as a library to
fairly give everything a PLT overhead.
  * Pushed optimised memchr and strlen and simple strchr into
cortex-strings bzr repo
  * Patched eglibc to use memchr and strchr code - although currently
fighting to get appropriate .changes file

== ffi ==
  * Kicked off TSC request for license permissions

== bugs ==
  * Built and recreated the qt4-x11 bug, produced all the dumps and
boiled it down to a few lines of suspicious RTL for Richard.

** Away next week.

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

[ACTIVITY] 2011-02-25

2011-02-25 Thread David Gilbert

== ffi ==
  * Sent variadic patch for libffi to libffi-discuss
  * Worked through some suggestions from Chung-Lin, need to do some rework

== string routines ==
  * memchr & strchr patch sent for inclusion in ubuntu packages
  * tried sqlite's benchmarks - they don't spend too much time in the
C library; although
a few % in memcpy, and ~1% in memset (also seem to have found an
sqlite test case failure on
ARM and filed as bug 725052)

== porting jam ==
  * There wasn't much traffic on #linaro during this related to the jam
  * I closed bug 635850 (fastdep FTBFS) which was already fixed with
an explicit fix for ARM in the changelog
 and bug 492336 (eglibc's tst-eintr1 failing) which seems to work now
but it's not clear when it was fixed.
* Looking at eglibc's test log there seem to be a bunch of others
that are failing and may well be worth investigating.
  * bug 372121 (qemu/xargs stack/number of arguments limit) seems to
work ok, however the reporter did say it was quite a fragile test;
that needs more investigation to see
whether the original reason has actually been fixed.

== misc ==
  * swapping notes with Peter on the PBX SD card investigation

Dave

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Re: Getting linaro toolchain binaries

2011-03-02 Thread David Gilbert

On 2 March 2011 18:44, John Rigby  wrote:
> FWIW, Michael's recipe for building here
> https://code.launchpad.net/~michaelh1/+junk/cross-build has worked
> well for me.

It's OK for people familiar with toolchains; the problem is where you
have a subject specialist
who knows how to write C or ARM assembler but really doesn't know anything
about toolchains or wrangling build tools; they just need something that works
out of the box.

Dave

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

[ACTIVITY] 2011-03-04

2011-03-04 Thread David Gilbert

  * Investigated and fixed sqlite3 testsuite failure on ARM (bug 725052)
  * Discussing libffi API changes with maintainer; hopefully he's
going to send out his comments today.
  * Looking at how to upstream the string routine changes
* Need to look at big endian testing

  * Testing QEmu pre-release for Peter; looking very nice.

Dave

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

[ACTIVITY] 2011-03-10

2011-03-10 Thread David Gilbert

== hard-float ==
  * Updated libffi variadic patch and Sent updated libffi variadic
patch to the ffi mailing list.

== String routines ==
  * Got a big endian build environment going
  * Patched up memchr and strlen for big endian; turned out to be a
very small change in the end; and
tested it on qemu-armeb - note that an older version it didn't work
on, but a newer one it did; I'll assume
the newer one is correct.
  * Fixed a couple of build issues in the cortex strings test harness

== Other ==
  * Kicked off a SPEC2006 train run on canis using the 2011.03 compilers

I'm on holiday tomorrow (Friday) and Monday.

Dave

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

[ACTIVITY] 2011-03-17

2011-03-17 Thread David Gilbert

Short week

  * libffi patch accepted upstream
  * eglibc integration of string routine changes
- I have something that works but it's more complex than I'd like
(to get it to fall
back to the C code on stuff I haven't optimised for).
  * Trying a neon memchr; tried a really simple 8 byte a loop version - it's
quite slow on both A8 and A9; branching on the result of comparisons
done in the neon is not simple.

  * Porting jam bug 735877 chromium using d32 float; it was passing
vpfpv3 rather than using the default when configured without neon.

On holiday tomorrow  (Friday).

Dave

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

[ACTIVITY] 2011-03-25

2011-03-25 Thread David Gilbert

== String routines ==
  * Wrote a thumb optimised strchr
- As expected it's got nice performance for longer runs but at
sizes <16 bytes it's slower, and a lot of the strchr
calls are very short, so it's probably not of benefit in most cases

  ( 
https://wiki.linaro.org/WorkingGroups/ToolChain/Benchmarks/InitialStrchr?action=AttachFile&do=get&target=panda-01-strchr-git44154ec-strchr-abs.png
)

  * Wrote a neon-memcpy
- As previously found with memset, it performs well on A8 but
poorly on A9 - it does however do the case where
the source/destination isn't aligned quite well even on A9 ; the vld1
unaligned case works with relatively little penalty.
 (it performs comparably to the Bionic implementation - mine is a
bit faster on shorter calls, Bionic is better
on longer uses - I think that's because they've got some careful use
of preloads where I have so far got none).


I'm on holiday up to and including 5th April.

Dave

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Tegra2 errata bug

2011-04-07 Thread David Gilbert

https://bugs.launchpad.net/ubuntu/+source/eglibc/+bug/739374 is the
bug relating to the Tegra errata mentioned today.

Dave

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

[ACTIVITY] 2011-04-08

2011-04-08 Thread David Gilbert

 - Back from holiday, short week.

 == Porting jam ==
  * We seem to have picked up a lot of ftbfs in the last couple of
weeks - which is unfortunate because it may well be too close to the
Natty
release to do anything about them

  * Bug 745843 is a repeatable segfault in part of the build process
of a package called vtk that is used by a few other things ; I've got
this
down to a particular call of one function - although gdb is getting
rather confused (r0 & r1 changing as I single step across a branch)

  * Bug 745861 petsc build failure; I'm getting one of two different
link errors depending which mood it is in - mpi related?

  * Bug 745873 - a meta package that just didn't have a list of
packages to build with for armel; easy to do a simple fix (provided
branch that built) for but the maintainer
says it's too late for natty anyway and some more thought is needed.

== Other ==
  * Reading over some optimisation documents

  * Tested weekly release on Beagle-c4 (still no OTG usb and hence no
networking for me)

  * Also simple boot test on panda; not much time for more thorough
test. (seems to work)

Dave

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

[ACTIVITY] 2011-04-15

2011-04-15 Thread David Gilbert

== Bug triaging ==
  * Bug 745843 (vtk ftbfs) got it down to a bad arm/thumb transition -
identified as a linker error and handed off to RichardS
  * Bug 758082 (augeas ftbfs) tracked it down to overwrite of a
parameter in a variadic function before it got stacked; identified by
Ramana as another
instance of the shrink-wrap bug.
  * Bug 745861 (petsc ftbfs) isolated the collection of different mpi
related problems this is hitting; really need to find an mpi expert on
this
  * Bug 745863 & bug 745891 (ftbfs's) - both were compilations that
timed out; verified this was due to using lots of RAM and also using
lots of RAM on x86
(> ~500MB) - marked as invalid until the build farm grows more RAM
  * Bug 757427 gconf seg fault - failed to reproduce under various
tests (although Michael has now managed to catch it in the act)

== Optimisation ==
  * neon memcpy tweeking; added prefetches and unrolled the core loop
- now comparable perf to bionic memcpy in most cases (slower on
misaligned destination, faster in other cases)
  * tweaked latrace to print address/length of argument strings so I
can get some stats on routine usage.

Dave

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

[ACTIVITY] 2011-04-21

2011-04-21 Thread David Gilbert

== String and Memory routines ==

  * Profiled denbench with perf and produced a set of stats to show
which programs spent how much time in libc and how
much time was spent in each routine.While some of the
benchmarks are good (like aes) and spend almost no time in libc
some of the others (MPEG codecs especially) seem to spend
significant times in libc.

  * Ran all of denbench through latrace to generate sets of library
calls; post processed them to extract the section between the clock()
calls (and hence in the timed portion) and analysed the hot library
calls.   I've looked at some of the output but not all of it yet; I
get output like:

Memcpy stats (dst align/src align/length/number of occurrences/total
size copied)
memcpy: 0,0,1 , 1588520, 1588520
memcpy: 16,28,4096 , 1, 4096
memcpy: 4,20,16384 , 855, 14008320

This shows that for a bunch of tests they do an inordinate number of 1
byte memcpy's, and a few hundred larger memcpy's with an address %32
which is 4
(and destination %32 is 20) - so not aligned but at least equally misaligned.

  * Started writing up a report on some of the stats

  * Also started to try and extract the same stuff from SPEC2k6

== QEMU ==

  * Tested Peter's QEmu release earlier in the week (On Lucid so
didn't hit his natty bug)
  * Wrote up a couple of specs (one for TrustZone and the other for
Device Tree integration)

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

[ACTIVITY] 2011-05-06

2011-05-06 Thread David Gilbert

== Bug fighting ==
  * Tracked bug 774175 (apt segfault on armel on oneiric) down to the
cortex-a8 branch erratum bug that we found as part of the bug jam a
few weeks
ago (affecting the more obscure vtk package) - Richard's existing
binutils fix should fix this.

== String routines ==
  * Struggled to get 'perf' to get sane results from profiling spec;
some of the samples are obviously being associated with the wrong
process somewhere
along the process (e.g. it's showing significant samples in the sh
process but in a library that's used by the actual benchmark.

  * latrace on spec still running on ursa2

  * Wrote a non-neon memcpy; as expected it's aligned performance is
very similar to libc/kernel - it's a bit faster in some places but
slower
in some odd places (e.g. n*32+1 bytes is a lot slower for some
reason).  It's also really bad on mis-aligned cases, I tried to take
advantage
of the v7's ability to do misaligned loads - but they really are quite slow.

Dave

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Re: [ACTIVITY] 2011-05-06

2011-05-09 Thread David Gilbert

On 8 May 2011 13:55, Hakehuang  wrote:
> Can there be something using pragma option to disable neon for each function?

I don't think there is a pragma like that for ARM at the moment;
Gcc does seem to have a

#pragma GCC target

and also function attributes for target options; but at the moment these
are documented as only being used on x86 (where they are used to turn
things like sse on and off).

What is your use case?

Dave

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

[ACTIVITY] 2011-05-13

2011-05-13 Thread David Gilbert

== String routines ==
  * Gave up on perf on silverbell and redid it on ursa2; now have a
full set of perf figures and have updated the workload report to show
the spec
binaries that use significant time in libc and the routines they spend
it in; a handful of tests spend very significant amounts of time in
libm.
  * Have ltrace results from about 75% of spec - some of the others
are fighting a bit
  * Optimised the non-neon memcpy; it's now quite respectable except
in one or two cases (2 byte misaligned, and for some odd reason source
offset
by 8 bytes, destination by 12 is way down on any other combination)

  (Current result graphs here
https://wiki.linaro.org/Internal/People/DaveGilbert?action=AttachFile&do=get&target=results-2011-05-13-panda-69321a21.pdf
)

Dave

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

[ACTIVITY] 2011-05-20

2011-05-20 Thread David Gilbert

  * Profiling SPEC 2k6 still; about 3/4 of the latrace files are
generated but it's taking some hand holding with some of them
(e.g. finding one that makes millions of calls to a library function
that we're not interested in but generates a huge log, and hence
needs it excluding).
* Working through the ones that I have with analysis scripts and
writing the interesting things up.

  * Submitted ARM test suite fix for latrace (unsigned characterism)
  * Verified Richard's binutils fix in natty-proposed fixed the vtk FTBFS
  * Blueprint for 64bit sync primitives.

Dave

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

[ACTIVITY] 2011-05-27

2011-05-27 Thread David Gilbert

== String routines ==
  * Finally finished the ltrace analysis of the whole of SPEC 2k6 and
have written it up - I'll proof read it next week and then send it out
to the benchmark list.
  * Ran memset and memcpy benchmarks of larger than cache sizes on A9
* memcpy on larger than cache sizes (or probably mainly cache miss
data) does come back to Neon winning over ARM; my suspicion is that
with cache hits we run out of bandwidth on Neon, but that doesn't
happen in the cache miss case; why it's faster in that case I'm not
sure yet.
* memset is still not faster for Neon even on large sizes where
the destination isn't in the cache.

== Other ==
  * Started looking at 64 bit atomics
  * Looking at the pot of QEmu work with Peter.

Dave

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

[ACTIVITY] 2011-06-03

2011-06-03 Thread David Gilbert

== String routines ==
   * Wrote a hybrid ARM/Neon memcpy - it uses Neon for non-aligned cases or
large (>=128k)  cases
   * polished up and sent out write up of workload analysis of denbench and spec
   * Ran denbench with all memcpy and memset varients, graphed up results
 - SPEC 2k6 is now cooking with the memcpy set - it'll take all weekend.

== 64 bit atomics ==
  * Started looking through the Gcc code at the existing non-64bit atomic code;
I need to understand how registers work in DI mode and what's going to be
needed in terms of temporaries.

Dave

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

[ACTIVITY] 2011-06-10

2011-06-10 Thread David Gilbert

== String Routines ==
  * Completed gathering the SPEC2k6 memcpy results, graphed them, sent them out
  * Gathered SPEC2k6 memset results, graphed them, sent them out

== 64bit Atomics ==
  * Modified gcc backend to do 64bit Atomic ops - the code looks good,
but I've not
done much testing yet.

== Other ==
  * Upstreamed a small ltrace patch

Next week:
  Plan is to get gcc tests done and attack libgcc for the pre-v7
fallbacks (the tricky
bit there is runtime deciding what to use)
  Also run spec and denbench for strlen and some other string routines

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Re: Interesting failure of vasprintf test from postler

2011-06-16 Thread David Gilbert

On 16 June 2011 19:37, Marcin Juszkiewicz  wrote:
> On czw, 2011-06-16 at 20:31 +0200, Marcin Juszkiewicz wrote:
>> There is a ftbfs on armel bug for postler:
>> https://bugs.launchpad.net/ubuntu/+source/postler/+bug/791319
>>
>> Attached test compiles fine on amd64 but fails on armel:
>>
>> 20:16 hrw@malenstwo:postler-0.1.1$ gcc _build_/.conf_check_0/test.c
>> _build_/.conf_check_0/test.c: In function ‘main’:
>> _build_/.conf_check_0/test.c:5:1: error: incompatible type for argument
>> 3 of ‘vasprintf’
>> /usr/include/stdio.h:396:12: note: expected ‘__gnuc_va_list’ but
>> argument is of type ‘char *’
>>
>> Can someone explain me why this happens?
>>
>> Ubuntu armel and armhf cross compilers, CSL 2011.03-42 have same
>> problem.
>
> Compiled fine with gcc 4.3 from Poky toolchain. Same with Emdebian
> gcc-4.3 cross compiler.

Looks more to me like the old compilers are the ones with the problems and
the new one is correctly throwing an error - it's supposed to have a
va_list there -
why would it take a string?

Dave

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

[ACTIVITY] 2011-06-17

2011-06-17 Thread David Gilbert

== 64 bit Atomics ==

  * Wrote more test cases; now have a nice 3 thread test that passes -
and more importantly, it fails if I replace one of the atomic ops
by a non-atomic equivalent.
  * Modified existing atomic helper code in libgcc to do 64bit
  * Added init function to 64bit atomic helper to detect presence of
new kernel and fail if an old one is present.

That last one is a bit of a pain; it now correctly exits on existing
kernels and aborts; qemu user space seg faults
because access to the kernel helper version address is uncaught.  So
first thing I need to do is try the early kernel patch Nicolas
sent around, and then I really need to see if qemu can be firmly
persuaded to run it.

== String routines ==
  * Ran denbench with sets of strlen; started running some spec as well.

== QEmu ==
  * Tested Peter's prelease tarball in user space and a bunch of
system emulations
 - successfully managed to say hello to #linaro from an emulated
overo board using USB keyboard.

== Other ==
  * Booked 4th July week off.

Dave

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

[ACTIVITY] 2011-06-24

2011-06-24 Thread David Gilbert

== Atomics ==
  * Testing the libgcc fallback code with Nicholas's kernel patch -
and then fixing my initialisation code to use init_array's (thanks
Richard for the hint)
  * Tidying stuff up after a review of my patch by Richard - the
sync.md is now smaller than the original before I started.
  * Discussing sync semantics with Michael Edwards - he's spotted that
the gcc ARM sync routines need to move their final memory barrier
for the compare-exchange case where the compare fails.
  * Looking at valgrind; it looks like it should be OK with the
commpage changes; but it doesn't currently support ldrexd and strexd;
there is a
patch for it to do ARM mode but not thumb yet.

Dave

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

[ACTIVITY] 2011-07-01

2011-07-01 Thread David Gilbert

== 64 bit atomics ==
  * Submitted patches to gcc patch list
- One comment back already asking if we should really change ARM
to have a VDSO to make checks of the user helper version easier

  * Added thumb ldrexd/strexd to valgrind; patch added to bug in their
bugtracker (KDE bug 266035)

  * Came to the conclusion eglibc doesn't actually need any changes
- It's got a place holder for a 64bit routine, but it's unused and
isn't exposed to the libc users
- Note that neither eglibc or glibc built cleanly from the trunk on ARM.
  * Started digging into QEmu a bit more to find out how to solve the
helper problem

== String routines ==

  * Added SPEC2k6 string routine results to my charts; while most
stuff is in the noise it seems
the bionic routine is a bit slower overall than everything else, and
my absolutely trivially simple
~5 instruction loop is a tie for the fastest with my smarter 4
byte/loop using uadd.


== Next week ==
  * Sleep, Rest, Relaxation, getting older
  * (Will be polling email for any more follow ups on my gcc patches)

Dave

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

[ACTIVITY} 2011-07-15

2011-07-15 Thread David Gilbert

== String routines ==
  * Sent a patch to libc-ports with modified configure scripts to add
subdirectories
for architecture specific ARM code, and the memchr.S from cortex-strings.

== 64 bit atomics ==
  * Working through comments on my patches and the set of discussions about the
kernel interface for the helper case - not really sure which way
that's going to go.

== QEmu ==
  * Looking at how tracing works, considering adding tracing to sd
card code to help
track down some of the sd card issues.

Dave

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

[ACTIVITY] 2011-07-22

2011-07-22 Thread David Gilbert

== 64 bit atomics ==
  * Updated gcc patches as per comments from Ramana and Joseph; build
currently cooking on Panda

== Qemu ==
  * Testing Peter's pre-release, finding bug on beagle (that he
tracked down to x-loader change)
  * Found cause of occasional SD card errors I was seeing (SD: CMD12
in a wrong state); I'll cut
a patch next week, but the bug is writing the last sector throws
an error and also leaves it in
the wrong state
  * Added a bunch of tracing code to the SD card layer
  * With the tracing code and fixing the other bug I'm starting to
understand how it works - and
 half a dozen reasons that the emulation is really slow; whether
that's the cause of the reported
 recoverable lock ups under load is an interesting question; I
plan to fix the obvious problems
 and see how it goes.

Dave

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

[ACTIVITY] 2011-07-29

2011-07-29 Thread David Gilbert

== 64 bit atomics ==
  * Sent updated set of 64bit atomic patches to gcc list with fixes
from previous review
  * Started hunting for other users of 64bit atomics than membase
jemalloc, sdl and boost lock free look like possibilities; but I've
not looked at them hard yet

== QEmu ==
  * Released fix for last SD card block access error
- Vincent Palatin released a bunch of SD card fixes a few hours
later - that included a fix to the same bug; however it does look like
  he has a bunch of other stuff we should keep sync'd with.
  * Changing caching mode to writeback on the block layer fixes bug
732223 (hangs on heavy IO) - goes from 130KB/s to 8MB/s on vexpress
- Asked mailing list whether that's reasonable to make as default for SD
  * Looking at path from CPU->MMC/SD card - the DMA on OMAP is pretty
inefficiently emulated, but the soc_dma code has an unused special
case for dma'ing to hardware, looks promising but need to figure
out how to use it and if it works.
  * Comparing Vincent's SD card patch with earlier meego patches;
partial overlap.

== Other ==
  * Pinged libc-ports for comments on my optimised memchr patch
  * Image testing


Next week; I intend to be in Camborne on the afternoon of Monday,
Wednesday and Friday.

Dave

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

[ACTIVITY] 2011-08-05

2011-08-05 Thread David Gilbert

== QEMU ==
  * After discussion with Peter started writing QEMU fixup for 64bit
atomic helper version location.
  * Sent fixes for soc-dma code to qemu list
  * Trying to understand just how much of omap_dma's code is needed.

== Other ==
  * Travelling to/from connect
  * Wanted to dial into some of the seessions in Corpus and Magdelen
rooms but the remote audio from them was unusable.

Dave

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

[ACTIVITY] 2011-08-12

2011-08-12 Thread David Gilbert

== QEMU ==
  * Finished off a first cut of the 64bit helper patch to QEMU
- Gave it to Peter and have reworked most of the things he commented on
  * This also lead into a bit of a rabbit hole of finding various
generic QEMU threading issues

  * Tested Peter's 11.08 QEMU release
(I used linaro-fetch-image-ui for the first time to grab the
release images; quite nice, hit
 a couple of issues but much nicer than crawling around the site
to find where the hwpacks
 are).

== Other ==
  * Pinged gcc patches list for more comments on 64bit atomic patch

I'm on holiday the week of 22nd (i.e. the week after next).

Dave

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Re: Is the Linaro toolchain useful on x86/x86_64?

2011-08-17 Thread David Gilbert

On 17 August 2011 12:09, Bernhard Rosenkranzer
 wrote:
> Hi,
> is the Linaro toolchain (esp. gcc) useful on x86/x86_64, or is an
> attempt to use the Linaro toolchain with such a target just asking for
> trouble?
> (No, I'm not secretly an Intel spy ;) Just trying to have some fun
> with my desktop machine ;) )

I believe the idea is that while we don't work to improve x86(_64) we
shouldn't break it.

Dave

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

[ACTIVITY] 2011-08-19

2011-08-19 Thread David Gilbert

== String routines ==
  * Working through updating my eglibc patch for memchr, I think I'm
nearly there - took way too long
to persuade an eglibc make check to work cross (can't get native
build happy).

== QEMU ==
  * Sent a new version of my QEMU patch for the atomic helpers to Peter.
  * Tested the Android beagle image on a real beagle - it fails in
pretty much the same way as the
QEMU run.

== Other ==
  * Had a brief look at bug 825711 - scribus ftbfs on ARM - this is QT
being built to define qreal as
 float on ARM when it's double on most other things, scribus
having a qreal variable and something
 it's defined as a double and then passing it to a template that
requires two arguments of the same type;
 not really sure which one is to blame here!

I'm on holiday next week.

Dave

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

[ACTIVITY] 2011-09-02

2011-09-02 Thread David Gilbert

== QEmu ==
  * Sent 64bit atomic helper fix upstream
  * Basic boot time and simple benchmarks v Panda board
  * Tested prebuilt images and Peter's latest post-merge QEmu tree
 - The full Ubuntu desktop on an emulated Overo is a bit slow -
it's rather short on RAM
 - The full Ubuntu desktop on an emulated VExpress isn't bad; it's
got the full 1G; (with particularly grim
   line of awk to mount vexpress images based on Peter's
suggestion of the use of 'file')

== String routines ==
  * Pushed memcpy and memset up to cortex-strings bzr
  * Working through memset issue with Michael
  - Made my code a little less sensitive to initial alignment

== Hard float ==
  * Testing libffi 3.0.11rc1 - still hasn't got variadic patch in, but
hopeing it will land later in the cycle.

== Other ==
  * Excavating inbox after week off.
  * Build LMbench and kicked run off on Panda. (Got stuck in some
heuristics under emulation)

Dave

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Re: Benchmarking / justifying cortex-strings

2011-09-05 Thread David Gilbert

On 5 September 2011 04:21, Michael Hope  wrote:
> On Fri, Sep 2, 2011 at 4:08 PM, Michael Hope  wrote:
>> Hi Dave.  I've been hacking away and have checked in a couple of
>> benchmarking and plotting scripts to lp:cortex-strings.  The current
>> results are at:
>>  http://people.linaro.org/~michaelh/incoming/strings-performance/
>>
>> All are done on an A9.  The results are very incomplete due to how
>> long things take to run.  I'll leave ursa3 doing these over the
>> weekend which should flesh this out for the other routines.
>
> Right, that's done.  The new graphs are up at:
>  http://people.linaro.org/~michaelh/incoming/strings-performance/
>
> The original data is at:
>  http://people.linaro.org/~michaelh/incoming/strings-performance/epic.txt
>
> Here's the relative performance for all routines with eight byte
> aligned data and 128 byte blocks:
>  http://people.linaro.org/~michaelh/incoming/strings-performance/top-000128.png
>
> memchr, memcpy, strcpy, and strlen all look good at this block size.

Good.

> Here's the speed versus block size for eight byte aligned data:
>  http://people.linaro.org/~michaelh/incoming/strings-performance/sizes-memchr-08.png

Nice; odd dip between 8 and 16 chars - I don't switch to the smarter
stuff until 16 bytes.

>  http://people.linaro.org/~michaelh/incoming/strings-performance/sizes-memset-08.png

Hmm yes the short ones could be a bit faster - I always tended to use
log X scales :-)
The really small ones I wouldn't worry too much about, the interesting
stuff is 32-512
where I'd have expected it to have got it's act in gear.

>  http://people.linaro.org/~michaelh/incoming/strings-performance/sizes-strchr-08.png
>  http://people.linaro.org/~michaelh/incoming/strings-performance/sizes-strchr-08.png

The version of strchr that's in there is the simple-as-possible
strchr; it's byte at a time -
I also have a version that uses similar code to memchr that goes fast
at large sizes
but is slower for small matches:

See:
https://wiki.linaro.org/WorkingGroups/ToolChain/Benchmarks/InitialStrchr?action=AttachFile&do=get&target=panda-01-strchr-git44154ec-strchr-abs.png

I'd made the call that performance at smaller strings was probably
more important.

>  http://people.linaro.org/~michaelh/incoming/strings-performance/sizes-strcmp-08.png

Huh? I haven't written a strcmp - that looks like newlibs?

>  http://people.linaro.org/~michaelh/incoming/strings-performance/sizes-strcpy-08.png

Ditto.

>  http://people.linaro.org/~michaelh/incoming/strings-performance/sizes-strlen-08.png

That's very nice - although quite bizarre;  even the lower end of the
steps are suitably
fast so not really anything to worry about; but it  would be great to
understand where
the 1500 cycle difference is going at the large end.

Dave

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Re: Benchmarking / justifying cortex-strings

2011-09-05 Thread David Gilbert

On 5 September 2011 17:40, Christian Robottom Reis  wrote:
> On Mon, Sep 05, 2011 at 03:21:49PM +1200, Michael Hope wrote:
>> memchr is good.  memset could be better for blocks of less than 1k.
>> strchr gets second place but is eclipsed by newlib's version.  strcmp
>> need work.  strcmp is good.
>
> It's strcpy which is good in this last sentence, though it basically
> matches newlib's version.

I think that's because it IS newlib's version - I've not done a strcpy.

> I'm curious about the "political" side of cortexstrings -- is there
> active interest by the library maintainers in picking up our versions?

There is interest from partners in having optimised versions; I think
the library maintainers are happy to take it if you can convince them
that they are improvements.

Dave

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

[ACTIVITY] 2011-09-09

2011-09-09 Thread David Gilbert

== String routines ==
  * Trying to understand my strlen behaviour that Michael identified
 - Found lots of ways of making the faster case slower, but none of making
   the slower case faster!
 - Perf not being available on Panda (bug 702999/843628) made it
difficult to
   dig down
  * Fixing standards corner cases for strchr/memchr
 - input match needs to be truncated to char (fixes bug 842258 & 791274)
  * Tidying up formatting for cortex-strings release
  * Looking at eglibc integration again
 - getting confused by what has to happen in config.sub and how
other users of it
   cope with triplets like armv7 even though it's not in config.sub

== QEMU ==
   * Testing Peter's QEMU release
 - All good
 - Lost a few hours due to the broken version of l-i-f-ui in Oneiric
   - PPA version works OK
  * A little bit of perf profiling

== Other ==
  * Managed to get hold of a nice fast build machine

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

eglibc and fun with config.sub

2011-09-15 Thread David Gilbert

As mentioned on the standup call this morning, I've been trying to get my head
around the way different parts of the toolchain using the config scripts and the
triplets.  I'd appreciate some thoughts on what the right thing to do
is, especially
since there was some unease at some of the ideas.

My aim here is to add an armv7 specific set of routines to the eglibc
ports and get this picked up
only when eglibc is built for armv7; but it's getting a bit tricky.

eglibc shares with gcc and binutils a script called config.sub (that
lives in a separate repository)
which munges the triplet into a $basic_machine and validates it for a
set of known
triplets.

So for example it has the (shell) pattern:

arm | arm[bl]e | arme[lb] | armv[2345] | armv[345][lb]

to recognise triplets of the form arm- armbe- armle-
armel- armbe- armv5- armv5l- or armv5b-

It also knows more obscure things such as if you're configuring for
a netwinder it's an armv4l- system running linux - but frankly most
of that type of thing are a decade or two out of date.  Note it doesn't
yet know about armv6 or armv7.

eglibc builds a search path that at the moment includes a path under
the 'ports' directory of the form

arm/eabi/$machine

where $machine is typically the first part of your triplet; however
at the moment eglibc doesn't have any ARM version specific subdirectories.

If I just added an ports/sysdeps/arm/eabi/armv7 directory it wouldn't
use it because
it searches in arm/eabi/arm if configured with the triplet  arm-linux-gnueabi or

--with-cpu sets $submachine (NOT $machine) - so if you pass --with-cpu=armv7
it ends up searching

arm/eabi/arm/armv7

if you used the triplet arm-linux-gnueabi.  If you had a triplet like
armel then I think
it would be searching

arm/eabi/armel/armv7

So my original patch (
http://old.nabble.com/-ARM--architecture-specific-subdirectories,-optimised-memchr-and-some-questions-td32070289.html
)
did the following:

   * Modified the paths searched to be arm/eabi (rather than arm/eabi/$machine)
   * If $submachine hadn't been set by --with-cpu then autodetect it
from gcc's #defines

which meant that it ignored the start of the triplet and let you
specify --with-cpu=armv7

After some discussion with Joseph Myers, he's convinced me that isn't
what eglibc
is expecting (see later in the thread linked above);  what it should
be doing is that
$machine should be armv7 and $submachine should be used if we wanted
say a cortex-a8 or
cortext-a9 specific version.

My current patch:
  * adds armv6 and armv7 to config.sub
  * adds arm/eabi/armv7 and arm/eabi/armv6t2   and one assembler
routine in there.
  * If $machine is just 'arm' then it autodetects from gcc's #defines
  * else if $machine is armv then that's still $machine

So if you use:

   a triplet like arm-linux-gnueabi it looks at gcc and if that's configured
   for armv7-a   it searches   arm/eabi/armv7

   a triplet like armv7-linux-gnueabi then it searches arm/eabi/armv7
irrespective
of what gcc was configured for

   a triplet like armv7-linux-gnueabi and --with-cpu=cortex-a9  then it searches
arm/eabi/armv7/cortex-a9 then arm/eabi/armv7


As far as I can tell gcc ignores the first part of the triplet, other than
noting it's arm and spotting  if it ends with b for big endian; (i.e.
configuring gcc with armv4-linux-gnueabi and armv7-linux-gnueabi
ends up with the same compiler).

binutils also mostly ignores the 1st part of the triple - although is
a bit of a mess
with different parts parsing it differently (it seems to spot arm9e for some odd
reason); as far as I can tell gold will accept armbe* for big endian where as
ld takes arm*b !

If you're still reading, then the questions are:
  1) Does the approach I've suggested make sense - in particular that the
machine directory chosen is based either on the triplet or where the triplet
doesn't specify the configuration of gcc; that's my interpretation of  what
Joseph is suggesting.

  2) Doing (1) would seem to suggest I should give config.sub armv6t2 and
some of the other complex names.

Dave

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Re: eglibc and fun with config.sub

2011-09-16 Thread David Gilbert

OK, so we seem to have agreement here that what we want is autodetect
for eglibc and
forget about the triplet; well technically that probably makes my life
easier, and I don't
think it's too hard a sell.

Dave

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

[ACTIVITY] 2011-09-16

2011-09-16 Thread David Gilbert

== String routines ==
  * Tidying up bits of cortex strings for the release process
  * Nailing down the behaviour of config.sub and the config systems in
gcc, binutils and eglibc

== Other ==
  * A discussion on synchronisation primitives on various CPUs that
started on the gcc list
- looking at http://www.cl.cam.ac.uk/~pes20/cpp/cpp0xmappings.html
   - pointing out the 64bit instructions
   - asking why they used isb's when neither the kernel or gcc use
them (answer the DMBs should
 be fine as well, but there is some debate over which is
quicker, oh and DMBs are
 converted to slower dsb's on most A9s due to an errata).
  * Looking for docs on the non-core bits of current SoCs
  * Extracting some denbench stats from a few months back for Ramana

About a day of non-Linaro IBM stuff.

Dave

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Re: eglibc and fun with config.sub

2011-09-19 Thread David Gilbert

On 19 September 2011 00:48, Michael K. Edwards  wrote:
> Please coordinate with Jon Masters at RedHat/Fedora and Adam Conrad at
> Ubuntu/Debian on this.  (Cc'ing the cross-distro list, through which the
> recent ARM summit at Linux Plumbers was organized.)

OK, let me summarise for the new people on the cc.

I'm looking to adding a few ARM optimised string routines to eglibc
(memchr initially)
and they requires ARMv6t2 and newer, hence I started looking at how it
finds architecture
specific code.

At the moment eglibc uses the architecture from the 1st part of the
triplet as part of the
search path in the source tree, so configuring for armv5-linux-gnueabi
will end up looking
in an arm/eabi/armv5 directory.

What I was going to do (after some discussion with Joseph Myers:
http://old.nabble.com/-ARM--architecture-specific-subdirectories,-optimised-memchr-and-some-questions-td32070289.html
)

   1) Add armv6/v7 to config.sub
   2) If the version wasn't specified in the 1st part of a triplet,
then use gcc ifdef's to autodetect what we're building for
   3) If the version was specified in the triplet then use it

A bit of digging however led me to find that neither gcc or binutils
seem to use the version in the triplet
to determine anything, and the discussion on the linaro-toolchain list
that started this thread have come to the
conclusion that people would actually prefer to always ignore the
triplet (like binutils and gcc) and just
autodetect from gcc ifdef's.

Thoughts?

Dave

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

[ACTIVITY] 2011-09-23

2011-09-23 Thread David Gilbert

== String routines ==
  * Having got agreement on ignoring the triplet for picking the
routine, I'm just testing a patch,
but fighting a qemu setup.
  * Found the binfmt binding for armeb was wrong (runs the le
version); filed bug with fix in

Dave

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Re: Use of memcpy() in libpng

2011-09-27 Thread David Gilbert

On 27 September 2011 14:16, Christian Robottom Reis  wrote:
> On Tue, Sep 27, 2011 at 09:47:33AM +0100, Ramana Radhakrishnan wrote:
>> On 26 September 2011 21:51, Michael Hope  wrote:
>> > Saw this on the linaro-multimedia list:
>> >  http://lists.linaro.org/pipermail/linaro-multimedia/2011-September/74.html
>> >
>> > libpng spends a significant amount of time in memcpy().  This might
>> > tie in with Ramana's investigation or the unaligned access work by
>> > allowing more memcpy()s to be inlined.
>>
>> It's the unaligned access and the change / improvements to the memcpy
>> that *might* help in this case. But that ofcourse depends on the
>> compiler knowing when it can do such a thing. Ofcourse what might be
>> more interesting is the kind of workload analysis that Dave's done in
>> the past with memcpy to know what the alignment and size of the buffer
>> being copied is.
>
> If you guys could take a look at this there is a potential requirement
> for the MMWG around libpng optimization; we could fit this in along with
> other work (possible vectorizing, etc) on that component.

It wouldn't take long to analyse the memcpy calls - life would
be easier if we had the test program and some details on things
like what size of images were used in these benchmarks.

Dave

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

[ACTIVITY] 2011-09-28

2011-09-28 Thread David Gilbert

== String routines ==
  * Got eglibc testing setup happy at last
- Note that -O3 builds generally seem to give a few more errors
that are probably worth looking at
- -march=armv6 -mthumb hit some non-thumb1 instructions (normally
non-lo registers), again worth looking at
- Cross testing to Qemu user mode often stalls, mostly on nptl
tests that abort/fail when run in system/natively
  * Sent new version of eglibc/memchr patch upstream
  * Now have working newlib test setup and reference set
- next step is to try adding my memchr there

== Other ==
  * Testing a QEmu patch with Peter
  * Looking at bug 861296 (difference in mmap layouts)
  * Adding a few suggestions to the set of cpu hotplug tests.
  * Dealing with the Manchester lab cold.

Short week; back on Monday

Dave

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

[ACTIVITY] 2011-10-07

2011-10-07 Thread David Gilbert

== String Routines ==
  * Built and tested a newlib with my memchr in - ready to go with a
bit of tidy up.
  * Followed up on my eglibc patch submission by a comment suggesting
the use of --with-cpu pointing back at the previous discussion.

== 64 Bit atomics ==
  * Updated gcc patch based on Ramana's comments, retested and posted
new version
- Lost half a day to a failing SD card in our panda.

== QEMU ==
  * Posted a patch that made one variable thread local using __thread
that fixes multi threaded user mode ARM programs (e.g. firefox); this
seems to have mutated on the list into a patch for more general thread
local support.

Dave

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Screwy panda timing?

2011-10-13 Thread David Gilbert

I've just tried rerunning some benchmarks on my panda, which I
reinstalled recently and am getting
some odd behaviour:

The kernel is 3.0.0-1404-linaro-lt-omap

For example:
simple_strlen: ,102400, loops of ,62, bytes=6.054688 MB, transferred
in ,20324707.00 ns, giving, 297.897898 MB/s
simple_strlen: ,102400, loops of ,32, bytes=3.125000 MB, transferred
in ,7904053.00 ns, giving, 395.366782 MB/s
simple_strlen: ,102400, loops of ,16, bytes=1.562500 MB, transferred
in ,7354736.00 ns, giving, 212.448142 MB/s
simple_strlen: ,102400, loops of ,8, bytes=0.781250 MB, transferred in
,91553.00 ns, giving, 8533.308575 MB/s
simple_strlen: ,102400, loops of ,4, bytes=0.390625 MB, transferred in
,1495361.00 ns, giving, 261.224547 MB/s
simple_strlen: ,102400, loops of ,2, bytes=0.195312 MB, transferred in
,1983643.00 ns, giving, 98.461518 MB/s

Note the 8 byte one apparently 40 times faster, and for true oddness:

smarter_strlen_ldrd: ,102400, loops of ,62, bytes=6.054688 MB,
transferred in ,3936768.00 ns, giving, 1537.984331 MB/s
smarter_strlen_ldrd: ,102400, loops of ,32, bytes=3.125000 MB,
transferred in ,0.00 ns, giving, inf MB/s
smarter_strlen_ldrd: ,102400, loops of ,16, bytes=1.562500 MB,
transferred in ,4180909.00 ns, giving, 373.722557 MB/s

Now, while I like infinite transfer rates, I suspect they're wrong.

Anyone else seeing this?

Dave

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

[ACTIVITY] 2011-10-14

2011-10-14 Thread David Gilbert

== 64 bit atomics ==
  * Thanks to Ramana for OKing my gcc patches; and Richard for
committing them - I've backported these to the gcc-linaro branch
and pushed it - hopefully those will pass OK!

== String routines ==
  * Sent my memchr patch to upstream newlib, received comments,
tweeked, and resent
  * Sent strlen patch to upstream newlib
  * Spent some time getting confused by timing issues on our Panda; it
got reinstalled with 11.09 a few
weeks ago and is now showing some odd behaviours.  In particular I'm
seeing some tests show completion in 0ns
(and my code isn't -that- fast!), and others where the times vary
wildly - it's almost as if a timer interrupt is delayed
or missing; my same test  binary works fine on one of Michael's Ursa's
running an older install.

== QEMU ==
  * Tested Peters QEMU image for release

== Other ==
  * Spent an afternoon reading through the System trace docs

On holiday next week; I'll poll email occasionally.

Dave

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

[ACTIVITY] 2011-10-28

2011-10-28 Thread David Gilbert

== 64 bit atomics ==
  * I've been building and testing membase
  * Version 1.7.1.1 source builds OK (after turning off -Werror due to
some of their curious type naming)
  * The git version fails to build - it doesn't seem consistent
  * 1.7.1.1 passes simple tests, but there are 3 tests in its test
suite that intermittently fail on ARM and
seem to be solid on x86.  (There are also some that just require
timeouts increased due to the
relatively slow machine).
  * t/issue_163.t turned out to be a timing race in the test itself,
made worse by being on a relatively slow
machine and probably made worse by the Pandas odd idea of timing.
That was reported to them with
a break down of it, and upstream has fixed their test. (
http://code.google.com/p/memcached/issues/detail?id=230 )

  * t/issue_67.t is proving tougher; once in a while memcached will
lock up during init in thread_init;
there is one particular point where adding a printf will make it work
apparently reliably.  I've got one
or two ideas but I need to check my understanding of pthread_cond_wait first.

  * There is an assert I've seen triggered once - not looked at that yet.

== String routines ==
  * While I was off last week, my memchr and strlen were accepted into newlib
  * Joseph has responded to my eglibc mail, with a couple of small queries.

== Other ==
  * Wrote a more detailed test case for bug 873453 (odd timing
behaviour on panda); it's
quite odd - I can get > ~80ms timing discrepency so it's not a clock
granularity issue.
  * Replicated a QEMU crash for Peter.

Dave

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

[ACTIVITY] 2011-11-04

2011-11-04 Thread David Gilbert

=== 64 bit atomics
  * I got the race in membase down to a futex issue, and asking dmart
pointed me at a kernel bug that
affects recent kernels where a fix had gone in about a month ago.
That was a nasty one!
  * I've still got a few bugs left; most are turning out to be timing
races in the test code (e.g. one that
times out after 2seconds but the code takes around 1.7 seconds ish -
but if something else gets
in trips over the line, and another one where it did a recv_from on a
socket but only got
the start of a message, presumably because the sender had used
multiple sends).  It's tricky going
because the tests are a combination of most scripting languages (perl,
python, ruby with a splash of Erlang).
I've so far found no bugs in the atomic code.
  * I looked at apr and SDL-1.3; both of which use atomics; but end up
not using 64bit atomics;
the tendency is for them to ensure they can do atomics on long and on
a void*; both of which
for us are 32bit.


=== String routines
  * I've got the Newlib A15 optimised memcpy running in a test harness
at the moment for
comparison.

=== Listening to connect
  * I listened in to a few connect sessions each day; the 1st day or
so was 3/4 lost on
audio systems that didn't work (I'm especially annoyed at not being
able to hear the QEMU for A15/KVM session
and toolchain support for kernel).  The Rypple session was rather lost
through the lack of any screen share
or slides.

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

[ACTIVITY] 2011-11-11

2011-11-11 Thread David Gilbert

== 64 bit atomics ==
  * Nailed one more of the membase tests; again this was a test
harness race condition (which I've reported here:
  http://code.google.com/p/moxi/issues/detail?id=2&thanks=2&ts=1321037460 )

 In this case there were two calls to write) performed on the
server, yet the test client performed a single read and
compared the result to what it was expecting; and got lucky on x86 and
about half the time on ARM in that the
server data managed to all get read by the 1st read.

 I think this leaves one more case - that I've seen rarely.

== Qemu ==
   * Tested Peter's 11.11 pre release; ran into a couple of issues
(vexpress without sound causing hangs, and
  the Linaro 11.10 Beagle and Overo images not running X).  Also
filed a couple of bugs in l-i-f-ui that
  I tripped over while testing it.

== String routines ==
* The new newlib A15 optimised memcpy is slower on an A9 than my
routines; posted to newlib list
asking what the normal way of dealing with a bunch of different
routines is.   Would it make sense to get
gcc to define a GCC_ARM_TUNE_CORTEX_A-whatever ?

== Other ==
* Watched the Youtube video of the Kernel/Toolchain discussion -
for those who didn't attend,
I'd encourage a check of the Youtube videos, they're pretty nicely done.
* Got pulled away on non-Linaro work for about half the week.

Dave

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

[ACTIVITY] 2011-11-18

2011-11-18 Thread David Gilbert

== 64 bit atomics ==
  * Still fighting membase
 * Cleaned up a bunch of other issues, but I'm back at an 'expiry'
issue, where the test
stores some data with a fixed expiry time and then waits until after
it should have expired,
and checks it has.  Except on ARM it sometimes doesn't expire quickly
enough.  I've got
enough debug now to see that the server processes view of time (which
it updates via
an event about every second) is sometimes very behind gettimeofday()'s
view of time - and
have a small test for it.  This doesn't seem to happen on x86.  The
good part is that it's now
a much smaller test, the bad part is that it fails rarely - somewhere
between 1/1000 and 1/100
depending on its mood.

  * Looked at a few other things to see if they might use 64 bit atomics:
- spice's (as in the VNC like protocol) FAQ said it needed 64bit
atomics and didn't work
on 32bit machines due to that; but the source appears to have been
fixed for 32bit.
- Looked at boost lock-free; it does have an implementation using
gcc's __sync primitives,
however for ARM it uses a hand coded set of primitives, those are
missing the 64 bit implementation,
but the contributor of the ARM code said that the boost lock-free
author preferred
not to use the gcc primtives.

== Other ==
  * Testing latest libffi rc
- Had most of my varargs for hf fix in (had missed one part of a test)

  * 1 day of non-linaro work

I'm on holiday next week.

Dave

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

[ACTIVITY] 2011-12-02

2011-12-02 Thread David Gilbert

== String routines ==
  * Sent updated memchr to the eglibc list

== 64 bit atomics ==
  * Ran a set of timing consistency tests that a colleague had sent me
while I was off; Panda passed those, so time
doesn't appear to be going backwards or anything, so that's not the
problem with membase.
  * Pushed the code into linaro-gcc.

== QEmu ==
  * Tested Peter's prerelease - all good.
  * Started looking at the issues for running in TCG mode on ARM

== Other ==
  * Read through the ARMv8 instructions docs that landed on arm.com;
quite interesting.  Note that multiple instruction
IT blocks are listed as being deprecated for 32bit mode on v8
(although this will work but it can be put in a mode to fault
you to make it easy to find the uses).
  * Some debugging of Panda odd timing issue with Paul Mckenney.

Dave

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Re: Release notes for GCC 4.6

2011-12-08 Thread David Gilbert

On 7 December 2011 20:36, Andrew Stubbs  wrote:
> Hi all,
>
> I've copied all those who made commits to GCC 4.6 this month.
>
> Could you please give me a sentence or two for the release notes?

Support for 64bit __sync* primitives on ARM.

Dave

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

[ACTIVITY] 2011-12-09

2011-12-09 Thread David Gilbert

== QEMU ==
  * Wrote a fix for bug 883133 (code buffer/libc conflict); spent some
time testing it because
I wasn't sure whether the crash I was seeing after that was my fix not
being complete or actually
bug 893208.
  * Got it to boot with -cpu 486; without that it's triple faulting in
a divide just after a load of time stamp
reads which makes me suspicious that 893208 is a timer problem.
  * (It also fails when used with vnc graphics, but works in SDL and
curses, but I'll leave that bug for
another time).

== String routines ==
  * With one more tweak to my memchr, it finally made it into eglibc.


Dave

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

[ACTIVITY] 2011-12-16

2011-12-16 Thread David Gilbert

== General ==
  * Tidying things up and updating my list of statuses

== String routines ==
  * Adding strchr and strlen to eglibc; tests running at the moment.

Dave

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

[ACTIVITY] 2011-12-23 - and goodbye

2011-12-23 Thread David Gilbert

== QEMU ==
  * Wrote the context routines for Eglibc, including those that QEMU uses
These pass all the context tests I could find, including QEMUs coroutine
tests, and with them QEMU seems to boot OK.  I've got a full eglibc test
run going at the moment, but I don't think anything else uses
them.  I posted
them with comments and a question to libc-ports; I'll try and chase follow
ups.

== String routines ==
  * I posted the strchr and strlen routines to eglibc (libc-ports)
* On  strchr the question of whether it was worth using the longer version
that's faster for longer strings (but slower for shorter strings) came
up.  I posted
some stats, observations etc - and there is still a discussion on
going about it.
* For strlen, rth noted the same trick that I'd originally seen in newlib
(and for which RichardS and Ramana had suggested) of a quicker end-of-string
sequence using clz.  I'd avoided this because I'd originally seen it
in newlib and
didn't want to copy it; but since 3 people have individually suggested it it
would seem using.

== Goodbye! ==
  Thank you all for a fun & interesting year!  I'm sure many of us
will meet online again in the future.I'll try and follow my
linaro.org address
while it's still live to check for any replies to any patches/comments etc.

  Feel free to mail me at david...@uk.ibm.com (work) or d...@treblig.org (home);
for Linaro people I've also added some more contact methods at:

   https://wiki.linaro.org/Internal/People/DaveGilbert/Contact

Thanks again!

Dave

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

73 matches

Mail list logo