Thoughts on progressing Toolchain Working Group Lava integration

2013-04-16 Thread Matthew Gretton-Dann

Paul,

I've been having some thoughts about CBuild and Lava and the TCWG 
integration of them both.  I wish to share them and open them up for general 
discussion.


The background to this has been the flakiness of the Panda's (due to heat), 
the Arndale (due to board 'set-up' issues), and getting a batch of Calxeda 
nodes working.


The following discussion refers to building and testing only, *not* 
benchmarking.


If you look at http://cbuild.validation.linaro.org/helpers/scheduler you 
will see a bunch of calxeda01_* nodes have been added to CBuild.  After a 
week of sorting them out they provide builds twice as fast as the Panda 
boards.  However, during the setup of the boards I came to the conclusion 
that we set build slaves up incorrectly, and that there is a better way.


The issues I encountered were:
 * The Calxeda's run quantal - yet we want to build on precise.
 * Its hard to get a machine running in hard-float to bootstrap a 
soft-float compiler and vice-versa.
 * My understanding of how the Lava integration works is that it runs the 
cbuild install scripts each time, and so we can't necessarily reproduce a 
build if the upstream packages have been changed.


Having thought about this a bit I came to the conclusion that the simple 
solution is to use chroots (managed by schroot), and to change the 
architecture a bit.  The old architecture is everything is put into the main 
file-system as one layer.  The new architecture would be to split this into two:


 1. Rootfs - Contains just enough to boot the system and knows how to 
download an appropriate chroot and start it.
 2. Chroots - these contain a setup build system that can be used for 
particular builds.


The rootfs can be machine type specific (as necessary), and for builds can 
be a stock linaro root filesystem.  It will contain scripts to set the users 
needed up, and then to download an appropriate chroot and run it.


The chroot will be set up for a particular type of build (soft-float vs 
hard-float) and will be the same for all platforms.  The advantage of this 
is that I can then download a chroot to my ChromeBook and reproduce a build 
locally in the same environment to diagnose issues.


The Calxeda nodes in cbuild use this type of infrastructure - the rootfs is 
running quantal (and I have no idea how it is configured - it is what Steve 
supplied me with).  Each node then runs two chroots (precise armel and 
precise armhf) which take it in turns to ask the cbuild scheduler whether 
there is a job available.


So my first question is does any of the above make sense?

Next steps as I see it are:

 1. Paul/Dave - what stage is getting the Pandaboards in the Lava farm 
cooled at?  One advantage of the above architecture is we could use a stock 
Pandaboard kernel & rootfs that has thermal limiting turned on for builds, 
so that things don't fall over all the time.


 2. Paul - how hard would it be to try and fire up a Calxeda node into 
Lava?  We can use one of the ones assigned to me.  I don't need any fancy 
multinode stuff that Michael Hudson-Doyle is working on - each node can be 
considered a separate board.  I feel guilty that I put the nodes into CBuild 
without looking at Lava - but it was easier to do and got me going - I think 
correcting that is important


 3. Generally - What's the state of the Arndale boards in Lava?  Fathi has 
got GCC building reliably, although I believe he is now facing networking 
issues.


 4. Paul - If Arndale boards are available in Lava - how much effort would 
it be to make them available to CBuild?


One issue the above doesn't solve as far as I see it is being able to say to 
Lava that we can do a build on any ARMv7-A CBuild compatible board.  I don't 
generally care whether the build happens on an Arndale, Panda, or Calxeda 
board - I want the result in the shortest possible time.


A final note on benchmarking.  I think the above scheme could work for 
benchmarking targets all we need to do is build a kernel/rootfs that is 
setup to provide a system that produces repeatable benchmarking results.


Comments welcome from all.

Thanks,

Matt

--
Matthew Gretton-Dann
Toolchain Working Group, Linaro

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain


Re: [Linaro-validation] Thoughts on progressing Toolchain Working Group Lava integration

2013-04-16 Thread Antonio Terceiro
On Tue, Apr 16, 2013 at 10:49:23AM +0100, Matthew Gretton-Dann wrote:
>  2. Paul - how hard would it be to try and fire up a Calxeda node
> into Lava?  We can use one of the ones assigned to me.  I don't need
> any fancy multinode stuff that Michael Hudson-Doyle is working on -
> each node can be considered a separate board.  I feel guilty that I
> put the nodes into CBuild without looking at Lava - but it was
> easier to do and got me going - I think correcting that is important

Support for the Calxeda nodes is being worked on (at code review stage),
and as you would expect that's orthogonal to multi-node testing. It
should land soonish.

> One issue the above doesn't solve as far as I see it is being able
> to say to Lava that we can do a build on any ARMv7-A CBuild
> compatible board.  I don't generally care whether the build happens
> on an Arndale, Panda, or Calxeda board - I want the result in the
> shortest possible time.

Good point, right now you have to explicitly ask for some device type
... but if you want the quickest response, your best bet is to submit to
the faster devices. :-)

-- 
Antonio Terceiro
Software Engineer - Linaro
http://www.linaro.org


signature.asc
Description: Digital signature
___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain


Re: [Linaro-validation] Thoughts on progressing Toolchain Working Group Lava integration

2013-04-16 Thread Renato Golin
On 16 April 2013 12:37, Antonio Terceiro wrote:

> Good point, right now you have to explicitly ask for some device type
> ... but if you want the quickest response, your best bet is to submit to
> the faster devices. :-)
>

This is not the point, I think.

For toolchain testing, specific CPU matters less than for kernel testing.
Even less important is which particular board revision or flavour. If the
build system is smart and can figure out which CPU it's running (most are),
it should make no difference if we run builds on dual-A9, quad-A9 or even
A15, as long as it builds and passes the tests.

For instance, fixing Panda-ES on LAVA means I'll wait on a long queue,
because there were only a few of them, while the old Panda had 15 idle all
the time. They might be slower, but it's much quicker to get results from
them than waiting for the ES to free up.

In the past, I have used a language that describes system properties to
reserve boards (like "A9 & NEON & RAM >= 1GB") that would give me a list of
available boards, when I'd choose one based on my own criteria. So, if you
know how long it usually takes to build on X, Y and Z boards, and you have
a list of jobs waiting on each one of them, with their own average build
times, you can estimate which will be freed first, and list the boards
sorted by that order. I could then pick the one I think it's best and add
my build to that board's queue.

With the number of different boards going up and the total number of boards
in the racks also going up, including virtual machines, I assume this will
save a lot of time in the future, even though it looks quite daunting right
now to implement.

cheers,
--renato

PS: I've used this system completely automatic for our regressions tests,
in parallel by many developers and benchmarks at the same time and it
worked a charm.
___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain


Re: g++ 4.7.3 ICEs building SNU

2013-04-16 Thread Matthew Gretton-Dann

On 15/04/13 21:29, Tom Gall wrote:

Hi,

Feel free to point me at a newer toolchain. Was building the SNU
OpenCL SDK native on my chromebook running ubuntu raring when I hit
the following:



make: Entering directory `/home/tgall/opencl/SNU/src/runtime/build/cpu'
arm-linux-gnueabihf-g++  -
-mfpu=neon -ftree-vectorize -ftree-vectorizer-verbose=0 -fsigned-char
-fPIC -DDEF_INCLUDE_ARM -g -c -o smoothstep.o
/home/tgall/opencl/SNU/src/runtime/hal/device/cpu/common/smoothstep.c
-I/home/tgall/opencl/SNU/inc
-I/home/tgall/opencl/SNU/src/runtime/hal/device/cpu
-I/home/tgall/opencl/SNU/src/runtime/hal/device/cpu/async
-I/home/tgall/opencl/SNU/src/runtime/hal/device/cpu/atomic
-I/home/tgall/opencl/SNU/src/runtime/hal/device/cpu/common
-I/home/tgall/opencl/SNU/src/runtime/hal/device/cpu/conversion
-I/home/tgall/opencl/SNU/src/runtime/hal/device/cpu/geometric
-I/home/tgall/opencl/SNU/src/runtime/hal/device/cpu/integer
-I/home/tgall/opencl/SNU/src/runtime/hal/device/cpu/math
-I/home/tgall/opencl/SNU/src/runtime/hal/device/cpu/reinterpreting
-I/home/tgall/opencl/SNU/src/runtime/hal/device/cpu/relational
-I/home/tgall/opencl/SNU/src/runtime/hal/device/cpu/vector  -O0 -g
In file included from
/home/tgall/opencl/SNU/src/runtime/hal/device/cpu/cl_cpu_ops.h:47:0,
  from
/home/tgall/opencl/SNU/src/runtime/hal/device/cpu/common/smoothstep.c:34:
/home/tgall/opencl/SNU/src/runtime/hal/device/cpu/type/cl_ops_floatn.h:
In function 'float2 operator-(float, float2)':
/home/tgall/opencl/SNU/src/runtime/hal/device/cpu/type/cl_ops_floatn.h:114:1:
internal compiler error: output_operand: invalid operand for code 'P'
Please submit a full bug report,
with preprocessed source if appropriate.
See  for instructions.
Preprocessed source stored into /tmp/cciluYVq.out file, please attach
this to your bugreport.
Traceback (most recent call last):
   File "/usr/share/apport/gcc_ice_hook", line 34, in 
 pr.write(open(apport.fileutils.make_report_path(pr), 'w'))
   File "/usr/lib/python2.7/dist-packages/problem_report.py", line 254, in write
 self._assert_bin_mode(file)
   File "/usr/lib/python2.7/dist-packages/problem_report.py", line 632,
in _assert_bin_mode
 assert (type(file) == BytesIO or 'b' in file.mode), 'file stream
must be in binary mode'
AssertionError: file stream must be in binary mode
make: *** [smoothstep.o] Error 1


I can reproduce this with upstream 4.7 and trunk (and so presumably with 
4.8).  I've raised it upstream with a reduced testcase.  See 
http://gcc.gnu.org/PR56979.


Thanks,

Matt



--
Matthew Gretton-Dann
Toolchain Working Group, Linaro

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain


Re: [Linaro-validation] Thoughts on progressing Toolchain Working Group Lava integration

2013-04-16 Thread Peter Maydell
On 16 April 2013 13:19, Renato Golin  wrote:
> In the past, I have used a language that describes system properties to
> reserve boards (like "A9 & NEON & RAM >= 1GB") that would give me a list of
> available boards, when I'd choose one based on my own criteria.

The trouble with this approach (as you may be aware :-)) is that
if the board farm includes a few 'rare' board types that happen
to be covered by a broad system property criteria used by most
people, it can be tricky to schedule jobs which really require
the 'rare' board type, because the rare resource can get
monopolised by a big job which could have run on anything but
happened to get scheduled to the rare board because it was
temporarily free. This is particularly acute if the rare board
is also a rather slow one.

-- PMM

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain


Re: [Linaro-validation] Thoughts on progressing Toolchain Working Group Lava integration

2013-04-16 Thread Renato Golin
On 16 April 2013 13:28, Peter Maydell  wrote:

> The trouble with this approach (as you may be aware :-)) is that
> if the board farm includes a few 'rare' board types that happen
> to be covered by a broad system property criteria used by most
> people, it can be tricky to schedule jobs which really require
> the 'rare' board type, because the rare resource can get
> monopolised by a big job which could have run on anything but
> happened to get scheduled to the rare board because it was
> temporarily free. This is particularly acute if the rare board
> is also a rather slow one.
>

There are a number of ways you can overcome this, for example:
 * by not listing this particular board by components or configurations,
but solely by name, so it can only be scheduled by specific jobs that call
it by name,
 * adding a huge weight to it, making it always fall to the bottom of most
lists, and only show up when you search so specific that only that board
appears

There are other problems, too and they can be dealt with reasonably
quickly, but validating each one is not a trivial task and gets
incrementally difficult. I'm not claiming this should be top priority, but
a possible future we might want to be in. ;)

cheers,
--renato
___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain


Re: Thoughts on progressing Toolchain Working Group Lava integration

2013-04-16 Thread Matthias Klose
Am 16.04.2013 11:49, schrieb Matthew Gretton-Dann:
> The issues I encountered were:
>  * Its hard to get a machine running in hard-float to bootstrap a soft-float
> compiler and vice-versa.

hmm, why?

when using precise or quantal as the build environment, then having these
packages installed should be good enough:

  libc6-dev-armhf [armel], libc6-dev-armel [armhf]
  binutils
  g++-multilib

Although I still have a local patch to support the multilib configuration:

http://anonscm.debian.org/viewvc/gcccvs/branches/sid/gcc-4.8/debian/patches/arm-multilib-defaults.diff?revision=6640&view=markup

  Matthias


___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain


Re: g++ 4.7.3 ICEs building SNU

2013-04-16 Thread Matthias Klose
[CC'ing Martin]

Am 15.04.2013 22:29, schrieb Tom Gall:
> internal compiler error: output_operand: invalid operand for code 'P'
> Please submit a full bug report,
> with preprocessed source if appropriate.
> See  for instructions.
> Preprocessed source stored into /tmp/cciluYVq.out file, please attach
> this to your bugreport.
> Traceback (most recent call last):
>   File "/usr/share/apport/gcc_ice_hook", line 34, in 
> pr.write(open(apport.fileutils.make_report_path(pr), 'w'))
>   File "/usr/lib/python2.7/dist-packages/problem_report.py", line 254, in 
> write
> self._assert_bin_mode(file)
>   File "/usr/lib/python2.7/dist-packages/problem_report.py", line 632,
> in _assert_bin_mode
> assert (type(file) == BytesIO or 'b' in file.mode), 'file stream
> must be in binary mode'
> AssertionError: file stream must be in binary mode
> make: *** [smoothstep.o] Error 1

the exeption from the apport hook here is strange, this should be fixed. which
versions of apport and python-problem-report are installed? any from the Linaro
overlays?

  Matthias


___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain


Re: Thoughts on progressing Toolchain Working Group Lava integration

2013-04-16 Thread Matthew Gretton-Dann

On 16/04/13 14:08, Matthias Klose wrote:

Am 16.04.2013 11:49, schrieb Matthew Gretton-Dann:

The issues I encountered were:
  * Its hard to get a machine running in hard-float to bootstrap a soft-float
compiler and vice-versa.


hmm, why?

when using precise or quantal as the build environment, then having these
packages installed should be good enough:

   libc6-dev-armhf [armel], libc6-dev-armel [armhf]
   binutils
   g++-multilib

Although I still have a local patch to support the multilib configuration:

http://anonscm.debian.org/viewvc/gcccvs/branches/sid/gcc-4.8/debian/patches/arm-multilib-defaults.diff?revision=6640&view=markup


I honestly don't know what the issue is - except that when I try to 
bootstrap a vanilla FSF GCC arm-none-linux-gnueabi with the initial host 
compiler as arm-none-linux-gnueabihf I get failures during libraries builds 
in stage 1.


Also given that we try to build vanilla compilers, and so for 4.6 & 4.7 that 
requires fiddling with links in /usr/lib and /usr/include to point into the 
multiarch stuff, doing this in a chroot is safer than on the main system.


Thanks,

Matt

--
Matthew Gretton-Dann
Toolchain Working Group, Linaro

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain


Re: Thoughts on progressing Toolchain Working Group Lava integration

2013-04-16 Thread Matthias Klose
Am 16.04.2013 15:46, schrieb Matthew Gretton-Dann:
> On 16/04/13 14:08, Matthias Klose wrote:
>> Am 16.04.2013 11:49, schrieb Matthew Gretton-Dann:
>>> The issues I encountered were:
>>>   * Its hard to get a machine running in hard-float to bootstrap a 
>>> soft-float
>>> compiler and vice-versa.
>>
>> hmm, why?
>>
>> when using precise or quantal as the build environment, then having these
>> packages installed should be good enough:
>>
>>libc6-dev-armhf [armel], libc6-dev-armel [armhf]
>>binutils
>>g++-multilib
>>
>> Although I still have a local patch to support the multilib configuration:
>>
>> http://anonscm.debian.org/viewvc/gcccvs/branches/sid/gcc-4.8/debian/patches/arm-multilib-defaults.diff?revision=6640&view=markup
>>
> 
> I honestly don't know what the issue is - except that when I try to bootstrap 
> a
> vanilla FSF GCC arm-none-linux-gnueabi with the initial host compiler as
> arm-none-linux-gnueabihf I get failures during libraries builds in stage 1.
> 
> Also given that we try to build vanilla compilers, and so for 4.6 & 4.7 that
> requires fiddling with links in /usr/lib and /usr/include to point into the
> multiarch stuff, doing this in a chroot is safer than on the main system.

this is not true. afaics all the active gcc linaro releases do have the
multiarch patches merged from upstream. So knowing the root cause would be
better than tampering with the links.

  Matthias


___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain


Re: Thoughts on progressing Toolchain Working Group Lava integration

2013-04-16 Thread Paul Sokolovsky
Hello Matt,


There were quite a few responses already, so I'll try to focus on the
questions to which I think I may contribute something useful.


On Tue, 16 Apr 2013 10:49:23 +0100
Matthew Gretton-Dann  wrote:

> Paul,
> 
> I've been having some thoughts about CBuild and Lava and the TCWG 
> integration of them both.  I wish to share them and open them up for
> general discussion.
> 
> The background to this has been the flakiness of the Panda's (due to
> heat), the Arndale (due to board 'set-up' issues), and getting a
> batch of Calxeda nodes working.
> 
> The following discussion refers to building and testing only, *not* 
> benchmarking.
> 
> If you look at http://cbuild.validation.linaro.org/helpers/scheduler
> you will see a bunch of calxeda01_* nodes have been added to CBuild.
> After a week of sorting them out they provide builds twice as fast as
> the Panda boards.  However, during the setup of the boards I came to
> the conclusion that we set build slaves up incorrectly, and that
> there is a better way.
> 
> The issues I encountered were:
>   * The Calxeda's run quantal - yet we want to build on precise.
>   * Its hard to get a machine running in hard-float to bootstrap a 
> soft-float compiler and vice-versa.
>   * My understanding of how the Lava integration works is that it
> runs the cbuild install scripts each time, and so we can't
> necessarily reproduce a build if the upstream packages have been
> changed.
> 
> Having thought about this a bit I came to the conclusion that the
> simple solution is to use chroots (managed by schroot), and to change
> the architecture a bit.  The old architecture is everything is put
> into the main file-system as one layer.  The new architecture would
> be to split this into two:
> 
>   1. Rootfs - Contains just enough to boot the system and knows how
> to download an appropriate chroot and start it.
>   2. Chroots - these contain a setup build system that can be used
> for particular builds.
> 
> The rootfs can be machine type specific (as necessary), and for
> builds can be a stock linaro root filesystem.  It will contain
> scripts to set the users needed up, and then to download an
> appropriate chroot and run it.
> 
> The chroot will be set up for a particular type of build (soft-float
> vs hard-float) and will be the same for all platforms.  The advantage
> of this is that I can then download a chroot to my ChromeBook and
> reproduce a build locally in the same environment to diagnose issues.
> 
> The Calxeda nodes in cbuild use this type of infrastructure - the
> rootfs is running quantal (and I have no idea how it is configured -
> it is what Steve supplied me with).  Each node then runs two chroots
> (precise armel and precise armhf) which take it in turns to ask the
> cbuild scheduler whether there is a job available.
> 
> So my first question is does any of the above make sense?

If you propose LAVA builds to use such chroot setup, then it
technically should be possible, but practically it will be quite a
chore to setup and maintain. If we want to use LAVA, why don't we
follow its way directly? It already allows to use (and switch easily)
any rootfs directly. There should be distro methods to pin packages to
specific versions. If you want to run LAVA's rootfs in chroot on
Chromebook, you can do just that - take one, transform to chroot and
use ("transform" stage may take a bit of effort initially, but at LAVA
rootfs is wholly based on Linaro standard linaro-media-create
technology, done once, it's reusable for all Linaro builds).

> 
> Next steps as I see it are:
> 
>   1. Paul/Dave - what stage is getting the Pandaboards in the Lava
> farm cooled at?  One advantage of the above architecture is we could
> use a stock Pandaboard kernel & rootfs that has thermal limiting
> turned on for builds, so that things don't fall over all the time.

I'm currently focusing on critical android-build issues, so anything
else is in backlog. And next up in my queue is supporting IT with
global Linaro services EC2 migration ;-I.

But the problem we have is not that we can't get reliable *builds* in
LAVA - it's that the *complete* CBuild picture doesn't work in LAVA.
Benchmarks is a culprit specifically. If you want reliable builds, just
use "lava-panda-usbdrive" queue - that will use those 15 standard Panda
boards mentioned by Renato, with known good rootfs/kernel. The problem,
gcc, etc. binaries produced by those builds won't run on benchmarking
image, because OS versions of "known good Panda rootfs" and "validated
CBuild PandaES rootfs" are different.

> 
>   2. Paul - how hard would it be to try and fire up a Calxeda node
> into Lava? 

As other folks answered, that completely depends on work which
(old-time) LAVA people do, not something I (a former Infra engineer)
can influence so far.

> We can use one of the ones assigned to me.  I don't need
> any fancy multinode stuff that Michael Hudson-Doyle is working on -
> each node can be considered a separate board