[RFC] NEON vs. ARM register selection

2012-03-02 Thread Andrew Stubbs

Hi All,

As you know, the compiler currently has difficulties choosing between 
whether to do an operation in NEON or not.


As I see it there are three problems:

  1. Simply, is it profitable?

 NEON can do many DImode operations in one or two instructions
 where 2 to 10 normal ARM/Thumb instructions would be required
 (not to mention the added register pressure), but there is a
 cost associated with moving the inputs to NEON, and the results
 back.

 If the data can stay in NEON for more than one operation,
 then that's even better.

 If the data must be loaded from memory, and the result stored back
 to memory, then it's only a question of whether the register space
 is available, or not.

 Currently these decisions are made in the IRA/reload passes.

  2. Values that originate in hard-registers stay there.

 This applies to function parameters, mostly, but also in general
 where the result of an operation is allocated first.

 If there is no instruction that can use the value there then the
 value is 'reloaded' to a more suitable register. If there is any
 alternative that avoids the move then the register allocator will
 use it, regardless of the relatives costs of the other
 alternatives.

 This problem is reduced where an operation and move can happen in
 one instruction, but NEON instructions do not do this much. We can
 write insns that appear to do it, but these output multiple
 instructions (see my recent core-SI=>NEON-DI extend patch).

  3. It all happens too late.

 The decision whether to use NEON or not is not made until register
 allocation time. Naturally this means that most of the optimization
 passes are already completed.

 Part of the problem is that the operation almost certainly needs
 splitting (into whatever form was chosen) and this might not be
 straight forward, post-reload. (However, the split1 pass is
 already quite late, so perhaps this isn't such a big deal.)

 Another part of the problem is that passes such as the two
 lower-subreg passes make assumptions about the register width which
 are not accurate if the operation is to end up in NEON.

There are other, lesser problems, such as it being hard to adjust the 
costs for different cores (A8 in particular) and the cost of generating 
an immediate constant can't be known until it's known what instructions 
will be used to generate it.


These problems are not specific to NEON, of course. I believe IWMMXT 
suffers from the same issues. Likewise the C6X port, and also the i386 
MMX to some degree. Anything that has instructions that only operate on 
a subset of registers, basically.



So, Bernd has suggested an outline of a solution. I've quizzed him on 
this, added a few of my own ideas, and probably a good selection of 
misunderstandings, bad assumptions, and general cock ups, and come up 
with something I can write here for comment. I can post something to 
upstream later if it doesn't get totally shot down now.



The basic idea is that we add a new RTL optimization pass (or two) that 
assesses the usage of pseudo registers, and makes recommendations about 
what register class each should end up in, if there's a choice. These 
recommendations would then be used by later passes to get a better use 
of NEON. I might call this the "prealloc" pass, or something.



Firstly, for each pseudo-register in a function, the pass would look at 
the insn constraints for each "def" and "use", and see how the registers 
relate to one another. This might determine things like "if rN is in 
class A, then rM must be also in class A".


E.g. if you have two registers with constraints like this:

 "r,w"
 "r,w"

.. (and 'r' and 'w' do not overlap) then you know that there is a choice 
between one mode or another, whereas this:


 "r,w,r,w"
 "r,w,w,r"

.. would impose no restrictions and we can carry on as normal.

Having done that we'd end up with sets of pseudo-registers that must 
make a decision one way or the other, and we'd know where the operations 
are that would force a move from one class to the other.


There's a fair amount of handwavium in there at present, because I've 
not worked out what to do with overlapping register classes (think 
VFP_LO_REGS) and all the other complications.



Secondly, the pass would consider the costs of each alternative, and 
store a recommended register class for each pseudo-register in a table 
somewhere. It would also create new pseudos and insert extra move 
instructions at the register file boundaries where an existing register 
would have had split recommendations (this would solve problem 2 above).


Again, there's handwavium in "consider the costs". This isn't too hard 
for size-optimization (assuming the "length" attributes on the insn is 
correct), but more difficult for speed optimization. Factors to include 
would be the move costs (here the A8 

Re: Please pull upstream rev. 184603 into gcc-linaro 4.7

2012-03-02 Thread Andrew Stubbs

On Wed 29 Feb 2012 18:05:46 GMT, Andrew Stubbs wrote:

On 29/02/12 17:23, Bernhard Rosenkränzer wrote:

Hi,
184603 fixes an ICE we're running into with Android test builds.
Please pull it in ASAP so I don't have to mess with the CFLAGS as a
workaround.


There's a merge from r184662 begun testing today. That should cover
your revision.

I'll commit it if it's not borked when the tests come back. That'll be
tomorrow or Friday, I expect.


Now committed.

Bero, give it a go and see if it does what you need, please.

Andrew


___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain


lp:gcc/4.7

2012-03-02 Thread Andrew Stubbs

Hi Matthias,

GCC 4.7.0 has branched upstream. SVN trunk is now 4.8.

Could you please create lp:gcc/4.7 from the release branch.

Thanks

Andrew

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain


[ACTIVITY] report week 09

2012-03-02 Thread Peter Maydell
Current Milestones:
||  || Planned|| Estimate   || Actual ||
||cp15-rework   || 2012-01-06 || 2012-??-?? ||||
(new blueprints & reestimate for this one pending)

Historical Milestones:
||a15-usermode-support  || 2011-11-10 || 2011-11-10 || 2011-10-27 ||
||upstream-omap3-cleanup|| 2011-11-10 || 2011-12-15 || 2011-12-12 ||
||initial-a15-system-model  || 2012-01-27 || 2012-01-27 || 2012-01-17 ||
||qemu-kvm-getting-started  || 2012-03-04?|| 2012-03-04?|| 2012-02-01 ||

 == cp15-rework ==
  * ploughing through conversion of cp15 registers to new design:
patchset now 20 patches long, still TODO crn={0,1,6,7,9}
 == other ==
  * reviewed more Xilinx Zynq model patches
  * looking at BE8 support: Paul Brook has posted some patches
to support this in user mode
  * LP:944645: fixed bug where we weren't clearing the IT bits when
entering an M profile exception handler
  * sent out an arm-devs.next pullreq
  * trying to track down why linux-user is failing brk() and thus
causing bash segfaults

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain


Re: [RFC] NEON vs. ARM register selection

2012-03-02 Thread Ramana Radhakrishnan
On 2 March 2012 12:29, Andrew Stubbs  wrote:
> Hi All,
>
> As you know, the compiler currently has difficulties choosing between
> whether to do an operation in NEON or not.
>

I have put this on the agenda for Tuesday's call - There is a bit of
detail here that I haven't digested fully which is why I didn't respond in
any detail earlier.

Ramana

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain


Agenda for next performance call

2012-03-02 Thread Ramana Radhakrishnan
is now here

https://wiki.linaro.org/WorkingGroups/ToolChain/Meetings/2012-03-06

Please add any topics that you might consider interesting for next time.

Ramana

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain


[Activity ] Week 9

2012-03-02 Thread Ramana Radhakrishnan
==Progress===

* Finished off PGO patch - sent upstream.
* Finished off the ABI tests - sent upstream.
* Investigated fixes for LP 942307 - a problem with kernel builds for
android. Backported a fix from Uli last year.
* Upstream patch review.
* Small configury done for SPEC2k as far as HC partitioning goes.
* Some Android benchmark investigations.
* Recovered from a broken upgrade on my laptop from natty to oneiric
on my laptop and then went all the way to Precise. It works
  reasonably !


=== Plans ===

* Commit all approved and tested patches.
* Check on hc partitioning results from SPEC2k and make sure there is
an improvement and the feature works !
* Investigate https://bugs.launchpad.net/gcc-linaro/+bug/924726 in a
little more detail.
* Get back to partial-partial PRE.


Absences.

* 1 week holiday sometime before that - to be booked.
* Linaro Connect Q2.12 - May 28 - June 1 - travel booked - hotel to be booked.

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain


[ACTIVITY] Feb 27 - Mar 2

2012-03-02 Thread Ulrich Weigand

== GCC ==

 * Fixed mainline regression causing ICE in certain outer-loop
   vectorization cases.

 * Merged fwprop-subreg patch into Linaro GCC 4.7.

 * Completed patch to generate usat/ssat instructions
   where appropriate; checked into GCC mainline.
   Merge requests to Linaro GCC 4.6 and 4.7 pending.

 * Ongoing work on improving end-of-loop value computation.


Mit freundlichen Gruessen / Best Regards

Ulrich Weigand

--
  Dr. Ulrich Weigand | Phone: +49-7031/16-3727
  STSM, GNU compiler and toolchain for Linux on System z and Cell/B.E.
  IBM Deutschland Research & Development GmbH
  Vorsitzender des Aufsichtsrats: Martin Jetter | Geschäftsführung: Dirk
Wittkopp
  Sitz der Gesellschaft: Böblingen | Registergericht: Amtsgericht
Stuttgart, HRB 243294


___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain


[ACTIVITY] weekly status

2012-03-02 Thread Ken Werner

Hi,

OpenEmbedded:
 * Worked on the meta-linaro layer and added libgcc and crosssdk 
recipes to satisfy some bitbake dependencies
   * I had to apply a few patches to build the linaro toolchain the OE 
way (mostly gcc configury)

   * successfully built the sato and Qt images
 * Moved on to test the February release of the linaro binary toolchain 
and (probably) and hit an issue with unaligned SD card images to used 
with QEMU

   * the guest kernel fails with: attempt to access beyond end of device
   * /proc/partitions shows different block sizes (host vs. guest)
   * the image size gets calculated on the fly by OE
   * patch posted that introduces allows to specify a rootfs size alignment
   * not seen on trunk as they use IDE
 * Started to rebase the linaro-meta layer against current OE-core
 * created https://wiki.linaro.org/KenWerner/Sandbox/OEMetaLinaroCard 
based on the existent card of David R.


Regards,
Ken

___
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain