== Progress ==

* Connect last week.

  * Worked through the open issues and open work items related to
performance and we've got a clear list of things that are currently in
flight. Now to keep track of this better.
https://wiki.linaro.org/RamanaRadhakrishnan/Sandbox//RRQ212ConnectNotes
and move this away from the wiki page in a form that we can use to
talk during our regular performance meetings.
  * Created blueprints, closed down old issues and reprioritized
issues with Ulrich and others.
  * A number of interesting conversations during Connect for a number
of compiler related issues.
  * Other sessions that I attended included the Android optimizations
sessions - while there was quite a bit about toolchain performance it
is important that we keep looking out for the performance profiles and
find areas where the toolchain can be improved. However this can't be
done without getting more testcases from other groups. There were a
couple of interesting comments made that skia is CPU bound which would
indicate that the paint function is CPU bound. But why and how ?
Someone should look at reproducing these numbers and see where we get
to in this area. Pointed out that cortex-strings might be good to make
it into bionic ?
  * Fixed the vrev off by one error and committed to FSF trunk .
However it couldn't make it in time for FSF 4.7.1 as the merge window
had closed by then.
  * Set up my panda board to be identical to what runs on our
validation labs etc.

* This week

   *   Worked through the merge requests and moved some patches
upstream away from the "toreview" state.
   *   Landed a few merge requests that were approved but hadn't been
done so. Took care of merging the upstream 4.7 branch.
   *  Given I only had a few hours back in the office this week I
worked on regenerating arm_neon.h to use __builtin_shuffle with
vrev64, vrev32, vtrn , vzip and vuzp. A follow up patch needs to do
the same for vext but that needs generic support also in
vec_perm_const_ok .Once that is done I think we can safely start
rewriting . It still needs some more testing and polishing up but the
initial results on the testcase from PR48941 is kind of neat. The
result for some of the other testcases that I've looked at also looks
much better than where we were a few weeks back. So all in all nice
progress on that front.  However we have to also find a way of getting
these generated at O0 which they don't appear to do so cleanly enough
with this approach.

for one example it does look like this below: Notice those spills
beginning to disappear .... :)


New :

sqrlen4D_16u8:
        @ args = 0, pretend = 0, frame = 0
        @ frame_needed = 0, uses_anonymous_args = 0
        @ link register save eliminated.
        vabd.u8 q1, q0, q1
        vmull.u8        q0, d2, d2
        vmull.u8        q8, d3, d3
        vuzp.32 q0, q8
        vpaddl.u16      q0, q0
        vpadal.u16      q0, q8
        bx      lr

Old :

sqrlen4D_16u8:
        @ args = 0, pretend = 0, frame = 0
        @ frame_needed = 1, uses_anonymous_args = 0
        @ link register save eliminated.
        vabd.u8 q1, q0, q1
        stmfd   sp!, {r4, fp}
        add     fp, sp, #4
        sub     sp, sp, #48
        add     r3, sp, #15
        vmull.u8        q0, d2, d2
        bic     r3, r3, #15
        vmull.u8        q8, d3, d3
        vuzp.32 q0, q8
        vstmia  r3, {d0-d1}
        vstr    d16, [r3, #16]
        vstr    d17, [r3, #24]
        vpaddl.u16      q0, q0
        vpadal.u16      q0, q8
        sub     sp, fp, #4
        ldmfd   sp!, {r4, fp}
        bx      lr


   *  Attended platform / WG sync-up.

== Plans ==

 * Cleanup the ml bits of rewiring the intrinsics and try some proper testcases.
 * Work on the auto-inc-dec scheduler patches.
 * Rework the sched-pressure patch upstream .
 * Review the Android benchmarking writeups.

_______________________________________________
linaro-toolchain mailing list
linaro-toolchain@lists.linaro.org
http://lists.linaro.org/mailman/listinfo/linaro-toolchain

Reply via email to