https://bugs.kde.org/show_bug.cgi?id=427404

Carl Love <c...@us.ibm.com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |will_schm...@vnet.ibm.com

--- Comment #4 from Carl Love <c...@us.ibm.com> ---
The functional support for the inner product support uses a clean helper to
implement the functionality.  Total there will be something like 30 of these
outer product instructions that need to be supported.  They are similar to a
matrix multiply operation which is rather complicated.  It seems very unlikely
that another architecture will implement something similar so adding a bunch of
ppc specific IOps is not appealing.  

The other issue is these instructions read and write the results to the new
accumulator registers.  There are eight 512-bit accumulators that are
read/written as a 128-bit word.  Each of the eight Accumulators are associated
with VSR registers.  For example ACC[0][0] is associated with VSR[0], i.e. the
first 128-bit word in ACC zero is associated with the first VSR0, ACC[0][1] is
associated with VSR1, ACC[0][2] is associated with VSR2, ACC[0][3] is
associated with VSR3, ACC[1][0] is associated with VSR4, etc.

There are instructions for moving values to/from ACC[i] and its corressponding
VSRs.  If we were to try and implement these instructions as IOps, it would
require a bit of overhead to setup the guest ACC[i] entry before issuing the
instruction and then saving the results to the guest ACC[i] entry.  I think it
would be possible but a little messy.

At this point, it appears these instructions will only show up in hand coded
library routines.  These are not instructions that GCC will issue for normal
user code.  So anyway, it seems like doing these as clean helpers is reasonable
instead of creating a bunch of new IOps.

The one issue with the implementation is the clean helper interface can only
return 64-bits.  So the helper is called twice, the first time it returns the
upper 64-bits of the result, the second time the lower 64-bit of the result. 
Not an ideal solution.  I did look at trying to extend the clean helper
interface to return 128-bits but that did not appear to be very straight
forward.

Anyway, the code review needs to decide if this is really the best
implementation or not for these instructions given the issues outlined above.

-- 
You are receiving this mail because:
You are watching all bug changes.

Reply via email to