https://bugs.kde.org/show_bug.cgi?id=427404
Carl Love <c...@us.ibm.com> changed: What |Removed |Added ---------------------------------------------------------------------------- CC| |will_schm...@vnet.ibm.com --- Comment #4 from Carl Love <c...@us.ibm.com> --- The functional support for the inner product support uses a clean helper to implement the functionality. Total there will be something like 30 of these outer product instructions that need to be supported. They are similar to a matrix multiply operation which is rather complicated. It seems very unlikely that another architecture will implement something similar so adding a bunch of ppc specific IOps is not appealing. The other issue is these instructions read and write the results to the new accumulator registers. There are eight 512-bit accumulators that are read/written as a 128-bit word. Each of the eight Accumulators are associated with VSR registers. For example ACC[0][0] is associated with VSR[0], i.e. the first 128-bit word in ACC zero is associated with the first VSR0, ACC[0][1] is associated with VSR1, ACC[0][2] is associated with VSR2, ACC[0][3] is associated with VSR3, ACC[1][0] is associated with VSR4, etc. There are instructions for moving values to/from ACC[i] and its corressponding VSRs. If we were to try and implement these instructions as IOps, it would require a bit of overhead to setup the guest ACC[i] entry before issuing the instruction and then saving the results to the guest ACC[i] entry. I think it would be possible but a little messy. At this point, it appears these instructions will only show up in hand coded library routines. These are not instructions that GCC will issue for normal user code. So anyway, it seems like doing these as clean helpers is reasonable instead of creating a bunch of new IOps. The one issue with the implementation is the clean helper interface can only return 64-bits. So the helper is called twice, the first time it returns the upper 64-bits of the result, the second time the lower 64-bit of the result. Not an ideal solution. I did look at trying to extend the clean helper interface to return 128-bits but that did not appear to be very straight forward. Anyway, the code review needs to decide if this is really the best implementation or not for these instructions given the issues outlined above. -- You are receiving this mail because: You are watching all bug changes.