rmuir commented on code in PR #13572:
URL: https://github.com/apache/lucene/pull/13572#discussion_r1685214866
##########
lucene/core/build.gradle:
##########
@@ -14,12 +14,59 @@
* See the License for the specific language governing permissions and
* limitations under the License.
*/
+plugins {
+ id "c"
+}
apply plugin: 'java-library'
+apply plugin: 'c'
description = 'Lucene core library'
+model {
+ toolChains {
+ gcc(Gcc) {
+ target("linux_aarch64"){
+ path '/usr/bin/'
+ cCompiler.executable 'gcc10-cc'
+ cCompiler.withArguments { args ->
+ args << "--shared"
+ << "-O3"
+ << "-march=armv8.2-a+dotprod"
Review Comment:
oh, the other likely explanation on the performance is that the integer dot
product in java is not AS HORRIBLE on the 256-bit SVE as it is on the 128-bit
neon. it more closely resembles the logic of how it behaves on AVX-256: two 8x8
bit integers ("64-bit vectors") are multiplied into intermediate 8x16-bit
result (128-bit vector) and added to 8x32-bit (256-bit vector). Of course, it
does not use SDOT instruction which is sad as it is CPU instruction intended
precisely for this purpose.
On the 128-bit neon there is not a possibility with java's vector api to
process 4x8 bit integers ("32-bit vectors") like the SDOT instruction does:
https://developer.arm.com/documentation/102651/a/What-are-dot-product-intructions-
Nor is it even performant to take 64-bit vector and process "part 0" then
"part 1". The situation is really sad, and the performance reflects that.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]