[GitHub] [lucene] rmuir opened a new issue, #12302: vector API integration, plan B

via GitHub Tue, 16 May 2023 17:30:19 -0700


rmuir opened a new issue, #12302:
URL: https://github.com/apache/lucene/issues/12302


   ### Description
   
   For years we have explored using the vector api to actually take advantage 
of SIMD units on the hardware. A couple of approaches have been attempted so 
far:
   1. try to coerce the hotspot superword autovectorization into giving better 
results: I think @jpountz may know details of this and it currently is limited 
to postings list decode, attempting to use 64-bit long. It is better than 
nothing, but not good enough. especially for stuff like vectors encoding and 
other bottlenecks.
   2. vectorize code and hope that the openjdk feature will "graduate" soon. 
This is not happening. Java is dying and becoming the next COBOL, in my 
opinion, as a result of ignoring the problem and delaying until "perfection". 
For evidence, simply look at vector api being in "6th incubation" with really 
no api changes happening, just waiting on other features (some wet dream 
project valhalla or whatever) which will prolly never land before we retire.
   3. hackedy-häcks: this is stuff to bypass the problem, such as prototype i 
wrote in frustration *two years ago* here: 
https://github.com/apache/lucene/pull/18 . It is dirty and insecure but it 
demonstrates we can potentially make stuff easier on the user and take 
advantage of the hardware.
   
   Currently, unless the user is extremely technical, they can't make use of 
the vector support their hardware has, which is terribly sad. If they have 
immense resources/funding/etc, they can fork lucene and patch the source code, 
and maintain a fork, hooking in incubator openjdk stuff, but that's too hard on 
users.
   
   I think we have to draw a line in the sand, basically we can not rely upon 
openjdk to be managed as a performant project, their decisions make no sense, 
we have to play a little less nicer and apply some hacks! otherwise give up and 
switch to a different programming language with better perf!
   
   So I'd suggest to look at the work @uschindler has done with mmap and the 
preview apis, and let's carve out a path where we use the vector api *IFF* the 
user opts in via the command-line.
   
   Proposal (depends entirely upon user's jdk version and supported flags): 
   1. user does nothing and runs lucene without special flags: they get a 
warning message logged to the console (once!) telling them they need to add 
some stuff to the commandline for best performance. something such as "vector 
falling back to scalar implementation: please add "--add-modules .x.y.z...."
   2. user supplies that command-line argument and lucene is faster and uses 
correct incubating vector api associated with their jdk version.
   
   Actually the system @uschindler developed I think is the correct design for 
this, the only trick is that the incubating api is more difficult than the 
preview api. So we need more build system support, it could require more stuff 
to be downloaded or build to be slower. But I think its the right decision? 
   
   We don't want to have base64-encoded hackiness that is hard to maintain, at 
the same time, we need to give the users option to opt-in to actually making 
use their hardware. I think we should suffer the complexity to make this easy 
on them. It fucking sucks that openjdk makes this almost impossible, but we 
need to do it for our users. That's what being a library is all about.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] rmuir opened a new issue, #12302: vector API integration, plan B

Reply via email to