rmuir opened a new issue, #12302: URL: https://github.com/apache/lucene/issues/12302
### Description For years we have explored using the vector api to actually take advantage of SIMD units on the hardware. A couple of approaches have been attempted so far: 1. try to coerce the hotspot superword autovectorization into giving better results: I think @jpountz may know details of this and it currently is limited to postings list decode, attempting to use 64-bit long. It is better than nothing, but not good enough. especially for stuff like vectors encoding and other bottlenecks. 2. vectorize code and hope that the openjdk feature will "graduate" soon. This is not happening. Java is dying and becoming the next COBOL, in my opinion, as a result of ignoring the problem and delaying until "perfection". For evidence, simply look at vector api being in "6th incubation" with really no api changes happening, just waiting on other features (some wet dream project valhalla or whatever) which will prolly never land before we retire. 3. hackedy-häcks: this is stuff to bypass the problem, such as prototype i wrote in frustration *two years ago* here: https://github.com/apache/lucene/pull/18 . It is dirty and insecure but it demonstrates we can potentially make stuff easier on the user and take advantage of the hardware. Currently, unless the user is extremely technical, they can't make use of the vector support their hardware has, which is terribly sad. If they have immense resources/funding/etc, they can fork lucene and patch the source code, and maintain a fork, hooking in incubator openjdk stuff, but that's too hard on users. I think we have to draw a line in the sand, basically we can not rely upon openjdk to be managed as a performant project, their decisions make no sense, we have to play a little less nicer and apply some hacks! otherwise give up and switch to a different programming language with better perf! So I'd suggest to look at the work @uschindler has done with mmap and the preview apis, and let's carve out a path where we use the vector api *IFF* the user opts in via the command-line. Proposal (depends entirely upon user's jdk version and supported flags): 1. user does nothing and runs lucene without special flags: they get a warning message logged to the console (once!) telling them they need to add some stuff to the commandline for best performance. something such as "vector falling back to scalar implementation: please add "--add-modules .x.y.z...." 2. user supplies that command-line argument and lucene is faster and uses correct incubating vector api associated with their jdk version. Actually the system @uschindler developed I think is the correct design for this, the only trick is that the incubating api is more difficult than the preview api. So we need more build system support, it could require more stuff to be downloaded or build to be slower. But I think its the right decision? We don't want to have base64-encoded hackiness that is hard to maintain, at the same time, we need to give the users option to opt-in to actually making use their hardware. I think we should suffer the complexity to make this easy on them. It fucking sucks that openjdk makes this almost impossible, but we need to do it for our users. That's what being a library is all about. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org