[GitHub] [lucene] ChrisHegarty commented on a diff in pull request #12410: Refactor vectorization support in Lucene

2023-07-03 Thread via GitHub
ChrisHegarty commented on code in PR #12410: URL: https://github.com/apache/lucene/pull/12410#discussion_r1250432743 ## lucene/core/src/java/org/apache/lucene/internal/vectorization/VectorizationProvider.java: ## @@ -15,79 +15,97 @@ * limitations under the License. */ -pac

[GitHub] [lucene] ChrisHegarty commented on a diff in pull request #12410: Refactor vectorization support in Lucene

2023-07-03 Thread via GitHub
ChrisHegarty commented on code in PR #12410: URL: https://github.com/apache/lucene/pull/12410#discussion_r1250432743 ## lucene/core/src/java/org/apache/lucene/internal/vectorization/VectorizationProvider.java: ## @@ -15,79 +15,97 @@ * limitations under the License. */ -pac

[GitHub] [lucene] uschindler commented on a diff in pull request #12410: Refactor vectorization support in Lucene

2023-07-03 Thread via GitHub
uschindler commented on code in PR #12410: URL: https://github.com/apache/lucene/pull/12410#discussion_r1250445083 ## lucene/core/src/java/org/apache/lucene/internal/vectorization/VectorizationProvider.java: ## @@ -15,79 +15,97 @@ * limitations under the License. */ -packa

[GitHub] [lucene] uschindler commented on a diff in pull request #12410: Refactor vectorization support in Lucene

2023-07-03 Thread via GitHub
uschindler commented on code in PR #12410: URL: https://github.com/apache/lucene/pull/12410#discussion_r1250445083 ## lucene/core/src/java/org/apache/lucene/internal/vectorization/VectorizationProvider.java: ## @@ -15,79 +15,97 @@ * limitations under the License. */ -packa

[GitHub] [lucene] uschindler commented on pull request #12410: Refactor vectorization support in Lucene

2023-07-03 Thread via GitHub
uschindler commented on PR #12410: URL: https://github.com/apache/lucene/pull/12410#issuecomment-1617641279 I added a small refactoring for the tests: I added a base class which does the Panama Check in BeforeClass. All subclasses have access to Panama and default impl (if available). If Pa

[GitHub] [lucene] uschindler commented on a diff in pull request #12410: Refactor vectorization support in Lucene

2023-07-03 Thread via GitHub
uschindler commented on code in PR #12410: URL: https://github.com/apache/lucene/pull/12410#discussion_r1250504901 ## lucene/core/src/java/org/apache/lucene/internal/vectorization/VectorizationProvider.java: ## @@ -15,79 +15,97 @@ * limitations under the License. */ -packa

[GitHub] [lucene] ChrisHegarty commented on a diff in pull request #12410: Refactor vectorization support in Lucene

2023-07-03 Thread via GitHub
ChrisHegarty commented on code in PR #12410: URL: https://github.com/apache/lucene/pull/12410#discussion_r1250507755 ## lucene/core/src/java/org/apache/lucene/internal/vectorization/VectorizationProvider.java: ## @@ -15,79 +15,97 @@ * limitations under the License. */ -pac

[GitHub] [lucene] ChrisHegarty commented on a diff in pull request #12410: Refactor vectorization support in Lucene

2023-07-03 Thread via GitHub
ChrisHegarty commented on code in PR #12410: URL: https://github.com/apache/lucene/pull/12410#discussion_r1250509250 ## lucene/core/src/test/org/apache/lucene/internal/vectorization/BaseVectorizationTestCase.java: ## @@ -0,0 +1,33 @@ +/* + * Licensed to the Apache Software Found

[GitHub] [lucene] uschindler commented on a diff in pull request #12410: Refactor vectorization support in Lucene

2023-07-03 Thread via GitHub
uschindler commented on code in PR #12410: URL: https://github.com/apache/lucene/pull/12410#discussion_r1250512698 ## lucene/core/src/test/org/apache/lucene/internal/vectorization/BaseVectorizationTestCase.java: ## @@ -0,0 +1,33 @@ +/* + * Licensed to the Apache Software Foundat

[GitHub] [lucene] uschindler commented on pull request #12410: Refactor vectorization support in Lucene

2023-07-03 Thread via GitHub
uschindler commented on PR #12410: URL: https://github.com/apache/lucene/pull/12410#issuecomment-1617677720 P.S.: I checked: ForUtil in 9.x is different than in 8.x because the byte order and some other slight changes. So I would only vectorize the core one and leave out the backwards compa

[GitHub] [lucene] uschindler commented on issue #12396: Make ForUtil Vectorized

2023-07-03 Thread via GitHub
uschindler commented on issue #12396: URL: https://github.com/apache/lucene/issues/12396#issuecomment-1617678206 P.S.: I checked: ForUtil in 9.x is different than in 8.x because the byte order and some other slight changes. So I would only vectorize the core one (9.x) and leave out the back

[GitHub] [lucene] uschindler commented on pull request #12410: Refactor vectorization support in Lucene

2023-07-03 Thread via GitHub
uschindler commented on PR #12410: URL: https://github.com/apache/lucene/pull/12410#issuecomment-1617715841 I will merge this once CI is happy. I only changed some messages and added a test for the caller sensitive check (like `TestTestSecrets`). -- This is an automated message from the A

[GitHub] [lucene] jpountz commented on issue #12396: Make ForUtil Vectorized

2023-07-03 Thread via GitHub
jpountz commented on issue #12396: URL: https://github.com/apache/lucene/issues/12396#issuecomment-1617720876 What about changing the format now, as part of this change? This would mean that users who do not enable the Parama vector API would get a bit worse performance compared to today, b

[GitHub] [lucene] tang-hi commented on issue #12396: Make ForUtil Vectorized

2023-07-03 Thread via GitHub
tang-hi commented on issue #12396: URL: https://github.com/apache/lucene/issues/12396#issuecomment-1617730474 > What about changing the format now, as part of this change? This would mean that users who do not enable the Parama vector API would get a bit worse performance compared to today,

[GitHub] [lucene] uschindler merged pull request #12410: Refactor vectorization support in Lucene

2023-07-03 Thread via GitHub
uschindler merged PR #12410: URL: https://github.com/apache/lucene/pull/12410 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

[GitHub] [lucene] uschindler commented on pull request #12410: Refactor vectorization support in Lucene

2023-07-03 Thread via GitHub
uschindler commented on PR #12410: URL: https://github.com/apache/lucene/pull/12410#issuecomment-1617742564 Thanks @ChrisHegarty ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

[GitHub] [lucene] uschindler commented on issue #12396: Make ForUtil Vectorized

2023-07-03 Thread via GitHub
uschindler commented on issue #12396: URL: https://github.com/apache/lucene/issues/12396#issuecomment-1617754157 The framework to plug in in custom formats into `org.apache.lucene.internal.vectorization` was merged, so stage is open to implement something. Hints: - Both the scalar

[GitHub] [lucene] ChrisHegarty commented on issue #12396: Make ForUtil Vectorized

2023-07-03 Thread via GitHub
ChrisHegarty commented on issue #12396: URL: https://github.com/apache/lucene/issues/12396#issuecomment-1617801212 ++ Let's use the format that vectorizes the best. And yes, we will need a scalar implementation of that too, but to get started let's assume that the format changeable. -- T

[GitHub] [lucene] uschindler commented on issue #12396: Make ForUtil Vectorized

2023-07-03 Thread via GitHub
uschindler commented on issue #12396: URL: https://github.com/apache/lucene/issues/12396#issuecomment-1617805484 > What about changing the format now, as part of this change? This would mean that users who do not enable the Parama vector API would get a bit worse performance compared to tod

[GitHub] [lucene] uschindler commented on issue #12396: Make ForUtil Vectorized

2023-07-03 Thread via GitHub
uschindler commented on issue #12396: URL: https://github.com/apache/lucene/issues/12396#issuecomment-1617809930 I am also not sure if we really need to unroll all that loops and have different impls for each size. -- This is an automated message from the Apache Git Service. To respond to

[GitHub] [lucene] uschindler commented on issue #12396: Make ForUtil Vectorized

2023-07-03 Thread via GitHub
uschindler commented on issue #12396: URL: https://github.com/apache/lucene/issues/12396#issuecomment-1617817314 One small suggestion: Please stay with classical switch statement, as backporting the python code and java code to Java 11 is getting harder (please remember: also the vectorized

[GitHub] [lucene] tang-hi commented on issue #12396: Make ForUtil Vectorized

2023-07-03 Thread via GitHub
tang-hi commented on issue #12396: URL: https://github.com/apache/lucene/issues/12396#issuecomment-1617844821 Thank you for your suggestion! I will take note of it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

[GitHub] [lucene] mikemccand commented on pull request #12409: Move sliced int buffer functionality to MemoryIndex (#11248)

2023-07-03 Thread via GitHub
mikemccand commented on PR #12409: URL: https://github.com/apache/lucene/pull/12409#issuecomment-1618101728 Whoa, thanks for tackling this @stefanvodita -- I'll try to review soon. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitH

[GitHub] [lucene] mikemccand commented on pull request #12408: Initialize facet counting data structures lazily

2023-07-03 Thread via GitHub
mikemccand commented on PR #12408: URL: https://github.com/apache/lucene/pull/12408#issuecomment-1618109228 +1, this makes sense to me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specifi

[GitHub] [lucene] tang-hi commented on issue #12396: Make ForUtil Vectorized

2023-07-03 Thread via GitHub
tang-hi commented on issue #12396: URL: https://github.com/apache/lucene/issues/12396#issuecomment-1618213152 I believe that the Lucene 5.0 format may not be appropriate due to its rounding up of bitsPerValue. For example, it uses 8 bits to compress a 3-bit value, resulting in larger index

[GitHub] [lucene] uschindler commented on issue #12396: Make ForUtil Vectorized

2023-07-03 Thread via GitHub
uschindler commented on issue #12396: URL: https://github.com/apache/lucene/issues/12396#issuecomment-1618230924 Hi, I just noticed that the forUtil.gradle had a bug in Lucene 9. It does not regenerate anything because the actual python script is never executed. I will povide a separa

[GitHub] [lucene] uschindler opened a new pull request, #12411: Fix forUtil.gradle to actually execute python script and also fix type error in script

2023-07-03 Thread via GitHub
uschindler opened a new pull request, #12411: URL: https://github.com/apache/lucene/pull/12411 When we ported the build to gradle, the forUtil generator script was never executed, because the script name was missing as argument. Actually the script was also buggy, because it produced type e

[GitHub] [lucene] uschindler commented on issue #12396: Make ForUtil Vectorized

2023-07-03 Thread via GitHub
uschindler commented on issue #12396: URL: https://github.com/apache/lucene/issues/12396#issuecomment-1618261123 Here is the bugfix for the generator. This bug caused a lot trouble here because I did not understand why the script wasn't executed at all. -- This is an automated message fro

[GitHub] [lucene] uschindler commented on pull request #12411: Fix forUtil.gradle to actually execute python script and also fix type error in script

2023-07-03 Thread via GitHub
uschindler commented on PR #12411: URL: https://github.com/apache/lucene/pull/12411#issuecomment-1618415341 > Hmm... was it like this since the very beginning?! Darn. Strange. Thanks, Uwe. I checked history of that file, no change since the beginning. -- This is an automated messag

[GitHub] [lucene] uschindler merged pull request #12411: Fix forUtil.gradle to actually execute python script and also fix type error in script

2023-07-03 Thread via GitHub
uschindler merged PR #12411: URL: https://github.com/apache/lucene/pull/12411 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.

[GitHub] [lucene] uschindler commented on issue #12396: Make ForUtil Vectorized

2023-07-03 Thread via GitHub
uschindler commented on issue #12396: URL: https://github.com/apache/lucene/issues/12396#issuecomment-1618548308 Hi, the bug in the generator is fixed. I was able to include the ForUtil generator into a local branch and made everything working. What I am wondering about: The genera

[GitHub] [lucene] tang-hi commented on issue #12396: Make ForUtil Vectorized

2023-07-03 Thread via GitHub
tang-hi commented on issue #12396: URL: https://github.com/apache/lucene/issues/12396#issuecomment-1618577928 > What I am wondering about: The generated PanamaForUtil is immensely huge (258 KiB) and it looks only encoding is vectorized, while decoding looks like an very old version of the c

[GitHub] [lucene] uschindler opened a new pull request, #12412: DRAFT: This vectorize ForUtil for the 9.0 codec (same format)

2023-07-03 Thread via GitHub
uschindler opened a new pull request, #12412: URL: https://github.com/apache/lucene/pull/12412 DRAFT: This uses code from #12396 to vectorize ForUtil for the 9.0 codec (only encode!). It should show as example how it works, this is not ready for productions, although all tests pass. -- T

[GitHub] [lucene] uschindler commented on issue #12396: Make ForUtil Vectorized

2023-07-03 Thread via GitHub
uschindler commented on issue #12396: URL: https://github.com/apache/lucene/issues/12396#issuecomment-1618683855 Here is the example what I did with your code: #12412 It should demonstrate how I inlcuded it. Most interesting is the cleanup inside the lucene90 codec. I also defined the

[GitHub] [lucene] uschindler commented on pull request #12412: DRAFT: This vectorize ForUtil for the 9.0 codec (same format)

2023-07-03 Thread via GitHub
uschindler commented on PR #12412: URL: https://github.com/apache/lucene/pull/12412#issuecomment-1618687032 Important: I don't intend to merge this, it just shows WIP. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use th

[GitHub] [lucene] uschindler commented on a diff in pull request #12412: DRAFT: This vectorize ForUtil for the 9.0 codec (same format)

2023-07-03 Thread via GitHub
uschindler commented on code in PR #12412: URL: https://github.com/apache/lucene/pull/12412#discussion_r1251052189 ## lucene/core/src/java/org/apache/lucene/internal/vectorization/VectorizationProvider.java: ## @@ -162,11 +165,8 @@ private static boolean isClientVM() { }

[GitHub] [lucene] uschindler commented on pull request #12410: Refactor vectorization support in Lucene

2023-07-03 Thread via GitHub
uschindler commented on PR #12410: URL: https://github.com/apache/lucene/pull/12410#issuecomment-1618743022 I figured out that the caller sensitive checking how it wa simplemented does not work with package private classes, so I plan to fix this in a followup commit: ```java pri

[GitHub] [lucene] uschindler commented on pull request #12410: Refactor vectorization support in Lucene

2023-07-03 Thread via GitHub
uschindler commented on PR #12410: URL: https://github.com/apache/lucene/pull/12410#issuecomment-1618745499 This is fine as Maven Shade plugin and similar will rewrite constants with class names automatically, so theres no need to fallback to `LDC` bytecode. -- This is an automated messag

[GitHub] [lucene] uschindler commented on pull request #12410: Refactor vectorization support in Lucene

2023-07-03 Thread via GitHub
uschindler commented on PR #12410: URL: https://github.com/apache/lucene/pull/12410#issuecomment-1618761750 Fixed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubs

[GitHub] [lucene] uschindler commented on pull request #12412: DRAFT: Vectorize ForUtil encoding for the 9.0 codec (same format)

2023-07-03 Thread via GitHub
uschindler commented on PR #12412: URL: https://github.com/apache/lucene/pull/12412#issuecomment-1618779454 What's also missing in the code: Some randomized test that feeds both implementations with random data and verifies that both results are identical. -- This is an automated message

[GitHub] [lucene] tang-hi commented on issue #12396: Make ForUtil Vectorized

2023-07-03 Thread via GitHub
tang-hi commented on issue #12396: URL: https://github.com/apache/lucene/issues/12396#issuecomment-1618783707 Thank you so much! I will take a look at this PR tomorrow. It's getting late for me now, so I need to go to sleep. 😊 -- This is an automated message from the Apache Git Service. T

[GitHub] [lucene] uschindler commented on pull request #12412: DRAFT: Vectorize ForUtil encoding for the 9.0 codec (same format)

2023-07-03 Thread via GitHub
uschindler commented on PR #12412: URL: https://github.com/apache/lucene/pull/12412#issuecomment-1618784994 to run this branch with vectorization enabled: ```sh $ export CI=true $ export RUNTIME_JAVA_HOME=/path/to/jdk20or21 $ gradlew :lucene:core:test ``` -- This is an a

[GitHub] [lucene] uschindler commented on pull request #12412: DRAFT: Vectorize ForUtil encoding for the 9.0 codec (same format)

2023-07-03 Thread via GitHub
uschindler commented on PR #12412: URL: https://github.com/apache/lucene/pull/12412#issuecomment-1618786477 @tang-hi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

[GitHub] [lucene] ChrisHegarty commented on pull request #12412: DRAFT: Vectorize ForUtil encoding for the 9.0 codec (same format)

2023-07-03 Thread via GitHub
ChrisHegarty commented on PR #12412: URL: https://github.com/apache/lucene/pull/12412#issuecomment-1618799648 Thanks for putting this together @uschindler. I'm going to move my experiments over to a clone of this branch. -- This is an automated message from the Apache Git Service. To

[GitHub] [lucene] mikemccand commented on a diff in pull request #12409: Move sliced int buffer functionality to MemoryIndex (#11248)

2023-07-03 Thread via GitHub
mikemccand commented on code in PR #12409: URL: https://github.com/apache/lucene/pull/12409#discussion_r1251132208 ## lucene/core/src/java/org/apache/lucene/util/IntBlockPool.java: ## @@ -156,206 +155,6 @@ public void nextBuffer() { bufferUpto++; intUpto = 0; -in

[GitHub] [lucene] almogtavor commented on issue #12406: Register nested queries (ToParentBlockJoinQuery) to Lucene Monitor

2023-07-03 Thread via GitHub
almogtavor commented on issue #12406: URL: https://github.com/apache/lucene/issues/12406#issuecomment-1618983956 @romseygeek @dweiss I'd love to get feedback from you on the subject -- This is an automated message from the Apache Git Service. To respond to the message, please log on to Git

[GitHub] [lucene] benwtrent opened a new pull request, #12413: Fix HNSW graph visitation limit bug

2023-07-03 Thread via GitHub
benwtrent opened a new pull request, #12413: URL: https://github.com/apache/lucene/pull/12413 We have some weird behavior in HNSW searcher when finding the candidate entry point for the zeroth layer. While trying to find the best entry point to gather the full candidate sets, we don