[GitHub] [lucene] ChrisHegarty commented on a diff in pull request #12410: Refactor vectorization support in Lucene
ChrisHegarty commented on code in PR #12410: URL: https://github.com/apache/lucene/pull/12410#discussion_r1250432743 ## lucene/core/src/java/org/apache/lucene/internal/vectorization/VectorizationProvider.java: ## @@ -15,79 +15,97 @@ * limitations under the License. */ -package org.apache.lucene.util; +package org.apache.lucene.internal.vectorization; import java.lang.Runtime.Version; +import java.lang.StackWalker.StackFrame; import java.lang.invoke.MethodHandles; import java.lang.invoke.MethodType; import java.security.AccessController; import java.security.PrivilegedAction; import java.util.Locale; import java.util.Objects; import java.util.logging.Logger; +import java.util.stream.Stream; +import org.apache.lucene.util.SuppressForbidden; +import org.apache.lucene.util.VectorUtil; + +/** + * A provider of vectorization implementations. Depending on the Java version and availability of + * vectorization modules in the Java runtime this class provides optimized implementations (using + * SIMD) of several algorithms used throughout Apache Lucene. + * + * @lucene.internal + */ +public abstract class VectorizationProvider { Review Comment: I'm ok with "vectorization". Yet another alternative ;-) Another thought I had was to actually leave the default scalar implementation where it was. Then the provider would be a provider of vectorized implementations, Something like, `VectorizedProvider::getInstanceOrNull`. That way the new package consists of just Panama code and the light-weight interfaces (no scalar code). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] ChrisHegarty commented on a diff in pull request #12410: Refactor vectorization support in Lucene
ChrisHegarty commented on code in PR #12410: URL: https://github.com/apache/lucene/pull/12410#discussion_r1250432743 ## lucene/core/src/java/org/apache/lucene/internal/vectorization/VectorizationProvider.java: ## @@ -15,79 +15,97 @@ * limitations under the License. */ -package org.apache.lucene.util; +package org.apache.lucene.internal.vectorization; import java.lang.Runtime.Version; +import java.lang.StackWalker.StackFrame; import java.lang.invoke.MethodHandles; import java.lang.invoke.MethodType; import java.security.AccessController; import java.security.PrivilegedAction; import java.util.Locale; import java.util.Objects; import java.util.logging.Logger; +import java.util.stream.Stream; +import org.apache.lucene.util.SuppressForbidden; +import org.apache.lucene.util.VectorUtil; + +/** + * A provider of vectorization implementations. Depending on the Java version and availability of + * vectorization modules in the Java runtime this class provides optimized implementations (using + * SIMD) of several algorithms used throughout Apache Lucene. + * + * @lucene.internal + */ +public abstract class VectorizationProvider { Review Comment: I'm ok with "vectorization". Yet another alternative ;-) Leave the default scalar implementation where it was. Then the provider would be a provider of vectorized implementations, Something like, `VectorizedProvider::getInstanceOrNull`. That way the new package consists of just Panama code and the light-weight interfaces (no scalar code). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a diff in pull request #12410: Refactor vectorization support in Lucene
uschindler commented on code in PR #12410: URL: https://github.com/apache/lucene/pull/12410#discussion_r1250445083 ## lucene/core/src/java/org/apache/lucene/internal/vectorization/VectorizationProvider.java: ## @@ -15,79 +15,97 @@ * limitations under the License. */ -package org.apache.lucene.util; +package org.apache.lucene.internal.vectorization; import java.lang.Runtime.Version; +import java.lang.StackWalker.StackFrame; import java.lang.invoke.MethodHandles; import java.lang.invoke.MethodType; import java.security.AccessController; import java.security.PrivilegedAction; import java.util.Locale; import java.util.Objects; import java.util.logging.Logger; +import java.util.stream.Stream; +import org.apache.lucene.util.SuppressForbidden; +import org.apache.lucene.util.VectorUtil; + +/** + * A provider of vectorization implementations. Depending on the Java version and availability of + * vectorization modules in the Java runtime this class provides optimized implementations (using + * SIMD) of several algorithms used throughout Apache Lucene. + * + * @lucene.internal + */ +public abstract class VectorizationProvider { Review Comment: I thought about the same. I was not too happy about it because it moves a lot if/else code back to the original package. Because you need to implement the scalar code anyways as interface. It also makes testing hard because I am at moment working for a BaseVectorizationTest case that allows to write compare impls like `TestVectorUtilSupport`. This only works in a generic way if the code is easy to locate and uses same interfaces. I think keeping the code that should do the same together is preferable (at least to me). From what I figured out: Most of the code affected is in java.util anyways. The special case with ForUtil is different. At moment I am not fully sure if the code really differs in Lucene 5, Lucene 8 and Lucene 9 codecs and ForUtil can move to utils package? About the name: I leave the name for now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a diff in pull request #12410: Refactor vectorization support in Lucene
uschindler commented on code in PR #12410: URL: https://github.com/apache/lucene/pull/12410#discussion_r1250445083 ## lucene/core/src/java/org/apache/lucene/internal/vectorization/VectorizationProvider.java: ## @@ -15,79 +15,97 @@ * limitations under the License. */ -package org.apache.lucene.util; +package org.apache.lucene.internal.vectorization; import java.lang.Runtime.Version; +import java.lang.StackWalker.StackFrame; import java.lang.invoke.MethodHandles; import java.lang.invoke.MethodType; import java.security.AccessController; import java.security.PrivilegedAction; import java.util.Locale; import java.util.Objects; import java.util.logging.Logger; +import java.util.stream.Stream; +import org.apache.lucene.util.SuppressForbidden; +import org.apache.lucene.util.VectorUtil; + +/** + * A provider of vectorization implementations. Depending on the Java version and availability of + * vectorization modules in the Java runtime this class provides optimized implementations (using + * SIMD) of several algorithms used throughout Apache Lucene. + * + * @lucene.internal + */ +public abstract class VectorizationProvider { Review Comment: I thought about the same. I was not too happy about it because it moves a lot if/else code back to the original package. Because you need to implement the scalar code anyways as interface. It also makes testing hard because I am at moment working for a BaseVectorizationTest case that allows to write compare impls like `TestVectorUtilSupport`. This only works in a generic way if the code is easy to locate and uses same interfaces. I think keeping the code that should do the same together is preferable (at least to me). From what I figured out: Most of the code affected is in `oal.util` anyways. The special case with ForUtil is different. At moment I am not fully sure if the code really differs in Lucene 5, Lucene 8 and Lucene 9 codecs and ForUtil can move to utils package? About the name: I leave the name for now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #12410: Refactor vectorization support in Lucene
uschindler commented on PR #12410: URL: https://github.com/apache/lucene/pull/12410#issuecomment-1617641279 I added a small refactoring for the tests: I added a base class which does the Panama Check in BeforeClass. All subclasses have access to Panama and default impl (if available). If Panama is not available it will prevent test from executing. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a diff in pull request #12410: Refactor vectorization support in Lucene
uschindler commented on code in PR #12410: URL: https://github.com/apache/lucene/pull/12410#discussion_r1250504901 ## lucene/core/src/java/org/apache/lucene/internal/vectorization/VectorizationProvider.java: ## @@ -15,79 +15,97 @@ * limitations under the License. */ -package org.apache.lucene.util; +package org.apache.lucene.internal.vectorization; import java.lang.Runtime.Version; +import java.lang.StackWalker.StackFrame; import java.lang.invoke.MethodHandles; import java.lang.invoke.MethodType; import java.security.AccessController; import java.security.PrivilegedAction; import java.util.Locale; import java.util.Objects; import java.util.logging.Logger; +import java.util.stream.Stream; +import org.apache.lucene.util.SuppressForbidden; +import org.apache.lucene.util.VectorUtil; + +/** + * A provider of vectorization implementations. Depending on the Java version and availability of + * vectorization modules in the Java runtime this class provides optimized implementations (using + * SIMD) of several algorithms used throughout Apache Lucene. + * + * @lucene.internal + */ +public abstract class VectorizationProvider { Review Comment: One addition: By returning null or similar for the provider we miss some flexibility: Based on Panama's options we can opt in / opt out to implement some methods. With the current approach we can fall back to the default implementation only for some interface or only for one method, if Panama does not have a specific feature enabled (or not in that version). The current VectorUtilSupport for integer vectors already opts out if AVX2 is not available. Currently it is an if statment, but it could also call `DefaultVectorUtilSupport#dotProduct()` if we have no AVX2. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] ChrisHegarty commented on a diff in pull request #12410: Refactor vectorization support in Lucene
ChrisHegarty commented on code in PR #12410: URL: https://github.com/apache/lucene/pull/12410#discussion_r1250507755 ## lucene/core/src/java/org/apache/lucene/internal/vectorization/VectorizationProvider.java: ## @@ -15,79 +15,97 @@ * limitations under the License. */ -package org.apache.lucene.util; +package org.apache.lucene.internal.vectorization; import java.lang.Runtime.Version; +import java.lang.StackWalker.StackFrame; import java.lang.invoke.MethodHandles; import java.lang.invoke.MethodType; import java.security.AccessController; import java.security.PrivilegedAction; import java.util.Locale; import java.util.Objects; import java.util.logging.Logger; +import java.util.stream.Stream; +import org.apache.lucene.util.SuppressForbidden; +import org.apache.lucene.util.VectorUtil; + +/** + * A provider of vectorization implementations. Depending on the Java version and availability of + * vectorization modules in the Java runtime this class provides optimized implementations (using + * SIMD) of several algorithms used throughout Apache Lucene. + * + * @lucene.internal + */ +public abstract class VectorizationProvider { Review Comment: Thanks for the detailed reply. I agree with the current approach and naming. Let's go for it! :-) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] ChrisHegarty commented on a diff in pull request #12410: Refactor vectorization support in Lucene
ChrisHegarty commented on code in PR #12410: URL: https://github.com/apache/lucene/pull/12410#discussion_r1250509250 ## lucene/core/src/test/org/apache/lucene/internal/vectorization/BaseVectorizationTestCase.java: ## @@ -0,0 +1,33 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.internal.vectorization; + +import org.apache.lucene.tests.util.LuceneTestCase; +import org.junit.BeforeClass; + +public abstract class BaseVectorizationTestCase extends LuceneTestCase { + + protected static final VectorizationProvider LUCENE_PROVIDER = new DefaultVectorizationProvider(); + protected static final VectorizationProvider PANAMA_PROVIDER = VectorizationProvider.lookup(true); + + @BeforeClass + public static void beforeClass() throws Exception { +assumeTrue( +"Test only works when JDK's vector incubator module is enabled.", +PANAMA_PROVIDER.getClass() != LUCENE_PROVIDER.getClass()); Review Comment: Noice! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a diff in pull request #12410: Refactor vectorization support in Lucene
uschindler commented on code in PR #12410: URL: https://github.com/apache/lucene/pull/12410#discussion_r1250512698 ## lucene/core/src/test/org/apache/lucene/internal/vectorization/BaseVectorizationTestCase.java: ## @@ -0,0 +1,33 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.internal.vectorization; + +import org.apache.lucene.tests.util.LuceneTestCase; +import org.junit.BeforeClass; + +public abstract class BaseVectorizationTestCase extends LuceneTestCase { + + protected static final VectorizationProvider LUCENE_PROVIDER = new DefaultVectorizationProvider(); + protected static final VectorizationProvider PANAMA_PROVIDER = VectorizationProvider.lookup(true); + + @BeforeClass + public static void beforeClass() throws Exception { +assumeTrue( +"Test only works when JDK's vector incubator module is enabled.", +PANAMA_PROVIDER.getClass() != LUCENE_PROVIDER.getClass()); Review Comment: Hehe, yes much better than before as it is not negated. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #12410: Refactor vectorization support in Lucene
uschindler commented on PR #12410: URL: https://github.com/apache/lucene/pull/12410#issuecomment-1617677720 P.S.: I checked: ForUtil in 9.x is different than in 8.x because the byte order and some other slight changes. So I would only vectorize the core one and leave out the backwards compatibility. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on issue #12396: Make ForUtil Vectorized
uschindler commented on issue #12396: URL: https://github.com/apache/lucene/issues/12396#issuecomment-1617678206 P.S.: I checked: ForUtil in 9.x is different than in 8.x because the byte order and some other slight changes. So I would only vectorize the core one (9.x) and leave out the backwards compatibility impl. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #12410: Refactor vectorization support in Lucene
uschindler commented on PR #12410: URL: https://github.com/apache/lucene/pull/12410#issuecomment-1617715841 I will merge this once CI is happy. I only changed some messages and added a test for the caller sensitive check (like `TestTestSecrets`). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on issue #12396: Make ForUtil Vectorized
jpountz commented on issue #12396: URL: https://github.com/apache/lucene/issues/12396#issuecomment-1617720876 What about changing the format now, as part of this change? This would mean that users who do not enable the Parama vector API would get a bit worse performance compared to today, but users who care about performance would most likely be able to do that? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] tang-hi commented on issue #12396: Make ForUtil Vectorized
tang-hi commented on issue #12396: URL: https://github.com/apache/lucene/issues/12396#issuecomment-1617730474 > What about changing the format now, as part of this change? This would mean that users who do not enable the Parama vector API would get a bit worse performance compared to today, but users who care about performance would most likely be able to do that? Perhaps I could attempt to implement both scalar and vectorized versions of different compression formats in order to assess the extent of performance degradation in the scalar version and performance enhancement in the vectorized version. We could then make a decision based on the results of the experiment. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler merged pull request #12410: Refactor vectorization support in Lucene
uschindler merged PR #12410: URL: https://github.com/apache/lucene/pull/12410 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #12410: Refactor vectorization support in Lucene
uschindler commented on PR #12410: URL: https://github.com/apache/lucene/pull/12410#issuecomment-1617742564 Thanks @ChrisHegarty ! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on issue #12396: Make ForUtil Vectorized
uschindler commented on issue #12396: URL: https://github.com/apache/lucene/issues/12396#issuecomment-1617754157 The framework to plug in in custom formats into `org.apache.lucene.internal.vectorization` was merged, so stage is open to implement something. Hints: - Both the scalar and the vectorized implementation need to be moved to that package and share a common interface. - A getter needs to be added to VectorizationProvider to return a singleton (if stateless) or an instance (if stateful like `ForUtil`, like `newForUtil98()` for the Lucene 9.8 codec). - The calling class that gets the provider needs to be added for caller sensitive checks to prevent misuse. - A test comparing both impementations with random data extending from BaseVectorizationTestCase. I can mock it up for ForUtil (looks simple, just a bit copy-paste and defining interface) and replacing `new ForUtil()` by `VectorizationProvider.getInstance().newForUtil98()`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] ChrisHegarty commented on issue #12396: Make ForUtil Vectorized
ChrisHegarty commented on issue #12396: URL: https://github.com/apache/lucene/issues/12396#issuecomment-1617801212 ++ Let's use the format that vectorizes the best. And yes, we will need a scalar implementation of that too, but to get started let's assume that the format changeable. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on issue #12396: Make ForUtil Vectorized
uschindler commented on issue #12396: URL: https://github.com/apache/lucene/issues/12396#issuecomment-1617805484 > What about changing the format now, as part of this change? This would mean that users who do not enable the Parama vector API would get a bit worse performance compared to today, but users who care about performance would most likely be able to do that? We could start and go back to the original Lucene 5.0 format. Plain simple... It is still in backwards-codecs. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on issue #12396: Make ForUtil Vectorized
uschindler commented on issue #12396: URL: https://github.com/apache/lucene/issues/12396#issuecomment-1617809930 I am also not sure if we really need to unroll all that loops and have different impls for each size. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on issue #12396: Make ForUtil Vectorized
uschindler commented on issue #12396: URL: https://github.com/apache/lucene/issues/12396#issuecomment-1617817314 One small suggestion: Please stay with classical switch statement, as backporting the python code and java code to Java 11 is getting harder (please remember: also the vectorized code is compiled with Java 11 compiler in 9.x branch!!!). I just noticed that in @tang-hi's code. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] tang-hi commented on issue #12396: Make ForUtil Vectorized
tang-hi commented on issue #12396: URL: https://github.com/apache/lucene/issues/12396#issuecomment-1617844821 Thank you for your suggestion! I will take note of it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mikemccand commented on pull request #12409: Move sliced int buffer functionality to MemoryIndex (#11248)
mikemccand commented on PR #12409: URL: https://github.com/apache/lucene/pull/12409#issuecomment-1618101728 Whoa, thanks for tackling this @stefanvodita -- I'll try to review soon. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mikemccand commented on pull request #12408: Initialize facet counting data structures lazily
mikemccand commented on PR #12408: URL: https://github.com/apache/lucene/pull/12408#issuecomment-1618109228 +1, this makes sense to me. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] tang-hi commented on issue #12396: Make ForUtil Vectorized
tang-hi commented on issue #12396: URL: https://github.com/apache/lucene/issues/12396#issuecomment-1618213152 I believe that the Lucene 5.0 format may not be appropriate due to its rounding up of bitsPerValue. For example, it uses 8 bits to compress a 3-bit value, resulting in larger index files. However, I have already implemented a vectorized version of various compression formats that maintain the same size. The outcome appears promising | Benchmark | Type | Iterations | Mean ops/us | Error | | --- | --- | --- | --- | --- | | Encode1 | thrpt | 15 | 37.023 | 5.060 | | VectorizedEncode1 | thrpt | 15 | 53.261 | 0.850 | | Encode2 | thrpt | 15 | 45.017 | 0.887 | | VectorizedEncode2 | thrpt | 15 | 56.111 | 2.234 | | Encode3 | thrpt | 15 | 38.750 | 0.521 | | VectorizedEncode3 | thrpt | 15 | 49.589 | 2.518 | | Encode4 | thrpt | 15 | 48.074 | 1.390 | | VectorizedEncode4 | thrpt | 15 | 55.654 | 2.580 | | Encode5 | thrpt | 15 | 35.517 | 0.815 | | VectorizedEncode5 | thrpt | 15 | 50.508 | 0.887 | | Encode6 | thrpt | 15 | 38.698 | 0.301 | | VectorizedEncode6 | thrpt | 15 | 48.876 | 0.926 | | Encode7 | thrpt | 15 | 36.697 | 0.942 | | VectorizedEncode7 | thrpt | 15 | 50.016 | 2.425 | | Encode8 | thrpt | 15 | 59.180 | 2.208 | | VectorizedEncode8 | thrpt | 15 | 54.895 | 0.497 | | Encode9 | thrpt | 15 | 24.215 | 0.670 | | VectorizedEncode9 | thrpt | 15 | 49.292 | 0.616 | | Encode10 | thrpt | 15 | 25.509 | 0.274 | | VectorizedEncode10 | thrpt | 15 | 46.777 | 0.762 | | Encode11 | thrpt | 15 | 25.165 | 0.635 | | VectorizedEncode11 | thrpt | 15 | 46.798 | 2.554 | | Encode12 | thrpt | 15 | 29.170 | 0.671 | | VectorizedEncode12 | thrpt | 15 | 47.331 | 0.994 | | Encode13 | thrpt | 15 | 23.749 | 1.126 | | VectorizedEncode13 | thrpt | 15 | 46.587 | 2.468 | | Encode14 | thrpt | 15 | 27.283 | 0.235 | | VectorizedEncode14 | thrpt | 15 | 44.704 | 0.805 | | Encode15 | thrpt | 15 | 27.459 | 1.035 | | VectorizedEncode15 | thrpt | 15 | 45.335 | 3.178 | | Encode16 | thrpt | 15 | 58.192 | 0.557 | | VectorizedEncode16 | thrpt | 15 | 52.698 | 0.918 | | Encode17 | thrpt | 15 | 16.265 | 0.168 | | VectorizedEncode17 | thrpt | 15 | 45.757 | 2.126 | | Encode18 | thrpt | 15 | 15.261 | 0.167 | | VectorizedEncode18 | thrpt | 15 | 44.386 | 0.807 | | Encode19 | thrpt | 15 | 12.531 | 0.138 | | VectorizedEncode19 | thrpt | 15 | 45.403 | 0.854 | | Encode20 | thrpt | 15 | 15.863 | 0.351 | | VectorizedEncode20 | thrpt | 15 | 42.607 | 3.698 | | Encode21 | thrpt | 15 | 15.772 | 0.154 | | VectorizedEncode21 | thrpt | 15 | 45.122 | 0.777 | | Encode22 | thrpt | 15 | 15.863 | 0.210 | | VectorizedEncode22 | thrpt | 15 | 42.802 | 1.240 | | Encode23 | thrpt | 15 | 15.638 | 0.095 | | VectorizedEncode23 | thrpt | 15 | 44.411 | 0.536 | | Encode24 | thrpt | 15 | 17.091 | 0.713 | | VectorizedEncode24 | thrpt | 15 | 42.151 | 2.151 | | Encode25 | thrpt | 15 | 15.206 | 0.163 | | VectorizedEncode25 | thrpt | 15 | 43.440 | 2.078 | | Encode26 | thrpt | 15 | 15.110 | 0.188 | | VectorizedEncode26 | thrpt | 15 | 40.758 | 0.416 | | Encode27 | thrpt | 15 | 14.794 | 0.192 | | VectorizedEncode27 | thrpt | 15 | 43.261 | 0.494 | | Encode28 | thrpt | 15 | 17.531 | 0.393 | | VectorizedEncode28 | thrpt | 15 | 41.578 | 0.838 | | Encode29 | thrpt | 15 | 14.423 | 0.173 | | VectorizedEncode29 | thrpt | 15 | 36.044 | 10.191 | | Encode30 | thrpt | 15 | 17.426 | 0.297 | | VectorizedEncode30 | thrpt | 15 | 40.087 | 0.791 | | Encode31 | thrpt | 15 | 18.489 | 0.180 | | VectorizedEncode31 | thrpt | 15 | 42.166 | 0.625 | | Encode32 | thrpt | 15 | 47.742 | 4.446 | | VectorizedEncode32 | thrpt | 15 | 54.260 | 1.806 | the code is straightforward, as shown below ```Java public void encode(long[] values, int bitsPerValue, long[] output) { int MASK = (int) ((1L << bitsPerValue) - 1); int bitsRemaining = 64; int upto = 0; int totalCompressedLine = 2 * bitsPerValue; int next = 0; LongVector input = LongVector.zero(LANE4_SPECIES); while (next < 128) { if (bitsRemaining >= bitsPerValue) { input = input.or(LongVector.fromArray(LANE4_SPECIES, values, next).and(MASK) .lanewise(VectorOperators.LSHL, bitsRemaining - bitsPerValue)); bitsRemaining -= bitsPerValue; } else { LongVector valueVector = LongVector.fromArray(LANE4_SPECIES, values, next).and(MASK); input = input.or(valueVector.lanewise(VectorOperators.LSHR, bitsPerValue - bitsRemaining)); input.intoArray(output, upto); upto += numEncodeLength; input = valueVector.lanewise(VectorOperators.LSHL, 64 - bitsPerValue + bitsRemaining); bitsRemai
[GitHub] [lucene] uschindler commented on issue #12396: Make ForUtil Vectorized
uschindler commented on issue #12396: URL: https://github.com/apache/lucene/issues/12396#issuecomment-1618230924 Hi, I just noticed that the forUtil.gradle had a bug in Lucene 9. It does not regenerate anything because the actual python script is never executed. I will povide a separate fix in a minute. Interestingly the script does not execute correctly: ``` > Task :lucene:core:generateForUtilInternal FAILED Caching disabled for task ':lucene:core:generateForUtilInternal' because: Build cache is disabled Task ':lucene:core:generateForUtilInternal' is not up-to-date because: Task has failed previously. External tool 'python3' resolved to: C:/cygwin/bin/python3.6m.exe Starting process 'command 'C:/cygwin/bin/python3.6m.exe''. Working directory: C:\Users\Uwe Schindler\Projects\lucene\lucene\lucene\core\src\java\org\apache\lucene\codecs\lucene90 Command: C:/cygwin/bin/python3.6m.exe C:\Users\Uwe Schindler\Projects\lucene\lucene\lucene\core\src\java\org\apache\lucene\codecs\lucene90\gen_ForUtil.py Successfully started process 'command 'C:/cygwin/bin/python3.6m.exe'' Traceback (most recent call last): File "C:\Users\Uwe Schindler\Projects\lucene\lucene\lucene\core\src\java\org\apache\lucene\codecs\lucene90\gen_ForUtil.py", line 438, in writeDecode(i, f) File "C:\Users\Uwe Schindler\Projects\lucene\lucene\lucene\core\src\java\org\apache\lucene\codecs\lucene90\gen_ForUtil.py", line 362, in writeDecode writeRemainder(bpv, next_primitive, shift + bpv, o, 128/num_values_per_long - o, f) File "C:\Users\Uwe Schindler\Projects\lucene\lucene\lucene\core\src\java\org\apache\lucene\codecs\lucene90\gen_ForUtil.py", line 320, in writeRemainder for i in range(num_values): TypeError: 'float' object cannot be interpreted as an integer ``` Thats not good! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler opened a new pull request, #12411: Fix forUtil.gradle to actually execute python script and also fix type error in script
uschindler opened a new pull request, #12411: URL: https://github.com/apache/lucene/pull/12411 When we ported the build to gradle, the forUtil generator script was never executed, because the script name was missing as argument. Actually the script was also buggy, because it produced type error (with newer python versions) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on issue #12396: Make ForUtil Vectorized
uschindler commented on issue #12396: URL: https://github.com/apache/lucene/issues/12396#issuecomment-1618261123 Here is the bugfix for the generator. This bug caused a lot trouble here because I did not understand why the script wasn't executed at all. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #12411: Fix forUtil.gradle to actually execute python script and also fix type error in script
uschindler commented on PR #12411: URL: https://github.com/apache/lucene/pull/12411#issuecomment-1618415341 > Hmm... was it like this since the very beginning?! Darn. Strange. Thanks, Uwe. I checked history of that file, no change since the beginning. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler merged pull request #12411: Fix forUtil.gradle to actually execute python script and also fix type error in script
uschindler merged PR #12411: URL: https://github.com/apache/lucene/pull/12411 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on issue #12396: Make ForUtil Vectorized
uschindler commented on issue #12396: URL: https://github.com/apache/lucene/issues/12396#issuecomment-1618548308 Hi, the bug in the generator is fixed. I was able to include the ForUtil generator into a local branch and made everything working. What I am wondering about: The generated PanamaForUtil is immensely huge (258 KiB) and it looks only encoding is vectorized, while decoding looks like an very old version of the code, it explodes somehow - where did you get that code from? The vectorized encode seems to work, the indexes and tests are working. I also noticed a small problem with the new framework regarding package private caller classes. I will think about a good solution. Anywys, I will make the branch available as Draft PR so you can see how it was done. It was a bit more work because ForUtil was used at many places and each call to PForUtil needs a ForUtil instance. I refactored this and was able to make it work. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] tang-hi commented on issue #12396: Make ForUtil Vectorized
tang-hi commented on issue #12396: URL: https://github.com/apache/lucene/issues/12396#issuecomment-1618577928 > What I am wondering about: The generated PanamaForUtil is immensely huge (258 KiB) and it looks only encoding is vectorized, while decoding looks like an very old version of the code, it explodes somehow - where did you get that code from? The vectorized encode seems to work, the indexes and tests are working. I have currently only provided a vectorized version for encoding, while keeping decoding unchanged. This is because I wanted to ensure that the encoding part is working before implementing a compressed version for decoding. However, based on @jpountz's response, I am now trying to use a more concise version (with a different compression format but the same space size). I will update the Python script with this version later. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler opened a new pull request, #12412: DRAFT: This vectorize ForUtil for the 9.0 codec (same format)
uschindler opened a new pull request, #12412: URL: https://github.com/apache/lucene/pull/12412 DRAFT: This uses code from #12396 to vectorize ForUtil for the 9.0 codec (only encode!). It should show as example how it works, this is not ready for productions, although all tests pass. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on issue #12396: Make ForUtil Vectorized
uschindler commented on issue #12396: URL: https://github.com/apache/lucene/issues/12396#issuecomment-1618683855 Here is the example what I did with your code: #12412 It should demonstrate how I inlcuded it. Most interesting is the cleanup inside the lucene90 codec. I also defined the interface very quicky, but actually I did not check that all is used there. I defined some of the constants in the interface (`BLOCK_SIZE`), too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #12412: DRAFT: This vectorize ForUtil for the 9.0 codec (same format)
uschindler commented on PR #12412: URL: https://github.com/apache/lucene/pull/12412#issuecomment-1618687032 Important: I don't intend to merge this, it just shows WIP. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a diff in pull request #12412: DRAFT: This vectorize ForUtil for the 9.0 codec (same format)
uschindler commented on code in PR #12412: URL: https://github.com/apache/lucene/pull/12412#discussion_r1251052189 ## lucene/core/src/java/org/apache/lucene/internal/vectorization/VectorizationProvider.java: ## @@ -162,11 +165,8 @@ private static boolean isClientVM() { } } - private static boolean isValidCaller(String cn) { -// add any class that is allowed to call getInstance() -// NOTE: the list here is lazy -return Stream.of(VectorUtil.class).map(Class::getName).anyMatch(cn::equals); - } + private static final Set VALID_CALLERS = + Set.of("org.apache.lucene.util.VectorUtil", "org.apache.lucene.codecs.lucene90.PForUtil"); Review Comment: This is the bug I found in my last PR: You can't refer to package private classes with `.class` operator. So I had to hardcode the full class names here. Maven Shaders will replace those names, so I see no problem with making them strings. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #12410: Refactor vectorization support in Lucene
uschindler commented on PR #12410: URL: https://github.com/apache/lucene/pull/12410#issuecomment-1618743022 I figured out that the caller sensitive checking how it wa simplemented does not work with package private classes, so I plan to fix this in a followup commit: ```java private static final Set VALID_CALLERS = Set.of("org.apache.lucene.util.VectorUtil", "org.apache.lucene.codecs.lucene90.PForUtil"); private static void ensureCaller() { final boolean validCaller = StackWalker.getInstance() .walk( s -> s.skip(2) .limit(1) .map(StackFrame::getClassName) .allMatch(VALID_CALLERS::contains)); if (!validCaller) { throw new UnsupportedOperationException( "VectorizationProvider is internal and can only be used by known Lucene classes."); } } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #12410: Refactor vectorization support in Lucene
uschindler commented on PR #12410: URL: https://github.com/apache/lucene/pull/12410#issuecomment-1618745499 This is fine as Maven Shade plugin and similar will rewrite constants with class names automatically, so theres no need to fallback to `LDC` bytecode. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #12410: Refactor vectorization support in Lucene
uschindler commented on PR #12410: URL: https://github.com/apache/lucene/pull/12410#issuecomment-1618761750 Fixed. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #12412: DRAFT: Vectorize ForUtil encoding for the 9.0 codec (same format)
uschindler commented on PR #12412: URL: https://github.com/apache/lucene/pull/12412#issuecomment-1618779454 What's also missing in the code: Some randomized test that feeds both implementations with random data and verifies that both results are identical. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] tang-hi commented on issue #12396: Make ForUtil Vectorized
tang-hi commented on issue #12396: URL: https://github.com/apache/lucene/issues/12396#issuecomment-1618783707 Thank you so much! I will take a look at this PR tomorrow. It's getting late for me now, so I need to go to sleep. 😊 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #12412: DRAFT: Vectorize ForUtil encoding for the 9.0 codec (same format)
uschindler commented on PR #12412: URL: https://github.com/apache/lucene/pull/12412#issuecomment-1618784994 to run this branch with vectorization enabled: ```sh $ export CI=true $ export RUNTIME_JAVA_HOME=/path/to/jdk20or21 $ gradlew :lucene:core:test ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #12412: DRAFT: Vectorize ForUtil encoding for the 9.0 codec (same format)
uschindler commented on PR #12412: URL: https://github.com/apache/lucene/pull/12412#issuecomment-1618786477 @tang-hi -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] ChrisHegarty commented on pull request #12412: DRAFT: Vectorize ForUtil encoding for the 9.0 codec (same format)
ChrisHegarty commented on PR #12412: URL: https://github.com/apache/lucene/pull/12412#issuecomment-1618799648 Thanks for putting this together @uschindler. I'm going to move my experiments over to a clone of this branch. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mikemccand commented on a diff in pull request #12409: Move sliced int buffer functionality to MemoryIndex (#11248)
mikemccand commented on code in PR #12409: URL: https://github.com/apache/lucene/pull/12409#discussion_r1251132208 ## lucene/core/src/java/org/apache/lucene/util/IntBlockPool.java: ## @@ -156,206 +155,6 @@ public void nextBuffer() { bufferUpto++; intUpto = 0; -intOffset += INT_BLOCK_SIZE; - } - - /** - * Creates a new int slice with the given starting size and returns the slices offset in the pool. - * - * @see SliceReader - */ - private int newSlice(final int size) { -if (intUpto > INT_BLOCK_SIZE - size) { - nextBuffer(); - assert assertSliceBuffer(buffer); -} - -final int upto = intUpto; -intUpto += size; -buffer[intUpto - 1] = 16; -return upto; - } - - private static boolean assertSliceBuffer(int[] buffer) { -int count = 0; -for (int i = 0; i < buffer.length; i++) { - count += buffer[i]; // for slices the buffer must only have 0 values -} -return count == 0; - } - - // no need to make this public unless we support different sizes - - /** - * An array holding the offset into the {@link IntBlockPool#LEVEL_SIZE_ARRAY} to quickly navigate - * to the next slice level. - */ - private static final int[] NEXT_LEVEL_ARRAY = {1, 2, 3, 4, 5, 6, 7, 8, 9, 9}; - - /** An array holding the level sizes for int slices. */ - private static final int[] LEVEL_SIZE_ARRAY = {2, 4, 8, 16, 16, 32, 32, 64, 64, 128}; - - /** The first level size for new slices */ - private static final int FIRST_LEVEL_SIZE = LEVEL_SIZE_ARRAY[0]; - - /** Allocates a new slice from the given offset */ - private int allocSlice(final int[] slice, final int sliceOffset) { -final int level = slice[sliceOffset] & 15; -final int newLevel = NEXT_LEVEL_ARRAY[level]; -final int newSize = LEVEL_SIZE_ARRAY[newLevel]; -// Maybe allocate another block -if (intUpto > INT_BLOCK_SIZE - newSize) { - nextBuffer(); - assert assertSliceBuffer(buffer); -} - -final int newUpto = intUpto; -final int offset = newUpto + intOffset; -intUpto += newSize; -// Write forwarding address at end of last slice: -slice[sliceOffset] = offset; - -// Write new level: -buffer[intUpto - 1] = 16 | newLevel; - -return newUpto; - } - - /** - * A {@link SliceWriter} that allows to write multiple integer slices into a given {@link - * IntBlockPool}. - * - * @see SliceReader - * @lucene.internal - */ - public static class SliceWriter { - -private int offset; -private final IntBlockPool pool; - -public SliceWriter(IntBlockPool pool) { - this.pool = pool; -} -/** */ -public void reset(int sliceOffset) { - this.offset = sliceOffset; -} - -/** Writes the given value into the slice and resizes the slice if needed */ -public void writeInt(int value) { - int[] ints = pool.buffers[offset >> INT_BLOCK_SHIFT]; - assert ints != null; - int relativeOffset = offset & INT_BLOCK_MASK; - if (ints[relativeOffset] != 0) { -// End of slice; allocate a new one -relativeOffset = pool.allocSlice(ints, relativeOffset); -ints = pool.buffer; -offset = relativeOffset + pool.intOffset; - } - ints[relativeOffset] = value; - offset++; -} - -/** - * starts a new slice and returns the start offset. The returned value should be used as the - * start offset to initialize a {@link SliceReader}. - */ -public int startNewSlice() { - return offset = pool.newSlice(FIRST_LEVEL_SIZE) + pool.intOffset; -} - -/** - * Returns the offset of the currently written slice. The returned value should be used as the - * end offset to initialize a {@link SliceReader} once this slice is fully written or to reset - * the this writer if another slice needs to be written. - */ -public int getCurrentOffset() { - return offset; -} - } - - /** - * A {@link SliceReader} that can read int slices written by a {@link SliceWriter} - * - * @lucene.internal - */ - public static final class SliceReader { - -private final IntBlockPool pool; -private int upto; -private int bufferUpto; -private int bufferOffset; -private int[] buffer; -private int limit; -private int level; -private int end; - -/** Creates a new {@link SliceReader} on the given pool */ -public SliceReader(IntBlockPool pool) { - this.pool = pool; -} - -/** Resets the reader to a slice give the slices absolute start and end offset in the pool */ -public void reset(int startOffset, int endOffset) { - bufferUpto = startOffset / INT_BLOCK_SIZE; - bufferOffset = bufferUpto * INT_BLOCK_SIZE; - this.end = endOffset; - level = 0; - - buffer = pool.buffers[bufferUpto]; - upto = startOffset & INT_BLOCK_MASK; - - final int firstSize = IntBlockPool.LEVEL_SIZE_ARRAY[0]; - if (startOffset + firstSize >= endOffset) { -
[GitHub] [lucene] almogtavor commented on issue #12406: Register nested queries (ToParentBlockJoinQuery) to Lucene Monitor
almogtavor commented on issue #12406: URL: https://github.com/apache/lucene/issues/12406#issuecomment-1618983956 @romseygeek @dweiss I'd love to get feedback from you on the subject -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] benwtrent opened a new pull request, #12413: Fix HNSW graph visitation limit bug
benwtrent opened a new pull request, #12413: URL: https://github.com/apache/lucene/pull/12413 We have some weird behavior in HNSW searcher when finding the candidate entry point for the zeroth layer. While trying to find the best entry point to gather the full candidate sets, we don't filter based on the acceptableOrds bitset. Consequently, if we exist the search early (before hitting the zeroth layer), the results that are returned may contain documents NOT within that bitset. Luckily since the results are marked as incomplete, the `*VectorQuery` logic switches back to an exact scan and throws away the results. However, if any user called the leaf searcher directly, bypassing the query, they could run into this bug. I ran performance tests and there were no significant latency increases. There do seem to be observable latency decreases though at higher `maxConn` levels. I am getting slightly different recall. Usually better by 0.001, but worse by `0.001` on glove 100 with ``` nDoc fanout maxConn beamWidth 10 20 96 500 120 ``` so I am digging into why that may be. Any help there is appreciated. Data (lucene util knnPerf): ``` dim = 100 doc_vectors = constants.GLOVE_VECTOR_DOCS_FILE query_vectors = '%s/util/tasks/vector-task-100d.vec' % constants.BASE_DIR ``` Settings ran: ``` VALUES = { 'ndoc': (10,), 'maxConn': (32, 96), 'beamWidthIndex': (250, 500,), 'fanout': (20, 100,), } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org