[GitHub] [lucene] abhirkz commented on issue #9883: Add IndexSearcher Level Query Max Clause Count Limits [LUCENE-8839]
abhirkz commented on issue #9883: URL: https://github.com/apache/lucene/issues/9883#issuecomment-1586719249 Any update on this ? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #12281: Add checks in KNNVectorField / KNNVectorQuery to only allow non-null, non-empty and finite vectors
uschindler commented on PR #12281: URL: https://github.com/apache/lucene/pull/12281#issuecomment-1586750531 Hi, so this looks like 2 different issues: - incorrect validation of query and indexed field contents. - some vectors cause NaN when the cosine is calculated. I am not sure how this can happen, because cosine should only be NaN when the argument of the square root is negative or NaN. To me this indeed looks like a rounding error. Maybe we should guard the cosine to not allow negative numbers. This can only happen for special cases, so maybe `Math.abs()` is fine to work around. Actually the cosine should never be outside `[-1 .. 1]`. I think instead of throwing an exception (which would be crazy to the end user we should make sure that due to rounding errors the argument of the cosine cannot get negative. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #12281: Add checks in KNNVectorField / KNNVectorQuery to only allow non-null, non-empty and finite vectors
uschindler commented on PR #12281: URL: https://github.com/apache/lucene/pull/12281#issuecomment-1586774660 I think I know why it happens, in both providers we calculate the cosine like that and when applying the sqrt it gets negative for one of those reasons: https://github.com/apache/lucene/blob/ef35e6edf4d38d88603b6ca2540f53913c5d9c2d/lucene/core/src/java/org/apache/lucene/util/VectorUtilDefaultProvider.java#L105 - norm1 or norm2 is negative - we don't cast both factors to double before multiplication, so it overflows the exponent of the float The other similarity functions are not affected, but we can possibly add some guard her by casting both floats to double before multiplying them. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #12281: Add checks in KNNVectorField / KNNVectorQuery to only allow non-null, non-empty and finite vectors
uschindler commented on PR #12281: URL: https://github.com/apache/lucene/pull/12281#issuecomment-1586803881 I debugged through it: The vector causes `norm1` and `norm2`, as well as `sum` to get `Infinity`. `Infinity/Infinity` results in `NaN`. So it is not caused by sqrt. In general you are right this might happen with any vector if the exponent overflows while summing up the component squares. I am not sure how to work around that. Maybe it should return 1, if the sqrt gets infinity? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #12281: Add checks in KNNVectorField / KNNVectorQuery to only allow non-null, non-empty and finite vectors
uschindler commented on PR #12281: URL: https://github.com/apache/lucene/pull/12281#issuecomment-1587138822 I think the related issue found with some vectors creating an NaN cosine (happens when the floats are too large by exponent and the result gets infinity after multiplication) is a separate one. I think we should open an issue. This vector causes the assert to trigger: ``` public void testCosineNaN() { final float[] v = new float[] { 1.E31f }; assertEquals(1f, VectorUtil.cosine(v, v), DELTA); } ``` I figured out that I did not check explicitely for "empty" vectors. I may add a test for that. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #12281: Add checks in KNNVectorField / KNNVectorQuery to only allow non-null, non-empty and finite vectors
uschindler commented on PR #12281: URL: https://github.com/apache/lucene/pull/12281#issuecomment-1587147803 I think we could decide to also disallow such vectors. If the square of one its components gets infinity it should maybe also be rejected. What do you think? ```java float y= 1e31f * 1e31f; System.err.println(y); // prints "Infinity" ``` Maybe we change the infinite check to use square? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #12281: Add checks in KNNVectorField / KNNVectorQuery to only allow non-null, non-empty and finite vectors
uschindler commented on PR #12281: URL: https://github.com/apache/lucene/pull/12281#issuecomment-1587159643 Actually with the current similarity methods we should make sure that for each vector component `v`, the following is true: `Math.isFinite(v * v * vector.length)`. This is quite easy to implement in the isFinite check, -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #12281: Add checks in KNNVectorField / KNNVectorQuery to only allow non-null, non-empty and finite vectors
uschindler commented on PR #12281: URL: https://github.com/apache/lucene/pull/12281#issuecomment-1587183519 I modified the function like that, but have not yet committed: ```java /** * Checks if a float vector only has finite components and the square of its components multiplied by vector dimensions is finite. * * @param v bytes containing a vector * @return the vector for call-chaining * @throws IllegalArgumentException if any component of vector violates the contract */ public static float[] checkFinite(float[] v) { for (int i = 0; i < v.length; i++) { float f = v[i] * v[i] * v.length; if (!Float.isFinite(f)) { throw new IllegalArgumentException("non-finite or too large (with respect to dimensions) value at vector[" + i + "]=" + v[i]); } } return v; } ``` What do you think? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #12281: Add checks in KNNVectorField / KNNVectorQuery to only allow non-null, non-empty and finite vectors
uschindler commented on PR #12281: URL: https://github.com/apache/lucene/pull/12281#issuecomment-158716 The above check should auto-vectorize in Hotspot, so the check during indexing/searching should be cheap. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] benwtrent commented on a diff in pull request #12253: GITHUB-12252: Add function queries for computing similarity scores between knn vectors
benwtrent commented on code in PR #12253: URL: https://github.com/apache/lucene/pull/12253#discussion_r1226584452 ## lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/KnnVectorFieldSource.java: ## @@ -0,0 +1,112 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.queries.function.valuesource; + +import java.io.IOException; +import java.util.Map; +import org.apache.lucene.index.ByteVectorValues; +import org.apache.lucene.index.FloatVectorValues; +import org.apache.lucene.index.LeafReaderContext; +import org.apache.lucene.queries.function.FunctionValues; +import org.apache.lucene.queries.function.ValueSource; +import org.apache.lucene.search.DocIdSetIterator; + +/** An implementation for retrieving {@link FunctionValues} instances for knn vectors fields. */ +public class KnnVectorFieldSource extends ValueSource { + private final String fieldName; + + public KnnVectorFieldSource(String fieldName) { +this.fieldName = fieldName; + } + + @Override + public FunctionValues getValues(Map context, LeafReaderContext readerContext) + throws IOException { + +final FloatVectorValues floatValues = readerContext.reader().getFloatVectorValues(fieldName); +final ByteVectorValues byteValues = readerContext.reader().getByteVectorValues(fieldName); + +if (floatValues == null && byteValues == null) { + throw new IllegalArgumentException( + "no vector value is indexed for field '" + fieldName + "'"); +} +return new FunctionValues() { + int lastDocID; + + @Override + public float[] floatVectorVal(int doc) throws IOException { +float[] floatVector = null; +if (floatValues != null && exists(floatValues, doc)) { + floatVector = floatValues.vectorValue(); +} else { + byte[] byteVector = byteVectorVal(doc); + if (byteVector != null) { +floatVector = new float[byteVector.length]; +for (int i = 0; i < byteVector.length; i++) { + floatVector[i] = byteVector[i]; +} + } +} +return floatVector; + } + + @Override + public byte[] byteVectorVal(int doc) throws IOException { +byte[] bytesVector = null; +if (byteValues != null && exists(byteValues, doc)) { + bytesVector = byteValues.vectorValue(); +} +return bytesVector; Review Comment: This further shows that we should not do under the hood casting. Casting a `byte` encoded vector to float should mean we can then cast a `float` encoded as `byte` for the user. ## lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/KnnVectorValueSource.java: ## @@ -0,0 +1,99 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.queries.function.valuesource; + +import java.io.IOException; +import java.util.List; +import java.util.Map; +import java.util.Objects; +import org.apache.lucene.index.LeafReaderContext; +import org.apache.lucene.queries.function.FunctionValues; +import org.apache.lucene.queries.function.ValueSource; + +/** function that returns a constant vector value for every document. */ +public class KnnVectorValueSource extends ValueSource { Review Comment: I am not sure the need of this class at all. However, we should indicate its constant in the name. Additionally, we should separate out the `float[]
[GitHub] [lucene] benwtrent commented on pull request #12253: GITHUB-12252: Add function queries for computing similarity scores between knn vectors
benwtrent commented on PR #12253: URL: https://github.com/apache/lucene/pull/12253#issuecomment-1587256870 @alessandrobenedetti , I see many `to discuss` messages, but no discussion clearly labeled in Lucene or the dev lists. My searching skills may be failing me. Could you link these discussions or at least highlight the reasoning for the decisions made and which decisions were made? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a diff in pull request #12253: GITHUB-12252: Add function queries for computing similarity scores between knn vectors
uschindler commented on code in PR #12253: URL: https://github.com/apache/lucene/pull/12253#discussion_r1226632668 ## lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/KnnVectorFieldSource.java: ## @@ -0,0 +1,112 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.queries.function.valuesource; + +import java.io.IOException; +import java.util.Map; +import org.apache.lucene.index.ByteVectorValues; +import org.apache.lucene.index.FloatVectorValues; +import org.apache.lucene.index.LeafReaderContext; +import org.apache.lucene.queries.function.FunctionValues; +import org.apache.lucene.queries.function.ValueSource; +import org.apache.lucene.search.DocIdSetIterator; + +/** An implementation for retrieving {@link FunctionValues} instances for knn vectors fields. */ +public class KnnVectorFieldSource extends ValueSource { + private final String fieldName; + + public KnnVectorFieldSource(String fieldName) { +this.fieldName = fieldName; + } + + @Override + public FunctionValues getValues(Map context, LeafReaderContext readerContext) + throws IOException { + +final FloatVectorValues floatValues = readerContext.reader().getFloatVectorValues(fieldName); +final ByteVectorValues byteValues = readerContext.reader().getByteVectorValues(fieldName); + +if (floatValues == null && byteValues == null) { + throw new IllegalArgumentException( + "no vector value is indexed for field '" + fieldName + "'"); +} +return new FunctionValues() { + int lastDocID; + + @Override + public float[] floatVectorVal(int doc) throws IOException { +float[] floatVector = null; +if (floatValues != null && exists(floatValues, doc)) { + floatVector = floatValues.vectorValue(); +} else { + byte[] byteVector = byteVectorVal(doc); + if (byteVector != null) { +floatVector = new float[byteVector.length]; +for (int i = 0; i < byteVector.length; i++) { + floatVector[i] = byteVector[i]; +} + } +} +return floatVector; + } Review Comment: Yes, please do not do this! You cannot simply cast a byte[] to float without transforming its value in a valid way. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a diff in pull request #12253: GITHUB-12252: Add function queries for computing similarity scores between knn vectors
uschindler commented on code in PR #12253: URL: https://github.com/apache/lucene/pull/12253#discussion_r1226634599 ## lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/KnnVectorFieldSource.java: ## @@ -0,0 +1,112 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.queries.function.valuesource; + +import java.io.IOException; +import java.util.Map; +import org.apache.lucene.index.ByteVectorValues; +import org.apache.lucene.index.FloatVectorValues; +import org.apache.lucene.index.LeafReaderContext; +import org.apache.lucene.queries.function.FunctionValues; +import org.apache.lucene.queries.function.ValueSource; +import org.apache.lucene.search.DocIdSetIterator; + +/** An implementation for retrieving {@link FunctionValues} instances for knn vectors fields. */ +public class KnnVectorFieldSource extends ValueSource { + private final String fieldName; + + public KnnVectorFieldSource(String fieldName) { +this.fieldName = fieldName; + } + + @Override + public FunctionValues getValues(Map context, LeafReaderContext readerContext) + throws IOException { + +final FloatVectorValues floatValues = readerContext.reader().getFloatVectorValues(fieldName); +final ByteVectorValues byteValues = readerContext.reader().getByteVectorValues(fieldName); + +if (floatValues == null && byteValues == null) { + throw new IllegalArgumentException( + "no vector value is indexed for field '" + fieldName + "'"); +} +return new FunctionValues() { + int lastDocID; + + @Override + public float[] floatVectorVal(int doc) throws IOException { +float[] floatVector = null; +if (floatValues != null && exists(floatValues, doc)) { + floatVector = floatValues.vectorValue(); +} else { + byte[] byteVector = byteVectorVal(doc); + if (byteVector != null) { +floatVector = new float[byteVector.length]; +for (int i = 0; i < byteVector.length; i++) { + floatVector[i] = byteVector[i]; +} + } +} +return floatVector; + } Review Comment: It should throw exception that the field type does not match the query vector. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] eliaporciani commented on a diff in pull request #12253: GITHUB-12252: Add function queries for computing similarity scores between knn vectors
eliaporciani commented on code in PR #12253: URL: https://github.com/apache/lucene/pull/12253#discussion_r1226642488 ## lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/KnnVectorFieldSource.java: ## @@ -0,0 +1,112 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.queries.function.valuesource; + +import java.io.IOException; +import java.util.Map; +import org.apache.lucene.index.ByteVectorValues; +import org.apache.lucene.index.FloatVectorValues; +import org.apache.lucene.index.LeafReaderContext; +import org.apache.lucene.queries.function.FunctionValues; +import org.apache.lucene.queries.function.ValueSource; +import org.apache.lucene.search.DocIdSetIterator; + +/** An implementation for retrieving {@link FunctionValues} instances for knn vectors fields. */ +public class KnnVectorFieldSource extends ValueSource { + private final String fieldName; + + public KnnVectorFieldSource(String fieldName) { +this.fieldName = fieldName; + } + + @Override + public FunctionValues getValues(Map context, LeafReaderContext readerContext) + throws IOException { + +final FloatVectorValues floatValues = readerContext.reader().getFloatVectorValues(fieldName); +final ByteVectorValues byteValues = readerContext.reader().getByteVectorValues(fieldName); + +if (floatValues == null && byteValues == null) { + throw new IllegalArgumentException( + "no vector value is indexed for field '" + fieldName + "'"); +} +return new FunctionValues() { + int lastDocID; + + @Override + public float[] floatVectorVal(int doc) throws IOException { +float[] floatVector = null; +if (floatValues != null && exists(floatValues, doc)) { + floatVector = floatValues.vectorValue(); +} else { + byte[] byteVector = byteVectorVal(doc); + if (byteVector != null) { +floatVector = new float[byteVector.length]; +for (int i = 0; i < byteVector.length; i++) { + floatVector[i] = byteVector[i]; +} + } +} +return floatVector; + } Review Comment: Initially, we had the two implementations separated for byte and float vectors. This is the commit in which we changed the approach [cb4b77c9f72551cb014d155586e399365fbcba5d](https://github.com/apache/lucene/pull/12253/commits/cb4b77c9f72551cb014d155586e399365fbcba5d). We decided to make an extra step and check if it was possible to abstract over the vector encoding in order to provide a simpler API to the caller. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] benwtrent commented on pull request #12281: Add checks in KNNVectorField / KNNVectorQuery to only allow non-null, non-empty and finite vectors
benwtrent commented on PR #12281: URL: https://github.com/apache/lucene/pull/12281#issuecomment-1587319629 @uschindler I think this change (`v[i] * v[i] * v.length`) is getting complicated. It really makes me think about if we should do any complex infinite checking other than `isFinite(float)`. For example, this check would prevent vectors that may be valid for `squareDistance`. If a specific similarity has `NaN` issues, than a more complicated `isFinite` check should account for those. But, this then opens the door to how `KnnFloatVectorQuery` would even know about the similarity used, or how that could be checked at all. I would prefer us doing the simple float value validation for now and we open an issue to see if we can get a better API for checking similarity infinity checks (e.g. `squareDistance` vs `cosine`) and if we should have those at all. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] eliaporciani commented on a diff in pull request #12253: GITHUB-12252: Add function queries for computing similarity scores between knn vectors
eliaporciani commented on code in PR #12253: URL: https://github.com/apache/lucene/pull/12253#discussion_r1226655236 ## lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/KnnVectorFieldSource.java: ## @@ -0,0 +1,112 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.queries.function.valuesource; + +import java.io.IOException; +import java.util.Map; +import org.apache.lucene.index.ByteVectorValues; +import org.apache.lucene.index.FloatVectorValues; +import org.apache.lucene.index.LeafReaderContext; +import org.apache.lucene.queries.function.FunctionValues; +import org.apache.lucene.queries.function.ValueSource; +import org.apache.lucene.search.DocIdSetIterator; + +/** An implementation for retrieving {@link FunctionValues} instances for knn vectors fields. */ +public class KnnVectorFieldSource extends ValueSource { + private final String fieldName; + + public KnnVectorFieldSource(String fieldName) { +this.fieldName = fieldName; + } + + @Override + public FunctionValues getValues(Map context, LeafReaderContext readerContext) + throws IOException { + +final FloatVectorValues floatValues = readerContext.reader().getFloatVectorValues(fieldName); +final ByteVectorValues byteValues = readerContext.reader().getByteVectorValues(fieldName); + +if (floatValues == null && byteValues == null) { + throw new IllegalArgumentException( + "no vector value is indexed for field '" + fieldName + "'"); +} +return new FunctionValues() { + int lastDocID; + + @Override + public float[] floatVectorVal(int doc) throws IOException { +float[] floatVector = null; +if (floatValues != null && exists(floatValues, doc)) { + floatVector = floatValues.vectorValue(); +} else { + byte[] byteVector = byteVectorVal(doc); + if (byteVector != null) { +floatVector = new float[byteVector.length]; +for (int i = 0; i < byteVector.length; i++) { + floatVector[i] = byteVector[i]; +} + } +} +return floatVector; + } Review Comment: Initially, we had separate implementations for byte and float vectors. The change in approach can be seen in the commit [cb4b77c9f72551cb014d155586e399365fbcba5d](https://github.com/apache/lucene/pull/12253/commits/cb4b77c9f72551cb014d155586e399365fbcba5d). We made the decision to explore the possibility of abstracting over the vector encoding to create a simpler API for the caller. There were two main reasons for pursuing this change: - Code duplication: Having separate implementations resulted in duplicate code for byte and float vectors. - Difficulty in using float[] and byte[] vectors interchangeably: The knn vectors, whether byte or float, essentially represented the same field. The difference lay in the underlying encoding and the precision used to store the vectors. The decision to use the byte vector format over the float vector format was primarily driven by non-functional aspects such as space occupancy and performance. The proposed approach strikes a good balance in terms of abstraction level for both types of vectors. If there are compelling reasons for not proceeding with this approach, we could consider reverting to the point where the two implementations were separate. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] benwtrent commented on a diff in pull request #12253: GITHUB-12252: Add function queries for computing similarity scores between knn vectors
benwtrent commented on code in PR #12253: URL: https://github.com/apache/lucene/pull/12253#discussion_r1226688194 ## lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/KnnVectorFieldSource.java: ## @@ -0,0 +1,112 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.queries.function.valuesource; + +import java.io.IOException; +import java.util.Map; +import org.apache.lucene.index.ByteVectorValues; +import org.apache.lucene.index.FloatVectorValues; +import org.apache.lucene.index.LeafReaderContext; +import org.apache.lucene.queries.function.FunctionValues; +import org.apache.lucene.queries.function.ValueSource; +import org.apache.lucene.search.DocIdSetIterator; + +/** An implementation for retrieving {@link FunctionValues} instances for knn vectors fields. */ +public class KnnVectorFieldSource extends ValueSource { + private final String fieldName; + + public KnnVectorFieldSource(String fieldName) { +this.fieldName = fieldName; + } + + @Override + public FunctionValues getValues(Map context, LeafReaderContext readerContext) + throws IOException { + +final FloatVectorValues floatValues = readerContext.reader().getFloatVectorValues(fieldName); +final ByteVectorValues byteValues = readerContext.reader().getByteVectorValues(fieldName); + +if (floatValues == null && byteValues == null) { + throw new IllegalArgumentException( + "no vector value is indexed for field '" + fieldName + "'"); +} +return new FunctionValues() { + int lastDocID; + + @Override + public float[] floatVectorVal(int doc) throws IOException { +float[] floatVector = null; +if (floatValues != null && exists(floatValues, doc)) { + floatVector = floatValues.vectorValue(); +} else { + byte[] byteVector = byteVectorVal(doc); + if (byteVector != null) { +floatVector = new float[byteVector.length]; +for (int i = 0; i < byteVector.length; i++) { + floatVector[i] = byteVector[i]; +} + } +} +return floatVector; + } Review Comment: > Code duplication: Having separate implementations resulted in duplicate code for byte and float vectors. This is a non-issue. A consistent API is way better than implementing "DRY". > Difficulty in using float[] and byte[] vectors interchangeably: They should not be used interchangeably. `float[]` should return `float[]`, `byte[]` should return `byte[]`. Users stored them that way, that is the way they should be retrieved. > The knn vectors, whether byte or float, essentially represented the same field This is making a big assumption on the user. I don't understand this jump in logic. They are not the same field nor the same field type. > The difference lay in the underlying encoding and the precision used to store the vectors This is not true. `byte[]` means you stored them as `int8` values. They COULD be quantized `float[]` values, or the user just has `byte[]` values. Linear quantization could be construed as a "precision" thing, but that is simplifying what quantization is, and how its implemented. As an argument against "just precision" users should NEVER attempt to store something like `float[]{1.22348f, 0.2384f}` as a `byte` encoded vector without some quantization process in between creating the vector and storing them. Simply truncating is bad, will cause errors, and is a surprising result. > The decision to use the byte vector format over the float vector format was primarily driven by non-functional aspects such as space occupancy and performance. This is partially true. Space and performance are important, but casting between `float` and `byte` breaks an otherwise strong contract with the API calls. Additionally, I would say that generally, there are other valid reasons for `int8` storage other than "Hey I quantized the output of some floats". What if the user just wanted to store `int8` because it worked best for their data? > If there are compelling reasons for not proceeding with this app
[GitHub] [lucene] uschindler commented on pull request #12281: Add checks in KNNVectorField / KNNVectorQuery to only allow non-null, non-empty and finite vectors
uschindler commented on PR #12281: URL: https://github.com/apache/lucene/pull/12281#issuecomment-1587489458 That would be my plan! Let's open a new issue and discuss that there. So we should merge this PR for now. Before doing that I will only add the "vector size>0" check in the API, because that may be missing at some places. A zero length vector also causes havoc. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] eliaporciani commented on a diff in pull request #12253: GITHUB-12252: Add function queries for computing similarity scores between knn vectors
eliaporciani commented on code in PR #12253: URL: https://github.com/apache/lucene/pull/12253#discussion_r1226796096 ## lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/KnnVectorFieldSource.java: ## @@ -0,0 +1,112 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.queries.function.valuesource; + +import java.io.IOException; +import java.util.Map; +import org.apache.lucene.index.ByteVectorValues; +import org.apache.lucene.index.FloatVectorValues; +import org.apache.lucene.index.LeafReaderContext; +import org.apache.lucene.queries.function.FunctionValues; +import org.apache.lucene.queries.function.ValueSource; +import org.apache.lucene.search.DocIdSetIterator; + +/** An implementation for retrieving {@link FunctionValues} instances for knn vectors fields. */ +public class KnnVectorFieldSource extends ValueSource { + private final String fieldName; + + public KnnVectorFieldSource(String fieldName) { +this.fieldName = fieldName; + } + + @Override + public FunctionValues getValues(Map context, LeafReaderContext readerContext) + throws IOException { + +final FloatVectorValues floatValues = readerContext.reader().getFloatVectorValues(fieldName); +final ByteVectorValues byteValues = readerContext.reader().getByteVectorValues(fieldName); + +if (floatValues == null && byteValues == null) { + throw new IllegalArgumentException( + "no vector value is indexed for field '" + fieldName + "'"); +} +return new FunctionValues() { + int lastDocID; + + @Override + public float[] floatVectorVal(int doc) throws IOException { +float[] floatVector = null; +if (floatValues != null && exists(floatValues, doc)) { + floatVector = floatValues.vectorValue(); +} else { + byte[] byteVector = byteVectorVal(doc); + if (byteVector != null) { +floatVector = new float[byteVector.length]; +for (int i = 0; i < byteVector.length; i++) { + floatVector[i] = byteVector[i]; +} + } +} +return floatVector; + } Review Comment: As mentioned earlier, I appreciate your input, and I would like to further explain the rationale behind our attempt to abstract over the two encodings. I understand your concerns regarding quantization, and I agree that in the majority of cases, casting from byte to float may not be meaningful. However, it's important to note that the single API we proposed does not restrict users from utilizing the API appropriately in the case of quantization. Regarding code duplication, I think that it can be problematic. Currently, we have two encodings, but there may be a need to introduce additional encodings in the future. Duplicating the same code for each encoding increases the likelihood of errors in future contributions. I want to emphasize that I do not hold a strong opinion about which approach is the best. I acknowledge that there are advantages and disadvantages to both options. I am happy to revert the last commits and provide separate implementations for byte and float vectors. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] eliaporciani commented on a diff in pull request #12253: GITHUB-12252: Add function queries for computing similarity scores between knn vectors
eliaporciani commented on code in PR #12253: URL: https://github.com/apache/lucene/pull/12253#discussion_r1226796096 ## lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/KnnVectorFieldSource.java: ## @@ -0,0 +1,112 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.queries.function.valuesource; + +import java.io.IOException; +import java.util.Map; +import org.apache.lucene.index.ByteVectorValues; +import org.apache.lucene.index.FloatVectorValues; +import org.apache.lucene.index.LeafReaderContext; +import org.apache.lucene.queries.function.FunctionValues; +import org.apache.lucene.queries.function.ValueSource; +import org.apache.lucene.search.DocIdSetIterator; + +/** An implementation for retrieving {@link FunctionValues} instances for knn vectors fields. */ +public class KnnVectorFieldSource extends ValueSource { + private final String fieldName; + + public KnnVectorFieldSource(String fieldName) { +this.fieldName = fieldName; + } + + @Override + public FunctionValues getValues(Map context, LeafReaderContext readerContext) + throws IOException { + +final FloatVectorValues floatValues = readerContext.reader().getFloatVectorValues(fieldName); +final ByteVectorValues byteValues = readerContext.reader().getByteVectorValues(fieldName); + +if (floatValues == null && byteValues == null) { + throw new IllegalArgumentException( + "no vector value is indexed for field '" + fieldName + "'"); +} +return new FunctionValues() { + int lastDocID; + + @Override + public float[] floatVectorVal(int doc) throws IOException { +float[] floatVector = null; +if (floatValues != null && exists(floatValues, doc)) { + floatVector = floatValues.vectorValue(); +} else { + byte[] byteVector = byteVectorVal(doc); + if (byteVector != null) { +floatVector = new float[byteVector.length]; +for (int i = 0; i < byteVector.length; i++) { + floatVector[i] = byteVector[i]; +} + } +} +return floatVector; + } Review Comment: As mentioned earlier, I appreciate your input, and I would like to further explain the rationale behind our attempt to abstract over the two encodings. I understand your concerns regarding quantization, and I agree that in the majority of cases, casting from byte to float may not be meaningful. However, it's important to note that the single API we proposed does not restrict users from utilizing the API appropriately in the case of quantization. Regarding code duplication, I think that it can be problematic. Currently, we have two encodings, but there may be a need to introduce additional encodings in the future. Duplicating the same code for each encoding increases the likelihood of errors in future contributions. I want to emphasize that I have not a strong opinion about which approach is the best. I acknowledge that there are advantages and disadvantages to both options. I am happy to revert the last commits and provide separate implementations for byte and float vectors. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] cfournie commented on a diff in pull request #12249: Restrict GraphTokenStreamFiniteStrings#articulationPointsRecurse recursion depth
cfournie commented on code in PR #12249: URL: https://github.com/apache/lucene/pull/12249#discussion_r1226823890 ## lucene/core/src/test/org/apache/lucene/util/graph/TestGraphTokenStreamFiniteStrings.java: ## @@ -16,6 +16,9 @@ */ package org.apache.lucene.util.graph; +import static org.apache.lucene.util.automaton.Operations.MAX_RECURSION_LEVEL; Review Comment: Removed, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] cfournie commented on a diff in pull request #12249: Restrict GraphTokenStreamFiniteStrings#articulationPointsRecurse recursion depth
cfournie commented on code in PR #12249: URL: https://github.com/apache/lucene/pull/12249#discussion_r1226824178 ## lucene/CHANGES.txt: ## @@ -80,6 +80,8 @@ Bug Fixes * GITHUB#12220: Hunspell: disallow hidden title-case entries from compound middle/end +* LUCENE-10181: Restrict GraphTokenStreamFiniteStrings#articulationPointsRecurse recursion depth. (Chris Fournier) Review Comment: Done; also rebased to pick up the latest changes in this file -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] benwtrent commented on a diff in pull request #12253: GITHUB-12252: Add function queries for computing similarity scores between knn vectors
benwtrent commented on code in PR #12253: URL: https://github.com/apache/lucene/pull/12253#discussion_r1226828445 ## lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/KnnVectorFieldSource.java: ## @@ -0,0 +1,112 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.queries.function.valuesource; + +import java.io.IOException; +import java.util.Map; +import org.apache.lucene.index.ByteVectorValues; +import org.apache.lucene.index.FloatVectorValues; +import org.apache.lucene.index.LeafReaderContext; +import org.apache.lucene.queries.function.FunctionValues; +import org.apache.lucene.queries.function.ValueSource; +import org.apache.lucene.search.DocIdSetIterator; + +/** An implementation for retrieving {@link FunctionValues} instances for knn vectors fields. */ +public class KnnVectorFieldSource extends ValueSource { + private final String fieldName; + + public KnnVectorFieldSource(String fieldName) { +this.fieldName = fieldName; + } + + @Override + public FunctionValues getValues(Map context, LeafReaderContext readerContext) + throws IOException { + +final FloatVectorValues floatValues = readerContext.reader().getFloatVectorValues(fieldName); +final ByteVectorValues byteValues = readerContext.reader().getByteVectorValues(fieldName); + +if (floatValues == null && byteValues == null) { + throw new IllegalArgumentException( + "no vector value is indexed for field '" + fieldName + "'"); +} +return new FunctionValues() { + int lastDocID; + + @Override + public float[] floatVectorVal(int doc) throws IOException { +float[] floatVector = null; +if (floatValues != null && exists(floatValues, doc)) { + floatVector = floatValues.vectorValue(); +} else { + byte[] byteVector = byteVectorVal(doc); + if (byteVector != null) { +floatVector = new float[byteVector.length]; +for (int i = 0; i < byteVector.length; i++) { + floatVector[i] = byteVector[i]; +} + } +} +return floatVector; + } Review Comment: > I understand your concerns regarding quantization, and I agree that in the majority of cases, casting from byte to float may not be meaningful. However, it's important to note that the single API we proposed does not restrict users from utilizing the API appropriately in the case of quantization. API's by definition restrict users to the correct use case. I think auto-casting is a bad idea. It gives users the ability to shoot themselves in the foot at runtime. > Currently, we have two encodings, but there may be a need to introduce additional encodings in the future. Duplicating the same code for each encoding increases the likelihood of errors in future contributions. I say we cross that bridge when we get to it. There are ways to deduplicate the code while still giving the users a nice, safe, and correct API. > I want to emphasize that I have not a strong opinion about which approach is the best. I acknowledge that there are advantages and disadvantages to both options. We need to do a Float version and a Byte version. They can rely on similar code to cover your duplication concerns (as long as this doesn't break the design). But, we should have consistency with the existing API design and protect users from making easy mistakes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #12281: Add checks in KNNVectorField / KNNVectorQuery to only allow non-null, non-empty and finite vectors
uschindler commented on PR #12281: URL: https://github.com/apache/lucene/pull/12281#issuecomment-1587551398 Hi, I added the dimension check to the constructor which uses a predefined field type. For the query it can't be done in the constructor, as we do not know the field type. The query will fail later, so an explicit check is not needed. I think this is ready to be merged. @jbellis we should open a separate issue to make sure that we find a solution for vectors where the float overflows (infinity) leading to infinite scores. I strongly disagree to use double math, we should maybe have some documentation checks like proposed above. We could also enforce it, but that's harder to decide! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a diff in pull request #12253: GITHUB-12252: Add function queries for computing similarity scores between knn vectors
uschindler commented on code in PR #12253: URL: https://github.com/apache/lucene/pull/12253#discussion_r1226849047 ## lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/KnnVectorFieldSource.java: ## @@ -0,0 +1,112 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.queries.function.valuesource; + +import java.io.IOException; +import java.util.Map; +import org.apache.lucene.index.ByteVectorValues; +import org.apache.lucene.index.FloatVectorValues; +import org.apache.lucene.index.LeafReaderContext; +import org.apache.lucene.queries.function.FunctionValues; +import org.apache.lucene.queries.function.ValueSource; +import org.apache.lucene.search.DocIdSetIterator; + +/** An implementation for retrieving {@link FunctionValues} instances for knn vectors fields. */ +public class KnnVectorFieldSource extends ValueSource { + private final String fieldName; + + public KnnVectorFieldSource(String fieldName) { +this.fieldName = fieldName; + } + + @Override + public FunctionValues getValues(Map context, LeafReaderContext readerContext) + throws IOException { + +final FloatVectorValues floatValues = readerContext.reader().getFloatVectorValues(fieldName); +final ByteVectorValues byteValues = readerContext.reader().getByteVectorValues(fieldName); + +if (floatValues == null && byteValues == null) { + throw new IllegalArgumentException( + "no vector value is indexed for field '" + fieldName + "'"); +} +return new FunctionValues() { + int lastDocID; + + @Override + public float[] floatVectorVal(int doc) throws IOException { +float[] floatVector = null; +if (floatValues != null && exists(floatValues, doc)) { + floatVector = floatValues.vectorValue(); +} else { + byte[] byteVector = byteVectorVal(doc); + if (byteVector != null) { +floatVector = new float[byteVector.length]; +for (int i = 0; i < byteVector.length; i++) { + floatVector[i] = byteVector[i]; +} + } +} +return floatVector; + } Review Comment: I fully agree with @benwtrent. We also have separate queries for byte and float vectors, so we should not mix the types together in function queries. It wil really hurt users. I know Solr people tend to sometimes "try to change their schema", but this is a no-go for byte and float vectors as other queries will fail but this one suddenly works magically. No, we should not do this. Field types in Lucene were relaxed at beginning but we spent more and more time into making them more strict. We only have some autoconverts in docvalues (e.g., SortedDocValues can be queries and sorted using SortedSetDocValues which simplifies code - bt there is no surprises and loss or convert of information). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on a diff in pull request #12249: Restrict GraphTokenStreamFiniteStrings#articulationPointsRecurse recursion depth
jpountz commented on code in PR #12249: URL: https://github.com/apache/lucene/pull/12249#discussion_r1226846973 ## lucene/core/src/java/org/apache/lucene/util/graph/GraphTokenStreamFiniteStrings.java: ## @@ -45,6 +45,8 @@ * different paths of the {@link Automaton}. */ public final class GraphTokenStreamFiniteStrings { + /** Maximum level of recursion allowed in recursive operations. */ + public static final int MAX_RECURSION_LEVEL = 1000; Review Comment: let's make it private? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #12294: Implement MMapDirectory with Java 21 Project Panama Preview API
uschindler commented on PR #12294: URL: https://github.com/apache/lucene/pull/12294#issuecomment-1587577046 I talked with @mcimadamore in the meantime. At least for the Panama Foreign API (FFI) there are no changes planned anymore, so this is safe to merge into main branch. Keep in mind: It could always happen that there is a major API bug that requires a change, but this sounds very unlikely to me. There is still a PR to update FFI Javadocs, but the API is final. For the vector API (which will be a separate PR), I see no problem. The vector part is uncritical, as users have to "opt in". If the API changes till JDK 21 release, it won't hurt anybody. We can update the impl in a followup release so it works. @ChrisHegarty, are you fine with merging this now, so we can port the vector part? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mikemccand commented on pull request #12357: Better paging when random reads go backwards
mikemccand commented on PR #12357: URL: https://github.com/apache/lucene/pull/12357#issuecomment-1587578171 `wikimedium10k` is a tiny corpus, really just for quick testing that your benchy is setup correctly. If you look at the QPS of each task they are ridiculously high :) In the future it's better to run on at least `wikimedium10M` or `wikimediumall`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jbellis commented on pull request #12281: Add checks in KNNVectorField / KNNVectorQuery to only allow non-null, non-empty and finite vectors
jbellis commented on PR #12281: URL: https://github.com/apache/lucene/pull/12281#issuecomment-1587588219 SGTM, thank you for taking the lead on investigating the root cause of the NaNs! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov commented on pull request #12360: Update comment about IndexOptions ordinals
msokolov commented on PR #12360: URL: https://github.com/apache/lucene/pull/12360#issuecomment-1587648171 It sounds to me as if order is no longer important. We wouldn't change the order because these are stored in the index, but if say someone was adding a new option for some reason, they shouldn't be encouraged to try to maintain a priority order? Just at it at the end right? I think you could entirely delete the comment? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] alessandrobenedetti commented on a diff in pull request #12253: GITHUB-12252: Add function queries for computing similarity scores between knn vectors
alessandrobenedetti commented on code in PR #12253: URL: https://github.com/apache/lucene/pull/12253#discussion_r1226909040 ## lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/KnnVectorFieldSource.java: ## @@ -0,0 +1,112 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.queries.function.valuesource; + +import java.io.IOException; +import java.util.Map; +import org.apache.lucene.index.ByteVectorValues; +import org.apache.lucene.index.FloatVectorValues; +import org.apache.lucene.index.LeafReaderContext; +import org.apache.lucene.queries.function.FunctionValues; +import org.apache.lucene.queries.function.ValueSource; +import org.apache.lucene.search.DocIdSetIterator; + +/** An implementation for retrieving {@link FunctionValues} instances for knn vectors fields. */ +public class KnnVectorFieldSource extends ValueSource { + private final String fieldName; + + public KnnVectorFieldSource(String fieldName) { +this.fieldName = fieldName; + } + + @Override + public FunctionValues getValues(Map context, LeafReaderContext readerContext) + throws IOException { + +final FloatVectorValues floatValues = readerContext.reader().getFloatVectorValues(fieldName); +final ByteVectorValues byteValues = readerContext.reader().getByteVectorValues(fieldName); + +if (floatValues == null && byteValues == null) { + throw new IllegalArgumentException( + "no vector value is indexed for field '" + fieldName + "'"); +} +return new FunctionValues() { + int lastDocID; + + @Override + public float[] floatVectorVal(int doc) throws IOException { +float[] floatVector = null; +if (floatValues != null && exists(floatValues, doc)) { + floatVector = floatValues.vectorValue(); +} else { + byte[] byteVector = byteVectorVal(doc); + if (byteVector != null) { +floatVector = new float[byteVector.length]; +for (int i = 0; i < byteVector.length; i++) { + floatVector[i] = byteVector[i]; +} + } +} +return floatVector; + } Review Comment: > Yes, please do not do this! You cannot simply cast a byte[] to float without transforming its value in a valid way. It was not converting a byte[] to float, it was converting a byte (containing a number between -128 and +127) to a float: floatVector[i] = byteVector[i]; What would be the valid way of doing it? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] alessandrobenedetti commented on a diff in pull request #12253: GITHUB-12252: Add function queries for computing similarity scores between knn vectors
alessandrobenedetti commented on code in PR #12253: URL: https://github.com/apache/lucene/pull/12253#discussion_r1226910532 ## lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/KnnVectorFieldSource.java: ## @@ -0,0 +1,112 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.queries.function.valuesource; + +import java.io.IOException; +import java.util.Map; +import org.apache.lucene.index.ByteVectorValues; +import org.apache.lucene.index.FloatVectorValues; +import org.apache.lucene.index.LeafReaderContext; +import org.apache.lucene.queries.function.FunctionValues; +import org.apache.lucene.queries.function.ValueSource; +import org.apache.lucene.search.DocIdSetIterator; + +/** An implementation for retrieving {@link FunctionValues} instances for knn vectors fields. */ +public class KnnVectorFieldSource extends ValueSource { + private final String fieldName; + + public KnnVectorFieldSource(String fieldName) { +this.fieldName = fieldName; + } + + @Override + public FunctionValues getValues(Map context, LeafReaderContext readerContext) + throws IOException { + +final FloatVectorValues floatValues = readerContext.reader().getFloatVectorValues(fieldName); +final ByteVectorValues byteValues = readerContext.reader().getByteVectorValues(fieldName); + +if (floatValues == null && byteValues == null) { + throw new IllegalArgumentException( + "no vector value is indexed for field '" + fieldName + "'"); +} +return new FunctionValues() { + int lastDocID; + + @Override + public float[] floatVectorVal(int doc) throws IOException { +float[] floatVector = null; +if (floatValues != null && exists(floatValues, doc)) { + floatVector = floatValues.vectorValue(); +} else { + byte[] byteVector = byteVectorVal(doc); + if (byteVector != null) { +floatVector = new float[byteVector.length]; +for (int i = 0; i < byteVector.length; i++) { + floatVector[i] = byteVector[i]; +} + } +} +return floatVector; + } Review Comment: thanks, everyone for the valuable feedback, @eliaporciani will revert the pull request to the commit where we had the working version with duplicated byte/float code in different classes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] alessandrobenedetti commented on a diff in pull request #12253: GITHUB-12252: Add function queries for computing similarity scores between knn vectors
alessandrobenedetti commented on code in PR #12253: URL: https://github.com/apache/lucene/pull/12253#discussion_r1226910532 ## lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/KnnVectorFieldSource.java: ## @@ -0,0 +1,112 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.queries.function.valuesource; + +import java.io.IOException; +import java.util.Map; +import org.apache.lucene.index.ByteVectorValues; +import org.apache.lucene.index.FloatVectorValues; +import org.apache.lucene.index.LeafReaderContext; +import org.apache.lucene.queries.function.FunctionValues; +import org.apache.lucene.queries.function.ValueSource; +import org.apache.lucene.search.DocIdSetIterator; + +/** An implementation for retrieving {@link FunctionValues} instances for knn vectors fields. */ +public class KnnVectorFieldSource extends ValueSource { + private final String fieldName; + + public KnnVectorFieldSource(String fieldName) { +this.fieldName = fieldName; + } + + @Override + public FunctionValues getValues(Map context, LeafReaderContext readerContext) + throws IOException { + +final FloatVectorValues floatValues = readerContext.reader().getFloatVectorValues(fieldName); +final ByteVectorValues byteValues = readerContext.reader().getByteVectorValues(fieldName); + +if (floatValues == null && byteValues == null) { + throw new IllegalArgumentException( + "no vector value is indexed for field '" + fieldName + "'"); +} +return new FunctionValues() { + int lastDocID; + + @Override + public float[] floatVectorVal(int doc) throws IOException { +float[] floatVector = null; +if (floatValues != null && exists(floatValues, doc)) { + floatVector = floatValues.vectorValue(); +} else { + byte[] byteVector = byteVectorVal(doc); + if (byteVector != null) { +floatVector = new float[byteVector.length]; +for (int i = 0; i < byteVector.length; i++) { + floatVector[i] = byteVector[i]; +} + } +} +return floatVector; + } Review Comment: thanks @benwtrent and @uschindler for the valuable feedback, we discussed with @eliaporciani and he will revert the pull request to the commit where we had the working version with duplicated byte/float code in different classes, I'll do a review tomorrow when ready and I would love your opinion before merging. That's also the reason we kept this PR open so long before merging, additional feedback is always welcome! Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz merged pull request #12249: Restrict GraphTokenStreamFiniteStrings#articulationPointsRecurse recursion depth
jpountz merged PR #12249: URL: https://github.com/apache/lucene/pull/12249 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz closed issue #11218: GraphTokenStreamFiniteStrings#articulationPointsRecurse can run into stack overflows [LUCENE-10181]
jpountz closed issue #11218: GraphTokenStreamFiniteStrings#articulationPointsRecurse can run into stack overflows [LUCENE-10181] URL: https://github.com/apache/lucene/issues/11218 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] msokolov commented on issue #12358: Optimize `count()` for BooleanQuery disjunction
msokolov commented on issue #12358: URL: https://github.com/apache/lucene/issues/12358#issuecomment-1587661349 > No no. There is no specific capitalization :). I was correcting tantivity -> tantivy. Just to keep everyone accurate. the original misspelling was "tantitvy" not "tantivity"! :) > I checked the repository. When Tantitvy says it is 2 times faster ... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] benwtrent commented on a diff in pull request #12253: GITHUB-12252: Add function queries for computing similarity scores between knn vectors
benwtrent commented on code in PR #12253: URL: https://github.com/apache/lucene/pull/12253#discussion_r1226928216 ## lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/KnnVectorFieldSource.java: ## @@ -0,0 +1,112 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.queries.function.valuesource; + +import java.io.IOException; +import java.util.Map; +import org.apache.lucene.index.ByteVectorValues; +import org.apache.lucene.index.FloatVectorValues; +import org.apache.lucene.index.LeafReaderContext; +import org.apache.lucene.queries.function.FunctionValues; +import org.apache.lucene.queries.function.ValueSource; +import org.apache.lucene.search.DocIdSetIterator; + +/** An implementation for retrieving {@link FunctionValues} instances for knn vectors fields. */ +public class KnnVectorFieldSource extends ValueSource { + private final String fieldName; + + public KnnVectorFieldSource(String fieldName) { +this.fieldName = fieldName; + } + + @Override + public FunctionValues getValues(Map context, LeafReaderContext readerContext) + throws IOException { + +final FloatVectorValues floatValues = readerContext.reader().getFloatVectorValues(fieldName); +final ByteVectorValues byteValues = readerContext.reader().getByteVectorValues(fieldName); + +if (floatValues == null && byteValues == null) { + throw new IllegalArgumentException( + "no vector value is indexed for field '" + fieldName + "'"); +} +return new FunctionValues() { + int lastDocID; + + @Override + public float[] floatVectorVal(int doc) throws IOException { +float[] floatVector = null; +if (floatValues != null && exists(floatValues, doc)) { + floatVector = floatValues.vectorValue(); +} else { + byte[] byteVector = byteVectorVal(doc); + if (byteVector != null) { +floatVector = new float[byteVector.length]; +for (int i = 0; i < byteVector.length; i++) { + floatVector[i] = byteVector[i]; +} + } +} +return floatVector; + } Review Comment: @alessandrobenedetti I am not saying that `floatVector[i] = byteVector[i];` would return invalid data. I am saying there are API assumptions: - That if `float` can do this, then so can `byte`?? (which it cannot, seems weird from an API standpoint) - Users accidentally abusing `byte[]` with a float query (they forgot to quantize their float query?!?!?). Keeping these separate clarifies their usage and makes it easier for users (but I agree its more code for us all to write :) ) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] benwtrent commented on a diff in pull request #12281: Add checks in KNNVectorField / KNNVectorQuery to only allow non-null, non-empty and finite vectors
benwtrent commented on code in PR #12281: URL: https://github.com/apache/lucene/pull/12281#discussion_r1226928953 ## lucene/core/src/java/org/apache/lucene/document/KnnByteVectorField.java: ## @@ -137,7 +137,12 @@ public KnnByteVectorField(String name, byte[] vector, FieldType fieldType) { + " using byte[] but the field encoding is " + fieldType.vectorEncoding()); } -fieldsData = Objects.requireNonNull(vector, "vector value must not be null"); +Objects.requireNonNull(vector, "vector value must not be null"); +if (vector.length != fieldType.vectorDimension()) { Review Comment: Good catch! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] sohami commented on issue #12347: Allow extensions of IndexSearcher to provide custom SliceExecutor and slices computation
sohami commented on issue #12347: URL: https://github.com/apache/lucene/issues/12347#issuecomment-1587721480 @jpountz and @javanna Seems like there is plan of feature freeze by this Fri (06/16) for next release. I was really hoping if we can get it in the 9.x release and would appreciate your help on this issue and how to progress on it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jmazanec15 commented on issue #12342: Prevent VectorSimilarity.DOT_PRODUCT from returning negative scores
jmazanec15 commented on issue #12342: URL: https://github.com/apache/lucene/issues/12342#issuecomment-1587866962 @benwtrent Thanks for taking a look! Interesting, I am not too familiar with MBW. Ill take a look. The main reason I wanted to avoid returning the dot product was to avoid negative scores, as ref by https://github.com/apache/lucene/issues/9044. Also, what do you mean by scaling continuously? The above formula gives negative and positive scores the same number of possible scores, reducing overall precision by 2 (please correct me if I am wrong). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a diff in pull request #12253: GITHUB-12252: Add function queries for computing similarity scores between knn vectors
uschindler commented on code in PR #12253: URL: https://github.com/apache/lucene/pull/12253#discussion_r1227084420 ## lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/KnnVectorFieldSource.java: ## @@ -0,0 +1,112 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.queries.function.valuesource; + +import java.io.IOException; +import java.util.Map; +import org.apache.lucene.index.ByteVectorValues; +import org.apache.lucene.index.FloatVectorValues; +import org.apache.lucene.index.LeafReaderContext; +import org.apache.lucene.queries.function.FunctionValues; +import org.apache.lucene.queries.function.ValueSource; +import org.apache.lucene.search.DocIdSetIterator; + +/** An implementation for retrieving {@link FunctionValues} instances for knn vectors fields. */ +public class KnnVectorFieldSource extends ValueSource { + private final String fieldName; + + public KnnVectorFieldSource(String fieldName) { +this.fieldName = fieldName; + } + + @Override + public FunctionValues getValues(Map context, LeafReaderContext readerContext) + throws IOException { + +final FloatVectorValues floatValues = readerContext.reader().getFloatVectorValues(fieldName); +final ByteVectorValues byteValues = readerContext.reader().getByteVectorValues(fieldName); + +if (floatValues == null && byteValues == null) { + throw new IllegalArgumentException( + "no vector value is indexed for field '" + fieldName + "'"); +} +return new FunctionValues() { + int lastDocID; + + @Override + public float[] floatVectorVal(int doc) throws IOException { +float[] floatVector = null; +if (floatValues != null && exists(floatValues, doc)) { + floatVector = floatValues.vectorValue(); +} else { + byte[] byteVector = byteVectorVal(doc); + if (byteVector != null) { +floatVector = new float[byteVector.length]; +for (int i = 0; i < byteVector.length; i++) { + floatVector[i] = byteVector[i]; +} + } +} +return floatVector; + } Review Comment: I think it should be possible to do it in same way like for field types and query: They both have an abstract pkg-private base class. Maybe thats a way here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] benwtrent commented on issue #12342: Prevent VectorSimilarity.DOT_PRODUCT from returning negative scores
benwtrent commented on issue #12342: URL: https://github.com/apache/lucene/issues/12342#issuecomment-1587878704 > Also, what do you mean by scaling continuously? Your algorithm is piecewise vs. continuous. But, I am not sure how we could do a continuous transformation (everything is on the same scale). > I am not too familiar with MBW. Yeah, MBW and MAXSCORE get all mixed up in my brain. But, yes MAXSCORE is why we disallow negative scoring. Forgive my misdirection. > The above formula gives negative and positive scores the same number of possible scores, reducing overall precision by 2 My main concern overall is this: we are changing the scoring methodology period for positive scores (and thus considered "valid"). I think this means that it cannot go into a Lucene 9.x release (correct @jpountz ?). What do you think @msokolov ? Maybe we have a new `MAX_INNER_PRODUCT` scoring that uses @jmazanec15 suggestion? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler merged pull request #12294: Implement MMapDirectory with Java 21 Project Panama Preview API
uschindler merged PR #12294: URL: https://github.com/apache/lucene/pull/12294 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #12294: Implement MMapDirectory with Java 21 Project Panama Preview API
uschindler commented on PR #12294: URL: https://github.com/apache/lucene/pull/12294#issuecomment-1587934813 Hi @ChrisHegarty, i think we can start with a JDK 21 branch for Panama Vectors, the directories are there. Of course, you need to regenerate the apijars with vector classes (that's not included here). Backport to 9.x was done, so we can release this with Lucene 9.7. I will do a crosscheck shortly before release to verify that apijar is uptodate. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #12294: Implement MMapDirectory with Java 21 Project Panama Preview API
uschindler commented on PR #12294: URL: https://github.com/apache/lucene/pull/12294#issuecomment-1587947849 I also removed a relic of the old filename of the apijar in the 9.x folder. It was not deleted while cherry-picking when we integrated vector API. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] ChrisHegarty commented on pull request #12294: Implement MMapDirectory with Java 21 Project Panama Preview API
ChrisHegarty commented on PR #12294: URL: https://github.com/apache/lucene/pull/12294#issuecomment-1587973337 > Hi @ChrisHegarty, i think we can start with a JDK 21 branch for Panama Vectors, the directories are there. Of course, you need to regenerate the apijars with vector classes (that's not included here). I'll start on this, and hopefully put up a PR by tomorrow. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] ChrisHegarty opened a new pull request, #12363: Implement VectorUtilProvider with Java 21 Project Pamana Vector API
ChrisHegarty opened a new pull request, #12363: URL: https://github.com/apache/lucene/pull/12363 Port of the Java 20 version of this code to Java 21. * cut'n'paste VectorUtilPanamaProvider - there are opportunities to eventually remove some workarounds, but this is ok for now * Updated jdk21.apijar by `./gradlew :lucene:core:regenerate -Porg.gradle.java.installations.paths=/Users/chegar/binaries/jdk-21.jdk/Contents/Home/ ` ``` $ /Users/chegar/binaries/jdk-21.jdk/Contents/Home/bin/java -version openjdk version "21-ea" 2023-09-19 OpenJDK Runtime Environment (build 21-ea+26-2328) OpenJDK 64-Bit Server VM (build 21-ea+26-2328, mixed mode, sharing) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] ChrisHegarty commented on pull request #12363: Implement VectorUtilProvider with Java 21 Project Pamana Vector API
ChrisHegarty commented on PR #12363: URL: https://github.com/apache/lucene/pull/12363#issuecomment-1588061449 I verified this locally by running the tests : ``` JENKINS_XX=true ./gradlew :lucene:core:test --tests "org.apache.lucene.util.TestVectorUtil**" -Pvalidation.git.failOnModified=false --info ``` The JDK 21 version is picked up. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] ChrisHegarty commented on a diff in pull request #12363: Implement VectorUtilProvider with Java 21 Project Pamana Vector API
ChrisHegarty commented on code in PR #12363: URL: https://github.com/apache/lucene/pull/12363#discussion_r1227226190 ## lucene/CHANGES.txt: ## @@ -139,8 +139,8 @@ New Features * GITHUB#12257: Create OnHeapHnswGraphSearcher to let OnHeapHnswGraph to be searched in a thread-safety manner. (Patrick Zhai) * GITHUB#12302, GITHUB#12311: Add vectorized implementations of VectorUtil.dotProduct(), - squareDistance(), cosine() with Java 20 jdk.incubator.vector APIs. Applications started - with command line parameter "java --add-modules jdk.incubator.vector" on exactly Java 20 + squareDistance(), cosine() with Java 20 or 21 jdk.incubator.vector APIs. Applications started + with command line parameter "java --add-modules jdk.incubator.vector" on exactly Java 20 or 21 will automatically use the new vectorized implementations if running on a supported platform Review Comment: Since this was introduced and updated in the same minor release we can probably just update the wording here to cover 21 also (rather than a separate change log entry). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #12363: Implement VectorUtilProvider with Java 21 Project Pamana Vector API
uschindler commented on PR #12363: URL: https://github.com/apache/lucene/pull/12363#issuecomment-1588081418 That was fast. ππ I will check tomorrow morning but this looks great. @rmuir wanted to run the benchmark with 21, too. Did you use latest openjdk 21-ea build to extract? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a diff in pull request #12363: Implement VectorUtilProvider with Java 21 Project Pamana Vector API
uschindler commented on code in PR #12363: URL: https://github.com/apache/lucene/pull/12363#discussion_r1227231734 ## lucene/core/src/java21/org/apache/lucene/util/VectorUtilPanamaProvider.java: ## @@ -0,0 +1,493 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.util; + +import java.security.AccessController; +import java.security.PrivilegedAction; +import java.util.logging.Logger; +import jdk.incubator.vector.ByteVector; +import jdk.incubator.vector.FloatVector; +import jdk.incubator.vector.IntVector; +import jdk.incubator.vector.ShortVector; +import jdk.incubator.vector.Vector; +import jdk.incubator.vector.VectorOperators; +import jdk.incubator.vector.VectorShape; +import jdk.incubator.vector.VectorSpecies; + +/** A VectorUtil provider implementation that leverages the Panama Vector API. */ +final class VectorUtilPanamaProvider implements VectorUtilProvider { + + private static final int INT_SPECIES_PREF_BIT_SIZE = IntVector.SPECIES_PREFERRED.vectorBitSize(); + + private static final VectorSpecies PREF_FLOAT_SPECIES = FloatVector.SPECIES_PREFERRED; + private static final VectorSpecies PREF_BYTE_SPECIES; + private static final VectorSpecies PREF_SHORT_SPECIES; + + /** + * x86 and less than 256-bit vectors. + * + * it could be that it has only AVX1 and integer vectors are fast. it could also be that it has + * no AVX and integer vectors are extremely slow. don't use integer vectors to avoid landmines. + */ + private static final boolean IS_AMD64_WITHOUT_AVX2 = + Constants.OS_ARCH.equals("amd64") && INT_SPECIES_PREF_BIT_SIZE < 256; + + static { +if (INT_SPECIES_PREF_BIT_SIZE >= 256) { + PREF_BYTE_SPECIES = + ByteVector.SPECIES_MAX.withShape( + VectorShape.forBitSize(IntVector.SPECIES_PREFERRED.vectorBitSize() >> 2)); + PREF_SHORT_SPECIES = + ShortVector.SPECIES_MAX.withShape( + VectorShape.forBitSize(IntVector.SPECIES_PREFERRED.vectorBitSize() >> 1)); +} else { + PREF_BYTE_SPECIES = null; + PREF_SHORT_SPECIES = null; +} + } + + // Extracted to a method to be able to apply the SuppressForbidden annotation + @SuppressWarnings("removal") + @SuppressForbidden(reason = "security manager") + private static T doPrivileged(PrivilegedAction action) { +return AccessController.doPrivileged(action); + } + + VectorUtilPanamaProvider() { +if (INT_SPECIES_PREF_BIT_SIZE < 128) { + throw new UnsupportedOperationException( + "Vector bit size is less than 128: " + INT_SPECIES_PREF_BIT_SIZE); +} + +// hack to work around for JDK-8309727: Review Comment: We can remove this check if you are confident that your change makes it into the release. On the other hand this check does not hurt. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a diff in pull request #12363: Implement VectorUtilProvider with Java 21 Project Pamana Vector API
uschindler commented on code in PR #12363: URL: https://github.com/apache/lucene/pull/12363#discussion_r1227232392 ## lucene/core/src/java21/org/apache/lucene/util/VectorUtilPanamaProvider.java: ## @@ -0,0 +1,493 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.util; + +import java.security.AccessController; +import java.security.PrivilegedAction; +import java.util.logging.Logger; +import jdk.incubator.vector.ByteVector; +import jdk.incubator.vector.FloatVector; +import jdk.incubator.vector.IntVector; +import jdk.incubator.vector.ShortVector; +import jdk.incubator.vector.Vector; +import jdk.incubator.vector.VectorOperators; +import jdk.incubator.vector.VectorShape; +import jdk.incubator.vector.VectorSpecies; + +/** A VectorUtil provider implementation that leverages the Panama Vector API. */ +final class VectorUtilPanamaProvider implements VectorUtilProvider { + + private static final int INT_SPECIES_PREF_BIT_SIZE = IntVector.SPECIES_PREFERRED.vectorBitSize(); + + private static final VectorSpecies PREF_FLOAT_SPECIES = FloatVector.SPECIES_PREFERRED; + private static final VectorSpecies PREF_BYTE_SPECIES; + private static final VectorSpecies PREF_SHORT_SPECIES; + + /** + * x86 and less than 256-bit vectors. + * + * it could be that it has only AVX1 and integer vectors are fast. it could also be that it has + * no AVX and integer vectors are extremely slow. don't use integer vectors to avoid landmines. + */ + private static final boolean IS_AMD64_WITHOUT_AVX2 = + Constants.OS_ARCH.equals("amd64") && INT_SPECIES_PREF_BIT_SIZE < 256; + + static { +if (INT_SPECIES_PREF_BIT_SIZE >= 256) { + PREF_BYTE_SPECIES = + ByteVector.SPECIES_MAX.withShape( + VectorShape.forBitSize(IntVector.SPECIES_PREFERRED.vectorBitSize() >> 2)); + PREF_SHORT_SPECIES = + ShortVector.SPECIES_MAX.withShape( + VectorShape.forBitSize(IntVector.SPECIES_PREFERRED.vectorBitSize() >> 1)); +} else { + PREF_BYTE_SPECIES = null; + PREF_SHORT_SPECIES = null; +} + } + + // Extracted to a method to be able to apply the SuppressForbidden annotation + @SuppressWarnings("removal") + @SuppressForbidden(reason = "security manager") + private static T doPrivileged(PrivilegedAction action) { +return AccessController.doPrivileged(action); + } + + VectorUtilPanamaProvider() { +if (INT_SPECIES_PREF_BIT_SIZE < 128) { + throw new UnsupportedOperationException( + "Vector bit size is less than 128: " + INT_SPECIES_PREF_BIT_SIZE); +} + +// hack to work around for JDK-8309727: Review Comment: If we remove, the doPrivileged forbidden wrapper can go away, too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a diff in pull request #12363: Implement VectorUtilProvider with Java 21 Project Pamana Vector API
uschindler commented on code in PR #12363: URL: https://github.com/apache/lucene/pull/12363#discussion_r1227235078 ## lucene/CHANGES.txt: ## @@ -139,8 +139,8 @@ New Features * GITHUB#12257: Create OnHeapHnswGraphSearcher to let OnHeapHnswGraph to be searched in a thread-safety manner. (Patrick Zhai) * GITHUB#12302, GITHUB#12311: Add vectorized implementations of VectorUtil.dotProduct(), Review Comment: Add this PR number here. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #12363: Implement VectorUtilProvider with Java 21 Project Pamana Vector API
uschindler commented on PR #12363: URL: https://github.com/apache/lucene/pull/12363#issuecomment-1588094332 > That was fast. ππ > > I will check tomorrow morning but this looks great. @rmuir wanted to run the benchmark with 21, too. > > Did you use latest openjdk 21-ea build to extract? Oh you pasted it. Same like mine. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] ChrisHegarty commented on a diff in pull request #12363: Implement VectorUtilProvider with Java 21 Project Pamana Vector API
ChrisHegarty commented on code in PR #12363: URL: https://github.com/apache/lucene/pull/12363#discussion_r1227243997 ## lucene/core/src/java21/org/apache/lucene/util/VectorUtilPanamaProvider.java: ## @@ -0,0 +1,493 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.util; + +import java.security.AccessController; +import java.security.PrivilegedAction; +import java.util.logging.Logger; +import jdk.incubator.vector.ByteVector; +import jdk.incubator.vector.FloatVector; +import jdk.incubator.vector.IntVector; +import jdk.incubator.vector.ShortVector; +import jdk.incubator.vector.Vector; +import jdk.incubator.vector.VectorOperators; +import jdk.incubator.vector.VectorShape; +import jdk.incubator.vector.VectorSpecies; + +/** A VectorUtil provider implementation that leverages the Panama Vector API. */ +final class VectorUtilPanamaProvider implements VectorUtilProvider { + + private static final int INT_SPECIES_PREF_BIT_SIZE = IntVector.SPECIES_PREFERRED.vectorBitSize(); + + private static final VectorSpecies PREF_FLOAT_SPECIES = FloatVector.SPECIES_PREFERRED; + private static final VectorSpecies PREF_BYTE_SPECIES; + private static final VectorSpecies PREF_SHORT_SPECIES; + + /** + * x86 and less than 256-bit vectors. + * + * it could be that it has only AVX1 and integer vectors are fast. it could also be that it has + * no AVX and integer vectors are extremely slow. don't use integer vectors to avoid landmines. + */ + private static final boolean IS_AMD64_WITHOUT_AVX2 = + Constants.OS_ARCH.equals("amd64") && INT_SPECIES_PREF_BIT_SIZE < 256; + + static { +if (INT_SPECIES_PREF_BIT_SIZE >= 256) { + PREF_BYTE_SPECIES = + ByteVector.SPECIES_MAX.withShape( + VectorShape.forBitSize(IntVector.SPECIES_PREFERRED.vectorBitSize() >> 2)); + PREF_SHORT_SPECIES = + ShortVector.SPECIES_MAX.withShape( + VectorShape.forBitSize(IntVector.SPECIES_PREFERRED.vectorBitSize() >> 1)); +} else { + PREF_BYTE_SPECIES = null; + PREF_SHORT_SPECIES = null; +} + } + + // Extracted to a method to be able to apply the SuppressForbidden annotation + @SuppressWarnings("removal") + @SuppressForbidden(reason = "security manager") + private static T doPrivileged(PrivilegedAction action) { +return AccessController.doPrivileged(action); + } + + VectorUtilPanamaProvider() { +if (INT_SPECIES_PREF_BIT_SIZE < 128) { + throw new UnsupportedOperationException( + "Vector bit size is less than 128: " + INT_SPECIES_PREF_BIT_SIZE); +} + +// hack to work around for JDK-8309727: Review Comment: The JDK change was merged, itβll be in the next jdk 21 es build. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] tflobbe commented on pull request #12360: Update comment about IndexOptions ordinals
tflobbe commented on PR #12360: URL: https://github.com/apache/lucene/pull/12360#issuecomment-1588112203 But we do things like: > @Override public boolean hasFreqs() { return fieldInfo.getIndexOptions().compareTo(IndexOptions.DOCS_AND_FREQS) >= 0; } So we need to keep those ordinals in the right order, right? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on a diff in pull request #12363: Implement VectorUtilProvider with Java 21 Project Pamana Vector API
uschindler commented on code in PR #12363: URL: https://github.com/apache/lucene/pull/12363#discussion_r1227252561 ## lucene/core/src/java21/org/apache/lucene/util/VectorUtilPanamaProvider.java: ## @@ -0,0 +1,493 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.util; + +import java.security.AccessController; +import java.security.PrivilegedAction; +import java.util.logging.Logger; +import jdk.incubator.vector.ByteVector; +import jdk.incubator.vector.FloatVector; +import jdk.incubator.vector.IntVector; +import jdk.incubator.vector.ShortVector; +import jdk.incubator.vector.Vector; +import jdk.incubator.vector.VectorOperators; +import jdk.incubator.vector.VectorShape; +import jdk.incubator.vector.VectorSpecies; + +/** A VectorUtil provider implementation that leverages the Panama Vector API. */ +final class VectorUtilPanamaProvider implements VectorUtilProvider { + + private static final int INT_SPECIES_PREF_BIT_SIZE = IntVector.SPECIES_PREFERRED.vectorBitSize(); + + private static final VectorSpecies PREF_FLOAT_SPECIES = FloatVector.SPECIES_PREFERRED; + private static final VectorSpecies PREF_BYTE_SPECIES; + private static final VectorSpecies PREF_SHORT_SPECIES; + + /** + * x86 and less than 256-bit vectors. + * + * it could be that it has only AVX1 and integer vectors are fast. it could also be that it has + * no AVX and integer vectors are extremely slow. don't use integer vectors to avoid landmines. + */ + private static final boolean IS_AMD64_WITHOUT_AVX2 = + Constants.OS_ARCH.equals("amd64") && INT_SPECIES_PREF_BIT_SIZE < 256; + + static { +if (INT_SPECIES_PREF_BIT_SIZE >= 256) { + PREF_BYTE_SPECIES = + ByteVector.SPECIES_MAX.withShape( + VectorShape.forBitSize(IntVector.SPECIES_PREFERRED.vectorBitSize() >> 2)); + PREF_SHORT_SPECIES = + ShortVector.SPECIES_MAX.withShape( + VectorShape.forBitSize(IntVector.SPECIES_PREFERRED.vectorBitSize() >> 1)); +} else { + PREF_BYTE_SPECIES = null; + PREF_SHORT_SPECIES = null; +} + } + + // Extracted to a method to be able to apply the SuppressForbidden annotation + @SuppressWarnings("removal") + @SuppressForbidden(reason = "security manager") + private static T doPrivileged(PrivilegedAction action) { +return AccessController.doPrivileged(action); + } + + VectorUtilPanamaProvider() { +if (INT_SPECIES_PREF_BIT_SIZE < 128) { + throw new UnsupportedOperationException( + "Vector bit size is less than 128: " + INT_SPECIES_PREF_BIT_SIZE); +} + +// hack to work around for JDK-8309727: Review Comment: Great, so we can remove it for 21. It is not urgent! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #12363: Implement VectorUtilProvider with Java 21 Project Pamana Vector API
uschindler commented on PR #12363: URL: https://github.com/apache/lucene/pull/12363#issuecomment-1588134825 I tried it out: I reverted the APIJAR changes and only left in following changes: - VectorUtilsProvider.java to enable of 21 - `vectorIncubatorJavaVersions = [ JavaVersion.VERSION_20, JavaVersion.VERSION_21 ] as Set` Everything else was reverted and the tests worked fine. So to me it looks like we can spare the separate implementation as it is 100% identical. The hack does not hurt. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] ChrisHegarty commented on pull request #12363: Implement VectorUtilProvider with Java 21 Project Pamana Vector API
ChrisHegarty commented on PR #12363: URL: https://github.com/apache/lucene/pull/12363#issuecomment-1588172519 > I tried it out: I reverted the APIJAR changes and only left in following changes: > > * VectorUtilsProvider.java to enable of 21 > * `vectorIncubatorJavaVersions = [ JavaVersion.VERSION_20, JavaVersion.VERSION_21 ] as Set` > > Everything else was reverted and the tests worked fine. So to me it looks like we can spare the separate implementation as it is 100% identical. The hack does not hurt. Thatβs great - I had a similar thought. π -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #12363: Implement VectorUtilProvider with Java 21 Project Pamana Vector API
uschindler commented on PR #12363: URL: https://github.com/apache/lucene/pull/12363#issuecomment-1588191838 Maybe just leave a readme file in the folder of the Java file stating that the impl is identical to Java 20. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] uschindler commented on pull request #12363: Implement VectorUtilProvider with Java 21 Project Pamana Vector API
uschindler commented on PR #12363: URL: https://github.com/apache/lucene/pull/12363#issuecomment-1588209511 I checked the commits: https://github.com/openjdk/jdk21/commits/master/src/jdk.incubator.vector/share/classes There were some changes, but nothing that affects us. It is mostly addition of VectorMask.XOR and improvementa in VectorShuffle. This aligns with the JEP: https://openjdk.org/jeps/448 And the Turkish locale bug is in the provider lookup. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gf2121 merged pull request #12324: Speed up IndexedDISI Sparse #AdvanceExactWithinBlock for tiny step advance
gf2121 merged PR #12324: URL: https://github.com/apache/lucene/pull/12324 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gf2121 opened a new pull request, #12364: Speed up IndexedDISI Sparse #AdvanceExactWithinBlock for tiny step advance (Backport 9x)
gf2121 opened a new pull request, #12364: URL: https://github.com/apache/lucene/pull/12364 backport of https://github.com/apache/lucene/pull/12324 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org