[GitHub] [lucene] dweiss commented on pull request #11960: hunspell: support empty dictionaries, adapt to the hunspell/C++ repo changes
dweiss commented on PR #11960: URL: https://github.com/apache/lucene/pull/11960#issuecomment-1323272071 Sorry if I lost track among all these patches - I thought there would be user facing changes as well. If there are none - no need to add anything. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] donnerpeter commented on pull request #11960: hunspell: support empty dictionaries, adapt to the hunspell/C++ repo changes
donnerpeter commented on PR #11960: URL: https://github.com/apache/lucene/pull/11960#issuecomment-1323273222 There are some, but they're just API additions which should be backward-compatible. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on pull request #11947: Add self-contained artifact upload script for apache nexus (#11329)
dweiss commented on PR #11947: URL: https://github.com/apache/lucene/pull/11947#issuecomment-1323273394 I wonder if this should be reported to infra though. The fact you can close an upload repository with top-level artifacts is strange... I wonder what would happen if you clicked release on that, would it actually promote those JARs to here? It seems crazy! https://repo1.maven.org/maven2/ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on pull request #11947: Add self-contained artifact upload script for apache nexus (#11329)
jpountz commented on PR #11947: URL: https://github.com/apache/lucene/pull/11947#issuecomment-1323280633 @dweiss So I did release the staged repository with top-level artifacts, and Nexus did not give me any errors either. :( After browsing the content, I saw all the artifacts that I had expected, it's only after clicking and realizing that publication to Central wasn't working that I thought that the lack of hierarchy was weird. Good news is that I have released with the correct directory layout earlier today and now I can see the artifacts on Maven Central. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on pull request #11947: Add self-contained artifact upload script for apache nexus (#11329)
dweiss commented on PR #11947: URL: https://github.com/apache/lucene/pull/11947#issuecomment-1323295591 Darn. I see them here: https://repository.apache.org/content/repositories/releases/ But they have not been synced with maven central. I honestly think it's Nexus that's at blame here for not verifying permissions here. I've no idea how this works internally. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on pull request #11947: Add self-contained artifact upload script for apache nexus (#11329)
dweiss commented on PR #11947: URL: https://github.com/apache/lucene/pull/11947#issuecomment-1323302321 I filed this question to infra: https://issues.apache.org/jira/browse/INFRA-23931 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on pull request #11947: Add self-contained artifact upload script for apache nexus (#11329)
jpountz commented on PR #11947: URL: https://github.com/apache/lucene/pull/11947#issuecomment-1323310172 Thanks Dawid! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] stefanvodita closed issue #11814: Support deletes in IndexRearranger
stefanvodita closed issue #11814: Support deletes in IndexRearranger URL: https://github.com/apache/lucene/issues/11814 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] stefanvodita commented on issue #11814: Support deletes in IndexRearranger
stefanvodita commented on issue #11814: URL: https://github.com/apache/lucene/issues/11814#issuecomment-1323341662 `IndexRearranger` now supports selecting docs for deletion from the original index and applying the deletes to the rearranged index. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on pull request #11947: Add self-contained artifact upload script for apache nexus (#11329)
dweiss commented on PR #11947: URL: https://github.com/apache/lucene/pull/11947#issuecomment-1323346570 Just FYI - Sonatype's nexus wouldn't let you release that staging repository, I've tried:  -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] dweiss commented on pull request #11947: Add self-contained artifact upload script for apache nexus (#11329)
dweiss commented on PR #11947: URL: https://github.com/apache/lucene/pull/11947#issuecomment-1323347876 I'll follow up with Gavin on the Infra ticket. Seems like the rules at Apache are not as strict. And sorry for this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on pull request #11947: Add self-contained artifact upload script for apache nexus (#11329)
jpountz commented on PR #11947: URL: https://github.com/apache/lucene/pull/11947#issuecomment-1323348632 Again, no worries at all, not your fault. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on a diff in pull request #11958: GITHUB-11868: Add FilterIndexInput and FilterIndexOutput wrapper classes
jpountz commented on code in PR #11958: URL: https://github.com/apache/lucene/pull/11958#discussion_r1029188569 ## lucene/core/src/java/org/apache/lucene/store/FilterIndexOutput.java: ## @@ -0,0 +1,113 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.lucene.store; + +import java.io.IOException; +import java.util.Map; +import java.util.Set; + +/** + * IndexOutput implementation that delegates calls to another directory. This class can be used to + * add limitations on top of an existing {@link IndexOutput} implementation such as {@link + * ByteBuffersIndexOutput} or to add additional sanity checks for tests. However, if you plan to + * write your own {@link IndexOutput} implementation, you should consider extending directly {@link + * IndexOutput} or {@link DataOutput} rather than try to reuse functionality of existing {@link + * IndexOutput}s by extending this class. + * + * @lucene.internal + */ +public class FilterIndexOutput extends IndexOutput { + + public static IndexOutput unwrap(IndexOutput out) { +while (out instanceof FilterIndexOutput) { + out = ((FilterIndexOutput) out).out; +} +return out; + } + + protected final IndexOutput out; + + protected FilterIndexOutput(String resourceDescription, String name, IndexOutput out) { +super(resourceDescription, name); +this.out = out; + } + + public final IndexOutput getDelegate() { +return out; + } + + @Override + public void close() throws IOException { +out.close(); + } + + @Override + public long getFilePointer() { +return out.getFilePointer(); + } + + @Override + public long getChecksum() throws IOException { +return out.getChecksum(); + } + + @Override + public void writeByte(byte b) throws IOException { +out.writeByte(b); + } + + @Override + public void writeBytes(byte[] b, int offset, int length) throws IOException { +out.writeBytes(b, offset, length); + } + + @Override + public void writeBytes(byte[] b, int length) throws IOException { Review Comment: My recollection from past discussions is that we prefer `FilterXXX` classes to only delegate abstract methods, not methods that have a default implementation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on a diff in pull request #11954: Remove QueryTimeout#isTimeoutEnabled method and move check to caller
jpountz commented on code in PR #11954: URL: https://github.com/apache/lucene/pull/11954#discussion_r1029194303 ## lucene/core/src/java/org/apache/lucene/index/ExitableDirectoryReader.java: ## @@ -784,7 +770,10 @@ protected DirectoryReader doWrapDirectoryReader(DirectoryReader in) throws IOExc */ public static DirectoryReader wrap(DirectoryReader in, QueryTimeout queryTimeout) throws IOException { -return new ExitableDirectoryReader(in, queryTimeout); +if (queryTimeout != null) { Review Comment: I don't think it's worth accepting `null` query timeouts. Let's reject `null` queryTimeout objects entirely? ## lucene/core/src/test/org/apache/lucene/index/TestExitableDirectoryReader.java: ## @@ -292,7 +296,11 @@ public void testExitablePointValuesIndexReader() throws Exception { // Not checking the validity of the result, all we are bothered about in this test is the timing // out. directoryReader = DirectoryReader.open(directory); -exitableDirectoryReader = new ExitableDirectoryReader(directoryReader, disabledQueryTimeout()); +exitableDirectoryReader = directoryReader; +if (disabledQueryTimeout() != null) { Review Comment: It is always null so we should be able to remove this code path? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on a diff in pull request #11950: Fix NPE in BinaryRangeFieldRangeQuery when field does not exist or is of wrong type
jpountz commented on code in PR #11950: URL: https://github.com/apache/lucene/pull/11950#discussion_r1029197613 ## lucene/core/src/java/org/apache/lucene/document/BinaryRangeFieldRangeQuery.java: ## @@ -91,7 +92,11 @@ public Query rewrite(IndexSearcher indexSearcher) throws IOException { } private BinaryRangeDocValues getValues(LeafReader reader, String field) throws IOException { -BinaryDocValues binaryDocValues = reader.getBinaryDocValues(field); +FieldInfo info = reader.getFieldInfos().fieldInfo(field); +if (info == null) { + return null; +} +BinaryDocValues binaryDocValues = DocValues.getBinary(reader, field); Review Comment: I'm not sure I understand why we need to retrieve field infos, `getBinaryDocValues` already returns `null` when the field doesn't exist, so doing the following should be enough? ``` BinaryDocValues binaryDocValues = reader.getBinaryDocValues(field); if (binaryDocValues == null) { return null; } return new BinaryRangeDocValues(binaryDocValues, numDims, numBytesPerDimension); ``` ## lucene/CHANGES.txt: ## @@ -143,6 +143,9 @@ Bug Fixes * GITHUB#11907: Fix latent casting bugs in BKDWriter. (Ben Trent) +* GITHUB#11950: Fix NPE in BinaryRangeFieldRangeQuery variants when the queried field doesn't exist + in a segment or is of the wrong type. (Greg Miller) Review Comment: Is is true that there would be a NPE when the field is of the wrong type? I would have expected `CodecReader#getBinaryDocValues` to catch the problem? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz opened a new pull request, #11961: Remove VectorValues#EMPTY.
jpountz opened a new pull request, #11961: URL: https://github.com/apache/lucene/pull/11961 This instance is illegal as it reports a number of dimensions equal to zero. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz opened a new pull request, #11962: Enforce VectorValues.cost() is equal to size().
jpountz opened a new pull request, #11962: URL: https://github.com/apache/lucene/pull/11962 `VectorValues` have a `cost()` method that reports an approximate number of documents that have a vector, but also a `size()` method that reports the accurate number of vectors in the field. Since KNN vectors only support single-valued fields we should enforce that `cost()` returns the `size()`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] shubhamvishu commented on a diff in pull request #11954: Remove QueryTimeout#isTimeoutEnabled method and move check to caller
shubhamvishu commented on code in PR #11954: URL: https://github.com/apache/lucene/pull/11954#discussion_r1029256452 ## lucene/core/src/test/org/apache/lucene/index/TestExitableDirectoryReader.java: ## @@ -292,7 +296,11 @@ public void testExitablePointValuesIndexReader() throws Exception { // Not checking the validity of the result, all we are bothered about in this test is the timing // out. directoryReader = DirectoryReader.open(directory); -exitableDirectoryReader = new ExitableDirectoryReader(directoryReader, disabledQueryTimeout()); +exitableDirectoryReader = directoryReader; +if (disabledQueryTimeout() != null) { Review Comment: Yes, we are not using the `ExitableDirectoryReader` so lets completely remove this code for testing disabled timeout. ## lucene/core/src/java/org/apache/lucene/index/ExitableDirectoryReader.java: ## @@ -784,7 +770,10 @@ protected DirectoryReader doWrapDirectoryReader(DirectoryReader in) throws IOExc */ public static DirectoryReader wrap(DirectoryReader in, QueryTimeout queryTimeout) throws IOException { -return new ExitableDirectoryReader(in, queryTimeout); +if (queryTimeout != null) { Review Comment: Sure, so lets throw an `ExitingReaderException` if queryTimeout is null? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz opened a new issue, #11963: Improve vector quantization API
jpountz opened a new issue, #11963: URL: https://github.com/apache/lucene/issues/11963 ### Description Follow-up of https://github.com/apache/lucene/pull/11860#discussion_r1027106953: the API for quantization of vectors is a bit surprising at times in that you can index bytes but then still get float[] arrays whose values are bytes at search time. Can we make sure it's either bytes all the way, or floats all the way? I'm not entirely sure how best to move it forward. One option would be to have a disjoint set of APIs for the binary encoding, like for doc values: different `Field` class, different `VectorValues` class, different `searchNearestNeighbors` method. I wonder if another option would be to make it floats all the way all the time, including for the byte encoding, ie. the `Field` and `VectorValues` class would still take and expose `float`s but `VectorValues#binaryValue` would be removed and the `Field` constructor would barf if the float values do not represent exact bytes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on a diff in pull request #11860: GITHUB-11830 Better optimize storage for vector connections
jpountz commented on code in PR #11860: URL: https://github.com/apache/lucene/pull/11860#discussion_r1029282728 ## lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene94/ExpandingVectorValues.java: ## @@ -0,0 +1,49 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.lucene.backward_codecs.lucene94; + +import java.io.IOException; +import org.apache.lucene.index.FilterVectorValues; +import org.apache.lucene.index.VectorValues; +import org.apache.lucene.util.BytesRef; + +/** reads from byte-encoded data */ +public class ExpandingVectorValues extends FilterVectorValues { + + private final float[] value; + + /** + * Constructs ExpandingVectorValues with passed byte encoded VectorValues iterator + * + * @param in the wrapped values + */ + protected ExpandingVectorValues(VectorValues in) { Review Comment: I still opened an issue to discuss how we should go about it: https://github.com/apache/lucene/issues/11963. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on a diff in pull request #11954: Remove QueryTimeout#isTimeoutEnabled method and move check to caller
jpountz commented on code in PR #11954: URL: https://github.com/apache/lucene/pull/11954#discussion_r1029300137 ## lucene/core/src/java/org/apache/lucene/index/ExitableDirectoryReader.java: ## @@ -784,7 +770,10 @@ protected DirectoryReader doWrapDirectoryReader(DirectoryReader in) throws IOExc */ public static DirectoryReader wrap(DirectoryReader in, QueryTimeout queryTimeout) throws IOException { -return new ExitableDirectoryReader(in, queryTimeout); +if (queryTimeout != null) { Review Comment: I would keep it simple and use `Objects#requireNonNull`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] benwtrent commented on issue #11963: Improve vector quantization API
benwtrent commented on issue #11963: URL: https://github.com/apache/lucene/issues/11963#issuecomment-1323680557 This is a naive question, but is there a design principle I am missing that makes `VectorValues` out of line? I don't know how often folks will read the vector values directly instead of just searching them. But, transforming everything to float has a performance impact during ingest. Additionally, if the user is expecting bytes, they may have to transform their floats again when reading values, which is another performance overhead. So, I am against 'floats everywhere', but I realize that other things in Lucene have unified types. I would prefer 'bytes everywhere' instead if we were pushing for a single array kind. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz opened a new pull request, #11964: Make RandomAccessVectorValues an implementation detail of HNSW implementations rather than a proper API.
jpountz opened a new pull request, #11964: URL: https://github.com/apache/lucene/pull/11964 `RandomAccessVectorValues` is internally used in our HNSW implementation to provide random access to vectors, both at index and search time. In order to better reflect this, this change does the following: - `RandomAccessVectorValues` moves to `org.apache.lucene.util.hnsw`. - `BufferingKnnVectorsWriter` no longer has a dependency on `RandomAccessVectorValues` and moves to `org.apache.lucene.codecs` since it's more of a utility class for KNN vector file formats than an index API. Maybe we should think of moving it near each file format that uses it instead. - `SortingCodecReader` no longer has a dependency on `RandomAccessVectorValues`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on issue #11963: Improve vector quantization API
jpountz commented on issue #11963: URL: https://github.com/apache/lucene/issues/11963#issuecomment-1323732143 I had not considered generics, I guess it would be an option indeed by passing the expected vector array class to `getVectorValues` and `searchNearestNeighbors`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on a diff in pull request #11950: Fix NPE in BinaryRangeFieldRangeQuery when field does not exist or is of wrong type
gsmiller commented on code in PR #11950: URL: https://github.com/apache/lucene/pull/11950#discussion_r1029458836 ## lucene/core/src/test/org/apache/lucene/search/TestRangeFieldsDocValuesQuery.java: ## @@ -226,4 +228,30 @@ public void testToString() { Query q4 = LongRangeDocValuesField.newSlowIntersectsQuery("foo", longMin, longMax); assertEquals("foo:[[101, 124, 137] TO [138, 145, 156]]", q4.toString()); } + + public void testNoData() throws IOException { +Directory dir = newDirectory(); Review Comment: Yeah, that's a nice pattern and a good suggestion. I'd prefer to be consistent though with the approach taken by the other unit tests in this class, and would rather not increase the scope of this bug fix to include changing the resource management in all these unit tests. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on a diff in pull request #11950: Fix NPE in BinaryRangeFieldRangeQuery when field does not exist or is of wrong type
gsmiller commented on code in PR #11950: URL: https://github.com/apache/lucene/pull/11950#discussion_r1029466354 ## lucene/core/src/java/org/apache/lucene/document/BinaryRangeFieldRangeQuery.java: ## @@ -91,7 +92,11 @@ public Query rewrite(IndexSearcher indexSearcher) throws IOException { } private BinaryRangeDocValues getValues(LeafReader reader, String field) throws IOException { -BinaryDocValues binaryDocValues = reader.getBinaryDocValues(field); +FieldInfo info = reader.getFieldInfos().fieldInfo(field); +if (info == null) { Review Comment: Sure, I'll simplify. Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on a diff in pull request #11950: Fix NPE in BinaryRangeFieldRangeQuery when field does not exist or is of wrong type
gsmiller commented on code in PR #11950: URL: https://github.com/apache/lucene/pull/11950#discussion_r1029470218 ## lucene/core/src/java/org/apache/lucene/document/BinaryRangeFieldRangeQuery.java: ## @@ -91,7 +92,11 @@ public Query rewrite(IndexSearcher indexSearcher) throws IOException { } private BinaryRangeDocValues getValues(LeafReader reader, String field) throws IOException { -BinaryDocValues binaryDocValues = reader.getBinaryDocValues(field); +FieldInfo info = reader.getFieldInfos().fieldInfo(field); +if (info == null) { + return null; +} +BinaryDocValues binaryDocValues = DocValues.getBinary(reader, field); Review Comment: Right, that's a better approach. Thanks for the suggestion. I work on a codebase that forbids loading doc-values directly from a `LeafReader` (enforcing going through `DocValues`) as a safeguard to dealing with the null references returned by a reader. So I'm in a habit of using the `DocValues` factory methods (and checking `FieldInfo` if _really_ needing to see if the field exists in a segment). I'll simplify this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on a diff in pull request #11950: Fix NPE in BinaryRangeFieldRangeQuery when field does not exist or is of wrong type
gsmiller commented on code in PR #11950: URL: https://github.com/apache/lucene/pull/11950#discussion_r1029471303 ## lucene/CHANGES.txt: ## @@ -143,6 +143,9 @@ Bug Fixes * GITHUB#11907: Fix latent casting bugs in BKDWriter. (Ben Trent) +* GITHUB#11950: Fix NPE in BinaryRangeFieldRangeQuery variants when the queried field doesn't exist + in a segment or is of the wrong type. (Greg Miller) Review Comment: Let me test. I don't think I actually ran that part of the unit test prior to applying the fix. I'll make sure the changes entry is accurate. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on a diff in pull request #11950: Fix NPE in BinaryRangeFieldRangeQuery when field does not exist or is of wrong type
gsmiller commented on code in PR #11950: URL: https://github.com/apache/lucene/pull/11950#discussion_r1029481972 ## lucene/core/src/java/org/apache/lucene/document/BinaryRangeFieldRangeQuery.java: ## @@ -91,7 +92,11 @@ public Query rewrite(IndexSearcher indexSearcher) throws IOException { } private BinaryRangeDocValues getValues(LeafReader reader, String field) throws IOException { -BinaryDocValues binaryDocValues = reader.getBinaryDocValues(field); +FieldInfo info = reader.getFieldInfos().fieldInfo(field); +if (info == null) { + return null; +} +BinaryDocValues binaryDocValues = DocValues.getBinary(reader, field); Review Comment: Actually, I suppose it depends if we want to throw an `IllegalArgumentException` if the field has been indexed but as a different type. `DocValues#getBinary` will do the type checking and throw. If we go with your suggestion, we'll just silently ignore the field (treating it the same as if it doesn't exist in the segment). I'd prefer we do the type checking. I think it makes more sense for users, and it's also consistent with other similar code paths. For example, `SortedNumericDocValuesField#newSlowRangeQuery` is a similar use-case, and relies on the factory methods for type checking. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] gsmiller commented on a diff in pull request #11950: Fix NPE in BinaryRangeFieldRangeQuery when field does not exist or is of wrong type
gsmiller commented on code in PR #11950: URL: https://github.com/apache/lucene/pull/11950#discussion_r1029494212 ## lucene/CHANGES.txt: ## @@ -143,6 +143,9 @@ Bug Fixes * GITHUB#11907: Fix latent casting bugs in BKDWriter. (Ben Trent) +* GITHUB#11950: Fix NPE in BinaryRangeFieldRangeQuery variants when the queried field doesn't exist + in a segment or is of the wrong type. (Greg Miller) Review Comment: So I checked this scenario, and indexing with the wrong type does in fact result in an NPE. `reader.getBinaryDocValues` returns `null` if the field exists in the segment but wasn't indexed with BDV. In my test case, I'm indexing a `StringField` called "foo" and then trying to load "foo" through `reader.getBinaryDocValues`, which just returns `null`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] jpountz commented on a diff in pull request #11950: Fix NPE in BinaryRangeFieldRangeQuery when field does not exist or is of wrong type
jpountz commented on code in PR #11950: URL: https://github.com/apache/lucene/pull/11950#discussion_r1029501693 ## lucene/core/src/java/org/apache/lucene/document/BinaryRangeFieldRangeQuery.java: ## @@ -91,7 +92,11 @@ public Query rewrite(IndexSearcher indexSearcher) throws IOException { } private BinaryRangeDocValues getValues(LeafReader reader, String field) throws IOException { -BinaryDocValues binaryDocValues = reader.getBinaryDocValues(field); +FieldInfo info = reader.getFieldInfos().fieldInfo(field); +if (info == null) { + return null; +} +BinaryDocValues binaryDocValues = DocValues.getBinary(reader, field); Review Comment: Oh sorry, I was confused and thought that `LeafReader#getBinaryDocValues` already performed the type check! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] donnerpeter merged pull request #11960: hunspell: support empty dictionaries, adapt to the hunspell/C++ repo changes
donnerpeter merged PR #11960: URL: https://github.com/apache/lucene/pull/11960 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] risdenk opened a new pull request, #2675: SOLR-16555: SolrIndexSearcher - FilterCache intersections/andNot should not clone bitsets repeatedly (#1184)
risdenk opened a new pull request, #2675: URL: https://github.com/apache/lucene-solr/pull/2675 Backport to branch_8_11 for https://issues.apache.org/jira/browse/SOLR-16555 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene-solr] itygh commented on pull request #2675: SOLR-16555: SolrIndexSearcher - FilterCache intersections/andNot should not clone bitsets repeatedly (#1184)
itygh commented on PR #2675: URL: https://github.com/apache/lucene-solr/pull/2675#issuecomment-1324112666 这是来自QQ邮箱的假期自动回复邮件。您好,我最近正在休假中,无法亲自回复您的邮件。我将在假期结束后,尽快给您回复。 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] mdmarshmallow commented on a diff in pull request #11958: GITHUB-11868: Add FilterIndexInput and FilterIndexOutput wrapper classes
mdmarshmallow commented on code in PR #11958: URL: https://github.com/apache/lucene/pull/11958#discussion_r1029879211 ## lucene/core/src/java/org/apache/lucene/store/FilterIndexOutput.java: ## @@ -0,0 +1,113 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ + +package org.apache.lucene.store; + +import java.io.IOException; +import java.util.Map; +import java.util.Set; + +/** + * IndexOutput implementation that delegates calls to another directory. This class can be used to + * add limitations on top of an existing {@link IndexOutput} implementation such as {@link + * ByteBuffersIndexOutput} or to add additional sanity checks for tests. However, if you plan to + * write your own {@link IndexOutput} implementation, you should consider extending directly {@link + * IndexOutput} or {@link DataOutput} rather than try to reuse functionality of existing {@link + * IndexOutput}s by extending this class. + * + * @lucene.internal + */ +public class FilterIndexOutput extends IndexOutput { + + public static IndexOutput unwrap(IndexOutput out) { +while (out instanceof FilterIndexOutput) { + out = ((FilterIndexOutput) out).out; +} +return out; + } + + protected final IndexOutput out; + + protected FilterIndexOutput(String resourceDescription, String name, IndexOutput out) { +super(resourceDescription, name); +this.out = out; + } + + public final IndexOutput getDelegate() { +return out; + } + + @Override + public void close() throws IOException { +out.close(); + } + + @Override + public long getFilePointer() { +return out.getFilePointer(); + } + + @Override + public long getChecksum() throws IOException { +return out.getChecksum(); + } + + @Override + public void writeByte(byte b) throws IOException { +out.writeByte(b); + } + + @Override + public void writeBytes(byte[] b, int offset, int length) throws IOException { +out.writeBytes(b, offset, length); + } + + @Override + public void writeBytes(byte[] b, int length) throws IOException { Review Comment: That makes sense to me but I wasn't sure which approach to take when initially writing this. I will fix it and make a new revision. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on issue #11963: Improve vector quantization API
rmuir commented on issue #11963: URL: https://github.com/apache/lucene/issues/11963#issuecomment-1324355332 as i mentioned on the original issue(s) (there were several PRs closed and opened and many comments were lost...), the problem is right in the title: quantization. there shouldnt be any quantization happening. otherwise it is broken. 8-bits in, 8-bits out, 32-bits in, 32-bits out. Additionally as a minimum there needs to be two separate field classes (e.g. ByteVectorValues/FloatVectorValues) to give type safety. Really, the types should be distinguished in fieldinfos today as well (not sure if that is currently the case). Basically just follow the examples of every other part of the index. vectors isn't special. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[GitHub] [lucene] rmuir commented on issue #11963: Improve vector quantization API
rmuir commented on issue #11963: URL: https://github.com/apache/lucene/issues/11963#issuecomment-1324355812 we can break index backwards compatibility to fix this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org