date:20221122

[GitHub] [lucene] dweiss commented on pull request #11960: hunspell: support empty dictionaries, adapt to the hunspell/C++ repo changes

2022-11-22 Thread GitBox



dweiss commented on PR #11960:
URL: https://github.com/apache/lucene/pull/11960#issuecomment-1323272071

   Sorry if I lost track among all these patches - I thought there would be 
user facing changes as well. If there are none - no need to add anything.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] donnerpeter commented on pull request #11960: hunspell: support empty dictionaries, adapt to the hunspell/C++ repo changes

2022-11-22 Thread GitBox



donnerpeter commented on PR #11960:
URL: https://github.com/apache/lucene/pull/11960#issuecomment-1323273222

   There are some, but they're just API additions which should be 
backward-compatible.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] dweiss commented on pull request #11947: Add self-contained artifact upload script for apache nexus (#11329)

2022-11-22 Thread GitBox



dweiss commented on PR #11947:
URL: https://github.com/apache/lucene/pull/11947#issuecomment-1323273394

   I wonder if this should be reported to infra though. The fact you can close 
an upload repository with top-level artifacts is strange... I wonder what would 
happen if you clicked release on that, would it actually promote those JARs to 
here? It seems crazy!
   https://repo1.maven.org/maven2/


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz commented on pull request #11947: Add self-contained artifact upload script for apache nexus (#11329)

2022-11-22 Thread GitBox



jpountz commented on PR #11947:
URL: https://github.com/apache/lucene/pull/11947#issuecomment-1323280633

   @dweiss So I did release the staged repository with top-level artifacts, and 
Nexus did not give me any errors either. :( After browsing the content, I saw 
all the artifacts that I had expected, it's only after clicking and realizing 
that publication to Central wasn't working that I thought that the lack of 
hierarchy was weird.
   
   Good news is that I have released with the correct directory layout earlier 
today and now I can see the artifacts on Maven Central.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] dweiss commented on pull request #11947: Add self-contained artifact upload script for apache nexus (#11329)

2022-11-22 Thread GitBox



dweiss commented on PR #11947:
URL: https://github.com/apache/lucene/pull/11947#issuecomment-1323295591

   Darn. I see them here:
   https://repository.apache.org/content/repositories/releases/
   But they have not been synced with maven central. I honestly think it's 
Nexus that's at blame here for not verifying permissions here. I've no idea how 
this works internally.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] dweiss commented on pull request #11947: Add self-contained artifact upload script for apache nexus (#11329)

2022-11-22 Thread GitBox



dweiss commented on PR #11947:
URL: https://github.com/apache/lucene/pull/11947#issuecomment-1323302321

   I filed this question to infra: 
https://issues.apache.org/jira/browse/INFRA-23931


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz commented on pull request #11947: Add self-contained artifact upload script for apache nexus (#11329)

2022-11-22 Thread GitBox



jpountz commented on PR #11947:
URL: https://github.com/apache/lucene/pull/11947#issuecomment-1323310172

   Thanks Dawid!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] stefanvodita closed issue #11814: Support deletes in IndexRearranger

2022-11-22 Thread GitBox



stefanvodita closed issue #11814: Support deletes in IndexRearranger
URL: https://github.com/apache/lucene/issues/11814


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] stefanvodita commented on issue #11814: Support deletes in IndexRearranger

2022-11-22 Thread GitBox



stefanvodita commented on issue #11814:
URL: https://github.com/apache/lucene/issues/11814#issuecomment-1323341662

   `IndexRearranger` now supports selecting docs for deletion from the original 
index and applying the deletes to the rearranged index.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] dweiss commented on pull request #11947: Add self-contained artifact upload script for apache nexus (#11329)

2022-11-22 Thread GitBox



dweiss commented on PR #11947:
URL: https://github.com/apache/lucene/pull/11947#issuecomment-1323346570

   Just FYI - Sonatype's nexus wouldn't let you release that staging 
repository, I've tried:
   
![image](https://user-images.githubusercontent.com/199470/203274558-06abae96-ee76-4b56-b919-a0a73aa0d6c5.png)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] dweiss commented on pull request #11947: Add self-contained artifact upload script for apache nexus (#11329)

2022-11-22 Thread GitBox



dweiss commented on PR #11947:
URL: https://github.com/apache/lucene/pull/11947#issuecomment-1323347876

   I'll follow up with Gavin on the Infra ticket. Seems like the rules at 
Apache are not as strict. And sorry for this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz commented on pull request #11947: Add self-contained artifact upload script for apache nexus (#11329)

2022-11-22 Thread GitBox



jpountz commented on PR #11947:
URL: https://github.com/apache/lucene/pull/11947#issuecomment-1323348632

   Again, no worries at all, not your fault.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz commented on a diff in pull request #11958: GITHUB-11868: Add FilterIndexInput and FilterIndexOutput wrapper classes

2022-11-22 Thread GitBox



jpountz commented on code in PR #11958:
URL: https://github.com/apache/lucene/pull/11958#discussion_r1029188569


##
lucene/core/src/java/org/apache/lucene/store/FilterIndexOutput.java:
##
@@ -0,0 +1,113 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.store;
+
+import java.io.IOException;
+import java.util.Map;
+import java.util.Set;
+
+/**
+ * IndexOutput implementation that delegates calls to another directory. This 
class can be used to
+ * add limitations on top of an existing {@link IndexOutput} implementation 
such as {@link
+ * ByteBuffersIndexOutput} or to add additional sanity checks for tests. 
However, if you plan to
+ * write your own {@link IndexOutput} implementation, you should consider 
extending directly {@link
+ * IndexOutput} or {@link DataOutput} rather than try to reuse functionality 
of existing {@link
+ * IndexOutput}s by extending this class.
+ *
+ * @lucene.internal
+ */
+public class FilterIndexOutput extends IndexOutput {
+
+  public static IndexOutput unwrap(IndexOutput out) {
+while (out instanceof FilterIndexOutput) {
+  out = ((FilterIndexOutput) out).out;
+}
+return out;
+  }
+
+  protected final IndexOutput out;
+
+  protected FilterIndexOutput(String resourceDescription, String name, 
IndexOutput out) {
+super(resourceDescription, name);
+this.out = out;
+  }
+
+  public final IndexOutput getDelegate() {
+return out;
+  }
+
+  @Override
+  public void close() throws IOException {
+out.close();
+  }
+
+  @Override
+  public long getFilePointer() {
+return out.getFilePointer();
+  }
+
+  @Override
+  public long getChecksum() throws IOException {
+return out.getChecksum();
+  }
+
+  @Override
+  public void writeByte(byte b) throws IOException {
+out.writeByte(b);
+  }
+
+  @Override
+  public void writeBytes(byte[] b, int offset, int length) throws IOException {
+out.writeBytes(b, offset, length);
+  }
+
+  @Override
+  public void writeBytes(byte[] b, int length) throws IOException {

Review Comment:
   My recollection from past discussions is that we prefer `FilterXXX` classes 
to only delegate abstract methods, not methods that have a default 
implementation.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz commented on a diff in pull request #11954: Remove QueryTimeout#isTimeoutEnabled method and move check to caller

2022-11-22 Thread GitBox



jpountz commented on code in PR #11954:
URL: https://github.com/apache/lucene/pull/11954#discussion_r1029194303


##
lucene/core/src/java/org/apache/lucene/index/ExitableDirectoryReader.java:
##
@@ -784,7 +770,10 @@ protected DirectoryReader 
doWrapDirectoryReader(DirectoryReader in) throws IOExc
*/
   public static DirectoryReader wrap(DirectoryReader in, QueryTimeout 
queryTimeout)
   throws IOException {
-return new ExitableDirectoryReader(in, queryTimeout);
+if (queryTimeout != null) {

Review Comment:
   I don't think it's worth accepting `null` query timeouts. Let's reject 
`null` queryTimeout objects entirely?



##
lucene/core/src/test/org/apache/lucene/index/TestExitableDirectoryReader.java:
##
@@ -292,7 +296,11 @@ public void testExitablePointValuesIndexReader() throws 
Exception {
 // Not checking the validity of the result, all we are bothered about in 
this test is the timing
 // out.
 directoryReader = DirectoryReader.open(directory);
-exitableDirectoryReader = new ExitableDirectoryReader(directoryReader, 
disabledQueryTimeout());
+exitableDirectoryReader = directoryReader;
+if (disabledQueryTimeout() != null) {

Review Comment:
   It is always null so we should be able to remove this code path?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz commented on a diff in pull request #11950: Fix NPE in BinaryRangeFieldRangeQuery when field does not exist or is of wrong type

2022-11-22 Thread GitBox



jpountz commented on code in PR #11950:
URL: https://github.com/apache/lucene/pull/11950#discussion_r1029197613


##
lucene/core/src/java/org/apache/lucene/document/BinaryRangeFieldRangeQuery.java:
##
@@ -91,7 +92,11 @@ public Query rewrite(IndexSearcher indexSearcher) throws 
IOException {
   }
 
   private BinaryRangeDocValues getValues(LeafReader reader, String field) 
throws IOException {
-BinaryDocValues binaryDocValues = reader.getBinaryDocValues(field);
+FieldInfo info = reader.getFieldInfos().fieldInfo(field);
+if (info == null) {
+  return null;
+}
+BinaryDocValues binaryDocValues = DocValues.getBinary(reader, field);

Review Comment:
   I'm not sure I understand why we need to retrieve field infos, 
`getBinaryDocValues` already returns `null` when the field doesn't exist, so 
doing the following should be enough?
   
   ```
   BinaryDocValues binaryDocValues = reader.getBinaryDocValues(field);
   if (binaryDocValues == null) {
 return null;
   }
   return new BinaryRangeDocValues(binaryDocValues, numDims, 
numBytesPerDimension);
   ```



##
lucene/CHANGES.txt:
##
@@ -143,6 +143,9 @@ Bug Fixes
 
 * GITHUB#11907: Fix latent casting bugs in BKDWriter. (Ben Trent)
 
+* GITHUB#11950: Fix NPE in BinaryRangeFieldRangeQuery variants when the 
queried field doesn't exist
+  in a segment or is of the wrong type. (Greg Miller)

Review Comment:
   Is is true that there would be a NPE when the field is of the wrong type? I 
would have expected `CodecReader#getBinaryDocValues` to catch the problem?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz opened a new pull request, #11961: Remove VectorValues#EMPTY.

2022-11-22 Thread GitBox



jpountz opened a new pull request, #11961:
URL: https://github.com/apache/lucene/pull/11961

   This instance is illegal as it reports a number of dimensions equal to zero.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz opened a new pull request, #11962: Enforce VectorValues.cost() is equal to size().

2022-11-22 Thread GitBox



jpountz opened a new pull request, #11962:
URL: https://github.com/apache/lucene/pull/11962

   `VectorValues` have a `cost()` method that reports an approximate number of 
documents that have a vector, but also a `size()` method that reports the 
accurate number of vectors in the field. Since KNN vectors only support 
single-valued fields we should enforce that `cost()` returns the `size()`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] shubhamvishu commented on a diff in pull request #11954: Remove QueryTimeout#isTimeoutEnabled method and move check to caller

2022-11-22 Thread GitBox



shubhamvishu commented on code in PR #11954:
URL: https://github.com/apache/lucene/pull/11954#discussion_r1029256452


##
lucene/core/src/test/org/apache/lucene/index/TestExitableDirectoryReader.java:
##
@@ -292,7 +296,11 @@ public void testExitablePointValuesIndexReader() throws 
Exception {
 // Not checking the validity of the result, all we are bothered about in 
this test is the timing
 // out.
 directoryReader = DirectoryReader.open(directory);
-exitableDirectoryReader = new ExitableDirectoryReader(directoryReader, 
disabledQueryTimeout());
+exitableDirectoryReader = directoryReader;
+if (disabledQueryTimeout() != null) {

Review Comment:
   Yes, we are not using the `ExitableDirectoryReader` so lets completely 
remove this code for testing disabled timeout.



##
lucene/core/src/java/org/apache/lucene/index/ExitableDirectoryReader.java:
##
@@ -784,7 +770,10 @@ protected DirectoryReader 
doWrapDirectoryReader(DirectoryReader in) throws IOExc
*/
   public static DirectoryReader wrap(DirectoryReader in, QueryTimeout 
queryTimeout)
   throws IOException {
-return new ExitableDirectoryReader(in, queryTimeout);
+if (queryTimeout != null) {

Review Comment:
   Sure, so lets throw an `ExitingReaderException` if queryTimeout is null?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz opened a new issue, #11963: Improve vector quantization API

2022-11-22 Thread GitBox



jpountz opened a new issue, #11963:
URL: https://github.com/apache/lucene/issues/11963

   ### Description
   
   Follow-up of 
https://github.com/apache/lucene/pull/11860#discussion_r1027106953: the API for 
quantization of vectors is a bit surprising at times in that you can index 
bytes but then still get float[] arrays whose values are bytes at search time. 
Can we make sure it's either bytes all the way, or floats all the way?
   
   I'm not entirely sure how best to move it forward. One option would be to 
have a disjoint set of APIs for the binary encoding, like for doc values: 
different `Field` class, different `VectorValues` class, different 
`searchNearestNeighbors` method. I wonder if another option would be to make it 
floats all the way all the time, including for the byte encoding, ie. the 
`Field` and `VectorValues` class would still take and expose `float`s but 
`VectorValues#binaryValue` would be removed and the `Field` constructor would 
barf if the float values do not represent exact bytes.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz commented on a diff in pull request #11860: GITHUB-11830 Better optimize storage for vector connections

2022-11-22 Thread GitBox



jpountz commented on code in PR #11860:
URL: https://github.com/apache/lucene/pull/11860#discussion_r1029282728


##
lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene94/ExpandingVectorValues.java:
##
@@ -0,0 +1,49 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.backward_codecs.lucene94;
+
+import java.io.IOException;
+import org.apache.lucene.index.FilterVectorValues;
+import org.apache.lucene.index.VectorValues;
+import org.apache.lucene.util.BytesRef;
+
+/** reads from byte-encoded data */
+public class ExpandingVectorValues extends FilterVectorValues {
+
+  private final float[] value;
+
+  /**
+   * Constructs ExpandingVectorValues with passed byte encoded VectorValues 
iterator
+   *
+   * @param in the wrapped values
+   */
+  protected ExpandingVectorValues(VectorValues in) {

Review Comment:
   I still opened an issue to discuss how we should go about it: 
https://github.com/apache/lucene/issues/11963.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz commented on a diff in pull request #11954: Remove QueryTimeout#isTimeoutEnabled method and move check to caller

2022-11-22 Thread GitBox



jpountz commented on code in PR #11954:
URL: https://github.com/apache/lucene/pull/11954#discussion_r1029300137


##
lucene/core/src/java/org/apache/lucene/index/ExitableDirectoryReader.java:
##
@@ -784,7 +770,10 @@ protected DirectoryReader 
doWrapDirectoryReader(DirectoryReader in) throws IOExc
*/
   public static DirectoryReader wrap(DirectoryReader in, QueryTimeout 
queryTimeout)
   throws IOException {
-return new ExitableDirectoryReader(in, queryTimeout);
+if (queryTimeout != null) {

Review Comment:
   I would keep it simple and use `Objects#requireNonNull`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] benwtrent commented on issue #11963: Improve vector quantization API

2022-11-22 Thread GitBox



benwtrent commented on issue #11963:
URL: https://github.com/apache/lucene/issues/11963#issuecomment-1323680557

   This is a naive question, but is there a design principle I am missing that 
makes `VectorValues` out of line?
   
   I don't know how often folks will read the vector values directly instead of 
just searching them. But, transforming everything to float has a performance 
impact during ingest. Additionally, if the user is expecting bytes, they may 
have to transform their floats again when reading values, which is another 
performance overhead.
   
   So, I am against 'floats everywhere', but I realize that other things in 
Lucene have unified types. 
   
   I would prefer 'bytes everywhere' instead if we were pushing for a single 
array kind.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz opened a new pull request, #11964: Make RandomAccessVectorValues an implementation detail of HNSW implementations rather than a proper API.

2022-11-22 Thread GitBox



jpountz opened a new pull request, #11964:
URL: https://github.com/apache/lucene/pull/11964

   `RandomAccessVectorValues` is internally used in our HNSW implementation to 
provide random access to vectors, both at index and search time. In order to 
better reflect this, this change does the following:
- `RandomAccessVectorValues` moves to `org.apache.lucene.util.hnsw`.
- `BufferingKnnVectorsWriter` no longer has a dependency on 
`RandomAccessVectorValues` and moves to `org.apache.lucene.codecs` since it's 
more of a utility class for KNN vector file formats than an index API. Maybe we 
should think of moving it near each file format that uses it instead.
- `SortingCodecReader` no longer has a dependency on 
`RandomAccessVectorValues`.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz commented on issue #11963: Improve vector quantization API

2022-11-22 Thread GitBox



jpountz commented on issue #11963:
URL: https://github.com/apache/lucene/issues/11963#issuecomment-1323732143

   I had not considered generics, I guess it would be an option indeed by 
passing the expected vector array class to `getVectorValues` and 
`searchNearestNeighbors`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] gsmiller commented on a diff in pull request #11950: Fix NPE in BinaryRangeFieldRangeQuery when field does not exist or is of wrong type

2022-11-22 Thread GitBox



gsmiller commented on code in PR #11950:
URL: https://github.com/apache/lucene/pull/11950#discussion_r1029458836


##
lucene/core/src/test/org/apache/lucene/search/TestRangeFieldsDocValuesQuery.java:
##
@@ -226,4 +228,30 @@ public void testToString() {
 Query q4 = LongRangeDocValuesField.newSlowIntersectsQuery("foo", longMin, 
longMax);
 assertEquals("foo:[[101, 124, 137] TO [138, 145, 156]]", q4.toString());
   }
+
+  public void testNoData() throws IOException {
+Directory dir = newDirectory();

Review Comment:
   Yeah, that's a nice pattern and a good suggestion. I'd prefer to be 
consistent though with the approach taken by the other unit tests in this 
class, and would rather not increase the scope of this bug fix to include 
changing the resource management in all these unit tests.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] gsmiller commented on a diff in pull request #11950: Fix NPE in BinaryRangeFieldRangeQuery when field does not exist or is of wrong type

2022-11-22 Thread GitBox



gsmiller commented on code in PR #11950:
URL: https://github.com/apache/lucene/pull/11950#discussion_r1029466354


##
lucene/core/src/java/org/apache/lucene/document/BinaryRangeFieldRangeQuery.java:
##
@@ -91,7 +92,11 @@ public Query rewrite(IndexSearcher indexSearcher) throws 
IOException {
   }
 
   private BinaryRangeDocValues getValues(LeafReader reader, String field) 
throws IOException {
-BinaryDocValues binaryDocValues = reader.getBinaryDocValues(field);
+FieldInfo info = reader.getFieldInfos().fieldInfo(field);
+if (info == null) {

Review Comment:
   Sure, I'll simplify. Thanks!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] gsmiller commented on a diff in pull request #11950: Fix NPE in BinaryRangeFieldRangeQuery when field does not exist or is of wrong type

2022-11-22 Thread GitBox



gsmiller commented on code in PR #11950:
URL: https://github.com/apache/lucene/pull/11950#discussion_r1029470218


##
lucene/core/src/java/org/apache/lucene/document/BinaryRangeFieldRangeQuery.java:
##
@@ -91,7 +92,11 @@ public Query rewrite(IndexSearcher indexSearcher) throws 
IOException {
   }
 
   private BinaryRangeDocValues getValues(LeafReader reader, String field) 
throws IOException {
-BinaryDocValues binaryDocValues = reader.getBinaryDocValues(field);
+FieldInfo info = reader.getFieldInfos().fieldInfo(field);
+if (info == null) {
+  return null;
+}
+BinaryDocValues binaryDocValues = DocValues.getBinary(reader, field);

Review Comment:
   Right, that's a better approach. Thanks for the suggestion. I work on a 
codebase that forbids loading doc-values directly from a `LeafReader` 
(enforcing going through `DocValues`) as a safeguard to dealing with the null 
references returned by a reader. So I'm in a habit of using the `DocValues` 
factory methods (and checking `FieldInfo` if _really_ needing to see if the 
field exists in a segment). I'll simplify this.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] gsmiller commented on a diff in pull request #11950: Fix NPE in BinaryRangeFieldRangeQuery when field does not exist or is of wrong type

2022-11-22 Thread GitBox



gsmiller commented on code in PR #11950:
URL: https://github.com/apache/lucene/pull/11950#discussion_r1029471303


##
lucene/CHANGES.txt:
##
@@ -143,6 +143,9 @@ Bug Fixes
 
 * GITHUB#11907: Fix latent casting bugs in BKDWriter. (Ben Trent)
 
+* GITHUB#11950: Fix NPE in BinaryRangeFieldRangeQuery variants when the 
queried field doesn't exist
+  in a segment or is of the wrong type. (Greg Miller)

Review Comment:
   Let me test. I don't think I actually ran that part of the unit test prior 
to applying the fix. I'll make sure the changes entry is accurate.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] gsmiller commented on a diff in pull request #11950: Fix NPE in BinaryRangeFieldRangeQuery when field does not exist or is of wrong type

2022-11-22 Thread GitBox



gsmiller commented on code in PR #11950:
URL: https://github.com/apache/lucene/pull/11950#discussion_r1029481972


##
lucene/core/src/java/org/apache/lucene/document/BinaryRangeFieldRangeQuery.java:
##
@@ -91,7 +92,11 @@ public Query rewrite(IndexSearcher indexSearcher) throws 
IOException {
   }
 
   private BinaryRangeDocValues getValues(LeafReader reader, String field) 
throws IOException {
-BinaryDocValues binaryDocValues = reader.getBinaryDocValues(field);
+FieldInfo info = reader.getFieldInfos().fieldInfo(field);
+if (info == null) {
+  return null;
+}
+BinaryDocValues binaryDocValues = DocValues.getBinary(reader, field);

Review Comment:
   Actually, I suppose it depends if we want to throw an 
`IllegalArgumentException` if the field has been indexed but as a different 
type. `DocValues#getBinary` will do the type checking and throw. If we go with 
your suggestion, we'll just silently ignore the field (treating it the same as 
if it doesn't exist in the segment).
   
   I'd prefer we do the type checking. I think it makes more sense for users, 
and it's also consistent with other similar code paths. For example, 
`SortedNumericDocValuesField#newSlowRangeQuery` is a similar use-case, and 
relies on the factory methods for type checking.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] gsmiller commented on a diff in pull request #11950: Fix NPE in BinaryRangeFieldRangeQuery when field does not exist or is of wrong type

2022-11-22 Thread GitBox



gsmiller commented on code in PR #11950:
URL: https://github.com/apache/lucene/pull/11950#discussion_r1029494212


##
lucene/CHANGES.txt:
##
@@ -143,6 +143,9 @@ Bug Fixes
 
 * GITHUB#11907: Fix latent casting bugs in BKDWriter. (Ben Trent)
 
+* GITHUB#11950: Fix NPE in BinaryRangeFieldRangeQuery variants when the 
queried field doesn't exist
+  in a segment or is of the wrong type. (Greg Miller)

Review Comment:
   So I checked this scenario, and indexing with the wrong type does in fact 
result in an NPE. `reader.getBinaryDocValues` returns `null` if the field 
exists in the segment but wasn't indexed with BDV. In my test case, I'm 
indexing a `StringField` called "foo" and then trying to load "foo" through 
`reader.getBinaryDocValues`, which just returns `null`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz commented on a diff in pull request #11950: Fix NPE in BinaryRangeFieldRangeQuery when field does not exist or is of wrong type

2022-11-22 Thread GitBox



jpountz commented on code in PR #11950:
URL: https://github.com/apache/lucene/pull/11950#discussion_r1029501693


##
lucene/core/src/java/org/apache/lucene/document/BinaryRangeFieldRangeQuery.java:
##
@@ -91,7 +92,11 @@ public Query rewrite(IndexSearcher indexSearcher) throws 
IOException {
   }
 
   private BinaryRangeDocValues getValues(LeafReader reader, String field) 
throws IOException {
-BinaryDocValues binaryDocValues = reader.getBinaryDocValues(field);
+FieldInfo info = reader.getFieldInfos().fieldInfo(field);
+if (info == null) {
+  return null;
+}
+BinaryDocValues binaryDocValues = DocValues.getBinary(reader, field);

Review Comment:
   Oh sorry, I was confused and thought that `LeafReader#getBinaryDocValues` 
already performed the type check!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] donnerpeter merged pull request #11960: hunspell: support empty dictionaries, adapt to the hunspell/C++ repo changes

2022-11-22 Thread GitBox



donnerpeter merged PR #11960:
URL: https://github.com/apache/lucene/pull/11960


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] risdenk opened a new pull request, #2675: SOLR-16555: SolrIndexSearcher - FilterCache intersections/andNot should not clone bitsets repeatedly (#1184)

2022-11-22 Thread GitBox



risdenk opened a new pull request, #2675:
URL: https://github.com/apache/lucene-solr/pull/2675

   Backport to branch_8_11 for https://issues.apache.org/jira/browse/SOLR-16555


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene-solr] itygh commented on pull request #2675: SOLR-16555: SolrIndexSearcher - FilterCache intersections/andNot should not clone bitsets repeatedly (#1184)

2022-11-22 Thread GitBox



itygh commented on PR #2675:
URL: https://github.com/apache/lucene-solr/pull/2675#issuecomment-1324112666

   这是来自QQ邮箱的假期自动回复邮件。您好，我最近正在休假中，无法亲自回复您的邮件。我将在假期结束后，尽快给您回复。


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] mdmarshmallow commented on a diff in pull request #11958: GITHUB-11868: Add FilterIndexInput and FilterIndexOutput wrapper classes

2022-11-22 Thread GitBox



mdmarshmallow commented on code in PR #11958:
URL: https://github.com/apache/lucene/pull/11958#discussion_r1029879211


##
lucene/core/src/java/org/apache/lucene/store/FilterIndexOutput.java:
##
@@ -0,0 +1,113 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+package org.apache.lucene.store;
+
+import java.io.IOException;
+import java.util.Map;
+import java.util.Set;
+
+/**
+ * IndexOutput implementation that delegates calls to another directory. This 
class can be used to
+ * add limitations on top of an existing {@link IndexOutput} implementation 
such as {@link
+ * ByteBuffersIndexOutput} or to add additional sanity checks for tests. 
However, if you plan to
+ * write your own {@link IndexOutput} implementation, you should consider 
extending directly {@link
+ * IndexOutput} or {@link DataOutput} rather than try to reuse functionality 
of existing {@link
+ * IndexOutput}s by extending this class.
+ *
+ * @lucene.internal
+ */
+public class FilterIndexOutput extends IndexOutput {
+
+  public static IndexOutput unwrap(IndexOutput out) {
+while (out instanceof FilterIndexOutput) {
+  out = ((FilterIndexOutput) out).out;
+}
+return out;
+  }
+
+  protected final IndexOutput out;
+
+  protected FilterIndexOutput(String resourceDescription, String name, 
IndexOutput out) {
+super(resourceDescription, name);
+this.out = out;
+  }
+
+  public final IndexOutput getDelegate() {
+return out;
+  }
+
+  @Override
+  public void close() throws IOException {
+out.close();
+  }
+
+  @Override
+  public long getFilePointer() {
+return out.getFilePointer();
+  }
+
+  @Override
+  public long getChecksum() throws IOException {
+return out.getChecksum();
+  }
+
+  @Override
+  public void writeByte(byte b) throws IOException {
+out.writeByte(b);
+  }
+
+  @Override
+  public void writeBytes(byte[] b, int offset, int length) throws IOException {
+out.writeBytes(b, offset, length);
+  }
+
+  @Override
+  public void writeBytes(byte[] b, int length) throws IOException {

Review Comment:
   That makes sense to me but I wasn't sure which approach to take when 
initially writing this. I will fix it and make a new revision.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] rmuir commented on issue #11963: Improve vector quantization API

2022-11-22 Thread GitBox



rmuir commented on issue #11963:
URL: https://github.com/apache/lucene/issues/11963#issuecomment-1324355332

   as i mentioned on the original issue(s) (there were several PRs closed and 
opened and many comments were lost...), the problem is right in the title: 
quantization.
   
   there shouldnt be any quantization happening. otherwise it is broken.  
8-bits in, 8-bits out, 32-bits in, 32-bits out. 
   
   Additionally as a minimum there needs to be two separate field classes (e.g. 
ByteVectorValues/FloatVectorValues) to give type safety. Really, the types 
should be distinguished in fieldinfos today as well (not sure if that is 
currently the case).
   
Basically just follow the examples of every other part of the index. 
vectors isn't special.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] rmuir commented on issue #11963: Improve vector quantization API

2022-11-22 Thread GitBox



rmuir commented on issue #11963:
URL: https://github.com/apache/lucene/issues/11963#issuecomment-1324355812

   we can break index backwards compatibility to fix this.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

37 matches

Mail list logo