Re: [I] IndexInput.isLoaded seems to return false for mmap index inputs on Windows [lucene]

2024-12-09 Thread via GitHub


dweiss commented on issue #14050:
URL: https://github.com/apache/lucene/issues/14050#issuecomment-2529079912

   This was my initial guess too. But if we hardcode it and then they fix the 
JDK with some magic Windows API that returns the right value? Should we try to 
detect if isLoaded returns true  on Windows and return the detected value?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] IndexInput.isLoaded seems to return false for mmap index inputs on Windows [lucene]

2024-12-09 Thread via GitHub


rmuir commented on issue #14050:
URL: https://github.com/apache/lucene/issues/14050#issuecomment-2529106991

   It seems the commented-out code with hardcoded `false` is also in the JDK 
master? I think we should at least report the bug since it is impacting us.
   
   I am not sure it needs a magic windows API, just looks like maybe the wrong 
one (VirtualQuery) was attempted instead of e.g. `QueryWorkingSetEx`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] IndexInput.isLoaded seems to return false for mmap index inputs on Windows [lucene]

2024-12-09 Thread via GitHub


dweiss commented on issue #14050:
URL: https://github.com/apache/lucene/issues/14050#issuecomment-2529066466

   Yep, I think I understand the reason it returns Optional - a missing value 
would indicate "don't know", so it's a tri-state knob: yes, no, don't know. The 
problem is that on Windows it returns "no" and that logic in BaseDirectoryTest 
assumes that if it's a mmapdir, it should always be a hard "true".


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] IndexInput.isLoaded seems to return false for mmap index inputs on Windows [lucene]

2024-12-09 Thread via GitHub


rmuir commented on issue #14050:
URL: https://github.com/apache/lucene/issues/14050#issuecomment-2529068203

   yeah, i'm suggesting adding this to the code:
   ```
   if (Constants.WINDOWS) {
 return DONT_KNOW;
   }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] IndexInput.isLoaded seems to return false for mmap index inputs on Windows [lucene]

2024-12-09 Thread via GitHub


dweiss commented on issue #14050:
URL: https://github.com/apache/lucene/issues/14050#issuecomment-2529159503

   I've peeked at the flags returned by QueryWorkingSetEx - just wondering if 
it'd be possible to query it directly using ffi. :) But I can't tell which of 
these flags would correspond to the loaded/ swapped out state. Fun.
   
   
https://learn.microsoft.com/en-us/windows/win32/api/psapi/ns-psapi-psapi_working_set_ex_block
   
   But I agree - reporting back to JDK is probably a saner idea, I'm sure they 
have better win32 api experts there.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] IndexInput.isLoaded seems to return false for mmap index inputs on Windows [lucene]

2024-12-09 Thread via GitHub


rmuir commented on issue #14050:
URL: https://github.com/apache/lucene/issues/14050#issuecomment-2529176829

   I think it won't return flags at all in that case. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] IndexInput.isLoaded seems to return false for mmap index inputs on Windows [lucene]

2024-12-09 Thread via GitHub


ChrisHegarty commented on issue #14050:
URL: https://github.com/apache/lucene/issues/14050#issuecomment-2529358488

   Oops! Sorry, clearly I added this test and didn't noticed it fail on 
Windows.  For Lucene, I'm perfectly fine with returning "don't know" for all 
Windows.   Lemme look into why this is not working in the JDK.  


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[I] IndexInput.isLoaded seems to return false for mmap index inputs on Windows [lucene]

2024-12-09 Thread via GitHub


dweiss opened a new issue, #14050:
URL: https://github.com/apache/lucene/issues/14050

   ### Description
   
   There is a check added to BaseDirectoryTest case in #13998 which has been 
failing since that PR was merged (nearly all Windows builds failing on jenkins):
   ```
   var loaded = in.isLoaded();
   if (FilterDirectory.unwrap(dir) instanceof MMapDirectory
   // direct IO wraps MMap but does not support isLoaded
   && !(dir.getClass().getName().contains("DirectIO"))) {
 assertTrue(loaded.isPresent());
 assertTrue(loaded.get());
   } else {
 assertFalse(loaded.isPresent());
   }
   ```
   
   The isLoaded optional has a value of false, even though it is a mmap 
directory. Seems like isLoaded in MemorySegment always returns false on Windows?
   
   ### Version and environment details
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Randomize KNN vector format parameters in tests [lucene]

2024-12-09 Thread via GitHub


msokolov closed issue #14047: Randomize KNN vector format parameters in tests
URL: https://github.com/apache/lucene/issues/14047


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Randomize KnnVector codec params in RandomCodec; addresses gh-14047 [lucene]

2024-12-09 Thread via GitHub


msokolov merged PR #14049:
URL: https://github.com/apache/lucene/pull/14049


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] IndexInput.isLoaded seems to return false for mmap index inputs on Windows [lucene]

2024-12-09 Thread via GitHub


dweiss commented on issue #14050:
URL: https://github.com/apache/lucene/issues/14050#issuecomment-2528605123

   Right. Seems like it's always false on Windows.
   
   
https://github.com/openjdk/jdk21/blob/890adb6410dab4606a4f26a942aed02fb2f55387/src/java.base/windows/native/libnio/MappedMemoryUtils.c#L34-L44


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Refactor dummy scorables. [lucene]

2024-12-09 Thread via GitHub


jpountz merged PR #14046:
URL: https://github.com/apache/lucene/pull/14046


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] IndexInput.isLoaded seems to return false for mmap index inputs on Windows [lucene]

2024-12-09 Thread via GitHub


dweiss commented on issue #14050:
URL: https://github.com/apache/lucene/issues/14050#issuecomment-2528617386

   While the test can be fixed easily (bypass the check on Windows), I wonder 
if it's a good fix. In essence, this method lies on Windows... What are the 
side-effects of this always returning false?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Randomize KNN vector format parameters in tests [lucene]

2024-12-09 Thread via GitHub


msokolov commented on issue #14047:
URL: https://github.com/apache/lucene/issues/14047#issuecomment-2528860574

   also pushed 4f8035c01393735e347f42c0bdc5e7b932db7447 to address a build 
failure by disabling randomization in  more tests


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] IndexInput.isLoaded seems to return false for mmap index inputs on Windows [lucene]

2024-12-09 Thread via GitHub


rmuir commented on issue #14050:
URL: https://github.com/apache/lucene/issues/14050#issuecomment-2529138823

   At least not everywhere: `Minimum supported server: Windows Server 2008`. So 
maybe this one can be revisited in 2024 and fixed...


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] IndexInput.isLoaded seems to return false for mmap index inputs on Windows [lucene]

2024-12-09 Thread via GitHub


dweiss commented on issue #14050:
URL: https://github.com/apache/lucene/issues/14050#issuecomment-2529232048

   I don't have access to openjdk's Jira, maybe @ChrisHegarty or Uwe (or you?) 
can file an issue there?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Allow reading binary doc values as a RandomAccessInput [lucene]

2024-12-09 Thread via GitHub


jpountz commented on code in PR #13948:
URL: https://github.com/apache/lucene/pull/13948#discussion_r1876152549


##
lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90DocValuesProducer.java:
##
@@ -791,25 +792,26 @@ public BinaryDocValues getBinary(FieldInfo field) throws 
IOException {
   return DocValues.emptyBinary();
 }
 
-final RandomAccessInput bytesSlice = 
data.randomAccessSlice(entry.dataOffset, entry.dataLength);
+final RandomAccessInputRef bytesSlice =
+new RandomAccessInputRef(data.randomAccessSlice(entry.dataOffset, 
entry.dataLength));

Review Comment:
   Nit, maybe extract two variables: `bytesSlice` and `byteSliceRef` to make it 
a bit easier to read?



##
lucene/codecs/src/java/org/apache/lucene/codecs/simpletext/SimpleTextDocValuesReader.java:
##
@@ -383,17 +392,31 @@ public long cost() {
 
   @Override
   public int advance(int target) throws IOException {
-return docsWithField.advance(target);
+int doc = docsWithField.advance(target);
+if (doc != NO_MORE_DOCS) {
+  consume(values.apply(doc));
+}
+return doc;
+  }
+
+  private void consume(BytesRef bytesRef) {
+bytes.reset(bytesRef.bytes);
+inputRef.offset = bytesRef.offset;
+inputRef.length = bytesRef.length;
   }
 
   @Override
   public boolean advanceExact(int target) throws IOException {
-return docsWithField.advanceExact(target);
+if (docsWithField.advanceExact(target)) {
+  consume(values.apply(target));
+  return true;
+}
+return false;
   }
 
   @Override
-  public BytesRef binaryValue() throws IOException {
-return values.apply(docsWithField.docID());
+  public RandomAccessInputRef randomAccessInputValue() {
+return inputRef;

Review Comment:
   What was the problem with calling `values#apply` in 
`randomAccessInputValue()` rather than in 
`nextDoc()`/`advance()`/`advanceExact()`?



##
lucene/core/src/java/org/apache/lucene/util/UnicodeUtil.java:
##
@@ -627,35 +629,58 @@ public static String toHexString(String s) {
   }
 
   /**
-   * Interprets the given byte array as UTF-8 and converts to UTF-16. It is 
the responsibility of
-   * the caller to make sure that the destination array is large enough.
+   * Interprets the given {@link RandomAccessInput} slice as UTF-8 and 
converts to UTF-16. It is the
+   * responsibility of the caller to make sure that the destination array is 
large enough.
+   *
+   * NOTE: Full characters are read, even if this reads past the length 
passed (and can result in
+   * an IOException if invalid UTF-8 is passed). Explicit checks for valid 
UTF-8 are not performed.
+   */
+  // TODO: broken if chars.offset != 0
+  public static int UTF8toUTF16(RandomAccessInput input, long offset, int 
length, char[] out)

Review Comment:
   Having `UnicodeUtil` need to work on a `RandomAccessInput` doesn't feel 
right. Do you know if this is performance-sensitive? Otherwise I'd rather make 
`RandomAccessInputRef` implement `utf8ToString()` by converting to a `BytesRef` 
first.



##
lucene/core/src/java/org/apache/lucene/store/RandomAccessInputDataInput.java:
##
@@ -0,0 +1,103 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.store;
+
+import java.io.IOException;
+import org.apache.lucene.util.RandomAccessInputRef;
+
+/**
+ * DataInput backed by a {@link RandomAccessInput}. WARNING: This class 
omits all low-level
+ * checks.
+ *
+ * @lucene.experimental
+ */
+public final class RandomAccessInputDataInput extends DataInput {
+
+  private RandomAccessInput input;
+  private long offset;
+  private long length;
+
+  private long pos;

Review Comment:
   I wonderif `pos` should start at `offset` rather than `0` so that we don't 
have to sum up `offset` and `pos` whenever we need to know the current position?



##
lucene/core/src/java/org/apache/lucene/index/BinaryDocValues.java:
##
@@ -18,19 +18,20 @@
 package org.apache.lucene.index;
 
 import java.io.IOException;
-import org.apache.lucene.util.BytesRef;
+import org.apache.lucene.util.R

Re: [PR] Randomize KnnVector codec params in RandomCodec; addresses gh-14047 [lucene]

2024-12-09 Thread via GitHub


msokolov commented on PR #14049:
URL: https://github.com/apache/lucene/pull/14049#issuecomment-2528438728

   makes sense, I'll add a CHANGES entry and commit


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] IndexInput.isLoaded seems to return false for mmap index inputs on Windows [lucene]

2024-12-09 Thread via GitHub


dweiss commented on issue #14050:
URL: https://github.com/apache/lucene/issues/14050#issuecomment-2530596008

   Thanks, Chris. The only Windows tests running at the moment are on Uwe's 
vbox. Github's Windows testing is so slow as to be almost unusable. The results 
from Uwe's box are sent to this list -
   https://lists.apache.org/list.html?bui...@lucene.apache.org


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org