Re: [I] IndexInput.isLoaded seems to return false for mmap index inputs on Windows [lucene]
dweiss commented on issue #14050: URL: https://github.com/apache/lucene/issues/14050#issuecomment-2529079912 This was my initial guess too. But if we hardcode it and then they fix the JDK with some magic Windows API that returns the right value? Should we try to detect if isLoaded returns true on Windows and return the detected value? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] IndexInput.isLoaded seems to return false for mmap index inputs on Windows [lucene]
rmuir commented on issue #14050: URL: https://github.com/apache/lucene/issues/14050#issuecomment-2529106991 It seems the commented-out code with hardcoded `false` is also in the JDK master? I think we should at least report the bug since it is impacting us. I am not sure it needs a magic windows API, just looks like maybe the wrong one (VirtualQuery) was attempted instead of e.g. `QueryWorkingSetEx` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] IndexInput.isLoaded seems to return false for mmap index inputs on Windows [lucene]
dweiss commented on issue #14050: URL: https://github.com/apache/lucene/issues/14050#issuecomment-2529066466 Yep, I think I understand the reason it returns Optional - a missing value would indicate "don't know", so it's a tri-state knob: yes, no, don't know. The problem is that on Windows it returns "no" and that logic in BaseDirectoryTest assumes that if it's a mmapdir, it should always be a hard "true". -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] IndexInput.isLoaded seems to return false for mmap index inputs on Windows [lucene]
rmuir commented on issue #14050: URL: https://github.com/apache/lucene/issues/14050#issuecomment-2529068203 yeah, i'm suggesting adding this to the code: ``` if (Constants.WINDOWS) { return DONT_KNOW; } ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] IndexInput.isLoaded seems to return false for mmap index inputs on Windows [lucene]
dweiss commented on issue #14050: URL: https://github.com/apache/lucene/issues/14050#issuecomment-2529159503 I've peeked at the flags returned by QueryWorkingSetEx - just wondering if it'd be possible to query it directly using ffi. :) But I can't tell which of these flags would correspond to the loaded/ swapped out state. Fun. https://learn.microsoft.com/en-us/windows/win32/api/psapi/ns-psapi-psapi_working_set_ex_block But I agree - reporting back to JDK is probably a saner idea, I'm sure they have better win32 api experts there. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] IndexInput.isLoaded seems to return false for mmap index inputs on Windows [lucene]
rmuir commented on issue #14050: URL: https://github.com/apache/lucene/issues/14050#issuecomment-2529176829 I think it won't return flags at all in that case. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] IndexInput.isLoaded seems to return false for mmap index inputs on Windows [lucene]
ChrisHegarty commented on issue #14050: URL: https://github.com/apache/lucene/issues/14050#issuecomment-2529358488 Oops! Sorry, clearly I added this test and didn't noticed it fail on Windows. For Lucene, I'm perfectly fine with returning "don't know" for all Windows. Lemme look into why this is not working in the JDK. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
[I] IndexInput.isLoaded seems to return false for mmap index inputs on Windows [lucene]
dweiss opened a new issue, #14050: URL: https://github.com/apache/lucene/issues/14050 ### Description There is a check added to BaseDirectoryTest case in #13998 which has been failing since that PR was merged (nearly all Windows builds failing on jenkins): ``` var loaded = in.isLoaded(); if (FilterDirectory.unwrap(dir) instanceof MMapDirectory // direct IO wraps MMap but does not support isLoaded && !(dir.getClass().getName().contains("DirectIO"))) { assertTrue(loaded.isPresent()); assertTrue(loaded.get()); } else { assertFalse(loaded.isPresent()); } ``` The isLoaded optional has a value of false, even though it is a mmap directory. Seems like isLoaded in MemorySegment always returns false on Windows? ### Version and environment details _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Randomize KNN vector format parameters in tests [lucene]
msokolov closed issue #14047: Randomize KNN vector format parameters in tests URL: https://github.com/apache/lucene/issues/14047 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Randomize KnnVector codec params in RandomCodec; addresses gh-14047 [lucene]
msokolov merged PR #14049: URL: https://github.com/apache/lucene/pull/14049 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] IndexInput.isLoaded seems to return false for mmap index inputs on Windows [lucene]
dweiss commented on issue #14050: URL: https://github.com/apache/lucene/issues/14050#issuecomment-2528605123 Right. Seems like it's always false on Windows. https://github.com/openjdk/jdk21/blob/890adb6410dab4606a4f26a942aed02fb2f55387/src/java.base/windows/native/libnio/MappedMemoryUtils.c#L34-L44 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Refactor dummy scorables. [lucene]
jpountz merged PR #14046: URL: https://github.com/apache/lucene/pull/14046 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] IndexInput.isLoaded seems to return false for mmap index inputs on Windows [lucene]
dweiss commented on issue #14050: URL: https://github.com/apache/lucene/issues/14050#issuecomment-2528617386 While the test can be fixed easily (bypass the check on Windows), I wonder if it's a good fix. In essence, this method lies on Windows... What are the side-effects of this always returning false? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] Randomize KNN vector format parameters in tests [lucene]
msokolov commented on issue #14047: URL: https://github.com/apache/lucene/issues/14047#issuecomment-2528860574 also pushed 4f8035c01393735e347f42c0bdc5e7b932db7447 to address a build failure by disabling randomization in more tests -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] IndexInput.isLoaded seems to return false for mmap index inputs on Windows [lucene]
rmuir commented on issue #14050: URL: https://github.com/apache/lucene/issues/14050#issuecomment-2529138823 At least not everywhere: `Minimum supported server: Windows Server 2008`. So maybe this one can be revisited in 2024 and fixed... -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] IndexInput.isLoaded seems to return false for mmap index inputs on Windows [lucene]
dweiss commented on issue #14050: URL: https://github.com/apache/lucene/issues/14050#issuecomment-2529232048 I don't have access to openjdk's Jira, maybe @ChrisHegarty or Uwe (or you?) can file an issue there? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Allow reading binary doc values as a RandomAccessInput [lucene]
jpountz commented on code in PR #13948: URL: https://github.com/apache/lucene/pull/13948#discussion_r1876152549 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/Lucene90DocValuesProducer.java: ## @@ -791,25 +792,26 @@ public BinaryDocValues getBinary(FieldInfo field) throws IOException { return DocValues.emptyBinary(); } -final RandomAccessInput bytesSlice = data.randomAccessSlice(entry.dataOffset, entry.dataLength); +final RandomAccessInputRef bytesSlice = +new RandomAccessInputRef(data.randomAccessSlice(entry.dataOffset, entry.dataLength)); Review Comment: Nit, maybe extract two variables: `bytesSlice` and `byteSliceRef` to make it a bit easier to read? ## lucene/codecs/src/java/org/apache/lucene/codecs/simpletext/SimpleTextDocValuesReader.java: ## @@ -383,17 +392,31 @@ public long cost() { @Override public int advance(int target) throws IOException { -return docsWithField.advance(target); +int doc = docsWithField.advance(target); +if (doc != NO_MORE_DOCS) { + consume(values.apply(doc)); +} +return doc; + } + + private void consume(BytesRef bytesRef) { +bytes.reset(bytesRef.bytes); +inputRef.offset = bytesRef.offset; +inputRef.length = bytesRef.length; } @Override public boolean advanceExact(int target) throws IOException { -return docsWithField.advanceExact(target); +if (docsWithField.advanceExact(target)) { + consume(values.apply(target)); + return true; +} +return false; } @Override - public BytesRef binaryValue() throws IOException { -return values.apply(docsWithField.docID()); + public RandomAccessInputRef randomAccessInputValue() { +return inputRef; Review Comment: What was the problem with calling `values#apply` in `randomAccessInputValue()` rather than in `nextDoc()`/`advance()`/`advanceExact()`? ## lucene/core/src/java/org/apache/lucene/util/UnicodeUtil.java: ## @@ -627,35 +629,58 @@ public static String toHexString(String s) { } /** - * Interprets the given byte array as UTF-8 and converts to UTF-16. It is the responsibility of - * the caller to make sure that the destination array is large enough. + * Interprets the given {@link RandomAccessInput} slice as UTF-8 and converts to UTF-16. It is the + * responsibility of the caller to make sure that the destination array is large enough. + * + * NOTE: Full characters are read, even if this reads past the length passed (and can result in + * an IOException if invalid UTF-8 is passed). Explicit checks for valid UTF-8 are not performed. + */ + // TODO: broken if chars.offset != 0 + public static int UTF8toUTF16(RandomAccessInput input, long offset, int length, char[] out) Review Comment: Having `UnicodeUtil` need to work on a `RandomAccessInput` doesn't feel right. Do you know if this is performance-sensitive? Otherwise I'd rather make `RandomAccessInputRef` implement `utf8ToString()` by converting to a `BytesRef` first. ## lucene/core/src/java/org/apache/lucene/store/RandomAccessInputDataInput.java: ## @@ -0,0 +1,103 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or more + * contributor license agreements. See the NOTICE file distributed with + * this work for additional information regarding copyright ownership. + * The ASF licenses this file to You under the Apache License, Version 2.0 + * (the "License"); you may not use this file except in compliance with + * the License. You may obtain a copy of the License at + * + * http://www.apache.org/licenses/LICENSE-2.0 + * + * Unless required by applicable law or agreed to in writing, software + * distributed under the License is distributed on an "AS IS" BASIS, + * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. + * See the License for the specific language governing permissions and + * limitations under the License. + */ +package org.apache.lucene.store; + +import java.io.IOException; +import org.apache.lucene.util.RandomAccessInputRef; + +/** + * DataInput backed by a {@link RandomAccessInput}. WARNING: This class omits all low-level + * checks. + * + * @lucene.experimental + */ +public final class RandomAccessInputDataInput extends DataInput { + + private RandomAccessInput input; + private long offset; + private long length; + + private long pos; Review Comment: I wonderif `pos` should start at `offset` rather than `0` so that we don't have to sum up `offset` and `pos` whenever we need to know the current position? ## lucene/core/src/java/org/apache/lucene/index/BinaryDocValues.java: ## @@ -18,19 +18,20 @@ package org.apache.lucene.index; import java.io.IOException; -import org.apache.lucene.util.BytesRef; +import org.apache.lucene.util.R
Re: [PR] Randomize KnnVector codec params in RandomCodec; addresses gh-14047 [lucene]
msokolov commented on PR #14049: URL: https://github.com/apache/lucene/pull/14049#issuecomment-2528438728 makes sense, I'll add a CHANGES entry and commit -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [I] IndexInput.isLoaded seems to return false for mmap index inputs on Windows [lucene]
dweiss commented on issue #14050: URL: https://github.com/apache/lucene/issues/14050#issuecomment-2530596008 Thanks, Chris. The only Windows tests running at the moment are on Uwe's vbox. Github's Windows testing is so slow as to be almost unusable. The results from Uwe's box are sent to this list - https://lists.apache.org/list.html?bui...@lucene.apache.org -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org