Re: [PR] Change uses of withReadAdvice to use hints instead [lucene]

2025-05-02 Thread via GitHub


thecoop commented on code in PR #14510:
URL: https://github.com/apache/lucene/pull/14510#discussion_r2071537608


##
lucene/test-framework/src/java/org/apache/lucene/tests/store/SerialIOCountingDirectory.java:
##
@@ -72,7 +72,7 @@ public ChecksumIndexInput openChecksumInput(String name) 
throws IOException {
 
   @Override
   public IndexInput openInput(String name, IOContext context) throws 
IOException {
-ReadAdvice readAdvice = 
context.readAdvice().orElse(Constants.DEFAULT_READADVICE);
+ReadAdvice readAdvice = MMapDirectory.toReadAdvice(context);

Review Comment:
   @jpountz Are these tests necessary as-is, or can they be removed & replaced 
with some other check?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Change uses of withReadAdvice to use hints instead [lucene]

2025-05-02 Thread via GitHub


thecoop commented on code in PR #14510:
URL: https://github.com/apache/lucene/pull/14510#discussion_r2071728838


##
lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene101/Lucene101PostingsReader.java:
##
@@ -149,9 +150,9 @@ public Lucene101PostingsReader(SegmentReadState state) 
throws IOException {
 IndexFileNames.segmentFileName(
 state.segmentInfo.name, state.segmentSuffix, 
Lucene101PostingsFormat.DOC_EXTENSION);
 try {
-  // Postings have a forward-only access pattern, so pass 
ReadAdvice.NORMAL to perform
-  // readahead.
-  docIn = state.directory.openInput(docName, 
state.context.withReadAdvice(ReadAdvice.NORMAL));
+  docIn =
+  state.directory.openInput(
+  docName, state.context.withHints(FileTypeHint.DATA, 
FileDataHint.POSTINGS));

Review Comment:
   This doesn't set a DataAccessHint as it's not RANDOM or SEQUENTIAL, just 
NORMAL.
   
   I've tried to keep the initial hints as simple as possible here, without 
changing behaviour, so we can normalise and possibly modify behaviour in a 
subsequent PR



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Remove RANDOM_PRELOAD read advice, which is not actually used [lucene]

2025-05-02 Thread via GitHub


ChrisHegarty merged PR #14593:
URL: https://github.com/apache/lucene/pull/14593


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] [Bug] Fix for stored fields force merge regression [lucene]

2025-05-02 Thread via GitHub


benwtrent commented on PR #14512:
URL: https://github.com/apache/lucene/pull/14512#issuecomment-2847040963

   I don't like the read-advice update, even for vectors (I am talking about 
the referenced PR). It seems we are trying to fix a self-inflicted bug.
   
   This has been an incredibly trappy API that has (it seems) provided nothing 
but woe. I have yet to see any prevailing evidence that changing read-advice 
for MMAP has provided any benefit at all :(. 
   
   The actual fix seems to be fixing the underlying APIs: 
https://github.com/apache/lucene/pull/14510
   
   I do not like the idea of updating a very general format API (stored fields) 
to fix an esoteric bug.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Change uses of withReadAdvice to use hints instead [lucene]

2025-05-02 Thread via GitHub


thecoop commented on code in PR #14510:
URL: https://github.com/apache/lucene/pull/14510#discussion_r2071743905


##
lucene/core/src/java/org/apache/lucene/store/MMapDirectory.java:
##
@@ -277,6 +277,32 @@ public final long getMaxChunkSize() {
 return 1L << chunkSizePower;
   }
 
+  public static ReadAdvice toReadAdvice(IOContext context) {
+if (context.context() == IOContext.Context.MERGE
+|| context.context() == IOContext.Context.FLUSH) {
+  return ReadAdvice.SEQUENTIAL;
+}
+
+if (context.hints().contains(DataAccessHint.RANDOM)) {
+  return ReadAdvice.RANDOM;
+}
+if (context.hints().contains(DataAccessHint.SEQUENTIAL)) {
+  return ReadAdvice.SEQUENTIAL;
+}
+
+if (context.hints().contains(FileTypeHint.DATA)
+|| context.hints().contains(FileTypeHint.INDEX)) {
+  return ReadAdvice.NORMAL;
+}
+// Postings have a forward-only access pattern, so pass ReadAdvice.NORMAL 
to perform
+// readahead.
+if (context.hints().contains(FileDataHint.POSTINGS)) {
+  return ReadAdvice.NORMAL;
+}

Review Comment:
   Currently, `ReadAdvice.NORMAL` doesn't have a corresponding 
`DataAccessHint`, so you get that by not specifying the `DataAccessHint` but 
then as the default is now `ReadAdvice.RANDOM`, you need some way to explicitly 
specify 'I want the OS default behaviour here'.
   
   Maybe we do have a `DataAccessHint.DEFAULT`, but then doesn't 
`DataAccessHint` just mirror `ReadAdvice`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Change uses of withReadAdvice to use hints instead [lucene]

2025-05-02 Thread via GitHub


thecoop commented on code in PR #14510:
URL: https://github.com/apache/lucene/pull/14510#discussion_r2071743905


##
lucene/core/src/java/org/apache/lucene/store/MMapDirectory.java:
##
@@ -277,6 +277,32 @@ public final long getMaxChunkSize() {
 return 1L << chunkSizePower;
   }
 
+  public static ReadAdvice toReadAdvice(IOContext context) {
+if (context.context() == IOContext.Context.MERGE
+|| context.context() == IOContext.Context.FLUSH) {
+  return ReadAdvice.SEQUENTIAL;
+}
+
+if (context.hints().contains(DataAccessHint.RANDOM)) {
+  return ReadAdvice.RANDOM;
+}
+if (context.hints().contains(DataAccessHint.SEQUENTIAL)) {
+  return ReadAdvice.SEQUENTIAL;
+}
+
+if (context.hints().contains(FileTypeHint.DATA)
+|| context.hints().contains(FileTypeHint.INDEX)) {
+  return ReadAdvice.NORMAL;
+}
+// Postings have a forward-only access pattern, so pass ReadAdvice.NORMAL 
to perform
+// readahead.
+if (context.hints().contains(FileDataHint.POSTINGS)) {
+  return ReadAdvice.NORMAL;
+}

Review Comment:
   Currently, `ReadAdvice.NORMAL` doesn't have a corresponding 
`DataAccessHint`, so you get that by not specifying the `DataAccessHint` but 
then as the default is now `ReadAdvice.RANDOM` (or configurable), you need some 
way to explicitly specify 'I want the OS default behaviour here'.
   
   Maybe we do have a `DataAccessHint.DEFAULT`, but then doesn't 
`DataAccessHint` just mirror `ReadAdvice`?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Reduce NeighborArray heap memory [lucene]

2025-05-02 Thread via GitHub


benwtrent commented on PR #14527:
URL: https://github.com/apache/lucene/pull/14527#issuecomment-2847706746

   @weizijun 
   
   It should be fixed. Having an estimation that is more than 2x off is pretty 
bad.
   
   this estimation is used to determine how often flushes should occur, etc.
   
   There are a couple of ways this can be fixed. 
   
   A simple way could be providing an optional call-back to `NeighborArray` 
that accesses package-private method on `OnHeapHnswGraph` that allows for their 
individual estimation to be adjusted during array growth.
   
   `NeighborArray(OnHeapHnswGraph::updateEstimate...)` or something. Then the 
ram estimation in OnHeapHnswGraph becomes the accumulation of those estimates 
as the inner estimates evolve. 
   
   We need to be cautious there with multi-threadedness as many node updates 
could be occuring at a time. So, likely this inner accumulator needs to be 
`LongAccumulator` and it should also assert that its always a positive number
   
   Please also adjust the inner arrays to enforce their maximal length. This 
way we never over-allocate.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] OptimisticKnnVectorQuery [lucene]

2025-05-02 Thread via GitHub


benwtrent commented on PR #14226:
URL: https://github.com/apache/lucene/pull/14226#issuecomment-2848006782

   I did a run over 4M vectors. One with bit quantization. In both cases, this 
optimistic pattern beats the baseline (sharing). Consistently, it gives better 
recall at better latency. 
   
   I think that the vector ops count method I am using is "double counting" the 
vector ops...
   
   
https://docs.google.com/spreadsheets/d/1B5L0Ob9Q_V7zwryYUJhk_T3d4zhbTLh17LezFWba08A/edit?usp=sharing
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] MultiRange query for SortedNumeric DocValues [lucene]

2025-05-02 Thread via GitHub


mkhludnev merged PR #14404:
URL: https://github.com/apache/lucene/pull/14404


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] MultiRange query for SortedNumeric DocValues (#14404) [lucene]

2025-05-02 Thread via GitHub


mkhludnev opened a new pull request, #14605:
URL: https://github.com/apache/lucene/pull/14605

   * Numeric Multi-Range DocValues query
   
   (cherry picked from commit 2eeb71877da4cbbc3033fe9a2c0a18f03550a070)
   
   ### Description
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



[PR] Add a preload hint for preloading mmap data on specific open calls [lucene]

2025-05-02 Thread via GitHub


thecoop opened a new pull request, #14604:
URL: https://github.com/apache/lucene/pull/14604

   (no comment)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] gradle-wrapper.jar will not be updated when its sha/version changes [lucene]

2025-05-02 Thread via GitHub


ChrisHegarty commented on issue #14598:
URL: https://github.com/apache/lucene/issues/14598#issuecomment-2846847520

   I noticed problems with the gradle wrapper also, e.g.
   ```
   ./gradlew check
   no main manifest attribute, in 
/Users/chegar/git/lucene/gradle/wrapper/gradle-wrapper.jar
   ```
   
   I just deleted and re-cloned and all is fine now. It downloaded a new 
wrapper, e.g.
   ```
   Downloading gradle-wrapper.jar from 
https://raw.githubusercontent.com/gradle/gradle/v8.14.0/gradle/wrapper/gradle-wrapper.jar
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] MultiRange query for SortedNumeric DocValues (#14404) [lucene]

2025-05-02 Thread via GitHub


mkhludnev merged PR #14605:
URL: https://github.com/apache/lucene/pull/14605


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Refactor doc values to expose a `DocIdSetIterator` instead of extending `DocIdSetIterator`. [lucene]

2025-05-02 Thread via GitHub


github-actions[bot] commented on PR #14475:
URL: https://github.com/apache/lucene/pull/14475#issuecomment-2848311502

   This PR has not had activity in the past 2 weeks, labeling it as stale. If 
the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you 
for your contribution!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] [Bug] Fix for postings force merge regression [lucene]

2025-05-02 Thread via GitHub


github-actions[bot] commented on PR #14513:
URL: https://github.com/apache/lucene/pull/14513#issuecomment-2848311449

   This PR has not had activity in the past 2 weeks, labeling it as stale. If 
the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you 
for your contribution!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Supporting both parentQuery and childQuery and adding childLimitPerParent in BlockJoin queries [lucene]

2025-05-02 Thread via GitHub


asimmahmood1 commented on issue #14565:
URL: https://github.com/apache/lucene/issues/14565#issuecomment-2848314981

   I think new query type is good idea, you're already mentioned new reasons 
like which score to use, and infact we used combination of parent and child 
score. For the short term you can limit how many children to match for each 
parent. 
   
   Since ParentChildrenBlockJoinQuery returns all children, you could name new 
query ParentChildBlockJoinQuery, althought the name doesn't make to exactly 
obvious what it does. 
   
   @msfroh Yes sounds like the same problem  😄 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Change uses of withReadAdvice to use hints instead [lucene]

2025-05-02 Thread via GitHub


jpountz commented on code in PR #14510:
URL: https://github.com/apache/lucene/pull/14510#discussion_r2071603158


##
lucene/core/src/java/org/apache/lucene/store/MMapDirectory.java:
##
@@ -277,6 +277,32 @@ public final long getMaxChunkSize() {
 return 1L << chunkSizePower;
   }
 
+  public static ReadAdvice toReadAdvice(IOContext context) {
+if (context.context() == IOContext.Context.MERGE
+|| context.context() == IOContext.Context.FLUSH) {
+  return ReadAdvice.SEQUENTIAL;
+}
+
+if (context.hints().contains(DataAccessHint.RANDOM)) {
+  return ReadAdvice.RANDOM;
+}
+if (context.hints().contains(DataAccessHint.SEQUENTIAL)) {
+  return ReadAdvice.SEQUENTIAL;
+}
+
+if (context.hints().contains(FileTypeHint.DATA)
+|| context.hints().contains(FileTypeHint.INDEX)) {
+  return ReadAdvice.NORMAL;
+}
+// Postings have a forward-only access pattern, so pass ReadAdvice.NORMAL 
to perform
+// readahead.
+if (context.hints().contains(FileDataHint.POSTINGS)) {
+  return ReadAdvice.NORMAL;
+}

Review Comment:
   I feel like this default implementation should be dumb and trust the 
`DataAccessHint`, and the `FileDataHint` hint should only be used by users to 
override defaults when they know better about their data / access pattern?



##
lucene/test-framework/src/java/org/apache/lucene/tests/store/SerialIOCountingDirectory.java:
##
@@ -72,7 +72,7 @@ public ChecksumIndexInput openChecksumInput(String name) 
throws IOException {
 
   @Override
   public IndexInput openInput(String name, IOContext context) throws 
IOException {
-ReadAdvice readAdvice = 
context.readAdvice().orElse(Constants.DEFAULT_READADVICE);
+ReadAdvice readAdvice = MMapDirectory.toReadAdvice(context);

Review Comment:
   I would expect this class to make the following assumptions
- METADATA implies that the file is read once and its content is loaded 
into heap,
- INDEX implies that the file should fit in the page cache, so count only 
one I/O (potentially per block) at open time
- DATA implies that the file may be larger than the page cache, so it needs 
to increment the counter when moving to a different page.
   
   (Related to my other comment about the default implementation in 
`MMapDirectory` needing to be simple and not looking at the kind of data it's 
working on.)
   
   Currently, the test tries to differenciate between random-access data files 
and sequential-access data files, I'd be happy with either trusting the 
`DataAccessHint` or always assuming sequential access.



##
lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene50/Lucene50CompoundReader.java:
##
@@ -74,7 +74,7 @@ public Lucene50CompoundReader(Directory directory, 
SegmentInfo si) throws IOExce
 }
 expectedLength += CodecUtil.footerLength();
 
-handle = directory.openInput(dataFileName, 
IOContext.DEFAULT.withReadAdvice(ReadAdvice.NORMAL));
+handle = directory.openInput(dataFileName, 
IOContext.DEFAULT.withHints(FileTypeHint.DATA));

Review Comment:
   `FileTypeHint.DATA` feels confusing as this file has `METADATA` and `INDEX` 
sub files?



##
lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene101/Lucene101PostingsReader.java:
##
@@ -149,9 +150,9 @@ public Lucene101PostingsReader(SegmentReadState state) 
throws IOException {
 IndexFileNames.segmentFileName(
 state.segmentInfo.name, state.segmentSuffix, 
Lucene101PostingsFormat.DOC_EXTENSION);
 try {
-  // Postings have a forward-only access pattern, so pass 
ReadAdvice.NORMAL to perform
-  // readahead.
-  docIn = state.directory.openInput(docName, 
state.context.withReadAdvice(ReadAdvice.NORMAL));
+  docIn =
+  state.directory.openInput(
+  docName, state.context.withHints(FileTypeHint.DATA, 
FileDataHint.POSTINGS));

Review Comment:
   Should this call also pass a `DataAccessHint`? The fact that it's sometimes 
provided and sometimes not is a bit confusing to me.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Change uses of withReadAdvice to use hints instead [lucene]

2025-05-02 Thread via GitHub


thecoop commented on code in PR #14510:
URL: https://github.com/apache/lucene/pull/14510#discussion_r2071769318


##
lucene/backward-codecs/src/java/org/apache/lucene/backward_codecs/lucene50/Lucene50CompoundReader.java:
##
@@ -74,7 +74,7 @@ public Lucene50CompoundReader(Directory directory, 
SegmentInfo si) throws IOExce
 }
 expectedLength += CodecUtil.footerLength();
 
-handle = directory.openInput(dataFileName, 
IOContext.DEFAULT.withReadAdvice(ReadAdvice.NORMAL));
+handle = directory.openInput(dataFileName, 
IOContext.DEFAULT.withHints(FileTypeHint.DATA));

Review Comment:
   Here, the handle isn't really used direct - its used to slice a new 
IndexInput, but then a replacement `IOContext` is specified there. So we don't 
actually need a hint/readadvice at all here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Speed up flush of softdelete by intoBitset [lucene]

2025-05-02 Thread via GitHub


jpountz commented on code in PR #14552:
URL: https://github.com/apache/lucene/pull/14552#discussion_r2071284659


##
lucene/core/src/java/org/apache/lucene/index/SingletonSortedNumericDocValues.java:
##
@@ -57,6 +58,16 @@ public int advance(int target) throws IOException {
 return in.advance(target);
   }
 
+  @Override
+  public void intoBitSet(int upTo, FixedBitSet bitSet, int offset) throws 
IOException {
+in.intoBitSet(upTo, bitSet, offset);
+  }
+
+  @Override
+  public int docIDRunEnd() throws IOException {
+return in.docIDRunEnd();
+  }

Review Comment:
   Maybe extract this change to a separate PR and cover all wrappers?



##
lucene/core/src/java/org/apache/lucene/index/ReadersAndUpdates.java:
##
@@ -526,6 +540,60 @@ public int nextDoc() throws IOException {
   } while (hasValue == false);
   return docIDOut;
 }
+
+@Override
+public void intoBitSet(int upTo, FixedBitSet bitSet, int offset) throws 
IOException {
+  if (onDiskDocValues == null) {
+super.intoBitSet(upTo, bitSet, offset);
+return;
+  }
+
+  // we need a scratch bitset because the param bitset doesn't allow bits 
to be cleared.
+  if (scratch == null || scratch.length() < upTo - offset) {

Review Comment:
   Should we compare scratch.length() with bitSet.length() rather than `upTo - 
offset`? My concern is that there are multiple call sites that call intoBitSet 
with upTo=NO_MORE_DOCS, so this could create a giant bit set? 



##
lucene/core/src/java/org/apache/lucene/index/ReadersAndUpdates.java:
##
@@ -526,6 +540,60 @@ public int nextDoc() throws IOException {
   } while (hasValue == false);
   return docIDOut;
 }
+
+@Override
+public void intoBitSet(int upTo, FixedBitSet bitSet, int offset) throws 
IOException {
+  if (onDiskDocValues == null) {
+super.intoBitSet(upTo, bitSet, offset);
+return;
+  }
+
+  // we need a scratch bitset because the param bitset doesn't allow bits 
to be cleared.
+  if (scratch == null || scratch.length() < upTo - offset) {
+// intoBitSet is usually called with fixed window size so we do not do 
overSize here.
+scratch = new FixedBitSet(upTo - offset);

Review Comment:
   Maybe use FixedBitSet#ensureCapacity?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Change uses of withReadAdvice to use hints instead [lucene]

2025-05-02 Thread via GitHub


ChrisHegarty commented on code in PR #14510:
URL: https://github.com/apache/lucene/pull/14510#discussion_r2071306868


##
lucene/core/src/java/org/apache/lucene/store/MMapDirectory.java:
##
@@ -277,6 +277,32 @@ public final long getMaxChunkSize() {
 return 1L << chunkSizePower;
   }
 
+  public static ReadAdvice toReadAdvice(IOContext context) {

Review Comment:
   Actually, this seems more like it's a default implementation of a static 
READ_ADVICE function, similar to other override-defaults pattern in this class. 
Otherwise, the user cannot override with empty, since the chaining will 
delegate.



##
lucene/core/src/java/org/apache/lucene/store/MMapDirectory.java:
##
@@ -277,6 +277,32 @@ public final long getMaxChunkSize() {
 return 1L << chunkSizePower;
   }
 
+  public static ReadAdvice toReadAdvice(IOContext context) {

Review Comment:
   Can you please add some javadoc for the public method.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Change uses of withReadAdvice to use hints instead [lucene]

2025-05-02 Thread via GitHub


thecoop commented on code in PR #14510:
URL: https://github.com/apache/lucene/pull/14510#discussion_r2071325241


##
lucene/core/src/java/org/apache/lucene/store/MMapDirectory.java:
##
@@ -277,6 +277,32 @@ public final long getMaxChunkSize() {
 return 1L << chunkSizePower;
   }
 
+  public static ReadAdvice toReadAdvice(IOContext context) {

Review Comment:
   It's public so it can be accessed by `SerialIOCountingDirectory` - see 
comments there



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [I] Supporting both parentQuery and childQuery in ToChildBlockJoinQuery and adding childLimitPerParent [lucene]

2025-05-02 Thread via GitHub


Jinny-Wang commented on issue #14565:
URL: https://github.com/apache/lucene/issues/14565#issuecomment-2848301176

   With more explorations in the code, I think it would be cleaner to just 
introduce a new query operator instead of trying to wrap both parentQuery and 
childQuery in ToChildBlockJoinQuery.
   - The current ToChildBlockJoinQuery uses 
[FilterWeight](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/search/FilterWeight.java)
 which only support one contained weight. If we start introducing another 
query, we will need 2 contained weights. Which means a custom `Weight` class 
would be required that does not extend the `FilterWeight`. This would make it 
harder to be 100% backward compatible with current code behavior. 
   - The current score uses just the parentQuery's score to score. Introduction 
of childQuery will open up the option to use childQuery score / childQuery 
combined with parentQuery score etc. If we want to make the scoring behavior 
configurable, that's even more customization on top of exiting 
ToChildBlockJoinQuery. 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org