Re: [PR] Allow skip cache factor to be updated dynamically [lucene]

2025-03-29 Thread via GitHub


jpountz commented on code in PR #14412:
URL: https://github.com/apache/lucene/pull/14412#discussion_r2019755560


##
lucene/CHANGES.txt:
##
@@ -47,6 +47,7 @@ Other
 -
 
 * GITHUB#14229: Bump minimum required Java version to 25
+* GITHUB#14412: Allow skip cache factor to be updated dynamically. (Sagar 
Upadhyaya)

Review Comment:
   Can you move the CHANGES entry under version 10.2 below, section "New 
Features"?



##
lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java:
##
@@ -142,6 +142,20 @@ public LRUQueryCache(
 missCount = new LongAdder();
   }
 
+  public float getSkipCacheFactor() {
+return skipCacheFactor;
+  }
+
+  /**
+   * This setter enables the skipCacheFactor to be updated dynamically.
+   *
+   * @param skipCacheFactor determines whether we need to skip the cache 
operation based on cost and
+   * lead cost.
+   */
+  public void setSkipCacheFactor(float skipCacheFactor) {

Review Comment:
   Can you explain what this factor does, e.g. by copying this from the ctor 
docs: "clauses whose cost is {@code skipCacheFactor} times more than the cost 
of the top-level query will not be cached in order to not slow down queries too 
much."



##
lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java:
##
@@ -142,6 +142,20 @@ public LRUQueryCache(
 missCount = new LongAdder();
   }
 
+  public float getSkipCacheFactor() {

Review Comment:
   Can you add javadocs here too? Lucene aims at having all public APIs 
documented to make javadocs nicer to browse by users. In this case it could 
mostly refer to the setter, something like
   
   ```
   Get the skip cache factor.
   @see #setSkipCacheFactor
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Allow skip cache factor to be updated dynamically [lucene]

2025-03-29 Thread via GitHub


kkewwei commented on PR #14412:
URL: https://github.com/apache/lucene/pull/14412#issuecomment-2763254947

   I am wondering if we should also make `maxRamBytesUsed` dynamic as well.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] cache preset dict for LZ4WithPresetDictDecompressor [lucene]

2025-03-29 Thread via GitHub


kkewwei commented on code in PR #14397:
URL: https://github.com/apache/lucene/pull/14397#discussion_r2019766795


##
lucene/core/src/java/org/apache/lucene/codecs/lucene90/compressing/Lucene90CompressingStoredFieldsReader.java:
##
@@ -512,6 +512,7 @@ private void doReset(int docID) throws IOException {
   bytes.offset = bytes.length = 0;
   for (int decompressed = 0; decompressed < totalLength; ) {
 final int toDecompress = Math.min(totalLength - decompressed, 
chunkSize);
+decompressor.reset();
 decompressor.decompress(fieldsStream, toDecompress, 0, 
toDecompress, spare);

Review Comment:
   It seems that `reset` is essential. When the block changes, we must discard 
the cache in time, this operation can only be detected from external.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] quick exit on filter query matching no docs when rewriting knn query [lucene]

2025-03-29 Thread via GitHub


bugmakerr commented on PR #14418:
URL: https://github.com/apache/lucene/pull/14418#issuecomment-2763256416

   > Can you help me understand what work this change helps save?
   
   Thanks for your reply. I think if the filter query matches no docs, the knn 
query must also match no docs. Before this small change, we always executed 
search tasks for each leaf reader even if the filter query is 
`MatchNoDocsQuery`.
   
   
https://github.com/apache/lucene/blob/a7ff87fdb5479a2f1e89746d88c80023b0aecbb1/lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java#L89-L98


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Allow skip cache factor to be updated dynamically [lucene]

2025-03-29 Thread via GitHub


sgup432 commented on code in PR #14412:
URL: https://github.com/apache/lucene/pull/14412#discussion_r2019868684


##
lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java:
##
@@ -142,6 +142,20 @@ public LRUQueryCache(
 missCount = new LongAdder();
   }
 
+  public float getSkipCacheFactor() {

Review Comment:
   Sure!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Fix HistogramCollector to not create zero-count buckets. [lucene]

2025-03-29 Thread via GitHub


jpountz commented on code in PR #14421:
URL: https://github.com/apache/lucene/pull/14421#discussion_r2019755410


##
lucene/sandbox/src/java/org/apache/lucene/sandbox/facet/plain/histograms/HistogramCollector.java:
##
@@ -279,7 +279,9 @@ public void collect(DocIdStream stream) throws IOException {
 public void finish() throws IOException {
   // Put counts that we computed in the int[] back into the hash map.
   for (int i = 0; i < counts.length; ++i) {
-collectorCounts.addTo(leafMinBucket + i, counts[i]);
+if (counts[i] != 0) {
+  collectorCounts.addTo(leafMinBucket + i, counts[i]);
+}
   }
   checkMaxBuckets(collectorCounts.size(), maxBuckets);

Review Comment:
   I think it's ok the way it is. The end goal is to prevent unbounded heap 
allocation. In this case, the amount of excess heap we may allocate is bounded 
by 1024 entries, so I'd err on the side of simplicity by not checking the 
number of buckets in the loop?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Allow skip cache factor to be updated dynamically [lucene]

2025-03-29 Thread via GitHub


sgup432 commented on code in PR #14412:
URL: https://github.com/apache/lucene/pull/14412#discussion_r2019868684


##
lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java:
##
@@ -142,6 +142,20 @@ public LRUQueryCache(
 missCount = new LongAdder();
   }
 
+  public float getSkipCacheFactor() {

Review Comment:
   Sure. Added. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Allow skip cache factor to be updated dynamically [lucene]

2025-03-29 Thread via GitHub


sgup432 commented on code in PR #14412:
URL: https://github.com/apache/lucene/pull/14412#discussion_r2019870708


##
lucene/CHANGES.txt:
##
@@ -47,6 +47,7 @@ Other
 -
 
 * GITHUB#14229: Bump minimum required Java version to 25
+* GITHUB#14412: Allow skip cache factor to be updated dynamically. (Sagar 
Upadhyaya)

Review Comment:
   Added in desired section.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org