date:20240319

Re: [I] TestTaxonomyFacetValueSource.testRandom fails [lucene]

2024-03-19 Thread via GitHub



iamsanjay commented on issue #13191:
URL: https://github.com/apache/lucene/issues/13191#issuecomment-2006026200

   `FloatTaxonomyFacets.getTopChildrenForPath` methods counts the children and 
apply the aggregate function for all the child nodes, only IF the child value 
is greater than 0.
   
https://github.com/apache/lucene/blob/d393b9d03952c69d9641cd4a38bbe83939a25942/lucene/facet/src/java/org/apache/lucene/facet/taxonomy/FloatTaxonomyFacets.java#L217-L227
   
   On the test side, we have method which would prepare the expected result of 
aggregation, however, this time we don't filter child.
   
   
https://github.com/apache/lucene/blob/d393b9d03952c69d9641cd4a38bbe83939a25942/lucene/facet/src/test/org/apache/lucene/facet/taxonomy/TestTaxonomyFacetValueSource.java#L691-L700
   
   Therefore, there is inconsistency between the number of child generated 
actual vs expected. 
   
   Below can be a possible fix at the test level, assuming that the 
functionality of FloatTaxonomyFacets.getTopChildrenForPath behaves as intended.
   
   ```
   List expected = new ArrayList<>();
   for (int i = 0; i < numDims; i++) {
 List labelValues = new ArrayList<>();
 float aggregatedValue = 0;
 for (Map.Entry ent : expectedValues[i].entrySet()) {
   if(ent.getValue() > 0) {
 labelValues.add(new LabelAndValue(ent.getKey(), ent.getValue()));
 aggregatedValue = aggregationFunction.aggregate(aggregatedValue, 
ent.getValue());
   }
 }
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Made DocIdsWriter use DISI when reading documents with an IntersectVisitor [lucene]

2024-03-19 Thread via GitHub



jpountz commented on PR #13149:
URL: https://github.com/apache/lucene/pull/13149#issuecomment-2006375971

   > I will try adding a visit() method taking an IntsRef (I believe that is 
what you meant @jpountz?).
   
   This is what I meant indeed.
   
   Before merging I'd be curious to better understand why the JVM doesn't 
optimize this better. Presumably, it should be able to resolve the virtual call 
once for the entire for loop rather than doing it again on every iteration? I 
wonder if there is actually a performance bug, or if we are just insufficiently 
warming up the JVM and this for loop never gets compiled through C2?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [I] Decouple within-query concurrency from the index's segment geometry [LUCENE-8675] [lucene]

2024-03-19 Thread via GitHub



epotyom commented on issue #9721:
URL: https://github.com/apache/lucene/issues/9721#issuecomment-2006430761

   Thanks @jpountz  for [bringing this up as a feature for Lucene 
10](https://lists.apache.org/list.html?d...@lucene.apache.org). It would be 
great to have if we can build it!
   
   A few of us in Amazon Product Search (+cc @stefanvodita , @slow-J) had been 
looking into it out of curiosity. We discussed implementing Cloneable for 
Scorer/BulkScorer as [previously 
suggested](https://github.com/apache/lucene/issues/9721#issuecomment-1223928468),
 but it looks like even implementing it just for the TermQuery scorer requires 
significant code changes, as there are quite a few dependencies that have to be 
Cloneable too. Maybe there is a hybrid approach? For example, when concurrent 
segment search is being initialized, it can try calling clone() for 
Scorer/BulkScorer, but if it throws CloneNotSupportedException, we fall back to 
creating a new Scorer/BulkScorer instance? Then we can implement the clone 
method only for scorers that are expensive to  initialize, e.g. range query 
scorers mentioned in the comments above.
   
   This approach still takes a lot of up-front effort. I’m curious if anyone 
has better ideas. The original patch seems to have been lost; it would have 
been really useful to have. Tagging @atris on the off-chance that he still has 
the patch or remembers what the implementation looked like.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Add new parallel merge task executor for parallel actions within a single merge action [lucene]

2024-03-19 Thread via GitHub



jpountz commented on PR #13190:
URL: https://github.com/apache/lucene/pull/13190#issuecomment-2006570648

   I understand that we need to make changes to account for the fact that 
multiple threads may be contributing to the same merge concurrently, but I 
would not expect `RateLimitedIndexOutput` to need to be thread-safe: the caller 
needs to manage synchronization themselves anyway if they want bytes to be 
written in the correct order?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] [GITHUB-11915] Make Lucene smarter about long runs of matches via new API on DISI [lucene]

2024-03-19 Thread via GitHub



jpountz commented on PR #12194:
URL: https://github.com/apache/lucene/pull/12194#issuecomment-2006594221

   Sorry for the long time without a reply. I had some hesitation about moving 
this change forward since it's a big API change and I didn't see much appeal 
for it (besides you and me I guess). But I'd like to also move sparse/zone 
indexing forward, and they'd need the same API at search-time, so I'm now keen 
on moving it forward. If you're interesting in updating this branch, I'll be 
interested in reviewing.
   
   Since we last worked on this branch, conjunctions introduced a `BulkScorer`: 
`ConjunctionBulkScorer`. IMO it would be a better place to take advantage of 
this new API since it can more naturally check the value of 
`peekNextNonMatchingDocID()` every N docs without adding per-doc overhead.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [I] Support for building materialized views using Lucene formats [lucene]

2024-03-19 Thread via GitHub



jpountz commented on issue #13188:
URL: https://github.com/apache/lucene/issues/13188#issuecomment-2006649404

   Figuring out the right API for this idea sounds challenging, but I like the 
idea.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Add new parallel merge task executor for parallel actions within a single merge action [lucene]

2024-03-19 Thread via GitHub



benwtrent commented on PR #13190:
URL: https://github.com/apache/lucene/pull/13190#issuecomment-2007100589

   @jpountz 
   >  the caller needs to manage synchronization themselves anyway if they want 
bytes to be written in the correct order?
   
   The MT safety isn't really around the bytes. I agree with your assessment 
there. But its more around the throttling. The way I read the class is that 
different threads could be writing to different files but both could experience 
throttling and we should account for that.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Add new parallel merge task executor for parallel actions within a single merge action [lucene]

2024-03-19 Thread via GitHub



jpountz commented on code in PR #13190:
URL: https://github.com/apache/lucene/pull/13190#discussion_r1530384809


##
lucene/core/src/java/org/apache/lucene/index/MergePolicy.java:
##
@@ -136,13 +136,13 @@ public boolean isAborted() {
  */
 public void pauseNanos(long pauseNanos, PauseReason reason, 
BooleanSupplier condition)
 throws InterruptedException {
-  if (Thread.currentThread() != owner) {
+  /*  if (Thread.currentThread() != owner) {
 throw new RuntimeException(
 "Only the merge owner thread can call pauseNanos(). This thread: "
 + Thread.currentThread().getName()
 + ", owner thread: "
 + owner);
-  }
+  }*/

Review Comment:
   I guess not, but now we need to make sure that the rate is applied across 
all threads rather than per-thread?



##
lucene/core/src/java/org/apache/lucene/store/RateLimitedIndexOutput.java:
##
@@ -28,30 +30,30 @@ public final class RateLimitedIndexOutput extends 
FilterIndexOutput {
   private final RateLimiter rateLimiter;
 
   /** How many bytes we've written since we last called rateLimiter.pause. */
-  private long bytesSinceLastPause;
+  private final AtomicLong bytesSinceLastPause = new AtomicLong(0);

Review Comment:
   I'm still unclear why this needs to become an `AtomicLong`, if 
`bytesSinceLastPause` could be updated/read concurrently, then this would imply 
that multiple `writeXXX` methods would also be called concurrently, which 
doesn't make much sense as bytes would be written to disk in a 
non-deterministic order?



##
lucene/core/src/java/org/apache/lucene/index/MergeRateLimiter.java:
##
@@ -118,24 +118,32 @@ private long maybePause(long bytes, long curNS) throws 
MergePolicy.MergeAbortedE
   throw new MergePolicy.MergeAbortedException("Merge aborted.");
 }
 
-double rate = mbPerSec; // read from volatile rate once.
-double secondsToPause = (bytes / 1024. / 1024.) / rate;
-
-// Time we should sleep until; this is purely instantaneous
-// rate (just adds seconds onto the last time we had paused to);
-// maybe we should also offer decayed recent history one?
-long targetNS = lastNS + (long) (10 * secondsToPause);
-
-long curPauseNS = targetNS - curNS;
-
-// We don't bother with thread pausing if the pause is smaller than 2 msec.
-if (curPauseNS <= MIN_PAUSE_NS) {
-  // Set to curNS, not targetNS, to enforce the instant rate, not
-  // the "averaged over all history" rate:
-  lastNS = curNS;
+final double rate = mbPerSec; // read from volatile rate once.
+final double secondsToPause = (bytes / 1024. / 1024.) / rate;
+
+AtomicLong curPauseNSSetter = new AtomicLong();
+lastNS.updateAndGet(

Review Comment:
   If we used a lock for this whole method, then this may help pause all 
threads that share the same rate limiter, and effectively apply the rate limit 
across all threads that run the same merge, rather than applying the rate limit 
on a per-thread basis?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [I] Decouple within-query concurrency from the index's segment geometry [LUCENE-8675] [lucene]

2024-03-19 Thread via GitHub



jpountz commented on issue #9721:
URL: https://github.com/apache/lucene/issues/9721#issuecomment-2007387101

   >  Maybe there is a hybrid approach? For example, when concurrent segment 
search is being initialized, it can try calling clone() for Scorer/BulkScorer, 
but if it throws CloneNotSupportedException, we fall back to creating a new 
Scorer/BulkScorer instance?
   
   My gut feeling is that it would work internally like that, so that we would 
not have to migrate all queries in one go. But hopefully on the caller side, 
there would be a single API to call.
   
   Instead of the clone() approach, I wonder if we could also allow 
`ScorerSupplier#get` to be called multiple times, and document that these 
scorers may be used in different threads. (We'd probably need to add 
`ScorerSupplier#getBulkScorer` as well.)
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Doc reordering avoid iterations: cooling using simulated annealing [lucene]

2024-03-19 Thread via GitHub



rishabhmaurya commented on PR #13186:
URL: https://github.com/apache/lucene/pull/13186#issuecomment-2007593466

   @jpountz that's a good point. Since we want tolerance to be `iter`, we can 
check for - 
   `pivot - bias <= iter/2` and `pivot - bias >= -iter/2`. 


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Make the HitQueue size more appropriate for KNN exact search [lucene]

2024-03-19 Thread via GitHub



benwtrent merged PR #13184:
URL: https://github.com/apache/lucene/pull/13184


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Doc reordering avoid iterations: cooling using simulated annealing [lucene]

2024-03-19 Thread via GitHub



jpountz commented on PR #13186:
URL: https://github.com/apache/lucene/pull/13186#issuecomment-2007653367

   @rishabhmaurya Actually I'm not comfortable with passing a comparison 
function that is not transitive, as with iter=1, this would give 1.2 = 1.6 
(since they differ by less than 0.5) and 1.6 = 2.0 but 1.2 != 2.0. I don't 
remember if our selection algorithm has any assumptions about the comparison 
function, but in any case I'd rather prefer the comparison function to be 
well-behaved.
   
   What about comparing rounded values instead, e.g. `int cmp = 
Float.compare(Math.round(pivot / iter * 2), Math.round(bias / iter * 2))`?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Support getMaxScore of DisjunctionSumScorer for non top level scoring clause [lucene]

2024-03-19 Thread via GitHub



jpountz commented on PR #13066:
URL: https://github.com/apache/lucene/pull/13066#issuecomment-2007674297

   > This means we can skip more items with this change.
   
   Indeed!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Support getMaxScore of DisjunctionSumScorer for non top level scoring clause [lucene]

2024-03-19 Thread via GitHub



jpountz commented on code in PR #13066:
URL: https://github.com/apache/lucene/pull/13066#discussion_r1530752132


##
lucene/core/src/java/org/apache/lucene/search/DisjunctionSumScorer.java:
##
@@ -43,10 +47,23 @@ protected float score(DisiWrapper topList) throws 
IOException {
 return (float) score;
   }
 
+  @Override
+  public int advanceShallow(int target) throws IOException {
+int min = DocIdSetIterator.NO_MORE_DOCS;
+for (Scorer scorer : scorers) {
+  if (scorer.docID() <= target) {
+min = Math.max(min, scorer.advanceShallow(target));
+  }
+}
+return min;
+  }
+
   @Override
   public float getMaxScore(int upTo) throws IOException {
-// It's ok to return a bad upper bound here since we use WANDScorer when
-// we actually care about block scores.
-return Float.MAX_VALUE;
+double maxScore = 0;
+for (Scorer scorer : scorers) {
+  maxScore += scorer.getMaxScore(upTo);

Review Comment:
   We should not call `getMaxScore` on scorers whose doc ID is already beyond 
`upTo`. See `WANDScorer#getMaxScore`.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Add new parallel merge task executor for parallel actions within a single merge action [lucene]

2024-03-19 Thread via GitHub



benwtrent commented on code in PR #13190:
URL: https://github.com/apache/lucene/pull/13190#discussion_r1530907881


##
lucene/core/src/java/org/apache/lucene/store/RateLimitedIndexOutput.java:
##
@@ -28,30 +30,30 @@ public final class RateLimitedIndexOutput extends 
FilterIndexOutput {
   private final RateLimiter rateLimiter;
 
   /** How many bytes we've written since we last called rateLimiter.pause. */
-  private long bytesSinceLastPause;
+  private final AtomicLong bytesSinceLastPause = new AtomicLong(0);

Review Comment:
   Ah, yeah, I could remove all the concurrency checks here. I was just 
concerned as the above assertions tripped because of the threads. but you are 
correct, they are still ultimately only written to via one thread.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Add new parallel merge task executor for parallel actions within a single merge action [lucene]

2024-03-19 Thread via GitHub



benwtrent commented on code in PR #13190:
URL: https://github.com/apache/lucene/pull/13190#discussion_r1531011510


##
lucene/core/src/java/org/apache/lucene/index/MergePolicy.java:
##
@@ -136,13 +136,13 @@ public boolean isAborted() {
  */
 public void pauseNanos(long pauseNanos, PauseReason reason, 
BooleanSupplier condition)
 throws InterruptedException {
-  if (Thread.currentThread() != owner) {
+  /*  if (Thread.currentThread() != owner) {
 throw new RuntimeException(
 "Only the merge owner thread can call pauseNanos(). This thread: "
 + Thread.currentThread().getName()
 + ", owner thread: "
 + owner);
-  }
+  }*/

Review Comment:
   > I guess not, but now we need to make sure that the rate is applied across 
all threads rather than per-thread?
   
   Looking at the code, it seems like it was previously assumed that only one 
`createOutput` was called at a time for a merge. Now, it could be that more 
than one is called. 
   
   I guess this means we need a `RateLimitingDirectory` that passes things in 
to the `RateLimitedIndexOutput` ensure global rate limiting is controlled 
across all outputs.  Is this what you are talking about?
   
   I am not sure that `pauseNanos` should directly know about other threads and 
pause them. It seems better to put this up on the directory level.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Add new parallel merge task executor for parallel actions within a single merge action [lucene]

2024-03-19 Thread via GitHub



benwtrent commented on code in PR #13190:
URL: https://github.com/apache/lucene/pull/13190#discussion_r1531018293


##
lucene/core/src/java/org/apache/lucene/index/MergeRateLimiter.java:
##
@@ -118,24 +118,32 @@ private long maybePause(long bytes, long curNS) throws 
MergePolicy.MergeAbortedE
   throw new MergePolicy.MergeAbortedException("Merge aborted.");
 }
 
-double rate = mbPerSec; // read from volatile rate once.
-double secondsToPause = (bytes / 1024. / 1024.) / rate;
-
-// Time we should sleep until; this is purely instantaneous
-// rate (just adds seconds onto the last time we had paused to);
-// maybe we should also offer decayed recent history one?
-long targetNS = lastNS + (long) (10 * secondsToPause);
-
-long curPauseNS = targetNS - curNS;
-
-// We don't bother with thread pausing if the pause is smaller than 2 msec.
-if (curPauseNS <= MIN_PAUSE_NS) {
-  // Set to curNS, not targetNS, to enforce the instant rate, not
-  // the "averaged over all history" rate:
-  lastNS = curNS;
+final double rate = mbPerSec; // read from volatile rate once.
+final double secondsToPause = (bytes / 1024. / 1024.) / rate;
+
+AtomicLong curPauseNSSetter = new AtomicLong();
+lastNS.updateAndGet(

Review Comment:
   So, the limits are on a per `RateLimitedIndexOutput` bases, not necessarily 
per thread. Since all `RateLimitedIndexOutput` constructed share the same 
rate-limiter, it might pause all threads?
   
   But I don't think so, as each process could have its own 
`RateLimitedIndexOutput` which is tracking its own bytes locally and while the 
combined total of all threads might exceed the throttling limit, the output 
isn't really throttled until a single thread exceeds the limit. 



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Test case that exposes bug in PR #11724 [lucene]

2024-03-19 Thread via GitHub



github-actions[bot] commented on PR #13157:
URL: https://github.com/apache/lucene/pull/13157#issuecomment-2008387603

   This PR has not had activity in the past 2 weeks, labeling it as stale. If 
the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you 
for your contribution!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Add Facets#getBulkSpecificValues method [lucene]

2024-03-19 Thread via GitHub



github-actions[bot] commented on PR #12862:
URL: https://github.com/apache/lucene/pull/12862#issuecomment-2008388050

   This PR has not had activity in the past 2 weeks, labeling it as stale. If 
the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you 
for your contribution!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Restrict GraphTokenStreamFiniteStrings#articulationPointsRecurse recursion depth [lucene]

2024-03-19 Thread via GitHub



dlavant commented on PR #12249:
URL: https://github.com/apache/lucene/pull/12249#issuecomment-2008538507

   I think I ran into this issue with (Elastic 7.17 + Lucene 8.11.1).I am 
aware that this version of Lucene does not have the fix.
   
   To confirm this, I tried to reproduce the StackOverflowError on-demand by 
feeding Elastic an absurdly long query.But strangely, I cannot reproduce it 
consistently. To be specific, I am running queries that are ~200,000 
characters long, with a single word repeated over and over (i.e. 
"cat,cat,cat,catcat,cat,cat,cat"). I would expect that a query like 
this would trigger the StackOverflowError every time, but it doesn't.  
   
   I suspect there some fact about Lucene tokenization that I am overlooking.   
 Could anyone provide an example of a query *string* that would be likely 
to cause the StackOverflowError? 
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Support getMaxScore of DisjunctionSumScorer for non top level scoring clause [lucene]

2024-03-19 Thread via GitHub



mrkm4ntr commented on PR #13066:
URL: https://github.com/apache/lucene/pull/13066#issuecomment-2008586471

   @jpountz Thank you for your review! I fixed the issue.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Doc reordering avoid iterations: cooling using simulated annealing [lucene]

2024-03-19 Thread via GitHub



rishabhmaurya commented on PR #13186:
URL: https://github.com/apache/lucene/pull/13186#issuecomment-2008789628

   @jpountz  +1 on comparison function that is not transitive could be 
problematic. 
   Do you think, given we are recomputing biases on each swap, that also breaks 
the transitivity of compare function? 
   
   On reading Bentley-McIlroy 3-way partitioning 
(https://algs4.cs.princeton.edu/lectures/demo/23DemoPartitioning.pdf), I don't 
think our swap logic makes sense here especially in phase-1 of the algo. 
   It does a 3 way partitioning and runs in 2 phases. In phase 1 
(https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/IntroSelector.java#L100-L118),
 it creates 4  partitions - `[elements equal to pivot] [elements smaller than 
pivot][elements greater than pivot] [elements equal to pivot]`. 
   Then in phase-2 
(https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/IntroSelector.java#L119-L133)
 it rearranges these 4 partitions into 3 partitions by swapping elements from 
the left and right end to the middle, so it looks like - `[smaller elements] 
[equal elements] [greater element]`. 
   
   All swaps performed in phase-1 is when the element is found equal to the 
pivot so that they can be swapped towards either left or the right end. 
Recomputing biases on swap is almost unnecessary in this phase since these 
elements will be swapped again towards the middle in phase-2. 
   
   If we want to continue using IntroSelector, I propose that we should not 
perform any recomputation of gains in phase-1 of the algorithm, and only 
perform recomputation in phase-2 when swaps are being performed. 
   For it, we need to distinguish `swap()` from both the phases of the algo. 
   
   For `comparePivot()`, which is only used in phase-1 of the algo, I like your 
approach of rounding values - `int cmp = Float.compare(Math.round(pivot / iter 
* 2), Math.round(bias / iter * 2))`, so we can try it out. 
   
   Let me know what do you think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [I] TestTaxonomyFacetValueSource.testRandom fails [lucene]

Re: [PR] Made DocIdsWriter use DISI when reading documents with an IntersectVisitor [lucene]

Re: [I] Decouple within-query concurrency from the index's segment geometry [LUCENE-8675] [lucene]

Re: [PR] Add new parallel merge task executor for parallel actions within a single merge action [lucene]

Re: [PR] [GITHUB-11915] Make Lucene smarter about long runs of matches via new API on DISI [lucene]

Re: [I] Support for building materialized views using Lucene formats [lucene]

Re: [PR] Add new parallel merge task executor for parallel actions within a single merge action [lucene]

Re: [PR] Add new parallel merge task executor for parallel actions within a single merge action [lucene]

Re: [I] Decouple within-query concurrency from the index's segment geometry [LUCENE-8675] [lucene]

Re: [PR] Doc reordering avoid iterations: cooling using simulated annealing [lucene]

Re: [PR] Make the HitQueue size more appropriate for KNN exact search [lucene]

Re: [PR] Doc reordering avoid iterations: cooling using simulated annealing [lucene]

Re: [PR] Support getMaxScore of DisjunctionSumScorer for non top level scoring clause [lucene]

Re: [PR] Support getMaxScore of DisjunctionSumScorer for non top level scoring clause [lucene]

Re: [PR] Add new parallel merge task executor for parallel actions within a single merge action [lucene]

Re: [PR] Add new parallel merge task executor for parallel actions within a single merge action [lucene]

Re: [PR] Add new parallel merge task executor for parallel actions within a single merge action [lucene]

Re: [PR] Test case that exposes bug in PR #11724 [lucene]

Re: [PR] Add Facets#getBulkSpecificValues method [lucene]

Re: [PR] Restrict GraphTokenStreamFiniteStrings#articulationPointsRecurse recursion depth [lucene]

Re: [PR] Support getMaxScore of DisjunctionSumScorer for non top level scoring clause [lucene]

Re: [PR] Doc reordering avoid iterations: cooling using simulated annealing [lucene]

22 matches

Site Navigation

Mail list logo

Footer information