rishabhmaurya commented on PR #13186:
URL: https://github.com/apache/lucene/pull/13186#issuecomment-2008789628

   @jpountz  +1 on comparison function that is not transitive could be 
problematic. 
   Do you think, given we are recomputing biases on each swap, that also breaks 
the transitivity of compare function? 
   
   On reading Bentley-McIlroy 3-way partitioning 
(https://algs4.cs.princeton.edu/lectures/demo/23DemoPartitioning.pdf), I don't 
think our swap logic makes sense here especially in phase-1 of the algo. 
   It does a 3 way partitioning and runs in 2 phases. In phase 1 
(https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/IntroSelector.java#L100-L118),
 it creates 4  partitions - `[elements equal to pivot] [elements smaller than 
pivot][elements greater than pivot] [elements equal to pivot]`. 
   Then in phase-2 
(https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/IntroSelector.java#L119-L133)
 it rearranges these 4 partitions into 3 partitions by swapping elements from 
the left and right end to the middle, so it looks like - `[smaller elements] 
[equal elements] [greater element]`. 
   
   All swaps performed in phase-1 is when the element is found equal to the 
pivot so that they can be swapped towards either left or the right end. 
Recomputing biases on swap is almost unnecessary in this phase since these 
elements will be swapped again towards the middle in phase-2. 
   
   If we want to continue using IntroSelector, I propose that we should not 
perform any recomputation of gains in phase-1 of the algorithm, and only 
perform recomputation in phase-2 when swaps are being performed. 
   For it, we need to distinguish `swap()` from both the phases of the algo. 
   
   For `comparePivot()`, which is only used in phase-1 of the algo, I like your 
approach of rounding values - `int cmp = Float.compare(Math.round(pivot / iter 
* 2), Math.round(bias / iter * 2))`, so we can try it out. 
   
   Let me know what do you think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to