rishabhmaurya commented on PR #13186: URL: https://github.com/apache/lucene/pull/13186#issuecomment-2008789628
@jpountz +1 on comparison function that is not transitive could be problematic. Do you think, given we are recomputing biases on each swap, that also breaks the transitivity of compare function? On reading Bentley-McIlroy 3-way partitioning (https://algs4.cs.princeton.edu/lectures/demo/23DemoPartitioning.pdf), I don't think our swap logic makes sense here especially in phase-1 of the algo. It does a 3 way partitioning and runs in 2 phases. In phase 1 (https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/IntroSelector.java#L100-L118), it creates 4 partitions - `[elements equal to pivot] [elements smaller than pivot][elements greater than pivot] [elements equal to pivot]`. Then in phase-2 (https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/IntroSelector.java#L119-L133) it rearranges these 4 partitions into 3 partitions by swapping elements from the left and right end to the middle, so it looks like - `[smaller elements] [equal elements] [greater element]`. All swaps performed in phase-1 is when the element is found equal to the pivot so that they can be swapped towards either left or the right end. Recomputing biases on swap is almost unnecessary in this phase since these elements will be swapped again towards the middle in phase-2. If we want to continue using IntroSelector, I propose that we should not perform any recomputation of gains in phase-1 of the algorithm, and only perform recomputation in phase-2 when swaps are being performed. For it, we need to distinguish `swap()` from both the phases of the algo. For `comparePivot()`, which is only used in phase-1 of the algo, I like your approach of rounding values - `int cmp = Float.compare(Math.round(pivot / iter * 2), Math.round(bias / iter * 2))`, so we can try it out. Let me know what do you think? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org