uschindler commented on code in PR #13052: URL: https://github.com/apache/lucene/pull/13052#discussion_r1470981973
########## lucene/core/src/java/org/apache/lucene/index/UpgradeIndexMergePolicy.java: ########## @@ -106,7 +106,11 @@ public MergeSpecification findForcedMerges( // the resulting set contains all segments that are left over // and will be merged to one additional segment: for (final OneMerge om : spec.merges) { - oldSegments.keySet().removeAll(om.segments); + // om.segments.forEach(::remove) is used here instead of oldSegments.keySet().removeAll() + // for performance reasons; om.segments size is always greater than oldSegments size, and Review Comment: New comment looks much better. To conclude, because I was also on wrong trip yesterday late evening: The code `oldSegments.keySet().removeAll(om.segments);` behaves like the following according to the docs: > This implementation determines which is the smaller of this set and the specified collection, by invoking the size method on each. If this set has fewer elements, then the implementation iterates over this set, checking each element returned by the iterator in turn to see if it is contained in the specified collection. If it is so contained, it is removed from this set with the iterator's remove method. If the specified collection has fewer elements, then the implementation iterates over the specified collection, removing from this set each element returned by the iterator, using this set's remove method. Translated to the example: If `oldSegments.keySet()` set has fewer elements, then the implementation iterates over `oldSegments.keySet()` set, checking each element returned by the iterator in turn to see if it is contained in `om.segments`. If it is so contained, it is removed from this set with the iterator's remove method. **If `om.segments` has fewer elements, then the implementation iterates over `om.segments`, removing from this set each element returned by the iterator, using this set's remove method.** So we would like to see the second (bold) part to be executed. Unfortunetely the spec only talks about "fewer", so we have no idea what happens on the boundary! Let's check here: https://github.com/openjdk/jdk/blob/fd8adf308357355bd33916ad80e2328c35434e5a/src/java.base/share/classes/java/util/AbstractSet.java#L166-L182 On the boundary, unfortunately it executes the slow code (contains in list, lower in the above code part). So the problem only exists if both collections have equal size. Nevertheless I will open a bug report. It should only prefer the shorter collection if both are sets. Changing this would be fine in later JDK releases, because the spec of `Set` does not specify how the method should behave. The code has more problems, because the symmetry of `contains()` for both sides is not always guaranteed (e.g. for TreeSets with case insensitive order). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org