Re: [PR] Change `set.removeAll(list)` to `list.forEach(set::remove)` [lucene]

via GitHub Tue, 30 Jan 2024 02:38:03 -0800


uschindler commented on code in PR #13052:
URL: https://github.com/apache/lucene/pull/13052#discussion_r1470981973



##########
lucene/core/src/java/org/apache/lucene/index/UpgradeIndexMergePolicy.java:
##########
@@ -106,7 +106,11 @@ public MergeSpecification findForcedMerges(
       // the resulting set contains all segments that are left over
       // and will be merged to one additional segment:
       for (final OneMerge om : spec.merges) {
-        oldSegments.keySet().removeAll(om.segments);
+        // om.segments.forEach(::remove) is used here instead of 
oldSegments.keySet().removeAll()
+        // for performance reasons; om.segments size is always greater than 
oldSegments size, and

Review Comment:
   New comment looks much better. To conclude, because I was also on wrong trip 
yesterday late evening:
   
   The code `oldSegments.keySet().removeAll(om.segments);` behaves like the 
following according to the docs:
   
   > This implementation determines which is the smaller of this set and the 
specified collection, by invoking the size method on each. If this set has 
fewer elements, then the implementation iterates over this set, checking each 
element returned by the iterator in turn to see if it is contained in the 
specified collection. If it is so contained, it is removed from this set with 
the iterator's remove method. If the specified collection has fewer elements, 
then the implementation iterates over the specified collection, removing from 
this set each element returned by the iterator, using this set's remove method.
   
   Translated to the example: If `oldSegments.keySet()` set has fewer elements, 
then the implementation iterates over `oldSegments.keySet()` set, checking each 
element returned by the iterator in turn to see if it is contained in 
`om.segments`. If it is so contained, it is removed from this set with the 
iterator's remove method. **If `om.segments` has fewer elements, then the 
implementation iterates over `om.segments`, removing from this set each element 
returned by the iterator, using this set's remove method.**
   
   So we would like to see the second (bold) part to be executed.
   
   Unfortunetely the spec only talks about "fewer", so we have no idea what 
happens on the boundary! Let's check here: 
https://github.com/openjdk/jdk/blob/fd8adf308357355bd33916ad80e2328c35434e5a/src/java.base/share/classes/java/util/AbstractSet.java#L166-L182
   
   On the boundary, unfortunately it executes the slow code (contains in list, 
lower in the above code part).
   
   So the problem only exists if both collections have equal size. Nevertheless 
I will open a bug report. It should only prefer the shorter collection if both 
are sets.
   
   Changing this would be fine in later JDK releases, because the spec of `Set` 
does not specify how the method should behave. The code has more problems, 
because the symmetry of `contains()` for both sides is not always guaranteed 
(e.g. for TreeSets with case insensitive order).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Change `set.removeAll(list)` to `list.forEach(set::remove)` [lucene]

Reply via email to