On Tue, 24 Mar 2026 18:45:39 GMT, Andy Goryachev <[email protected]> wrote:

>> `distinct()` uses a temporary HashSet to track the entries, which conveys 
>> all the overhead of the collection, hashing, and so on. 
>> 
>> I could test it later tonight, but I am virtually certain `distinct` will 
>> slow things down (not to mention additional memory usage and garbage 
>> collection).
>
>> But wouldn't we save some `BitSet` creations then?
> 
> `BitSet` is much, much better for this.  It's compact, and fast, and some 
> operations (like distinct and sort) are effectively free.  Even for 
> selections that span 1,000,000 items it needs something like 122 kB.

I benchmarked this the last hour, because while I think so too, can't be sure 
without Benchmarking it ;-)

Results look veryyy promising:


1_000_000 items, 2 duplicates each

Benchmark                  Mode  Cnt    Score     Error  Units
BitSetBenchmark.or        thrpt   25  514,423 ± 100,789  ops/s
BitSetBenchmark.distinct  thrpt   25  54,505  ±   1,056  ops/s

50_000 items, 2 duplicates each

Benchmark                  Mode  Cnt     Score    Error  Units
BitSetBenchmark.or        thrpt   25  7180,546 ± 169,226 ops/s
BitSetBenchmark.distinct  thrpt   25  1772,690 ±  46,704 ops/s

-------------

PR Review Comment: https://git.openjdk.org/jfx/pull/2100#discussion_r2983657166

Reply via email to