On Tue, 24 Mar 2026 18:45:39 GMT, Andy Goryachev <[email protected]> wrote:
>> `distinct()` uses a temporary HashSet to track the entries, which conveys >> all the overhead of the collection, hashing, and so on. >> >> I could test it later tonight, but I am virtually certain `distinct` will >> slow things down (not to mention additional memory usage and garbage >> collection). > >> But wouldn't we save some `BitSet` creations then? > > `BitSet` is much, much better for this. It's compact, and fast, and some > operations (like distinct and sort) are effectively free. Even for > selections that span 1,000,000 items it needs something like 122 kB. I benchmarked this the last hour, because while I think so too, can't be sure without Benchmarking it ;-) Results look veryyy promising: 1_000_000 items, 2 duplicates each Benchmark Mode Cnt Score Error Units BitSetBenchmark.or thrpt 25 514,423 ± 100,789 ops/s BitSetBenchmark.distinct thrpt 25 54,505 ± 1,056 ops/s 50_000 items, 2 duplicates each Benchmark Mode Cnt Score Error Units BitSetBenchmark.or thrpt 25 7180,546 ± 169,226 ops/s BitSetBenchmark.distinct thrpt 25 1772,690 ± 46,704 ops/s ------------- PR Review Comment: https://git.openjdk.org/jfx/pull/2100#discussion_r2983657166
