uschindler commented on PR #12135: URL: https://github.com/apache/lucene/pull/12135#issuecomment-1422922194
Actually instead of taking a collection or array of BytesRef, we could make the ctor take a `(Stream<BytesRef> stream)` and consume the stream by calling: `stream().sorted().distinct().doOurStuffByForEach()`. The cool trick is that streams have the markers for "distinct" and "sorted" internally. So `sorted()` and `distinct()` do nothing if the stream is already marked using those flags. If not, the method returns a new Stream which sorts and has the correct flags. You should always have it in order "first sort, then distinct" (see some comments in JDK issues, it is no longer mandatory for optimal perf but always better to first sort then dedup). The array ctor would just call Stream.of(array). This has no defined order and dedup, so the ctor taking the stream would sort for us. If we have a sorted list, we can trick the stream API to set the sorted and deduped flag with a helper method using [spliterator](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/Spliterators.html#spliterator(java.util.Iterator,long,int)) of correct size and correct flags. At end the ctor taking stream is plain simple and requests a sorted, distinct stream and it is up to the caller to have it presorted. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org