uschindler commented on PR #12135:
URL: https://github.com/apache/lucene/pull/12135#issuecomment-1422922194

   Actually instead of taking a collection or array of BytesRef, we could make 
the ctor take a `(Stream<BytesRef> stream)` and consume the stream by calling: 
`stream().sorted().distinct().doOurStuffByForEach()`.
   
   The cool trick is that streams have the markers for "distinct" and "sorted" 
internally. So `sorted()` and `distinct()` do nothing if the stream is already 
marked using those flags. If not, the method returns a new Stream which sorts 
and has the correct flags. You should always have it in order "first sort, then 
distinct" (see some comments in JDK issues, it is no longer mandatory for 
optimal perf but always better to first sort then dedup).
   
   The array ctor would just call Stream.of(array). This has no defined order 
and dedup, so the ctor taking the stream would sort for us. If we have a sorted 
list, we can trick the stream API to set the sorted and deduped flag with a 
helper method using 
[spliterator](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/Spliterators.html#spliterator(java.util.Iterator,long,int))
 of correct size and correct flags.
   
   At end the ctor taking stream is plain simple and requests a sorted, 
distinct stream and it is up to the caller to have it presorted.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to