gsmiller commented on PR #12427:
URL: https://github.com/apache/lucene/pull/12427#issuecomment-1629757874

   Thanks @shubhamvishu for working on this! As I look at the PR, I'm wondering 
if accepting a `List` is really the proper thing to do here. If users already 
have a sorted `Collection` (like `SortedSet`), are we forcing them to make 
unnecessary copies of their data to adhere to our API?
   
   I wonder if the ctor for `TermInSetQuery` might serve as a reasonable 
example of handling sortedness? `TermInSetQuery` accepts a `Collection` but 
tries to see if it's already sorted (and de-duped). If it is, it uses it as-is. 
If not, it sorts and de-dupes on behalf of the user. The automaton building 
logic right now only has a simple `assert` down in the internals, which could 
be disabled depending on the runtime environment, resulting in tricky debugging 
if the user didn't handle sorting their data properly.
   
   Since `StringsToAutomaton` is an internal class, maybe one approach would be 
to leave it untouched, letting it expect pre-sorted data and having a simple 
`assert` statement as it does today to check that (for easier debugging), but 
add sorting logic to `Automata#makeStringUnion`? What if that method accepted a 
`Collection` as it does today but had similar logic as the `TermInSetQuery` 
ctor related to sorting?
   
   Apologies for leading us in the wrong direction in the issue I opened. I 
think a `List` is a bit more semantically appropriate than a `Collection`, but 
I'm not sure it's best from a pragmatic standpoint? What do you think?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to