gsmiller commented on PR #12427: URL: https://github.com/apache/lucene/pull/12427#issuecomment-1629757874
Thanks @shubhamvishu for working on this! As I look at the PR, I'm wondering if accepting a `List` is really the proper thing to do here. If users already have a sorted `Collection` (like `SortedSet`), are we forcing them to make unnecessary copies of their data to adhere to our API? I wonder if the ctor for `TermInSetQuery` might serve as a reasonable example of handling sortedness? `TermInSetQuery` accepts a `Collection` but tries to see if it's already sorted (and de-duped). If it is, it uses it as-is. If not, it sorts and de-dupes on behalf of the user. The automaton building logic right now only has a simple `assert` down in the internals, which could be disabled depending on the runtime environment, resulting in tricky debugging if the user didn't handle sorting their data properly. Since `StringsToAutomaton` is an internal class, maybe one approach would be to leave it untouched, letting it expect pre-sorted data and having a simple `assert` statement as it does today to check that (for easier debugging), but add sorting logic to `Automata#makeStringUnion`? What if that method accepted a `Collection` as it does today but had similar logic as the `TermInSetQuery` ctor related to sorting? Apologies for leading us in the wrong direction in the issue I opened. I think a `List` is a bit more semantically appropriate than a `Collection`, but I'm not sure it's best from a pragmatic standpoint? What do you think? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org