gsmiller commented on PR #12427: URL: https://github.com/apache/lucene/pull/12427#issuecomment-1631550554
Yeah, good points/questions! I'd be curious how much overhead it would actually add to sort the input when it's already sorted? But to take a step back for a moment, we also have automata building methods that accept a `BytesRefIterator` as well, which also must be sorted. This is a situation where we really cannot sort on behalf of the caller, so it might be a bit confusing/trappy to sort some flavors of this method but not others? Maybe it's best to leave these methods as they are? If we want to make these functions a bit more user-friendly, we could look at changing the `assert` on line 276 of `StringsToAutomaton` to throw an explicit `IllegalArgumentException` so that we don't silently built a corrupt automaton on unordered input (with asserts disabled). There _would_ add overhead since we have to now keep track of the previous term all the time, but maybe it's worth benchmarking and considering this change? I _do_ think it's best to explicitly let the user know they passed invalid input in a case like this, so it would be a nice change if it didn't introduce a significant performance drag. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org