[ https://issues.apache.org/jira/browse/LUCENE-9212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044009#comment-17044009 ]
David Smiley commented on LUCENE-9212: -------------------------------------- > Automatons can be defined in both binary and unicode space, and there's no >way of telling which it is when it comes to compiling them Isn't that a problem with our API -- more of a root cause? I've been bitten by the un-typed nature of byte vs char automatons. > Intervals.multiterm() should take a CompiledAutomaton > ----------------------------------------------------- > > Key: LUCENE-9212 > URL: https://issues.apache.org/jira/browse/LUCENE-9212 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Alan Woodward > Assignee: Alan Woodward > Priority: Major > Fix For: 8.5 > > Time Spent: 0.5h > Remaining Estimate: 0h > > LUCENE-9028 added a `multiterm` factory method for intervals that accepts an > arbitrary Automaton, and converts it internally into a CompiledAutomaton. > This isn't necessarily correct behaviour, however, because Automatons can be > defined in both binary and unicode space, and there's no way of telling which > it is when it comes to compiling them. In particular, for automatons > produced by FuzzyTermsEnum, we need to convert them to unicode before > compilation. > The `multiterm` factory should just take `CompiledAutomaton` directly, and we > should deprecate the methods that take `Automaton` and remove in master. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org