[ 
https://issues.apache.org/jira/browse/LUCENE-9212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044009#comment-17044009
 ] 

David Smiley commented on LUCENE-9212:
--------------------------------------

>  Automatons can be defined in both binary and unicode space, and there's no 
>way of telling which it is when it comes to compiling them

Isn't that a problem with our API -- more of a root cause?  I've been bitten by 
the un-typed nature of byte vs char automatons.

> Intervals.multiterm() should take a CompiledAutomaton
> -----------------------------------------------------
>
>                 Key: LUCENE-9212
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9212
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Alan Woodward
>            Assignee: Alan Woodward
>            Priority: Major
>             Fix For: 8.5
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> LUCENE-9028 added a `multiterm` factory method for intervals that accepts an 
> arbitrary Automaton, and converts it internally into a CompiledAutomaton.  
> This isn't necessarily correct behaviour, however, because Automatons can be 
> defined in both binary and unicode space, and there's no way of telling which 
> it is when it comes to compiling them.  In particular, for automatons 
> produced by FuzzyTermsEnum, we need to convert them to unicode before 
> compilation.
> The `multiterm` factory should just take `CompiledAutomaton` directly, and we 
> should deprecate the methods that take `Automaton` and remove in master.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to