mikemccand commented on issue #12957:
URL: https://github.com/apache/lucene/issues/12957#issuecomment-1864448812

   > > Terms.intersect(Automaton a, BytesRef startTerm) requires that startTerm 
is accepted by the incoming automaton, yet the way CheckIndex is calling it can 
clearly violate that.
   > 
   > I wondered about that, but the automaton is `Automata.makeAnyBinary()`, 
shouldn't it accept any term?
   
   Oh, you're right!  I missed that `Automata.makeAnyBinary()` there!
   
   > Oh I see, I created binary automata, but the API implicitly treats 
automata as UTF32 automata, so you need to tell it explicitly that it's a 
binary automaton. And something like that should fix the problem?
   
   Oh, you are also right!  Specifically `CompiledAutomaton` assumes it's UTF32 
and needs conversion to UTF8, unless you pass `isBinar=true`.  OK I like your 
fix!  I'll confirm it fixes the `DirectPostingsFormat` failure too.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to