mikemccand commented on issue #12957: URL: https://github.com/apache/lucene/issues/12957#issuecomment-1864448812
> > Terms.intersect(Automaton a, BytesRef startTerm) requires that startTerm is accepted by the incoming automaton, yet the way CheckIndex is calling it can clearly violate that. > > I wondered about that, but the automaton is `Automata.makeAnyBinary()`, shouldn't it accept any term? Oh, you're right! I missed that `Automata.makeAnyBinary()` there! > Oh I see, I created binary automata, but the API implicitly treats automata as UTF32 automata, so you need to tell it explicitly that it's a binary automaton. And something like that should fix the problem? Oh, you are also right! Specifically `CompiledAutomaton` assumes it's UTF32 and needs conversion to UTF8, unless you pass `isBinar=true`. OK I like your fix! I'll confirm it fixes the `DirectPostingsFormat` failure too. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org