dweiss commented on issue #12542: URL: https://github.com/apache/lucene/issues/12542#issuecomment-1711706053
With regard to automata/ FSTs - they're nearly the same thing, conceptually. Automata are logically transducers producing a constant epsilon value (no value). This knowledge can be used to make them smaller but they're the same animal, really. The root of the automaton/fst difference in Lucene is historically the source of code: brics package for constructing arbitrary automata, Daciuk/Mihov algorithm implementing minimal (at least at first) FST construction directly from sorted data (no intermediate "unoptimized" transitions). > Non-minimal FST means the index is a wee bit bigger, and perhaps lookups through the FST are a bit slower since we must have more bytes hot / fewer bytes cache local. But it's likely these effects are miniscule relative to the RAM savings during construction. If we allow non-optimal "transition graphs" then in theory we could also build FSTs in parallel: just build them independently from different prefix blocks. Yes, it wouldn't be optimal, but it if we don't care then so be it. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org