[GitHub] [lucene] dweiss commented on issue #12542: Lucene's FST Builder should have a simpler "knob" to trade off memory/CPU required against minimality

via GitHub Fri, 08 Sep 2023 06:50:39 -0700


dweiss commented on issue #12542:
URL: https://github.com/apache/lucene/issues/12542#issuecomment-1711706053


   With regard to automata/ FSTs - they're nearly the same thing, conceptually. 
Automata are logically transducers producing a constant epsilon value (no 
value). This knowledge can be used to make them smaller but they're the same 
animal, really.
   
   The root of the automaton/fst difference in Lucene is historically the 
source of code: brics package for constructing arbitrary automata, Daciuk/Mihov 
algorithm implementing minimal (at least at first) FST construction directly 
from sorted data (no intermediate "unoptimized" transitions).
   
   > Non-minimal FST means the index is a wee bit bigger, and perhaps lookups 
through the FST are a bit slower since we must have more bytes hot / fewer 
bytes cache local. But it's likely these effects are miniscule relative to the 
RAM savings during construction.
   
   If we allow non-optimal "transition graphs" then in theory we could also 
build FSTs in parallel: just build them independently from different prefix 
blocks. Yes, it wouldn't be optimal, but it if we don't care then so be it.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] dweiss commented on issue #12542: Lucene's FST Builder should have a simpler "knob" to trade off memory/CPU required against minimality

Reply via email to