mikemccand commented on PR #12633: URL: https://github.com/apache/lucene/pull/12633#issuecomment-1753705229
Translating/merging the above two tables into a graph:  Some observations: * The PR is mostly better at using less RAM to make the same size FST, yay! * It is a more smooth/predictable/monotonic tradeoff: the larger the `NodeHash` size, the smaller the FST. Whereas on `main`, using the god-like parameters, it's more dicy/spiky/unpredictable. It's like you are the co-pilot trying to land a 747 alone using only toothpicks. * At the "spend all the RAM necessary to get a truly minimal FST" end (the right of the chart) the PR looks like it uses a bit more RAM than `main`. I think I can improve on this by not wastefully using `long[]` but rather one of Lucene's many cool bit-packing dynamic/growable array thingys, like `main` does for its `NodeHash`. Or maybe @msokolov's idea to somehow do a reversed suffix lookup against the growing FST. I'll try that. * Bang for the buck tapers off like you'd expect: the early MB of RAM you spend has a bigger payoff in reducing the FST size, while later MB of RAM is less and less impact. This is nice 80/20 like behavior... * With the PR, you unfortunately cannot easily say "give me a minimal FST at all costs", like you can with `main` today. You'd have to keep trying larger and larger NodeHash sizes until the final FST size gets no smaller. I don't really like this regression -- I'll think about how to somehow keep that capability in the PR. E.g. we would want to use this option when compiling FSTs for Kuromoji, or users may want this when compiling synonym maps. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org