[ https://issues.apache.org/jira/browse/LUCENE-9286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17064815#comment-17064815 ]
Dawid Weiss commented on LUCENE-9286: ------------------------------------- I confirm my original problem (memory blowup) is related to stored copies of arcs. What was previously fairly cheap (copyOf) has become fairly heavy and blows up memory when you have data structures that require storing intermediate Arcs during processing. I also noticed something else that worries me. We do have very specific FSTs that are shallow (4-8 levels) but have a very high fan-out on arc labels (labels are ints), I don't know if this is related anyhow but when I timed automaton construction and traversals I see a significant slowdown. I created a snippet of code that rebuilds the automaton and does a TermEnum enumeration scan with IntsRefFSTEnum; the "Arc transition" entry below is a bit more complex code walking the FST. With the default oversizing factor (1) the results are: {code} [Task] [Time] [%] [+T₀] FST construction 7s 42.3% 0ms @ FST RAM: [52.40MB allocated, 52.40MB utilized (100.0 %)] @ Oversizing factor: 1.00 TermEnum scan 4s 260ms 25.1% 7s Arc transition 5s 32.6% 11s {code} Recompiled with the oversizing factor of 0 the results are: {code} [Task] [Time] [%] [+T₀] FST construction 2s 957ms 60.1% 0ms @ FST RAM: [53.46MB allocated, 53.46MB utilized (100.0 %)] @ Oversizing factor: 0.00 TermEnum scan 298ms 6.1% 2s Arc transition 1s 663ms 33.8% 3s {code} This is fairly consistent across runs. The automaton is consistently faster to create and walk if setDirectAddressingMaxOversizingFactor is set to 0. The automaton is also not much larger (53.46MB compared to 52.4MB). I don't know how specific this is to the kind of automata we're building and I can't offer much in terms of improving this situation. I can share the automaton if you guys would like to take a closer look. One other lesson from dealing with FST code is that mutable Arc classes make everything much more complex and error-prone... I don't know what the performance penalty would be for resigning from mutability here but it'd definitely help in tracking odd cases like this one. > FST construction explodes memory in BitTable > -------------------------------------------- > > Key: LUCENE-9286 > URL: https://issues.apache.org/jira/browse/LUCENE-9286 > Project: Lucene - Core > Issue Type: Bug > Affects Versions: 8.5 > Reporter: Dawid Weiss > Assignee: Dawid Weiss > Priority: Major > Attachments: screen-[1].png > > > I see a dramatic increase in the amount of memory required for construction > of (arguably large) automata. It currently OOMs with 8GB of memory consumed > for bit tables. I am pretty sure this didn't require so much memory before > (the automaton is ~50MB after construction). > Something bad happened in between. Thoughts, [~broustant], [~sokolov]? -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org