[ https://issues.apache.org/jira/browse/LUCENE-9286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17078075#comment-17078075 ]
Dawid Weiss commented on LUCENE-9286: ------------------------------------- I love the patch and the idea, Bruno. But the cost of construction and traversal have gone high up for me. On a slower-ish dell laptop I get the following on master (same code as in that tiny repo before): {code} FST construction (of=0.8) 2s 299ms 9.4% 11s TermEnum scan (of=0.8) 391ms 1.6% 14s FST construction (of=1.0) 5s 24.0% 14s TermEnum scan (of=1.0) 3s 938ms 16.1% 20s {code} whereas after the patch compilation and enumeration goes up to a whopping 35+ seconds! {code} FST construction (of=0.8) 2s 357ms 2.6% 13s TermEnum scan (of=0.8) 457ms 0.5% 16s FST construction (of=1.0) 37s 41.8% 16s TermEnum scan (of=1.0) 35s 39.6% 54s {code} This is a bit strange, isn't it? I don't think I've changed anything regarding fst parameters: https://github.com/dweiss/lucene9286/commit/1fb899e018712e9637984d95937c50b4bc9ffa97#diff-32e7e4311056421eef9f3a87a4ec51f7R56-R66 I'm not sure where the difference comes from and why it's so slow -- I'll try to get an execution profile later today. > FST arc.copyOf clones BitTables and this can lead to excessive memory use > ------------------------------------------------------------------------- > > Key: LUCENE-9286 > URL: https://issues.apache.org/jira/browse/LUCENE-9286 > Project: Lucene - Core > Issue Type: Bug > Affects Versions: 8.5 > Reporter: Dawid Weiss > Assignee: Bruno Roustant > Priority: Major > Attachments: screen-[1].png > > Time Spent: 0.5h > Remaining Estimate: 0h > > I see a dramatic increase in the amount of memory required for construction > of (arguably large) automata. It currently OOMs with 8GB of memory consumed > for bit tables. I am pretty sure this didn't require so much memory before > (the automaton is ~50MB after construction). > Something bad happened in between. Thoughts, [~broustant], [~sokolov]? -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org