[
https://issues.apache.org/jira/browse/LUCENE-9286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17073959#comment-17073959
]
Dawid Weiss commented on LUCENE-9286:
-------------------------------------
Hi Bruno. Thank you for looking into it. The problem is not during construction
of the FST but later on - when the FST is used. In our algorithms we kept a
significant number of some arcs in memory. Previously they were cheap, now they
are not: arc.copyOf copies the entire underlying bit table:
bq. What was previously fairly cheap (copyOf) has become fairly heavy and blows
up memory when you have data structures that require storing intermediate Arcs
during processing
I didn't look into this but if these bit tables are immutable once the FST is
constructed then copyOf could just copy the reference. A side note is that
copyOf doesn't really fully reset the state of an arc (clear bit table
reference if the copied arc doesn't have the bit table, for example).
> FST construction explodes memory in BitTable
> --------------------------------------------
>
> Key: LUCENE-9286
> URL: https://issues.apache.org/jira/browse/LUCENE-9286
> Project: Lucene - Core
> Issue Type: Bug
> Affects Versions: 8.5
> Reporter: Dawid Weiss
> Assignee: Bruno Roustant
> Priority: Major
> Attachments: screen-[1].png
>
>
> I see a dramatic increase in the amount of memory required for construction
> of (arguably large) automata. It currently OOMs with 8GB of memory consumed
> for bit tables. I am pretty sure this didn't require so much memory before
> (the automaton is ~50MB after construction).
> Something bad happened in between. Thoughts, [~broustant], [~sokolov]?
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]