[ 
https://issues.apache.org/jira/browse/LUCENE-9286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17074612#comment-17074612
 ] 

Bruno Roustant edited comment on LUCENE-9286 at 4/3/20, 2:12 PM:
-----------------------------------------------------------------

Ok, now I understand better.

I spotted the issue for the perf slowdown for FSTEnum. This is specific to 
FSTEnum with direct addressing node. I'll try to fix that.

For the memory issue it is indeed only when calling FST.Arc.copyFrom(). And the 
FST you provided is a worst case for it: the root node is direct-addressing 
with 500K arcs so 62K bits in the table (7.8K longs).

Calls to FSTEnum to navigate or seek never call FST.Arc.copyFrom(), so there is 
no memory issue. copyFrom() is called by fst.Util methods.

[~dweiss] if we had two different methods, one named copyFrom() that would deep 
copy the BitTable, and another immutableCopyFrom() that would be an immutable 
Arc and share a ref of BitTable. The immutable Arc copy could not be passed as 
modifiable param of several FST methods that modify in-place the Arc. The 
immutable Arc would have to be deep copied first.
 Would it fit your use-case? How do you use the Arc copies in your code?

I'm looking/debugging at the code in fst.Util to see if we could avoid deep 
copy and use shallow immutable copies sometime. I'd like to evaluate the risk 
of having an OOM with this code currently.


was (Author: broustant):
Ok, now I understand better.

I spotted the issue for the perf slowdown for FSTEnum. This is specific to 
FSTEnum with direct addressing node. I'll try to fix that.

For the memory issue it is indeed only when calling FST.Arc.copyFrom(). And the 
FST you provided is a worst case for it: the root node is direct-addressing 
with 500K arcs so 62K bits in the table (7.8K longs).

Calls to FSTEnum to navigate or seek never call FST.Arc.copyFrom(), so there is 
no memory issue.
copyFrom() is called by fst.Util methods.

[~dweiss] if we had two different methods, one named copyFrom() that would deep 
copy the BitTable, and another immutableCopyFrom() that would be an immutable 
Arc and share a ref of BitTable. The immutable Arc copy could not be passed as 
modifiable param of several FST methods that modify in-place the Arc. The 
immutable Arc would have to be deep copied first.
Would it fit your use-case? How do you use the Arc copies in your code?

I'm looking/debugging at the code in fst.Util to see if we could avoid deep 
copy and use shallow immutable copies sometime. I'd like to evaluate the risk 
of having an OOM with this code currently.

> FST construction explodes memory in BitTable
> --------------------------------------------
>
>                 Key: LUCENE-9286
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9286
>             Project: Lucene - Core
>          Issue Type: Bug
>    Affects Versions: 8.5
>            Reporter: Dawid Weiss
>            Assignee: Bruno Roustant
>            Priority: Major
>         Attachments: screen-[1].png
>
>
> I see a dramatic increase in the amount of memory required for construction 
> of (arguably large) automata. It currently OOMs with 8GB of memory consumed 
> for bit tables. I am pretty sure this didn't require so much memory before 
> (the automaton is ~50MB after construction).
> Something bad happened in between. Thoughts, [~broustant], [~sokolov]?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to