[ https://issues.apache.org/jira/browse/LUCENE-9286?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17141249#comment-17141249 ]
Tomoko Uchida commented on LUCENE-9286: --------------------------------------- Thanks Robert and Mike for your comments, bq. To get the benchmark to cover JapaneseAnalyzer (and the other CJK analyzers too, maybe?) we'd need to incorporate some documents that include text in ideographic scripts. I can work for preparing the corpus but I'm unusually busy for a while here; maybe I can start it next month... > FST arc.copyOf clones BitTables and this can lead to excessive memory use > ------------------------------------------------------------------------- > > Key: LUCENE-9286 > URL: https://issues.apache.org/jira/browse/LUCENE-9286 > Project: Lucene - Core > Issue Type: Bug > Affects Versions: 8.5 > Reporter: Dawid Weiss > Assignee: Bruno Roustant > Priority: Major > Fix For: 8.6 > > Attachments: screen-[1].png > > Time Spent: 1h 50m > Remaining Estimate: 0h > > I see a dramatic increase in the amount of memory required for construction > of (arguably large) automata. It currently OOMs with 8GB of memory consumed > for bit tables. I am pretty sure this didn't require so much memory before > (the automaton is ~50MB after construction). > Something bad happened in between. Thoughts, [~broustant], [~sokolov]? -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org