On Wed, Jun 22, 2011 at 10:14 AM, Bernd Fehling <bernd.fehl...@uni-bielefeld.de> wrote: > While trying some synonyms.txt files I noticed a huge increase of heap > usage. > > synonyms_1.txt --> 6645 lines (2826104 bytes in size) > results in 66364 entries in SynonymMap with 730MB heap usage. > Startup time about 2 minutes. > > synonyms_2.txt --> 6645 lines (5384884 bytes in size) > results in 115168 entries in SynonymMap with 3.3GB heap usage. > Startup time about 4 minutes. > > > What is your size of synonyms.txt? > > > Any limitations (e.g. file size, number of synonyms, ...)? > > > How to deal with _really_ large numbers of synonyms? > > > To the experts: > Why not using synonyms from a file, just because memory is faster? >
Hi, I think we should look at implementing synonyms with an FST, to reduce the ram usage. I also think this would make it easier for us to minimize the number of captureState/restoreState that it does, because it would just be a more natural way to handle all the multi-word cases... this could actually speed up the analysis time for this filter.