Re: size of synonyms.txt

Robert Muir Wed, 22 Jun 2011 09:24:16 -0700

On Wed, Jun 22, 2011 at 10:14 AM, Bernd Fehling
<bernd.fehl...@uni-bielefeld.de> wrote:
> While trying some synonyms.txt files I noticed a huge increase of heap
> usage.
>
> synonyms_1.txt --> 6645 lines (2826104 bytes in size)
> results in 66364 entries in SynonymMap with 730MB heap usage.
> Startup time about 2 minutes.
>
> synonyms_2.txt --> 6645 lines (5384884 bytes in size)
> results in 115168 entries in SynonymMap with 3.3GB heap usage.
> Startup time about 4 minutes.
>
>
> What is your size of synonyms.txt?
>
>
> Any limitations (e.g. file size, number of synonyms, ...)?
>
>
> How to deal with _really_ large numbers of synonyms?
>
>
> To the experts:
> Why not using synonyms from a file, just because memory is faster?
>


Hi,

I think we should look at implementing synonyms with an FST, to reduce
the ram usage.
I also think this would make it easier for us to minimize the number
of captureState/restoreState that it does,
because it would just be a more natural way to handle all the
multi-word cases... this could actually speed up the analysis time for
this filter.

Re: size of synonyms.txt

Reply via email to