Re: size of synonyms.txt

Bernd Fehling Wed, 22 Jun 2011 11:04:49 -0700

> On Wed, Jun 22, 2011 at 10:14 AM, Bernd Fehling
> <[email protected]> wrote:
> > While trying some synonyms.txt files I noticed a huge increase 
> of heap
> > usage.
> >
> > synonyms_1.txt --> 6645 lines (2826104 bytes in size)
> > results in 66364 entries in SynonymMap with 730MB heap usage.
> > Startup time about 2 minutes.
> >
> > synonyms_2.txt --> 6645 lines (5384884 bytes in size)
> > results in 115168 entries in SynonymMap with 3.3GB heap usage.
> > Startup time about 4 minutes.
> >
> >
> > What is your size of synonyms.txt?
> >
> >
> > Any limitations (e.g. file size, number of synonyms, ...)?
> >
> >
> > How to deal with _really_ large numbers of synonyms?
> >
> >
> > To the experts:
> > Why not using synonyms from a file, just because memory is faster?
> >
> 
> Hi,
> 
> I think we should look at implementing synonyms with an FST, to reduce
> the ram usage.
> I also think this would make it easier for us to minimize the number
> of captureState/restoreState that it does,
> because it would just be a more natural way to handle all the
> multi-word cases... this could actually speed up the analysis 
> time for
> this filter.


Wow you can read between the lines ;-)
Exactly what I have on my mind.

Re: size of synonyms.txt

Reply via email to