Re: Synonym filters memory usage

Erick Erickson Mon, 30 Sep 2019 04:38:41 -0700

Solr/Lucene _better_ not have a copy of the synonym map for every segment, if 
so it’s a JIRA for sure. I’ve seen indexes with 100s of segments. With a large 
synonym file it’d be terrible.


I would be really, really, really surprised if this is the case. The Lucene 
people are very careful with memory usage and would hop on this in an instant 
if true I’d guess.

Best,
Erick

> On Sep 30, 2019, at 5:27 AM, Andrea Gazzarini <a.gazzar...@sease.io> wrote:
> 
> That sounds really strange to me. 
> Segments are created gradually depending on changes applied to the index, 
> while the Schema should have a completely different lifecycle, independent 
> from that.
> If that is true, that would mean each time a new segment is created Solr 
> would instantiate a new Schema instance (or at least, assuming this is valid 
> only for synonyms, one SynonymFilterFactory, one SynonymFilter, one 
> SynonymMap), which again, sounds really strange.
> 
> Thanks for the point, I'll check and I'll let you know
> 
> Cheers, 
> Andrea
> 
> On 30/09/2019 09:58, Bernd Fehling wrote:
>> Yes, I think so. 
>> While integrating a Thesaurus as synonyms.txt I saw massive memory usage. 
>> A heap dump and analysis with MemoryAnalyzer pointed out that the 
>> SynonymMap took 3 times a huge amount of memory, together with each 
>> opened index segment. 
>> Just try it and check that by yourself with heap dump and MemoryAnalyzer. 
>> 
>> Regards 
>> Bernd 
>> 
>> 
>> Am 30.09.19 um 09:44 schrieb Andrea Gazzarini: 
>>> mmm, ok for the core but are you sure things in this case are working 
>>> per-segment? I would expect a FilterFactory instance per index, initialized 
>>> at schema loading time. 
>>> 
>>> On 30/09/2019 09:04, Bernd Fehling wrote: 
>>>> And I think this is per core per index segment. 
>>>> 
>>>> 2 cores per instance, each core with 3 index segments, sums up to 6 times 
>>>> the 2 SynonymMaps. Results in 12 times SynonymMaps. 
>>>> 
>>>> Regards 
>>>> Bernd 
>>>> 
>>>> 
>>>> Am 30.09.19 um 08:41 schrieb Andrea Gazzarini: 
>>>>>   Hi, 
>>>>> looking at the stateful nature of SynonymGraphFilter/FilterFactory 
>>>>> classes, 
>>>>> the answer should be 2 times (one time per type instance). 
>>>>> The SynonymMap, which internally holds the synonyms table, is a private 
>>>>> member of the filter factory and it is loaded each time the factory needs 
>>>>> to create a type. 
>>>>> 
>>>>> Best, 
>>>>> Andrea 
>>>>> 
>>>>> On 29/09/2019 23:49, Dominique Bejean wrote: 
>>>>> 
>>>>> Hi, 
>>>>> 
>>>>> My concern is about memory used by synonym filter, especially if synonyms 
>>>>> resources files are large. 
>>>>> 
>>>>> If in my schema, there are two field types "TypeSyno1" and "TypeSyno2" 
>>>>> using synonym filter with the same synonyms files. 
>>>>> For each of these two field types there are two fields 
>>>>> 
>>>>> Field1 type is TypeSyno1 
>>>>> Field2 type is TypeSyno1 
>>>>> Field3 type is TypeSyno2 
>>>>> Field4 type is TypeSyno2 
>>>>> 
>>>>> How many times is the synonym file loaded in memory ? 
>>>>> 4 times, so one time per field ? 
>>>>> 2 times, so one time per instanciated type ? 
>>>>> 
>>>>> Regards 
>>>>> 
>>>>> Dominique 
>>> 
> 
> -- 
> Andrea Gazzarini
> Search Consultant, R&D Software Engineer
> 
> 
> 
> mobile: +39 349 513 86 25
> email: a.gazzar...@sease.io 
>

Re: Synonym filters memory usage

Reply via email to