Well, you can always throw more replicas at the problem as well.

But Andrea's comment is spot on. When Solr stores a field, it
compresses it. So to fetch the stored info, it has to:
1> seek the disk
2> decompress at minimum 16K
3> assemble the response.

All the while perhaps causing memory to be consumed, adding to GC
issues and the like.

One possibility is implement an doc transformer. See the class
ValueAugmenterFactory for a model. What that does is, for each doc
returned in the result set, call the transform method.

Another approach would be to only index the first, say, 1K characters
and just return _that_, along with a link for the full doc that you
get from another store. Or, indeed from Solr itself since that would
only be one doc at a time. If you put this in as a string type with
docValues=true you would avoid most of the disk seek/decompression
issues.

Best,
Erick

On Mon, Jun 4, 2018 at 12:27 PM, Andrea Gazzarini <a.gazzar...@sease.io> wrote:
> Hi Sam, I have been in a similar scenario (not recently so my answer could
> be outdated). As far as I remember caching, at least in that scenario,
> didn't help so much, probably because the field size.
>
> So we went with the second option: a custom SearchComponent connected with
> Redis. I'm not aware if such component is available somewhere but, trust
> me, it's a very easy thing to write.
>
> Best,
> Andrea
>
> On Mon, 4 Jun 2018, 20:45 Sambhav Kothari, <samb...@metabrainz.org> wrote:
>
>> Hi everyone,
>>
>> We at MetaBrainz are trying to scale our solr cloud instance but are
>> hitting a bottle-neck.
>>
>> Each of the documents in our solr index is accompanied by a '_store' field
>> that store our API compatible response for that document (which is
>> basically parsed and displayed by our custom response writer).
>>
>> The main problem is that this field is very large (It takes upto 60-70% of
>> our index) and because of this, Solr is struggling to keep up with our
>> required reqs/s.
>>
>> Any ideas on how to improve upon this?
>>
>> I have a couple of options in mind -
>>
>> 1. Use caches extensively.
>> 2. Have solr return only a doc id and fetch the response string from a KV
>> store/fast db.
>>
>> About 2 - are there any solr plugins will allow me to do this?
>>
>> Thanks,
>> Sam
>>

Reply via email to