Re: Solr on HDFS: Streaming API performance tuning

Joel Bernstein Fri, 16 Dec 2016 17:44:21 -0800

The Streaming API may have been throwing exceptions because the JSON
special characters were not escaped. This was fixed in Solr 6.0.







Joel Bernstein
http://joelsolr.blogspot.com/

On Fri, Dec 16, 2016 at 4:34 PM, Chetas Joshi <chetas.jo...@gmail.com>
wrote:

> Hello,
>
> I am running Solr 5.5.0.
> It is a solrCloud of 50 nodes and I have the following config for all the
> collections.
> maxShardsperNode: 1
> replicationFactor: 1
>
> I was using Streaming API to get back results from Solr. It worked fine for
> a while until the index data size reached beyond 40 GB per shard (i.e. per
> node). It started throwing JSON parsing exceptions while reading the
> TupleStream data. FYI: I have other services (Yarn, Spark) deployed on the
> same boxes on which Solr shards are running. Spark jobs also use a lot of
> disk cache. So, the free available disk cache on the boxes vary a
> lot depending upon what else is running on the box.
>
> Due to this issue, I moved to using the cursor approach and it works fine
> but as we all know it is way slower than the streaming approach.
>
> Currently the index size per shard is 80GB (The machine has 512 GB of RAM
> and being used by different services/programs: heap/off-heap and the disk
> cache requirements).
>
> When I have enough RAM (more than 80 GB so that all the index data could
> fit in memory) available on the machine, the streaming API succeeds without
> running into any exceptions.
>
> Question:
> How different the index data caching mechanism (for HDFS) is for the
> Streaming API from the cursorMark approach?
> Why cursor works every time but streaming works only when there is a lot of
> free disk cache?
>
> Thank you.
>

Re: Solr on HDFS: Streaming API performance tuning

Reply via email to