Hi Erick,

Thanks for your inputs.

I think long before we had made a conscious decision to skip solrJ client
and use plain http. I think it might have been because at the time solrJ
client was queueing update in its memory or something.

But nonetheless, we will give the latest solrJ client + cloudSolrServer a
try.

* Yes, the documents are pretty small.
* We are using G1 collector and there are no major GCs, but however, there
are a lot of minor GCs sometimes going upto 2s per minute overall.
* We are allocating 12G of memory.
* Query rate: 3750 TPS (transactions per second)
* I need to get the exact rate for insert/updates.

I will make the solrJ client change first and give it a test.

Thanks
Vinay

On 3 May 2015 at 09:37, Erick Erickson <erickerick...@gmail.com> wrote:

> First, you shouldn't be using HttpSolrClient, use CloudSolrServer
> (CloudSolrClient in 5.x). That takes
> the ZK address and routes the docs to the leader, reducing the network
> hops docs have to go
> through. AFAIK, in cloud setups it is in every way superior to http.
>
> I'm guessing your docs aren't huge. You haven't really told us what
> "high indexing rates" and
> "high query rates" are in your environment, so it's hard to say much.
> For comparison I get
> 2-3K docs/sec on my laptop (no query load though).
>
> The most frequent problem for nodes going into recovery in this
> scenario is the ZK timeout
> being exceeded. This is often triggered by excessive GC pauses, some
> more details would
> help here:
>
> How much memory are you allocating to Solr? Have you turned on GC
> logging to see whether
> you're getting "stop the world" GC pauses? What rates _are_ you seeing?
>
> Personally, I'd concentrate on the nodes going into recovery before
> anything else. Until that's
> fixed any other things you do will not be predictive of much.
>
> BTW, I typically start with batch sizes of 1,000 FWIW. Sometimes
> that's too big, sometimes
> too small but it seems pretty reasonable most of the time.
>
> Best,
> Erick
>
> On Thu, Apr 30, 2015 at 12:20 PM, Vinay Pothnis <poth...@gmail.com> wrote:
> > Hello,
> >
> > I have a usecase with the following characteristics:
> >
> >  - High index update rate (adds/updates)
> >  - High query rate
> >  - Low index size (~800MB for 2.4Million docs)
> >  - The documents that are created at the high rate eventually "expire"
> and
> > are deleted regularly at half hour intervals
> >
> > I currently have a solr cloud set up with 1 shard and 4 replicas.
> >  * My index updates are sent to a VIP/loadbalancer (round robins to one
> of
> > the 4 solr nodes)
> >  * I am using http client to send the updates
> >  * Using batch size of 100 and 8 to 10 threads sending the batch of
> updates
> > to solr.
> >
> > When I try to run tests to scale out the indexing rate, I see the
> following:
> >  * solr nodes go into recovery
> >  * updates are taking really long to complete.
> >
> > As I understand, when a node receives an update:
> >  * If it is the leader, it forwards the update to all the replicas and
> > waits until it receives the reply from all of them before replying back
> to
> > the client that sent the reply.
> >  * If it is not the leader, it forwards the update to the leader, which
> > THEN does the above steps mentioned.
> >
> > How do I go about scaling the index updates:
> >  * As I add more replicas, my updates would get slower and slower?
> >  * Is there a way I can configure the leader to wait for say N out of M
> > replicas only?
> >  * Should I be targeting the updates to only the leader?
> >  * Any other approach i should be considering?
> >
> > Thanks
> > Vinay
>

Reply via email to