Hi Erick, Thanks for your inputs.
I think long before we had made a conscious decision to skip solrJ client and use plain http. I think it might have been because at the time solrJ client was queueing update in its memory or something. But nonetheless, we will give the latest solrJ client + cloudSolrServer a try. * Yes, the documents are pretty small. * We are using G1 collector and there are no major GCs, but however, there are a lot of minor GCs sometimes going upto 2s per minute overall. * We are allocating 12G of memory. * Query rate: 3750 TPS (transactions per second) * I need to get the exact rate for insert/updates. I will make the solrJ client change first and give it a test. Thanks Vinay On 3 May 2015 at 09:37, Erick Erickson <erickerick...@gmail.com> wrote: > First, you shouldn't be using HttpSolrClient, use CloudSolrServer > (CloudSolrClient in 5.x). That takes > the ZK address and routes the docs to the leader, reducing the network > hops docs have to go > through. AFAIK, in cloud setups it is in every way superior to http. > > I'm guessing your docs aren't huge. You haven't really told us what > "high indexing rates" and > "high query rates" are in your environment, so it's hard to say much. > For comparison I get > 2-3K docs/sec on my laptop (no query load though). > > The most frequent problem for nodes going into recovery in this > scenario is the ZK timeout > being exceeded. This is often triggered by excessive GC pauses, some > more details would > help here: > > How much memory are you allocating to Solr? Have you turned on GC > logging to see whether > you're getting "stop the world" GC pauses? What rates _are_ you seeing? > > Personally, I'd concentrate on the nodes going into recovery before > anything else. Until that's > fixed any other things you do will not be predictive of much. > > BTW, I typically start with batch sizes of 1,000 FWIW. Sometimes > that's too big, sometimes > too small but it seems pretty reasonable most of the time. > > Best, > Erick > > On Thu, Apr 30, 2015 at 12:20 PM, Vinay Pothnis <poth...@gmail.com> wrote: > > Hello, > > > > I have a usecase with the following characteristics: > > > > - High index update rate (adds/updates) > > - High query rate > > - Low index size (~800MB for 2.4Million docs) > > - The documents that are created at the high rate eventually "expire" > and > > are deleted regularly at half hour intervals > > > > I currently have a solr cloud set up with 1 shard and 4 replicas. > > * My index updates are sent to a VIP/loadbalancer (round robins to one > of > > the 4 solr nodes) > > * I am using http client to send the updates > > * Using batch size of 100 and 8 to 10 threads sending the batch of > updates > > to solr. > > > > When I try to run tests to scale out the indexing rate, I see the > following: > > * solr nodes go into recovery > > * updates are taking really long to complete. > > > > As I understand, when a node receives an update: > > * If it is the leader, it forwards the update to all the replicas and > > waits until it receives the reply from all of them before replying back > to > > the client that sent the reply. > > * If it is not the leader, it forwards the update to the leader, which > > THEN does the above steps mentioned. > > > > How do I go about scaling the index updates: > > * As I add more replicas, my updates would get slower and slower? > > * Is there a way I can configure the leader to wait for say N out of M > > replicas only? > > * Should I be targeting the updates to only the leader? > > * Any other approach i should be considering? > > > > Thanks > > Vinay >