Re: Async exceptions during distributed update

2018-05-14 Thread Jay Potharaju
Adding some more context to my last email Solr:6.6.3 2 nodes : 3 shards each No replication . Can someone answer the following questions 1) any ideas on why the following errors keep happening. AFAIK streaming solr clients error is because of timeouts when connecting to other nodes. Async e

Re: Async exceptions during distributed update

2018-05-13 Thread Jay Potharaju
Hi, I restarted both my solr servers but I am seeing the async error again. In older 5x version of solrcloud, solr would normally recover gracefully in case of network errors, but solr 6.6.3 does not seem to be doing that. At this time I am not doing only a small percentage of deletebyquery operat

Re: Async exceptions during distributed update

2018-05-09 Thread Emir Arnautović
Hi Jay, Network blip might be the cause, but also the consequence of this issue. Maybe you can try avoiding DBQ while indexing and see if it is the cause. You can do thread dump on “the other” node and see if there are blocked threads and that can give you more clues what’s going on. Thanks, Em

Re: Async exceptions during distributed update

2018-05-08 Thread Jay Potharaju
Hi Emir, I was seeing this error as long as the indexing was running. Once I stopped the indexing the errors also stopped. Yes, we do monitor both hosts & solr but have not seen anything out of the ordinary except for a small network blip. In my experience solr generally recovers after a network b

Re: Async exceptions during distributed update

2018-05-08 Thread Emir Arnautović
Hi Jay, This is low ingestion rate. What is the size of your index? What is heap size? I am guessing that this is not a huge index, so I am leaning toward what Shawn mentioned - some combination of DBQ/merge/commit/optimise that is blocking indexing. Though, it is strange that it is happening o

Re: Async exceptions during distributed update

2018-05-07 Thread Jay Potharaju
I have about 3-5 updates per second. > On May 7, 2018, at 5:02 PM, Shawn Heisey wrote: > >> On 5/7/2018 5:05 PM, Jay Potharaju wrote: >> There are some deletes by query. I have not had any issues with DBQ, >> currently have 5.3 running in production. > > Here's the big problem with DBQ. Imagi

Re: Async exceptions during distributed update

2018-05-07 Thread Jay Potharaju
Thanks for explaining that Shawn! Emir, I use php library called solarium to do updates/deletes to solr. The request is sent to any of the available nodes in the cluster. > On May 7, 2018, at 5:02 PM, Shawn Heisey wrote: > >> On 5/7/2018 5:05 PM, Jay Potharaju wrote: >> There are some deletes b

Re: Async exceptions during distributed update

2018-05-07 Thread Shawn Heisey
On 5/7/2018 5:05 PM, Jay Potharaju wrote: > There are some deletes by query. I have not had any issues with DBQ, > currently have 5.3 running in production. Here's the big problem with DBQ.  Imagine this sequence of events with these timestamps: 13:00:00: A commit for change visibility happens. 1

Re: Async exceptions during distributed update

2018-05-07 Thread Emir Arnautović
How many concurrent updates can be sent? Do you always send updates to the same node? Do you use solrj? Emir On Tue, May 8, 2018, 1:02 AM Jay Potharaju wrote: > The updates are pushed in real time not batched. No complex analysis and > everything is committed using autocommit settings in solr.

Re: Async exceptions during distributed update

2018-05-07 Thread Jay Potharaju
There are some deletes by query. I have not had any issues with DBQ, currently have 5.3 running in production. Thanks Jay Potharaju On Mon, May 7, 2018 at 4:02 PM, Jay Potharaju wrote: > The updates are pushed in real time not batched. No complex analysis and > everything is committed using au

Re: Async exceptions during distributed update

2018-05-07 Thread Jay Potharaju
The updates are pushed in real time not batched. No complex analysis and everything is committed using autocommit settings in solr. Thanks Jay Potharaju On Mon, May 7, 2018 at 4:00 PM, Emir Arnautović < emir.arnauto...@sematext.com> wrote: > How do you send documents? Large batches? Complex ana

Re: Async exceptions during distributed update

2018-05-07 Thread Emir Arnautović
How do you send documents? Large batches? Complex analysis? Do you send all batches to the same node? How do you commit? Do you delete by query while indexing? Emir On Tue, May 8, 2018, 12:30 AM Jay Potharaju wrote: > I didn't see any OOM errors in the logs on either of the nodes. I saw GC > pa

Re: Async exceptions during distributed update

2018-05-07 Thread Jay Potharaju
I didn't see any OOM errors in the logs on either of the nodes. I saw GC pause of 1 second on the box that was throwing error ...but nothing on the other node. Any other recommendations? Thanks Thanks Jay Potharaju On Mon, May 7, 2018 at 9:48 AM, Jay Potharaju wrote: > Ah thanks for explainin

Re: Async exceptions during distributed update

2018-05-07 Thread Jay Potharaju
Ah thanks for explaining that! Thanks Jay Potharaju On Mon, May 7, 2018 at 9:45 AM, Emir Arnautović < emir.arnauto...@sematext.com> wrote: > Node A receives batch of documents to index. It forwards documents to > shards that are on the node B. Node B is having issues with GC so it takes > a whi

Re: Async exceptions during distributed update

2018-05-07 Thread Emir Arnautović
Node A receives batch of documents to index. It forwards documents to shards that are on the node B. Node B is having issues with GC so it takes a while to respond. Node A sees it as read timeout and reports it in logs. So the issue is on node B not node A. Emir -- Monitoring - Log Management

Re: Async exceptions during distributed update

2018-05-07 Thread Jay Potharaju
Yes, the nodes are well balanced. I am just using these boxes for indexing the data and is not serving any traffic at this time. The error indicates it is having issues errors on the shards that are hosted on the box and not on the other box. I will check GC logs to see if there were any issues. t

Re: Async exceptions during distributed update

2018-05-07 Thread Emir Arnautović
Hi Jay, My first guess would be that there was some major GC on other box so it did not respond on time. Are your nodes well balanced - do they serve equal amount of data? Thanks, Emir -- Monitoring - Log Management - Alerting - Anomaly Detection Solr & Elasticsearch Consulting Support Training