Hi Emir, I was seeing this error as long as the indexing was running. Once I stopped the indexing the errors also stopped. Yes, we do monitor both hosts & solr but have not seen anything out of the ordinary except for a small network blip. In my experience solr generally recovers after a network blip and there are a few errors for streaming solr client...but have never seen this error before.
Thanks Jay Thanks Jay Potharaju On Tue, May 8, 2018 at 12:56 AM, Emir Arnautović < emir.arnauto...@sematext.com> wrote: > Hi Jay, > This is low ingestion rate. What is the size of your index? What is heap > size? I am guessing that this is not a huge index, so I am leaning toward > what Shawn mentioned - some combination of DBQ/merge/commit/optimise that > is blocking indexing. Though, it is strange that it is happening only on > one node if you are sending updates randomly to both nodes. Do you monitor > your hosts/Solr? Do you see anything different at the time when timeouts > happen? > > Thanks, > Emir > -- > Monitoring - Log Management - Alerting - Anomaly Detection > Solr & Elasticsearch Consulting Support Training - http://sematext.com/ > > > > > On 8 May 2018, at 03:23, Jay Potharaju <jspothar...@gmail.com> wrote: > > > > I have about 3-5 updates per second. > > > > > >> On May 7, 2018, at 5:02 PM, Shawn Heisey <apa...@elyograg.org> wrote: > >> > >>> On 5/7/2018 5:05 PM, Jay Potharaju wrote: > >>> There are some deletes by query. I have not had any issues with DBQ, > >>> currently have 5.3 running in production. > >> > >> Here's the big problem with DBQ. Imagine this sequence of events with > >> these timestamps: > >> > >> 13:00:00: A commit for change visibility happens. > >> 13:00:00: A segment merge is triggered by the commit. > >> (It's a big merge that takes exactly 3 minutes.) > >> 13:00:05: A deleteByQuery is sent. > >> 13:00:15: An update to the index is sent. > >> 13:00:25: An update to the index is sent. > >> 13:00:35: An update to the index is sent. > >> 13:00:45: An update to the index is sent. > >> 13:00:55: An update to the index is sent. > >> 13:01:05: An update to the index is sent. > >> 13:01:15: An update to the index is sent. > >> 13:01:25: An update to the index is sent. > >> {time passes, more updates might be sent} > >> 13:03:00: The merge finishes. > >> > >> Here's what would happen in this scenario: The DBQ and all of the > >> update requests sent *after* the DBQ will block until the merge > >> finishes. That means that it's going to take up to three minutes for > >> Solr to respond to those requests. If the client that is sending the > >> request is configured with a 60 second socket timeout, which inter-node > >> requests made by Solr are by default, then it is going to experience a > >> timeout error. The request will probably complete successfully once the > >> merge finishes, but the connection is gone, and the client has already > >> received an error. > >> > >> Now imagine what happens if an optimize (forced merge of the entire > >> index) is requested on an index that's 50GB. That optimize may take 2-3 > >> hours, possibly longer. A deleteByQuery started on that index after the > >> optimize begins (and any updates requested after the DBQ) will pause > >> until the optimize is done. A pause of 2 hours or more is a BIG > problem. > >> > >> This is why deleteByQuery is not recommended. > >> > >> If the deleteByQuery were changed into a two-step process involving a > >> query to retrieve ID values and then one or more deleteById requests, > >> then none of that blocking would occur. The deleteById operation can > >> run at the same time as a segment merge, so neither it nor subsequent > >> update requests will have the significant pause. From what I > >> understand, you can even do commits in this scenario and have changes be > >> visible before the merge completes. I haven't verified that this is the > >> case. > >> > >> Experienced devs: Can we fix this problem with DBQ? On indexes with a > >> uniqueKey, can DBQ be changed to use the two-step process I mentioned? > >> > >> Thanks, > >> Shawn > >> > >