Re: Async exceptions during distributed update

Jay Potharaju Tue, 08 May 2018 08:54:10 -0700

Hi Emir,
I was seeing this error as long as the indexing was running. Once I stopped
the indexing the errors also stopped.  Yes, we do monitor both hosts & solr
but have not seen anything out of the ordinary except for a small network
blip. In my experience solr generally recovers after a network blip and
there are a few errors for streaming solr client...but have never seen this
error before.


Thanks
Jay

Thanks
Jay Potharaju


On Tue, May 8, 2018 at 12:56 AM, Emir Arnautović <
emir.arnauto...@sematext.com> wrote:

> Hi Jay,
> This is low ingestion rate. What is the size of your index? What is heap
> size? I am guessing that this is not a huge index, so  I am leaning toward
> what Shawn mentioned - some combination of DBQ/merge/commit/optimise that
> is blocking indexing. Though, it is strange that it is happening only on
> one node if you are sending updates randomly to both nodes. Do you monitor
> your hosts/Solr? Do you see anything different at the time when timeouts
> happen?
>
> Thanks,
> Emir
> --
> Monitoring - Log Management - Alerting - Anomaly Detection
> Solr & Elasticsearch Consulting Support Training - http://sematext.com/
>
>
>
> > On 8 May 2018, at 03:23, Jay Potharaju <jspothar...@gmail.com> wrote:
> >
> > I have about 3-5 updates per second.
> >
> >
> >> On May 7, 2018, at 5:02 PM, Shawn Heisey <apa...@elyograg.org> wrote:
> >>
> >>> On 5/7/2018 5:05 PM, Jay Potharaju wrote:
> >>> There are some deletes by query. I have not had any issues with DBQ,
> >>> currently have 5.3 running in production.
> >>
> >> Here's the big problem with DBQ.  Imagine this sequence of events with
> >> these timestamps:
> >>
> >> 13:00:00: A commit for change visibility happens.
> >> 13:00:00: A segment merge is triggered by the commit.
> >> (It's a big merge that takes exactly 3 minutes.)
> >> 13:00:05: A deleteByQuery is sent.
> >> 13:00:15: An update to the index is sent.
> >> 13:00:25: An update to the index is sent.
> >> 13:00:35: An update to the index is sent.
> >> 13:00:45: An update to the index is sent.
> >> 13:00:55: An update to the index is sent.
> >> 13:01:05: An update to the index is sent.
> >> 13:01:15: An update to the index is sent.
> >> 13:01:25: An update to the index is sent.
> >> {time passes, more updates might be sent}
> >> 13:03:00: The merge finishes.
> >>
> >> Here's what would happen in this scenario:  The DBQ and all of the
> >> update requests sent *after* the DBQ will block until the merge
> >> finishes.  That means that it's going to take up to three minutes for
> >> Solr to respond to those requests.  If the client that is sending the
> >> request is configured with a 60 second socket timeout, which inter-node
> >> requests made by Solr are by default, then it is going to experience a
> >> timeout error.  The request will probably complete successfully once the
> >> merge finishes, but the connection is gone, and the client has already
> >> received an error.
> >>
> >> Now imagine what happens if an optimize (forced merge of the entire
> >> index) is requested on an index that's 50GB.  That optimize may take 2-3
> >> hours, possibly longer.  A deleteByQuery started on that index after the
> >> optimize begins (and any updates requested after the DBQ) will pause
> >> until the optimize is done.  A pause of 2 hours or more is a BIG
> problem.
> >>
> >> This is why deleteByQuery is not recommended.
> >>
> >> If the deleteByQuery were changed into a two-step process involving a
> >> query to retrieve ID values and then one or more deleteById requests,
> >> then none of that blocking would occur.  The deleteById operation can
> >> run at the same time as a segment merge, so neither it nor subsequent
> >> update requests will have the significant pause.  From what I
> >> understand, you can even do commits in this scenario and have changes be
> >> visible before the merge completes.  I haven't verified that this is the
> >> case.
> >>
> >> Experienced devs: Can we fix this problem with DBQ?  On indexes with a
> >> uniqueKey, can DBQ be changed to use the two-step process I mentioned?
> >>
> >> Thanks,
> >> Shawn
> >>
>
>

Re: Async exceptions during distributed update

Reply via email to