Hi all,

Recently we have gone live using CDCR on our 2 node solr cloud cluster
(7.2.1). From a CDCR perspective, everything seems to be working
fine...collections are staying in sync across the cluster, everything looks
good.

The issue we are seeing is with 1 collection in particular, after we set up
CDCR, we are getting extremely slow response times when retrieving
documents. Debugging the query shows QTime is almost nothing, but the
overall responseTime is like 5x what it should be. The problem is
exacerbated by larger result sizes. IE retrieving 25 results is almost
normal, but 200 results is way slower than normal. I can run the exact same
query multiple times in a row (so everything should be cached), and I still
see response times way higher than another environment that is not using
CDCR. It doesn't seem to matter if CDCR is enabled or disabled, just that
we are using the CDCRUpdateLog. The problem started happening even before
we enabled CDCR.

In a lower environment we noticed that the transaction logs were huge
(multiple gigs), so we tried stopping solr and deleting the tlogs then
restarting, and that seemed to fix the performance issue. We tried the same
thing in production the other day but it had no effect, so now I don't know
if it was a coincidence or not.

Things that we have tried:

-Completely deleting the collection and rebuilding from scratch
-Running the query directly from solr admin to eliminate other causes
-Doing a tcpdump on the solr node to eliminate a network issue

None of these things have yielded any results. It seems very inconsistent.
Some environments we can reproduce it in, others we can't.
Hardware/configuration/network is exactly the same between all
envrionments. The only thing that we have narrowed it down to is we are
pretty sure it has something to do with CDCR, as the issue only started
when we started using it.

I'm wondering if any of this sparks any ideas from anyone, or if people
have suggestions as to how I can figure out what is causing this long query
response time? The debug flag on the query seems more geared towards seeing
where time is spent in the actual query, which is nothing in my case. The
time is spent retrieving the results, which I don't have much information
on. I have tried increasing the log level but nothing jumps out at me in
the solr logs. Is there something I can look for specifically to help debug
this?

Thanks,

Chris

Reply via email to