Hi all, Recently we have gone live using CDCR on our 2 node solr cloud cluster (7.2.1). From a CDCR perspective, everything seems to be working fine...collections are staying in sync across the cluster, everything looks good.
The issue we are seeing is with 1 collection in particular, after we set up CDCR, we are getting extremely slow response times when retrieving documents. Debugging the query shows QTime is almost nothing, but the overall responseTime is like 5x what it should be. The problem is exacerbated by larger result sizes. IE retrieving 25 results is almost normal, but 200 results is way slower than normal. I can run the exact same query multiple times in a row (so everything should be cached), and I still see response times way higher than another environment that is not using CDCR. It doesn't seem to matter if CDCR is enabled or disabled, just that we are using the CDCRUpdateLog. The problem started happening even before we enabled CDCR. In a lower environment we noticed that the transaction logs were huge (multiple gigs), so we tried stopping solr and deleting the tlogs then restarting, and that seemed to fix the performance issue. We tried the same thing in production the other day but it had no effect, so now I don't know if it was a coincidence or not. Things that we have tried: -Completely deleting the collection and rebuilding from scratch -Running the query directly from solr admin to eliminate other causes -Doing a tcpdump on the solr node to eliminate a network issue None of these things have yielded any results. It seems very inconsistent. Some environments we can reproduce it in, others we can't. Hardware/configuration/network is exactly the same between all envrionments. The only thing that we have narrowed it down to is we are pretty sure it has something to do with CDCR, as the issue only started when we started using it. I'm wondering if any of this sparks any ideas from anyone, or if people have suggestions as to how I can figure out what is causing this long query response time? The debug flag on the query seems more geared towards seeing where time is spent in the actual query, which is nothing in my case. The time is spent retrieving the results, which I don't have much information on. I have tried increasing the log level but nothing jumps out at me in the solr logs. Is there something I can look for specifically to help debug this? Thanks, Chris