Did you try to see where/which component  like query, facet highlight... is
taking time by debugQuery=on when performance is slow. Just to rule out any
other component is not the culprit...

Thnx

On Mon, Jun 25, 2018 at 2:06 PM, Chris Troullis <[email protected]>
wrote:

> FYI to all, just as an update, we rebuilt the index in question from
> scratch for a second time this weekend and the problem went away on 1 node,
> but we were still seeing it on the other node. After restarting the
> problematic node, the problem went away. Still makes me a little uneasy as
> we weren't able to determine the cause, but at least we are back to normal
> query times now.
>
> Chris
>
> On Fri, Jun 15, 2018 at 8:06 AM, Chris Troullis <[email protected]>
> wrote:
>
> > Thanks Shawn,
> >
> > As mentioned previously, we are hard committing every 60 seconds, which
> we
> > have been doing for years, and have had no issues until enabling CDCR. We
> > have never seen large tlog sizes before, and even manually issuing a hard
> > commit to the collection does not reduce the size of the tlogs. I believe
> > this is because when using the CDCRUpdateLog the tlogs are not purged
> until
> > the docs have been replicated over. Anyway, since we manually purged the
> > tlogs they seem to now be staying at an acceptable size, so I don't think
> > that is the cause. The documents are not abnormally large, maybe ~20
> > string/numeric fields with simple whitespace tokenization.
> >
> > To answer your questions:
> >
> > -Solr version: 7.2.1
> > -What OS vendor and version Solr is running on: CentOS 6
> > -Total document count on the server (counting all index cores): 13
> > collections totaling ~60 million docs
> > -Total index size on the server (counting all cores): ~60GB
> > -What the total of all Solr heaps on the server is - 16GB heap (we had to
> > increase for CDCR because it was using a lot more heap).
> > -Whether there is software other than Solr on the server - No
> > -How much total memory the server has installed - 64 GB
> >
> > All of this has been consistent for multiple years across multiple Solr
> > versions and we have only started seeing this issue once we started using
> > the CDCRUpdateLog and CDCR, hence why that is the only real thing we can
> > point to. And again, the issue is only affecting 1 of the 13 collections
> on
> > the server, so if it was hardware/heap/GC related then I would think we
> > would be seeing it for every collection, not just one, as they all share
> > the same resources.
> >
> > I will take a look at the GC logs, but I don't think that is the cause.
> > The consistent nature of the slow performance doesn't really point to GC
> > issues, and we have profiling set up in New Relic and it does not show
> any
> > long/frequent GC pauses.
> >
> > We are going to try and rebuild the collection from scratch again this
> > weekend as that has solved the issue in some lower environments, although
> > it's not really consistent. At this point it's all we can think of to do.
> >
> > Thanks,
> >
> > Chris
> >
> >
> > On Thu, Jun 14, 2018 at 6:23 PM, Shawn Heisey <[email protected]>
> wrote:
> >
> >> On 6/12/2018 12:06 PM, Chris Troullis wrote:
> >> > The issue we are seeing is with 1 collection in particular, after we
> >> set up
> >> > CDCR, we are getting extremely slow response times when retrieving
> >> > documents. Debugging the query shows QTime is almost nothing, but the
> >> > overall responseTime is like 5x what it should be. The problem is
> >> > exacerbated by larger result sizes. IE retrieving 25 results is almost
> >> > normal, but 200 results is way slower than normal. I can run the exact
> >> same
> >> > query multiple times in a row (so everything should be cached), and I
> >> still
> >> > see response times way higher than another environment that is not
> using
> >> > CDCR. It doesn't seem to matter if CDCR is enabled or disabled, just
> >> that
> >> > we are using the CDCRUpdateLog. The problem started happening even
> >> before
> >> > we enabled CDCR.
> >> >
> >> > In a lower environment we noticed that the transaction logs were huge
> >> > (multiple gigs), so we tried stopping solr and deleting the tlogs then
> >> > restarting, and that seemed to fix the performance issue. We tried the
> >> same
> >> > thing in production the other day but it had no effect, so now I don't
> >> know
> >> > if it was a coincidence or not.
> >>
> >> There is one other cause besides CDCR buffering that I know of for huge
> >> transaction logs, and it has nothing to do with CDCR:  A lack of hard
> >> commits.  It is strongly recommended to have autoCommit set to a
> >> reasonably short interval (about a minute in my opinion, but 15 seconds
> >> is VERY common).  Most of the time openSearcher should be set to false
> >> in the autoCommit config, and other mechanisms (which might include
> >> autoSoftCommit) should be used for change visibility.  The example
> >> autoCommit settings might seem superfluous because they don't affect
> >> what's searchable, but it is actually a very important configuration to
> >> keep.
> >>
> >> Are the docs in this collection really big, by chance?
> >>
> >> As I went through previous threads you've started on the mailing list, I
> >> have noticed that none of your messages provided some details that would
> >> be useful for looking into performance problems:
> >>
> >>  * What OS vendor and version Solr is running on.
> >>  * Total document count on the server (counting all index cores).
> >>  * Total index size on the server (counting all cores).
> >>  * What the total of all Solr heaps on the server is.
> >>  * Whether there is software other than Solr on the server.
> >>  * How much total memory the server has installed.
> >>
> >> If you name the OS, I can use that information to help you gather some
> >> additional info which will actually show me most of that list.  Total
> >> document count is something that I cannot get from the info I would help
> >> you gather.
> >>
> >> Something else that can cause performance issues is GC pauses.  If you
> >> provide a GC log (The script that starts Solr logs this by default), we
> >> can analyze it to see if that's a problem.
> >>
> >> Attachments to messages on the mailing list typically do not make it to
> >> the list, so a file sharing website is a better way to share large
> >> logfiles.  A paste website is good for log data that's smaller.
> >>
> >> Thanks,
> >> Shawn
> >>
> >>
> >
>

Reply via email to