Apache's e-mail server is pretty aggressive about stripping attachments, none of your images came through. You can put them somewhere else and provide a link....
I happened to see the original e-mail so here are a couple of possibilities CPU usage: Does the start of each spike correlate with a commit? Either hard or soft? Hard commits will start a background merge which can be CPU intensive (as well as I/O). Hard commits with openSearcher=true or soft commits will trigger autowarming as they both open a new searcher. The length of your CPU spikes hints that if it is autowarming you may have excessive autowarm counts configured in solrconfig.xml. Although looking again, you have almost zero usage between spikes, which contrariwise suggests that your indexing process is bursting docs to Solr. Assuming you have a SolrJ program that fires docs in batches (and it should), then my guess is that you send a batch, then your SolrJ program spends time assembling the next batch during which time Solr is just idling. Ditto if you're using some other process to send docs to Solr. The fact that the first and third graphs track each other so closely (assuming that they are the exact same time interval) really looks like your indexing process is bursting docs to Solr. I don't quite know what to say about the second graph. The third graph could be related to what roles the replicas on each node have and how you're indexing and the like. Updates for a shard are forwarded to the leader. From there they're sent to followers so the machines the leaders are on can have considerably more traffic since they get the original input then send it on whereas the followers don't redistribute the input. If you're not using SolrJ (and specifically CloudSolrClient), then the documents just land on some node. From there they're forwarded to the appropriate leader and the above is repeated from that point. So another possibility is that the third machine isn't the target for, say, HTTP updates. Are you updating by sending docs to a specific node? Third is if there's a skew in the kinds of replicas on each node. Say all your leaders are on two nodes and all your followers are on the third node. Then likely there'll be more traffic on the first two. this all assumes NRT replicas (the default). If you specify TLOG or PULL replicas, where which ones live could also be part of the reason. All this at a guess of course since I don't know much about your indexing process. Best, Erick On Sat, Apr 28, 2018 at 3:26 PM, Nicola Gordon <nicola.gor...@d2l.com> wrote: > Hello, > > > > Hoping someone has some insight on this. I need to understand resource > usage patterns seen at solr cluster. > > Any insight/any info on what solr is doing would be much appreciated! > Here’s what I see: > > > > This pattern of CPU usage is seen throughout indexing – each of the lines is > one of the 3 SOLR instances in my cluster. > > > > Same seen for other resources eg: > > > > Also, why is one of the solr instance receiving (and sending) consistently > less network data than the others? > > > > Thanks for any insight! > > - Nicola > >