I apologize for the late reply.

The query load is more or less stable during the spikes. There are always
fluctuations, but nothing on the order of magnitude that could explain this
spike. In fact, the latest spike occured last night when there were almost
noone using it.
To test a hunch of mine, I tried to deactivate all caches by commenting out
all cache entries in solrconfig.xml. It still spikes, so I dont think it
has anything to do with cache warming or hits/misses or anything of the
sort.

One interesting thing GC though. This is our latest spike with cpu load
(this server has 8 cores, so a load higher than 8 is potentially
troublesome):

2012.Nov.27 19:58:18    2.27
2012.Nov.27 19:57:17    4.06
2012.Nov.27 19:56:18    8.95
2012.Nov.27 19:55:17    19.97
2012.Nov.27 19:54:17    32.27
2012.Nov.27 19:53:18    1.67
2012.Nov.27 19:52:17    1.6
2012.Nov.27 19:51:18    1.77
2012.Nov.27 19:50:17    1.89

This is what the GC was doing around that time:

2012-11-27T19:50:04.933+0100: [GC [PSYoungGen: 4777586K->277641K(4969216K)]
8887542K->4387693K(9405824K), 0.0856360 secs] [Times: user=0.54 sys=0.00,
real=0.09 secs]
2012-11-27T19:50:30.785+0100: [GC [PSYoungGen: 4749769K->325171K(5068096K)]
8859821K->4435320K(9504704K), 0.0992160 secs] [Times: user=0.63 sys=0.00,
real=0.10 secs]
2012-11-27T19:51:12.293+0100: [GC [PSYoungGen: 4911603K->306181K(5071168K)]
9021752K->4416617K(9507776K), 0.0957890 secs] [Times: user=0.62 sys=0.00,
real=0.09 secs]
2012-11-27T19:51:52.817+0100: [GC [PSYoungGen: 4892613K->376175K(5075328K)]
9003049K->4486755K(9511936K), 0.1099830 secs] [Times: user=0.79 sys=0.01,
real=0.11 secs]
2012-11-27T19:52:29.454+0100: [GC [PSYoungGen: 4972847K->271468K(4868160K)]
9083427K->4383520K(9304768K), 0.0699660 secs] [Times: user=0.48 sys=0.01,
real=0.07 secs]
2012-11-27T19:53:08.176+0100: [GC [PSYoungGen: 4868140K->336421K(5090944K)]
8980192K->4448572K(9527552K), 0.0824350 secs] [Times: user=0.56 sys=0.01,
real=0.08 secs]
2012-11-27T19:54:53.534+0100: [GC [PSYoungGen: 4950373K->340513K(5092864K)]
9062524K->4468215K(9529472K), 0.1016770 secs] [Times: user=0.71 sys=0.00,
real=0.10 secs]
2012-11-27T19:55:02.906+0100: [GC [PSYoungGen: 4954465K->480488K(4952000K)]
9082167K->4684537K(9388608K), 0.1813290 secs] [Times: user=1.23 sys=0.09,
real=0.19 secs]
2012-11-27T19:55:09.114+0100: [GC [PSYoungGen: 4951976K->560434K(5031936K)]
9156025K->5075285K(9547072K), 0.3511090 secs] [Times: user=2.32 sys=0.12,
real=0.35 secs]
2012-11-27T19:55:09.465+0100: [Full GC [PSYoungGen: 560434K->0K(5031936K)]
[PSOldGen: 4514851K->2793342K(5047296K)] 5075285K->2793342K(10079232K)
[PSPermGen: 35285K->35285K(44864K)], 5.2310820 secs] [Times: user=5.23
sys=0.00, real=5.23 secs]
2012-11-27T19:55:21.359+0100: [GC [PSYoungGen: 4471488K->87249K(4647168K)]
7264830K->2880592K(9694464K), 0.0426640 secs] [Times: user=0.27 sys=0.00,
real=0.04 secs]
2012-11-27T19:55:35.399+0100: [GC [PSYoungGen: 4173969K->171306K(4839552K)]
6967312K->2964649K(9886848K), 0.0493570 secs] [Times: user=0.34 sys=0.00,
real=0.05 secs]
2012-11-27T19:55:41.397+0100: [GC [PSYoungGen: 4258026K->156193K(4802368K)]
7051369K->2949536K(9849664K), 0.0643190 secs] [Times: user=0.37 sys=0.00,
real=0.06 secs]
2012-11-27T19:55:55.609+0100: [GC [PSYoungGen: 4254753K->156028K(4826880K)]
7048096K->2949371K(9874176K), 0.0590410 secs] [Times: user=0.35 sys=0.00,
real=0.06 secs]
2012-11-27T19:56:20.519+0100: [GC [PSYoungGen: 4254588K->413148K(4860288K)]
7047931K->3206491K(9907584K), 0.1189810 secs] [Times: user=0.91 sys=0.01,
real=0.12 secs]
2012-11-27T19:56:56.649+0100: [GC [PSYoungGen: 4564892K->412801K(4866432K)]
7358235K->3206144K(9913728K), 0.1089870 secs] [Times: user=0.82 sys=0.00,
real=0.11 secs]
2012-11-27T19:57:24.304+0100: [GC [PSYoungGen: 4564545K->434702K(4870592K)]
7357888K->3228045K(9917888K), 0.1172090 secs] [Times: user=0.86 sys=0.00,
real=0.12 secs]
2012-11-27T19:57:41.489+0100: [GC [PSYoungGen: 4591822K->500171K(4657344K)]
7385165K->3293513K(9704640K), 0.1663600 secs] [Times: user=1.12 sys=0.00,
real=0.16 secs]
2012-11-27T19:58:24.062+0100: [GC [PSYoungGen: 4657291K->453204K(4849536K)]
7450633K->3246547K(9896832K), 0.1099200 secs] [Times: user=0.83 sys=0.01,
real=0.11 secs]
2012-11-27T19:58:46.081+0100: [GC [PSYoungGen: 4560404K->533101K(4640320K)]
7353747K->3326443K(9687616K), 0.1252060 secs] [Times: user=0.96 sys=0.00,
real=0.13 secs]
2012-11-27T19:59:01.488+0100: [GC [PSYoungGen: 4640301K->480969K(4822720K)]
7433643K->3274312K(9870016K), 0.1495860 secs] [Times: user=0.99 sys=0.01,
real=0.15 secs]
2012-11-27T19:59:25.155+0100: [GC [PSYoungGen: 4534281K->572295K(4625664K)]
7327624K->3365638K(9672960K), 0.1492270 secs] [Times: user=1.10 sys=0.01,
real=0.15 secs]
2012-11-27T19:59:39.923+0100: [GC [PSYoungGen: 4625607K->486964K(4800640K)]
7418950K->3280306K(9847936K), 0.1244440 secs] [Times: user=0.94 sys=0.00,
real=0.12 secs]

So the GC was doing a full collection around that time, but that only
accounts for around 5 secs. Not the couple of minutes the spike lasts.

I am really at a complete loss as to what this could be. Google has not
given me any clues.


Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
p...@mcb.dk
www.mcb.dk



On Fri, Nov 23, 2012 at 1:56 PM, Otis Gospodnetic <
otis.gospodne...@gmail.com> wrote:

> Strange indeed. What about query load/ayes during that time? What about GC?
> And does cache hit rate drop?
>
> Otis
> --
> SOLR Performance Monitoring - http://sematext.com/spm
> On Nov 23, 2012 2:45 AM, "John Nielsen" <j...@mcb.dk> wrote:
>
> > Hi all,
> >
> > We are seeing a strange CPU spike on one of our solr4 servers which we
> are
> > unable to explain. The spike, which only lasts for a couple of minutes,
> > sends the disks racing. This happens a few times a times a day. This is
> > what the load looks like:
> >
> > 2012.Nov.14 13:37:17    2.77
> > 2012.Nov.14 13:36:17    3.65
> > 2012.Nov.14 13:35:18    3.92
> > 2012.Nov.14 13:34:17    3.95
> > 2012.Nov.14 13:33:18    6.56
> > 2012.Nov.14 13:32:17    10.79
> > 2012.Nov.14 13:31:17    24.38
> > 2012.Nov.14 13:30:17    63.35
> > 2012.Nov.14 13:29:17    24.68
> > 2012.Nov.14 13:28:17    2.44
> > 2012.Nov.14 13:27:18    3.51
> > 2012.Nov.14 13:26:17    5.26
> > 2012.Nov.14 13:25:18    5.71
> > 2012.Nov.14 13:24:17    2.7
> >
> > The problem is that out of a 3 minute spike, I get about 40 seconds of
> > silence in the logs. This log usually adds like a thousand lines every
> > second. Not being able to communicate with the server for this long,
> breaks
> > our use case.
> >
> > We have two servers, varnish01 and varnish02. We used to feed data to
> > varnish02, replicate it to varnish01 where the data is then read from.
> When
> > we discovered this issue, we moved all traffic to varnish02 so that data
> is
> > being replicated to varnish01, but other than that, gets zero traffic.
> The
> > spike did not disappear.
> >
> > The spike we are seeing is on varnish01 only.
> >
> > Please note that our use case requires us to continuously feed large
> > amounts of data from our main system in the order of up to 1.000
> registers
> > every minute. Our solrconfig.xml is attached.
> >
> > Has anyone seen this phenomenon before?
> >
> > Med venlig hilsen / Best regards
> >
> > *John Nielsen*
> > Programmer
> >
> >
> >
> > *MCB A/S*
> > Enghaven 15
> > DK-7500 Holstebro
> >
> > Kundeservice: +45 9610 2824
> > p...@mcb.dk
> > www.mcb.dk
> >
> >
>

Reply via email to