Indexing about 15 million documents per day across 100 shards on 45
servers. Up until about 350 million documents, each of the solr
instances was taking up about 1 core (100% CPU). Recently, they all
jumped to 700%. Is this normal? Anything that I can check for?
I don't see anything unusual in the solr logs. Sample from the GC logs:
---------------
2017-08-18 11:53:15 GC log file created /opt/solr6/server/logs/solr_gc.log.2
OpenJDK 64-Bit Server VM (25.141-b16) for linux-amd64 JRE
(1.8.0_141-b16), built on Jul 20 2017 11:14:57 by "mockbuild" with gcc
4.4.7 20120313 (Red Hat 4.4.7-18)
Memory: 4k page, physical 99016188k(796940k free), swap
33554428k(32614048k free)
CommandLine flags: -XX:+AggressiveOpts -XX:CICompilerCount=12
-XX:ConcGCThreads=4 -XX:G1HeapRegionSize=16777216
-XX:GCLogFileSize=20971520 -XX:InitialHeapSize=17179869184
-XX:InitiatingHeapOccupancyPercent=75 -XX:MarkStackSize=4194304
-XX:MaxDirectMemorySize=3221225472 -XX:MaxGCPauseMillis=300
-XX:MaxHeapSize=30064771072 -XX:MaxNewSize=18035507200
-XX:MinHeapDeltaBytes=16777216 -XX:NumberOfGCLogFiles=9
-XX:OnOutOfMemoryError=/opt/solr6/bin/oom_solr.sh 9100
/opt/solr6/server/logs -XX:ParallelGCThreads=16
-XX:+ParallelRefProcEnabled -XX:+PerfDisableSharedMem -XX:+PrintGC
-XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps
-XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC
-XX:+PrintTenuringDistribution -XX:-ResizePLAB -XX:ThreadStackSize=256
-XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseG1GC
-XX:+UseGCLogFileRotation -XX:+UseLargePages
{Heap before GC invocations=559440 (full 0):
garbage-first heap total 29360128K, used 24944705K
[0x00000000c0000000, 0x00000000c1003800, 0x00000007c0000000)
region size 16384K, 1075 young (17612800K), 13 survivors (212992K)
Metaspace used 95460K, capacity 97248K, committed 97744K,
reserved 1134592K
class space used 11616K, capacity 12104K, committed 12240K,
reserved 1048576K
2017-08-18T11:53:15.985-0400: 522594.835: [GC pause (G1 Evacuation
Pause) (young)
Desired survivor size 1132462080 bytes, new threshold 15 (max 15)
- age 1: 23419920 bytes, 23419920 total
- age 2: 9355296 bytes, 32775216 total
- age 3: 2455384 bytes, 35230600 total
- age 4: 38246704 bytes, 73477304 total
- age 5: 47064408 bytes, 120541712 total
- age 6: 13228864 bytes, 133770576 total
- age 7: 23990800 bytes, 157761376 total
- age 8: 1031416 bytes, 158792792 total
- age 9: 17011128 bytes, 175803920 total
- age 10: 7371888 bytes, 183175808 total
- age 11: 6226576 bytes, 189402384 total
- age 12: 637184 bytes, 190039568 total
- age 13: 11577864 bytes, 201617432 total
- age 14: 9519224 bytes, 211136656 total
- age 15: 672304 bytes, 211808960 total
, 0.0391210 secs]
[Parallel Time: 32.1 ms, GC Workers: 16]
[GC Worker Start (ms): Min: 522594835.0, Avg: 522594835.1, Max:
522594835.2, Diff: 0.2]
[Ext Root Scanning (ms): Min: 0.5, Avg: 0.8, Max: 2.2, Diff: 1.7,
Sum: 12.2]
[Update RS (ms): Min: 0.9, Avg: 2.3, Max: 3.2, Diff: 2.2, Sum: 36.6]
[Processed Buffers: Min: 3, Avg: 4.7, Max: 8, Diff: 5, Sum: 75]
[Scan RS (ms): Min: 0.1, Avg: 0.2, Max: 0.4, Diff: 0.3, Sum: 3.0]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff:
0.0, Sum: 0.1]
[Object Copy (ms): Min: 27.7, Avg: 28.3, Max: 28.6, Diff: 0.8,
Sum: 453.5]
[Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
[Termination Attempts: Min: 1, Avg: 1.3, Max: 2, Diff: 1, Sum: 21]
[GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.3, Diff: 0.3,
Sum: 2.4]
[GC Worker Total (ms): Min: 31.6, Avg: 31.7, Max: 32.0, Diff:
0.4, Sum: 507.9]
[GC Worker End (ms): Min: 522594866.7, Avg: 522594866.8, Max:
522594867.0, Diff: 0.2]
[Code Root Fixup: 0.1 ms]
[Code Root Purge: 0.0 ms]
[Clear CT: 1.7 ms]
[Other: 5.2 ms]
[Choose CSet: 0.0 ms]
[Ref Proc: 2.9 ms]
[Ref Enq: 0.1 ms]
[Redirty Cards: 0.2 ms]
[Humongous Register: 0.1 ms]
[Humongous Reclaim: 0.0 ms]
[Free CSet: 1.6 ms]
[Eden: 16.6G(16.6G)->0.0B(16.6G) Survivors: 208.0M->208.0M Heap:
23.8G(28.0G)->7371.4M(28.0G)]
Heap after GC invocations=559441 (full 0):
garbage-first heap total 29360128K, used 7548353K
[0x00000000c0000000, 0x00000000c1003800, 0x00000007c0000000)
region size 16384K, 13 young (212992K), 13 survivors (212992K)
Metaspace used 95460K, capacity 97248K, committed 97744K,
reserved 1134592K
class space used 11616K, capacity 12104K, committed 12240K,
reserved 1048576K
}
[Times: user=0.54 sys=0.00, real=0.04 secs]
2017-08-18T11:53:16.024-0400: 522594.874: Total time for which
application threads were stopped: 0.0471187 seconds, Stopping threads
took: 0.0001739 seconds
2017-08-18T11:53:16.811-0400: 522595.661: Total time for which
application threads were stopped: 0.0019163 seconds, Stopping threads
took: 0.0001631 seconds
2017-08-18T11:53:17.210-0400: 522596.060: Total time for which
application threads were stopped: 0.0010934 seconds, Stopping threads
took: 0.0001659 seconds
----------------
-Joe