Indexing about 15 million documents per day across 100 shards on 45 servers. Up until about 350 million documents, each of the solr instances was taking up about 1 core (100% CPU). Recently, they all jumped to 700%. Is this normal? Anything that I can check for?

I don't see anything unusual in the solr logs.  Sample from the GC logs:

---------------

2017-08-18 11:53:15 GC log file created /opt/solr6/server/logs/solr_gc.log.2
OpenJDK 64-Bit Server VM (25.141-b16) for linux-amd64 JRE (1.8.0_141-b16), built on Jul 20 2017 11:14:57 by "mockbuild" with gcc 4.4.7 20120313 (Red Hat 4.4.7-18) Memory: 4k page, physical 99016188k(796940k free), swap 33554428k(32614048k free) CommandLine flags: -XX:+AggressiveOpts -XX:CICompilerCount=12 -XX:ConcGCThreads=4 -XX:G1HeapRegionSize=16777216 -XX:GCLogFileSize=20971520 -XX:InitialHeapSize=17179869184 -XX:InitiatingHeapOccupancyPercent=75 -XX:MarkStackSize=4194304 -XX:MaxDirectMemorySize=3221225472 -XX:MaxGCPauseMillis=300 -XX:MaxHeapSize=30064771072 -XX:MaxNewSize=18035507200 -XX:MinHeapDeltaBytes=16777216 -XX:NumberOfGCLogFiles=9 -XX:OnOutOfMemoryError=/opt/solr6/bin/oom_solr.sh 9100 /opt/solr6/server/logs -XX:ParallelGCThreads=16 -XX:+ParallelRefProcEnabled -XX:+PerfDisableSharedMem -XX:+PrintGC -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCDateStamps -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintHeapAtGC -XX:+PrintTenuringDistribution -XX:-ResizePLAB -XX:ThreadStackSize=256 -XX:+UseCompressedClassPointers -XX:+UseCompressedOops -XX:+UseG1GC -XX:+UseGCLogFileRotation -XX:+UseLargePages
{Heap before GC invocations=559440 (full 0):
garbage-first heap total 29360128K, used 24944705K [0x00000000c0000000, 0x00000000c1003800, 0x00000007c0000000)
  region size 16384K, 1075 young (17612800K), 13 survivors (212992K)
Metaspace used 95460K, capacity 97248K, committed 97744K, reserved 1134592K class space used 11616K, capacity 12104K, committed 12240K, reserved 1048576K 2017-08-18T11:53:15.985-0400: 522594.835: [GC pause (G1 Evacuation Pause) (young)
Desired survivor size 1132462080 bytes, new threshold 15 (max 15)
- age   1:   23419920 bytes,   23419920 total
- age   2:    9355296 bytes,   32775216 total
- age   3:    2455384 bytes,   35230600 total
- age   4:   38246704 bytes,   73477304 total
- age   5:   47064408 bytes,  120541712 total
- age   6:   13228864 bytes,  133770576 total
- age   7:   23990800 bytes,  157761376 total
- age   8:    1031416 bytes,  158792792 total
- age   9:   17011128 bytes,  175803920 total
- age  10:    7371888 bytes,  183175808 total
- age  11:    6226576 bytes,  189402384 total
- age  12:     637184 bytes,  190039568 total
- age  13:   11577864 bytes,  201617432 total
- age  14:    9519224 bytes,  211136656 total
- age  15:     672304 bytes,  211808960 total
, 0.0391210 secs]
   [Parallel Time: 32.1 ms, GC Workers: 16]
[GC Worker Start (ms): Min: 522594835.0, Avg: 522594835.1, Max: 522594835.2, Diff: 0.2] [Ext Root Scanning (ms): Min: 0.5, Avg: 0.8, Max: 2.2, Diff: 1.7, Sum: 12.2]
      [Update RS (ms): Min: 0.9, Avg: 2.3, Max: 3.2, Diff: 2.2, Sum: 36.6]
         [Processed Buffers: Min: 3, Avg: 4.7, Max: 8, Diff: 5, Sum: 75]
      [Scan RS (ms): Min: 0.1, Avg: 0.2, Max: 0.4, Diff: 0.3, Sum: 3.0]
[Code Root Scanning (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1] [Object Copy (ms): Min: 27.7, Avg: 28.3, Max: 28.6, Diff: 0.8, Sum: 453.5]
      [Termination (ms): Min: 0.0, Avg: 0.0, Max: 0.0, Diff: 0.0, Sum: 0.1]
         [Termination Attempts: Min: 1, Avg: 1.3, Max: 2, Diff: 1, Sum: 21]
[GC Worker Other (ms): Min: 0.0, Avg: 0.1, Max: 0.3, Diff: 0.3, Sum: 2.4] [GC Worker Total (ms): Min: 31.6, Avg: 31.7, Max: 32.0, Diff: 0.4, Sum: 507.9] [GC Worker End (ms): Min: 522594866.7, Avg: 522594866.8, Max: 522594867.0, Diff: 0.2]
   [Code Root Fixup: 0.1 ms]
   [Code Root Purge: 0.0 ms]
   [Clear CT: 1.7 ms]
   [Other: 5.2 ms]
      [Choose CSet: 0.0 ms]
      [Ref Proc: 2.9 ms]
      [Ref Enq: 0.1 ms]
      [Redirty Cards: 0.2 ms]
      [Humongous Register: 0.1 ms]
      [Humongous Reclaim: 0.0 ms]
      [Free CSet: 1.6 ms]
[Eden: 16.6G(16.6G)->0.0B(16.6G) Survivors: 208.0M->208.0M Heap: 23.8G(28.0G)->7371.4M(28.0G)]
Heap after GC invocations=559441 (full 0):
garbage-first heap total 29360128K, used 7548353K [0x00000000c0000000, 0x00000000c1003800, 0x00000007c0000000)
  region size 16384K, 13 young (212992K), 13 survivors (212992K)
Metaspace used 95460K, capacity 97248K, committed 97744K, reserved 1134592K class space used 11616K, capacity 12104K, committed 12240K, reserved 1048576K
}
 [Times: user=0.54 sys=0.00, real=0.04 secs]
2017-08-18T11:53:16.024-0400: 522594.874: Total time for which application threads were stopped: 0.0471187 seconds, Stopping threads took: 0.0001739 seconds 2017-08-18T11:53:16.811-0400: 522595.661: Total time for which application threads were stopped: 0.0019163 seconds, Stopping threads took: 0.0001631 seconds 2017-08-18T11:53:17.210-0400: 522596.060: Total time for which application threads were stopped: 0.0010934 seconds, Stopping threads took: 0.0001659 seconds

----------------

-Joe

Reply via email to