Hi Shawn, I have shared a tar ball with you (apa...@elyograg.org) from google drive. The tar ball includes logs directories of 4 nodes, solrconfig.xml, solr.in.sh, and screenshot of TOP command. The log files is about 1 day’s log. However, I restarted the solr cloud several times during that period.
I want to make it clear. I don’t have 4 physical machines. I have 48 cores server. All 4 solr nodes are running on the same physical machine. Each node has 1 shard and 1 replicate. I also have a ZooKeeper ensemble running on the same machine with 3 different ports. I am curious to know what Solr is doing when the CPU usage is 100% or more than 100%. Because for some queries, I think even just looping through all the document without using any index might be faster. If you have problem accessing the tar ball, please let me know. Thanks a lot! Chuming On Nov 2, 2018, at 6:56 PM, Shawn Heisey <apa...@elyograg.org> wrote: > On 11/2/2018 1:38 PM, Chuming Chen wrote: >> I am running a Solr cloud 7.4 with 4 shards and 4 nodes (JVM "-Xms20g >> -Xmx40g”), each shard has 32 million documents and 32Gbytes in size. > > A 40GB heap is probably completely unnecessary for an index of that size. > Does each machine have one replica on it or two? If you are trying for high > availability, then it will be at least two shard replicas per machine. > > The values on -Xms and -Xmx should normally be set the same. Java will > always tend to allocate the entire max heap it has been allowed, so it's > usually better to just let it have the whole amount right up front. > >> For a given query (I use complexphrase query), typically, the first time it >> took a couple of seconds to return the first 20 docs. However, for the >> following page, or sorting by a field, even run the same query again took a >> lot longer to return results. I can see my 4 solr nodes running crazy with >> more than 100%CPU. > > Can you obtain a screenshot of a process listing as described at the > following URL, and provide the image using a file sharing site? > > https://wiki.apache.org/solr/SolrPerformanceProblems#Asking_for_help_on_a_memory.2Fperformance_issue > > There are separate instructions there for Windows and for Linux/UNIX > operating systems. > > Also useful are the GC logs that are written by Java when Solr is started > using the included scripts. I'm looking for logfiles that cover several days > of runtime. You'll need to share them with a file sharing website -- files > will not normally make it to the mailing list if attached to a message. > > Getting a copy of the solrconfig.xml in use on your collection can also be > helpful. > >> My understanding is that Solr has query cache, run same query should be >> faster. > > If the query is absolutely identical in *every* way, then yes, it can be > satisfied from Solr caches, if their size is sufficient. If you change > ANYTHING, including things like rows or start, filters, sorting, facets, and > other parameters, then the query probably cannot be satisfied completely from > cache. At that point, Solr is very reliant on how much memory has NOT been > allocated to programs -- it must be a sufficient quantity of memory that the > Solr index data can be effectively cached. > >> What could be wrong here? How do I debug? I checked solr.log in all nodes >> and didn’t see anything unusual. Most frequent log entry looks like this. >> >> INFO - 2018-11-02 19:32:55.189; [ ] org.apache.solr.servlet.HttpSolrCall; >> [admin] webapp=null path=/admin/metrics >> params={wt=javabin&version=2&key=solr.core.patternmatch.shard3.replica_n8:UPDATE./update.requests&key=solr.core.patternmatch.shard3.replica_n8:INDEX.sizeInBytes&key=solr.core.patternmatch.shard1.replica_n1:QUERY./select.requests&key=solr.core.patternmatch.shard1.replica_n1:INDEX.sizeInBytes&key=solr.core.patternmatch.shard1.replica_n1:UPDATE./update.requests&key=solr.core.patternmatch.shard3.replica_n8:QUERY./select.requests} >> status=0 QTime=7 >> INFO - 2018-11-02 19:32:55.192; [ ] org.apache.solr.servlet.HttpSolrCall; >> [admin] webapp=null path=/admin/metrics >> params={wt=javabin&version=2&key=solr.jvm:os.processCpuLoad&key=solr.node:CONTAINER.fs.coreRoot.usableSpace&key=solr.jvm:os.systemLoadAverage&key=solr.jvm:memory.heap.used} >> status=0 QTime=1 > > That is not a query. It is a call to the Metrics API. When I've made this > call on a production Solr machine, it seems to be very resource-intensive, > taking a long time. I don't think it should be made frequently. Probably no > more than once a minute. If you are seeing that kind of entry in your logs a > lot, then that might be contributing to your performance issues. > > Thanks, > Shawn >