Hi Shawn,

I have shared a tar ball with you (apa...@elyograg.org) from google drive. The 
tar ball includes logs directories of 4 nodes, solrconfig.xml, solr.in.sh, and 
screenshot of TOP command. The log files is about 1 day’s log. However, I 
restarted the solr cloud several times during that period.

I want to make it clear. I don’t have 4 physical machines. I have 48 cores 
server. All 4 solr nodes are running on the same physical machine. Each node 
has 1 shard and 1 replicate. I also have a ZooKeeper ensemble running on the 
same machine with 3 different ports.

I am curious to know what Solr is doing when the CPU usage is 100% or more than 
100%. Because for some queries, I think even just looping through all the 
document without using any index might be faster.

If you have problem accessing the tar ball, please let me know.

Thanks a lot!

Chuming


On Nov 2, 2018, at 6:56 PM, Shawn Heisey <apa...@elyograg.org> wrote:

> On 11/2/2018 1:38 PM, Chuming Chen wrote:
>> I am running a Solr cloud 7.4 with 4 shards and 4 nodes (JVM "-Xms20g 
>> -Xmx40g”), each shard has 32 million documents and 32Gbytes in size.
> 
> A 40GB heap is probably completely unnecessary for an index of that size.  
> Does each machine have one replica on it or two? If you are trying for high 
> availability, then it will be at least two shard replicas per machine.
> 
> The values on -Xms and -Xmx should normally be set the same.  Java will 
> always tend to allocate the entire max heap it has been allowed, so it's 
> usually better to just let it have the whole amount right up front.
> 
>> For a given query (I use complexphrase query), typically, the first time it 
>> took a couple of seconds to return the first 20 docs. However, for the 
>> following page, or sorting by a field, even run the same query again took a 
>> lot longer to return results. I can see my 4 solr nodes running crazy with 
>> more than 100%CPU.
> 
> Can you obtain a screenshot of a process listing as described at the 
> following URL, and provide the image using a file sharing site?
> 
> https://wiki.apache.org/solr/SolrPerformanceProblems#Asking_for_help_on_a_memory.2Fperformance_issue
> 
> There are separate instructions there for Windows and for Linux/UNIX 
> operating systems.
> 
> Also useful are the GC logs that are written by Java when Solr is started 
> using the included scripts.  I'm looking for logfiles that cover several days 
> of runtime.  You'll need to share them with a file sharing website -- files 
> will not normally make it to the mailing list if attached to a message.
> 
> Getting a copy of the solrconfig.xml in use on your collection can also be 
> helpful.
> 
>> My understanding is that Solr has query cache, run same query should be 
>> faster.
> 
> If the query is absolutely identical in *every* way, then yes, it can be 
> satisfied from Solr caches, if their size is sufficient.  If you change 
> ANYTHING, including things like rows or start, filters, sorting, facets, and 
> other parameters, then the query probably cannot be satisfied completely from 
> cache.  At that point, Solr is very reliant on how much memory has NOT been 
> allocated to programs -- it must be a sufficient quantity of memory that the 
> Solr index data can be effectively cached.
> 
>> What could be wrong here? How do I debug? I checked solr.log in all nodes 
>> and didn’t see anything unusual. Most frequent log entry looks like this.
>> 
>> INFO  - 2018-11-02 19:32:55.189; [   ] org.apache.solr.servlet.HttpSolrCall; 
>> [admin] webapp=null path=/admin/metrics 
>> params={wt=javabin&version=2&key=solr.core.patternmatch.shard3.replica_n8:UPDATE./update.requests&key=solr.core.patternmatch.shard3.replica_n8:INDEX.sizeInBytes&key=solr.core.patternmatch.shard1.replica_n1:QUERY./select.requests&key=solr.core.patternmatch.shard1.replica_n1:INDEX.sizeInBytes&key=solr.core.patternmatch.shard1.replica_n1:UPDATE./update.requests&key=solr.core.patternmatch.shard3.replica_n8:QUERY./select.requests}
>>  status=0 QTime=7
>> INFO  - 2018-11-02 19:32:55.192; [   ] org.apache.solr.servlet.HttpSolrCall; 
>> [admin] webapp=null path=/admin/metrics 
>> params={wt=javabin&version=2&key=solr.jvm:os.processCpuLoad&key=solr.node:CONTAINER.fs.coreRoot.usableSpace&key=solr.jvm:os.systemLoadAverage&key=solr.jvm:memory.heap.used}
>>  status=0 QTime=1
> 
> That is not a query.  It is a call to the Metrics API. When I've made this 
> call on a production Solr machine, it seems to be very resource-intensive, 
> taking a long time.  I don't think it should be made frequently.  Probably no 
> more than once a minute. If you are seeing that kind of entry in your logs a 
> lot, then that might be contributing to your performance issues.
> 
> Thanks,
> Shawn
> 

Reply via email to