Re: How to trace one query？the debug/debugQuery info are not enough to find out why a query is slow

Shawn Heisey Thu, 23 Aug 2018 03:20:01 -0700

On 8/23/2018 4:03 AM, Shawn Heisey wrote:

Configuring caches cannot speed up the first time a query runs. Thatspeeds up later runs. To speed up the first time will require twothings:
1) Ensuring that there is enough memory in the system for theoperating system to effectively cache the index. This is memory*beyond* the java heap that is not allocated to any program.


Followup, after fully digesting the latest reply:

HDFS changes things a little bit. You would need to talk to somebodyabout caching HDFS data effectively. I think that in that case, you*do* need to use the heap to create a large HDFS client cache, but Ihave no personal experience with HDFS, so I do not know for sure. Notethat having a very large heap can make garbage collection pauses becomeextreme.

With 2 billion docs, I'm assuming that you're running SolrCloud and thatthe index is sharded. SolrCloud gives you query load balancing forfree. But I think you're probably going to need a lot more than 4servers, and each server is probably going to need a lot of memory. Youhaven't indicated how many shards or replicas are involved here. Foroptimal performance, every shard needs to be on a separate server.

Searching 2 billion docs, especially with wildcards, may not be possibleto get working REALLY fast. Without a LOT of hardware, particularlymemory, it can be completely impractical to cache that much data. Terabytes of memory is *very* expensive, especially if it's scatteredacross many servers.


Thanks,
Shawn

Re: How to trace one query？the debug/debugQuery info are not enough to find out why a query is slow

Reply via email to