On 8/23/2018 4:03 AM, Shawn Heisey wrote:
Configuring caches cannot speed up the first time a query runs.  That speeds up later runs.  To speed up the first time will require two things:

1) Ensuring that there is enough memory in the system for the operating system to effectively cache the index.  This is memory *beyond* the java heap that is not allocated to any program.

Followup, after fully digesting the latest reply:

HDFS changes things a little bit.  You would need to talk to somebody about caching HDFS data effectively.  I think that in that case, you *do* need to use the heap to create a large HDFS client cache, but I have no personal experience with HDFS, so I do not know for sure.  Note that having a very large heap can make garbage collection pauses become extreme.

With 2 billion docs, I'm assuming that you're running SolrCloud and that the index is sharded.  SolrCloud gives you query load balancing for free.  But I think you're probably going to need a lot more than 4 servers, and each server is probably going to need a lot of memory.  You haven't indicated how many shards or replicas are involved here.  For optimal performance, every shard needs to be on a separate server.

Searching 2 billion docs, especially with wildcards, may not be possible to get working REALLY fast.  Without a LOT of hardware, particularly memory, it can be completely impractical to cache that much data.  Terabytes of memory is *very* expensive, especially if it's scattered across many servers.

Thanks,
Shawn

Reply via email to