On 8/23/2018 4:03 AM, Shawn Heisey wrote:
Configuring caches cannot speed up the first time a query runs. That
speeds up later runs. To speed up the first time will require two
things:
1) Ensuring that there is enough memory in the system for the
operating system to effectively cache the index. This is memory
*beyond* the java heap that is not allocated to any program.
Followup, after fully digesting the latest reply:
HDFS changes things a little bit. You would need to talk to somebody
about caching HDFS data effectively. I think that in that case, you
*do* need to use the heap to create a large HDFS client cache, but I
have no personal experience with HDFS, so I do not know for sure. Note
that having a very large heap can make garbage collection pauses become
extreme.
With 2 billion docs, I'm assuming that you're running SolrCloud and that
the index is sharded. SolrCloud gives you query load balancing for
free. But I think you're probably going to need a lot more than 4
servers, and each server is probably going to need a lot of memory. You
haven't indicated how many shards or replicas are involved here. For
optimal performance, every shard needs to be on a separate server.
Searching 2 billion docs, especially with wildcards, may not be possible
to get working REALLY fast. Without a LOT of hardware, particularly
memory, it can be completely impractical to cache that much data.
Terabytes of memory is *very* expensive, especially if it's scattered
across many servers.
Thanks,
Shawn