Given that it is log entries, you might find it works to use a collection per day, and then use collection aliasing to query over them all. This way, you can have a different aliases that specify certain ranges (e.g. week is an alias for the last 7 or 8 day's collections).
Upayavira On Thu, Feb 5, 2015, at 09:11 AM, Toke Eskildsen wrote: > On Wed, 2015-02-04 at 23:31 +0100, Arumugam, Suresh wrote: > > We are trying to do a POC for searching our log files with a single > > node Solr(396 GB RAM with 14 TB Space). > > We're running 7 billion larger-than-typical-log-entries documents from a > machine of similar size and it serves our needs well: > https://sbdevel.wordpress.com/net-archive-search/ > > With your (I assume) tiny documents, the number 14 billion does not seem > too scary for your machine. Of course it depends on the types of queries > you are issuing and your requirements for throughput & latency. > > Perhaps you could state your performance requirements as well as the > types of queries you will be issuing? > > > Besides the hard requirement of < 2 billion documents / shard, you are > free to choose your shard size. While the general advice of 100M/shard > is not bad, I would guess that 3-500M/shard could also work for you, as > it lowers the merging overhead to have fewer shards. What works best > also depends on the queries you make; especially faceting can be tricky > with a high number of documents /shard. > > - Toke Eskildsen, State and University Library, Denmark > >