Given that it is log entries, you might find it works to use a
collection per day, and then use collection aliasing to query over them
all. This way, you can have a different aliases that specify certain
ranges (e.g. week is an alias for the last 7 or 8 day's collections).

Upayavira

On Thu, Feb 5, 2015, at 09:11 AM, Toke Eskildsen wrote:
> On Wed, 2015-02-04 at 23:31 +0100, Arumugam, Suresh wrote:
> > We are trying to do a POC for searching our log files with a single
> > node Solr(396 GB RAM with 14 TB Space).
> 
> We're running 7 billion larger-than-typical-log-entries documents from a
> machine of similar size and it serves our needs well: 
> https://sbdevel.wordpress.com/net-archive-search/
> 
> With your (I assume) tiny documents, the number 14 billion does not seem
> too scary for your machine. Of course it depends on the types of queries
> you are issuing and your requirements for throughput & latency.
> 
> Perhaps you could state your performance requirements as well as the
> types of queries you will be issuing?
> 
> 
> Besides the hard requirement of < 2 billion documents / shard, you are
> free to choose your shard size. While the general advice of 100M/shard
> is not bad, I would guess that 3-500M/shard could also work for you, as
> it lowers the merging overhead to have fewer shards. What works best
> also depends on the queries you make; especially faceting can be tricky
> with a high number of documents /shard. 
> 
> - Toke Eskildsen, State and University Library, Denmark
> 
> 

Reply via email to