RE: Billion document index

2013-05-15 Thread Toke Eskildsen
Shawn Heisey [s...@elyograg.org]: > Performance testing would be required in order to make a proper > determination on whether SSD makes financial sense. I fully agree. [Lack of TRIM with RAID] > then performance eventually suffers, and can become even worse than > a spinning hard disk. Do you

Re: Billion document index

2013-05-15 Thread Shawn Heisey
On 5/15/2013 3:56 AM, pankaj.pand...@wipro.com wrote: > Thanks Shawn for explaining everything in such detail, it was really helpful. > > Have few more queries on the same. Can you please explain the purpose of the > 3rd box in minimal configuration, with the standalone zookeeper? A zookeeper e

Re: Billion document index

2013-05-15 Thread Shawn Heisey
On 5/15/2013 1:57 AM, Toke Eskildsen wrote: > On Wed, 2013-05-15 at 08:31 +0200, Shawn Heisey wrote: >> http://wiki.apache.org/solr/SolrPerformanceProblems >> >> I really was serious about reading that page, and not just because I >> wrote it. > > That page makes a clear recommendation of RAM over

Re: Billion document index

2013-05-15 Thread Jack Krupansky
Although technically it may be possible to put 1 billion documents in a single Solr/Lucene index (2 billion hard limit), I would recommend simply: Don't do it! Don't try to put more than 250 million documents on a single Solr node. In fact, 100 million is a better, more realistic limit. To be

RE: Billion document index

2013-05-15 Thread pankaj.pandey4
Thanks Shawn for explaining everything in such detail, it was really helpful. Have few more queries on the same. Can you please explain the purpose of the 3rd box in minimal configuration, with the standalone zookeeper? On separate note, if we go with ahead with 4 box(8 shard with replication f

Re: Billion document index

2013-05-15 Thread Daniel Collins
Just on our experiences, we have a large collection (350M documents, but 1.2Tb in size spread across 4 shards/machines and multiple replicas, we may well need more) and the first thing we needed to do for size estimation was to work out how big a set number of documents would be on disk. So we did

Re: Billion document index

2013-05-15 Thread Toke Eskildsen
On Wed, 2013-05-15 at 08:31 +0200, Shawn Heisey wrote: > http://wiki.apache.org/solr/SolrPerformanceProblems > > I really was serious about reading that page, and not just because I > wrote it. That page makes a clear recommendation of RAM over SSDs. Have you done any performance testing on this?

Re: Billion document index

2013-05-14 Thread Shawn Heisey
On 5/15/2013 12:31 AM, Shawn Heisey wrote: > If we assume that you've taken every possible step to reduce Solr's Java > heap requirements, you might be able to do a heap of 8 to 16GB per > server, but the actual heap requirement could be significantly higher. > Adding this up, you get a bare minimu