On 2/4/2015 3:31 PM, Arumugam, Suresh wrote: > We are trying to do a POC for searching our log files with a single node > Solr(396 GB RAM with 14 TB Space). > Since the server is powerful, added 2 Billion records successfully & search > is working fine without much issues. > > Due to the restriction of the Lucence Index max Document, we were not able to > load further. > > Is there a way to increase that limit from 2Billion to 4 or 5 Billion > in Lucene?
I thought I already sent this, but it has been sitting in my drafts folder for several days. That Lucene restriction cannot be changed at this time, the result of using a 32-bit value for the Lucene document identifier. The amount of program code that would be affected by a switch to a 64-bit value is HUGE, and the ripple effect would be highly unpredictable. Developers that use the Lucene API expect long-term stability ... that change has the potential for a lot of volatility. Even if we figure out how to make the change, I wouldn't expect it anytime soon. It won't be in the 5.0 release, and I don't even think that anyone is brave enough to attempt it for the 6.0 release either. > If Lucene supports 2Billion per index then will it be the same issue > with Solr Cloud also?? SolrCloud lets you shard your index, so there are no limits other than available system resources and the number of servers. There are users who have indexes as big as the one you are planning (and some even larger) that use Solr successfully. > Recommended size for an index is 100 million means, do we need to have > 20 indexes to support 2 Billion documents, is my understanding right?? The memory structures required within Java are much smaller and can be manipulated more efficiently if the index has 100 million documents than if the index has 1 or 2 billion documents. Within the hard Lucene limitation, you can make your indexes as big as you like ... but real-world experience has told us that 100 million on each server is a good balance between resource requirements and performance. If you don't care how many seconds your index takes to respond to a query, or you can afford enormous amounts of memory and a commercial JVM with low-pause characteristics, you can push the limits with your shard size. I have compiled some performance information for "normal" sized indexes with millions of documents. On the billions scale, some of this info is not very helpful: http://wiki.apache.org/solr/SolrPerformanceProblems Thanks, Shawn