On 2/4/2015 3:31 PM, Arumugam, Suresh wrote:
> We are trying to do a POC for searching our log files with a single node 
> Solr(396 GB RAM with 14 TB Space).
> Since the server is powerful, added 2 Billion records successfully & search 
> is working fine without much issues.
>
> Due to the restriction of the Lucence Index max Document, we were not able to 
> load further.
>
>       Is there a way to increase that limit from 2Billion to 4 or 5 Billion 
> in Lucene?

I thought I already sent this, but it has been sitting in my drafts
folder for several days.

That Lucene restriction cannot be changed at this time, the result of
using a 32-bit value for the Lucene document identifier.  The amount of
program code that would be affected by a switch to a 64-bit value is
HUGE, and the ripple effect would be highly unpredictable.  Developers
that use the Lucene API expect long-term stability ... that change has
the potential for a lot of volatility.  Even if we figure out how to
make the change, I wouldn't expect it anytime soon.  It won't be in the
5.0 release, and I don't even think that anyone is brave enough to
attempt it for the 6.0 release either.

>       If Lucene supports 2Billion per index then will it be the same issue 
> with Solr Cloud also??

SolrCloud lets you shard your index, so there are no limits other than
available system resources and the number of servers.  There are users
who have indexes as big as the one you are planning (and some even
larger) that use Solr successfully.

>       Recommended size for an index is 100 million means, do we need to have 
> 20 indexes to support 2 Billion documents, is my understanding right??   

The memory structures required within Java are much smaller and can be
manipulated more efficiently if the index has 100 million documents than
if the index has 1 or 2 billion documents.  Within the hard Lucene
limitation, you can make your indexes as big as you like ... but
real-world experience has told us that 100 million on each server is a
good balance between resource requirements and performance.  If you
don't care how many seconds your index takes to respond to a query, or
you can afford enormous amounts of memory and a commercial JVM with
low-pause characteristics, you can push the limits with your shard size.

I have compiled some performance information for "normal" sized indexes
with millions of documents.  On the billions scale, some of this info is
not very helpful:

http://wiki.apache.org/solr/SolrPerformanceProblems

Thanks,
Shawn

Reply via email to