On 8/7/2017 9:41 AM, Wael Kader wrote:
> I faced an issue that is making me go crazy.
> I am running SOLR saving data on HDFS and I have a single node setup with
> an index that has been running fine until today.
> I know that 2 billion documents is too much on a single node but it has
> been running fine for my requirements and it was pretty fast.
>
> I restarted SOLR today and I am getting an error stating "Too many
> documents, composite IndexReaders cannot exceed 2147483519.
> The last backup I have is 2 weeks back and I really need the index to start
> to get the data from the index.

You have run into what I think might be the only *hard* limit in the
entire Lucene ecosystem.  Other limits can usually be broken with
careful programming, but that one is set in stone.

A Lucene index uses a 32-bit Java integer to track the internal document
ID.  In Java, numeric variables are signed.  For that reason, an integer
cannot exceed (2^31)-1.  That number is 2147483647.  It appears that
Lucene cuts that off at a value that's smaller by 128.  Not sure why
that is, but it's probably to prevent problems when a small offset is
added to the value.

SolrCloud is perfectly capable of running indexes with far more than two
billion documents, but as Yago mentioned, the collection must be sharded
for that to happen.

I have no idea whether you can successfully recover anything from that
index now that it has broken the hard limit.

Thanks,
Shawn

Reply via email to