On 2/4/2015 2:54 PM, Arumugam, Suresh wrote: > > Hi All, > > > > We are trying to load 14+ Billion documents into Solr. But we are > failing to load them into Solr. > > > > Solr version: *4.8.0* > > Analyzer used: *ClassicTokenizer for index as well as query.* > > > > Can someone help me in getting into the core of this issue? > > > > For 14+ Billion document load, we are loading 2Billion batches using > the dataimport with single thread. > > > > First batch completed successfully & added 2 Billion > documents > > Second batch, dataimport is showing as successful > completion. But the no of documents is still 2 Billion with the > following exception in the logs. >
<snip> > Caused by: java.lang.IllegalArgumentException: Too many documents, > composite IndexReaders cannot exceed 2147483647 Solr is an application based on Lucene. Lucene has exactly one hard limitation -- a single index cannot contain more than 2147483647(Java's Integer.MAX_VALUE) documents. There are some ideas being kicked around for removing this limitation, but it is not normally seen it as a major stumbling block. You're likely to hit performance bottlenecks with indexes much smaller than 2 billion documents. The document count includes deleted documents that have not yet been merged away. For a variety of reasons, we recommend not storing more than about 100 million documents in any single index, although going up to about 1 billion is feasible, if you have enough memory. Solr, especially if you use SolrCloud, offers the ability to shard your index so it is being served from many smaller indexes, on many hosts. If you're going to have billions of documents, you have no choice but to shard your index. In order to get good performance out of an index that large, you'll need the memory and processing power of multiple physical machines working together. https://wiki.apache.org/solr/DistributedSearch https://cwiki.apache.org/confluence/display/solr/SolrCloud You will need a lot of hardware, especially memory, to handle a 14 billion document index with any kind of speed. Thanks, Shawn