On 8/26/2014 9:36 PM, lalitjangra wrote: > I am using SOlr 4.6.0 with single collection/core and want to know details > about following. > > 1. What is the maximum number of documents which can be uploaded in a single > collection/core? > 2. What is the maximum size of a document i can upload in solr without > failing? > 3. Is there any way to update these limits, if possible?
There is exactly one hard limit in Solr that cannot be changed with configuration. That limit comes about because the Lucene on-disk index format uses a 32-bit signed integer value (the "int" or "Integer" data type in Java) for internal document identifiers. The biggest number that format can contain is 2147483647 -- a little more than two billion. A Lucene index cannot contain more than 2147483647 documents. Because deleted documents are counted along with the rest, it is advisable to not exceed one billion live documents per Solr core -- each core maintains one Lucene index. Limiting yourself to one billion documents will make it possible for the index to contain one billion live documents as well as one billion deleted documents. You can create Solr indexes well beyond the Lucene limit by going distributed. One way to easily do this is with SolrCloud - create a collection with multiple shards. Each shard will be limited to 2147483647 documents, but the whole index can be as many shards as you require. It is definitely recommended that you load such an index onto many servers. There are no size limits at all, although you should be aware that most tokenizers and token filters do have a hard-coded maximum token size that is typically between 256 and 4096 characters. A character may be more than one byte. For example, the wide space common in oriental languages takes up three bytes in UTF-8 encoding: http://www.fileformat.info/info/unicode/char/3000/index.htm One final note: You'll almost always encounter resource limits -- RAM, IOPS, or CPU -- before you actually run into Lucene's one undefeatable limitation. Thanks, Shawn