You'll almost certainly have to shard then. First of all Lucene has a hard limit of 2^31 docs in a single index so there's a 2B limit. There's no such limit on the number of docs in the collection (i.e. 5 shards each can have 2B docs for 10B docs total in the collection).
But nobody that I know of has that many documents on a shard, although I've seen 200M-300M docs on a shard give good response time. I've also seen 20M docs strain a beefy server. Here's an outline of what it takes to find out: https://lucidworks.com/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/ The idea is to set up a test environment that you strain to breaking with _your_ data/queries/environment. You can do this with just two machines, from there its just multiplying... Best, Erick On Wed, Oct 4, 2017 at 6:07 AM, gatanathoa <msouthw...@microfocus.com> wrote: > There is a very large amount of data and there will be a constant addition of > more data. There will be hundreds of millions if not billions of items. > > We have to be able to be able to be constantly indexing items but also allow > for searching. Sadly there is no way to know the amount of searching that > will be done, but was told to expected a fair amount. (I have no idea what > "a fair amount" means either) > > I am not sure that only one shard will be adequate in this setup. The speed > of the search results is the key here. There is also no way to test this > prior to implementation. > > Is this enough information to be able to provide some guide lines? > > > > -- > Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html