You'll almost certainly have to shard then. First of all Lucene has a
hard limit of 2^31 docs in a single index so there's a 2B limit.
There's no such limit on the number of docs in the collection (i.e. 5
shards each can have 2B docs for 10B docs total in the collection).

But nobody that I know of has that many documents on a shard, although
I've seen 200M-300M docs on a shard give good response time. I've also
seen 20M docs strain a beefy server.

Here's an outline of what it takes to find out:
https://lucidworks.com/2012/07/23/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/

The idea is to set up a test environment that you strain to breaking
with _your_ data/queries/environment. You can do this with just two
machines, from there its just multiplying...

Best,
Erick

On Wed, Oct 4, 2017 at 6:07 AM, gatanathoa <msouthw...@microfocus.com> wrote:
> There is a very large amount of data and there will be a constant addition of
> more data. There will be hundreds of millions if not billions of items.
>
> We have to be able to be able to be constantly indexing items but also allow
> for searching. Sadly there is no way to know the amount of searching that
> will be done, but was told to expected a fair amount. (I have no idea what
> "a fair amount" means either)
>
> I am not sure that only one shard will be adequate in this setup. The speed
> of the search results is the key here. There is also no way to test this
> prior to implementation.
>
> Is this enough information to be able to provide some guide lines?
>
>
>
> --
> Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html

Reply via email to