50 billion per day?  Wow!  How large are these documents?

We have a cluster with one large collection that contains 2.4 billion documents spread across 40 machines using HDFS for the index.  We store our data inside of HBase, and in order to re-index data we pull from HBase and index with solr cloud.  Most we can do is around 57 million per day; usually limited by pulling data out of HBase not Solr.

-Joe


On 4/4/2018 10:57 PM, 苗海泉 wrote:
When we have 49 shards per collection, there are more than 600 collections.
Solr will have serious performance problems. I don't know how to deal with
them. My advice to you is to minimize the number of collections.
Our environment is 49 solr server nodes, each with 32cpu/128g, and the data
volume is about 50 billion per day.


‌
<https://mailtrack.io/> Sent with Mailtrack
<https://mailtrack.io?utm_source=gmail&utm_medium=signature&utm_campaign=signaturevirality&;>

2018-04-04 9:23 GMT+08:00 Yago Riveiro <yago.rive...@gmail.com>:

Hi,

In my company we are running a 12 node cluster with 10 (american) Billion
documents 12 shards / 2 replicas.

We do mainly faceting queries with a very reasonable performance.

36 million documents it's not an issue, you can handle that volume of
documents with 2 nodes with SSDs and 32G of ram

Regards.

--

Yago Riveiro

On 4 Apr 2018 02:15 +0100, Abhi Basu <9000r...@gmail.com>, wrote:
We have tested Solr 4.10 with 200 million docs with avg doc size of 250
KB.
No issues with performance when using 3 shards / 2 replicas.



On Tue, Apr 3, 2018 at 8:12 PM, Steven White <swhite4...@gmail.com>
wrote:
Hi everyone,

I'm about to start a project that requires indexing 36 million records
using Solr 7.2.1. Each record range from 500 KB to 0.25 MB where the
average is 0.1 MB.

Has anyone indexed this number of records? What are the things I should
worry about? And out of curiosity, what is the largest number of
records
that Solr has indexed which is published out there?

Thanks

Steven



--
Abhi Basu



Reply via email to