50 billion per day? Wow! How large are these documents?
We have a cluster with one large collection that contains 2.4 billion
documents spread across 40 machines using HDFS for the index. We store
our data inside of HBase, and in order to re-index data we pull from
HBase and index with solr
For us we have ~ 350M documents stored using r3.xlarge nodes with 8GB Heap
and about 31GB of RAM
We are using Solr 5.3.1 in a SolrCloud setup (3 collections, each with 3
shards and 3 replicas).
For us lots of RAM memory is not as important as CPU (as the EBS disk we
run on top of
is quite fast a
When we have 49 shards per collection, there are more than 600 collections.
Solr will have serious performance problems. I don't know how to deal with
them. My advice to you is to minimize the number of collections.
Our environment is 49 solr server nodes, each with 32cpu/128g, and the data
volume
Hi,
In my company we are running a 12 node cluster with 10 (american) Billion
documents 12 shards / 2 replicas.
We do mainly faceting queries with a very reasonable performance.
36 million documents it's not an issue, you can handle that volume of documents
with 2 nodes with SSDs and 32G of ra
We have a 24 million document index. Our documents are a bit smaller than
yours, homework problems.
The Hathi Trust probably has the record. They haven’t updated their blog for a
while, but they were at 11 million books and billions of pages in 2014.
https://www.hathitrust.org/blogslarge-scale-
We have tested Solr 4.10 with 200 million docs with avg doc size of 250 KB.
No issues with performance when using 3 shards / 2 replicas.
On Tue, Apr 3, 2018 at 8:12 PM, Steven White wrote:
> Hi everyone,
>
> I'm about to start a project that requires indexing 36 million records
> using Solr 7.