Re: Performance potential for updating (reindexing) documents

Shawn Heisey Thu, 31 Mar 2016 18:40:32 -0700

On 3/24/2016 11:57 AM, tedsolr wrote:
> My post was scant on details. The numbers I gave for collection sizes are
> projections for the future. I am in the midst of an upgrade that will be
> completed within a few weeks. My concern is that I may not be able to
> produce the throughput necessary to index an entire collection quickly
> enough (3 to 4 hours) for a large customer (100M docs).


I can fully rebuild one of my indexes, with 146 million docs, in 8-10
hours.  This is fairly inefficient indexing -- six large shards (not
cloud), each one running the dataimport handler, importing from MySQL. 
I suspect I could probably get two or three times this rate (and maybe
more) on the same hardware if I wrote a SolrJ application that uses
multiple threads for each Solr shard.

I know from experiments that the MySQL server can push over 100 million
rows to a SolrJ program in less than an hour, including constructing
SolrInputDocument objects.  That experiment just left out the
"client.add(docs);" line.  The bottleneck is definitely Solr.

Each machine holds three large shards(half the index),is running Solr
4.x (5.x upgrade is in the works), and has 64GB RAM with an 8GB heap. 
Each shard is approximately 24.4 million docs and 28GB.  These machines
also hold another sharded index in the same Solr install, but it's quite
a lot smaller.

Thanks,
Shawn

Re: Performance potential for updating (reindexing) documents

Reply via email to