Re: Solr cloud performance degradation with billions of documents

Erick Erickson Thu, 14 Aug 2014 10:10:31 -0700

You are absolutely on the bleeding edge.

I know of a couple of projects that are at that scale, but....

1> they aren't being done on just a few nodes. As Jack
says, this scale for SolrCloud is not common and there
are no OOB templates to follow.

2> AFAIK, the projects I'm talking about aren't in production
yet. And they're significant R&D  efforts on the parts of the
companies involved.

3> You are _not_ going to do this on a shoestring
budget. Nor is it going to be something you have
up and running in 3 months. And you're talking
a lot of machines here. Jack and I are both
coming up with thousands of Solr servers, _that's_
the scale we're talking here! You're not going to
get around this by just adding more memory either.

Much as I love Solr, I have to ask whether it's the right
tool for your situation. Unlike some other technologies,
Solr requires holding large parts of the index in memory.
For the entire corpus. At once. At the scale you're talking,
you need compelling reasons to invest in all that. So I'd
carefully look at what your problem is and whether Solr/search
is the right tool for the job or not.

On Thu, Aug 14, 2014 at 9:51 AM, Toke Eskildsen <t...@statsbiblioteket.dk> 
wrote:
> Wilburn, Scott [scott.wilb...@verizonwireless.com.INVALID] wrote:
>> Thanks for your suggestion to look into MapReduceIndexerTool, I'm looking 
>> into that now.
>> I agree what I am trying to do is a tall order, and the more I hear from all 
>> of your
>> comments, the more I am convinced that lack of memory is my biggest problem.
>> I'm going to work on increasing the memory now, but was wondering if there 
>> are
>> any configuration or other techniques that could also increase ingest 
>> performance?
>
> More RAM basically compensates for slow storage, so the obvious trick is to 
> increase your I/O performance. If your index is placed on network storage, 
> then put it on local storage. If you are using spinning drives, then change 
> to SSDs. If you are using SSDs then RAID them. Way cheaper than trying to 
> match your RAM with your projected index size.
>
>> Does anyone know if a cloud of this size( hundreds of billions ) with an 
>> ingest rate of 5 billion new each day, has ever been attempted before?
>
> Sorry, my experience is primarily with maximizing search performance.
>
> - Toke Eskildsen

Re: Solr cloud performance degradation with billions of documents

Reply via email to