RE: SolrCloud loadbalancing, replication, and failover

David Parks Fri, 19 Apr 2013 01:15:57 -0700

Interesting. I'm trying to correlate this new understanding to what I see on
my servers.  I've got one server with 5GB dedicated to solr, solr dashboard
reports a 167GB index actually.


When I do many typical queries I see between 3MB and 9MB of disk reads
(watching iostat).

But solr's dashboard only shows 710MB of memory in use (this box has had
many hundreds of queries put through it, and has been up for 1 week). That
doesn't quite correlate with my understanding that Solr would cache the
index as much as possible. 

Should I be thinking that things aren't configured correctly here?

Dave


-----Original Message-----
From: John Nielsen [mailto:j...@mcb.dk] 
Sent: Friday, April 19, 2013 2:35 PM
To: solr-user@lucene.apache.org
Subject: Re: SolrCloud loadbalancing, replication, and failover

Well, to consume 120GB of RAM with a 120GB index, you would have to query
over every single GB of data.

If you only actually query over, say, 500MB of the 120GB data in your dev
environment, you would only use 500MB worth of RAM for caching. Not 120GB


On Fri, Apr 19, 2013 at 7:55 AM, David Parks <davidpark...@yahoo.com> wrote:

> Wow! That was the most pointed, concise discussion of hardware 
> requirements I've seen to date, and it's fabulously helpful, thank you 
> Shawn!  We currently have 2 servers that I can dedicate about 12GB of 
> ram to Solr on (we're moving to these 2 servers now). I can upgrade 
> further if it's needed & justified, and your discussion helps me 
> justify that such an upgrade is the right thing to do.
>
> So... If I move to 3 servers with 50GB of RAM each, using 3 shards, I 
> should be in the free and clear then right?  This seems reasonable and 
> doable.
>
> In this more extreme example the failover properties of solr cloud 
> become more clear. I couldn't possibly run a replica shard without 
> doubling the memory, so really replication isn't reasonable until I 
> have double the hardware, then the load balancing scheme makes perfect 
> sense. With 3 servers, 50GB of RAM and 120GB index I should just 
> backup the index directory I think.
>
> My previous though to run replication just for failover would have 
> actually resulted in LOWER performance because I would have halved the 
> memory available to the master & replica. So the previous question is 
> answered as well now.
>
> Question: if I had 1 server with 60GB of memory and 120GB index, would 
> solr make full use of the 60GB of memory? Thus trimming disk access in 
> half. Or is it an all-or-nothing thing?  In a dev environment, I 
> didn't notice SOLR consuming the full 5GB of RAM assigned to it with a
120GB index.
>
> Dave
>
>
> -----Original Message-----
> From: Shawn Heisey [mailto:s...@elyograg.org]
> Sent: Friday, April 19, 2013 11:51 AM
> To: solr-user@lucene.apache.org
> Subject: Re: SolrCloud loadbalancing, replication, and failover
>
> On 4/18/2013 8:12 PM, David Parks wrote:
> > I think I still don't understand something here.
> >
> > My concern right now is that query times are very slow for 120GB 
> > index (14s on avg), I've seen a lot of disk activity when running
queries.
> >
> > I'm hoping that distributing that query across 2 servers is going to 
> > improve the query time, specifically I'm hoping that we can 
> > distribute that disk activity because we don't have great disks on there
(yet).
> >
> > So, with disk IO being a factor in mind, running the query on one 
> > box,
> vs.
> > across 2 *should* be a concern right?
> >
> > Admittedly, this is the first step in what will probably be many to 
> > try to work our query times down from 14s to what I want to be around
1s.
>
> I went through my mailing list archive to see what all you've said 
> about your setup.  One thing that I can't seem to find is a mention of 
> how much total RAM is in each of your servers.  I apologize if it was 
> actually there and I overlooked it.
>
> In one email thread, you wanted to know whether Solr is CPU-bound or 
> IO-bound.  Solr is heavily reliant on the index on disk, and disk I/O 
> is the slowest piece of the puzzle. The way to get good performance 
> out of Solr is to have enough memory that you can take the disk mostly 
> out of the equation by having the operating system cache the index in 
> RAM.  If you don't have enough RAM for that, then Solr becomes 
> IO-bound, and your CPUs will be busy in iowait, unable to do much real 
> work.  If you DO have enough RAM to cache all (or most) of your index, 
> then Solr will be CPU-bound.
>
> With 120GB of total index data on each server, you would want at least 
> 128GB of RAM per server, assuming you are only giving 8-16GB of RAM to 
> Solr, and that Solr is the only thing running on the machine.  If you 
> have more servers and shards, you can reduce the per-server memory 
> requirement because the amount of index data on each server would go 
> down.  I am aware of the cost associated with this kind of requirement 
> - each of my Solr servers has 64GB.
>
> If you are sharing the server with another program, then you want to 
> have enough RAM available for Solr's heap, Solr's data, the other 
> program's heap, and the other program's data.  Some programs (like
> MySQL) completely skip the OS disk cache and instead do that caching 
> themselves with heap memory that's actually allocated to the program.
> If you're using a program like that, then you wouldn't need to count 
> its data.
>
> Using SSDs for storage can speed things up dramatically and may reduce 
> the total memory requirement to some degree, but even an SSD is slower 
> than RAM.
> The transfer speed of RAM is faster, and from what I understand, the 
> latency is at least an order of magnitude quicker - nanoseconds vs 
> microseconds.
>
> In another thread, you asked about how Google gets such good response 
> times.
> Although Google's software probably works differently than 
> Solr/Lucene, when it comes right down to it, all search engines do 
> similar jobs and have similar requirements.  I would imagine that 
> Google gets incredible response time because they have incredible 
> amounts of RAM at their disposal that keep the important bits of their 
> index instantly available.  They have thousands of servers in each 
> data center.  I once got a look at the extent of Google's hardware in 
> one data center - it was HUGE.  I couldn't get in to examine things 
> closely, they keep that stuff very locked down.
>
> Thanks,
> Shawn
>
>


--
Med venlig hilsen / Best regards

*John Nielsen*
Programmer



*MCB A/S*
Enghaven 15
DK-7500 Holstebro

Kundeservice: +45 9610 2824
p...@mcb.dk
www.mcb.dk

RE: SolrCloud loadbalancing, replication, and failover

Reply via email to