Re: Hardware config for SOLR

Grant Ingersoll Fri, 19 Sep 2008 14:23:13 -0700

Inline below.

On Sep 17, 2008, at 6:32 PM, Andrey Shulinskiy wrote:

Hello,




First, some numbers we're expecting.

- The average size of a doc: ~100K

- The number of indexes: 1

- The query response time we're looking for: < 200 - 300ms

- The number of stored docs:

1st year: 500K - 1M

2nd year: 2-3M

- The estimated number of concurrent users per second

1st year: 15 - 25

2nd year: 40 - 60

- The estimated number of queries

1st year: 15 - 25

2nd year: 40 - 60



Now the questions



1)  Should we do sharding or not?

If we start without sharding, how hard will it be to enable it?

Is it just some config changes + the index rebuild or is it more?

There will be operations setup, etc. And you'll have to add in theappropriate query stuff.

Your install and requirements aren't that large, so I doubt you'llneed sharding, but it always depends on your exact configuration.I've seen indexes as big as 80 million docs on a single machine, butthe docs were smaller in size.



My personal opinion is to go without sharding at first and enable it
later if do get a lot of documents.


Sounds reasonable.





2)  How should we organize our clusters to ensure redundancy?

Should we have 2 or more identical Masters (means that all the
updates/optimisations/etc. are done for every one of them)?

An alternative, afaik, is to reconfigure one slave to become the new
Master, how hard is that?

I don't have a good answer here, maybe someone else can chime in. Iknow master failover is a concern, but I'm not sure how others handleit right now. Would be good to have people share their approach.That being said, it seems reasonable to me to have identical masters.





3) Basically, we can get servers of two kinds:



* Single Processor, Dual Core Opteron 2214HE

* 2 GB DDR2 SDRAM

* 1 x 250 GB (7200 RPM) SATA Drive(s)



* Dual Processor, Quad Core 5335

* 16 GB Memory (Fully Buffered)

* 2 x 73 GB (10k RPM) 2.5" SAS Drive(s), RAID 1



The second - more powerful - one is more expensive, of course.

Get as much RAM as you can afford. Surely there is an in betweenmachine as well that might balance cost and capabilities. The firstmachine seems a bit light, especially in memory.





How can we take advantage of the multiprocessor/multicore servers?

Is there some special setup required to make, say, 2 instances of SOLR
run on the same server using different processors/cores?

See the Core Admin stuff http://wiki.apache.org/solr/CoreAdmin. Solris thread-safe by design (so it's a bug, if you hit issues). You cansend it documents on multiple threads and it will be fine.





4)  Does it make much difference to get a more powerful Master?

Or, on the contrary, as slaves will be queried more often, they should

be the better ones? Maybe just the HDDs for the slaves should be asfast

as possible?

Depends on where your bottlenecks are. Are you getting a lot ofqueries or a lot of updates?

As for HDDs, people have noted some nice speedups in Lucene usingSolid-state drives, if you can afford them. Fast I/O is good ifyou're retrieving whole documents, but once things are warmed up moreRAM is most important, I think, as many things can be cached.





5) How many slaves does it make sense to have per one Master?

What's (roughly) the performance gain from 1 to 2, 2 -> 3, etc?

When does it stop making sense to add more slaves?

I suppose it's when you can handle your peak load, but I don't havenumbers. One of the keys is to incrementally test and see what makessense for your scenario.



As far as I understand, it depends mainly on the size of the index.
However, I'd guess the time required to do a push for too many slaves
can be a problem too, correct?

The biggest problem for slaves is if the master does an optimization,in which case the whole snapshot must be downloaded versus incrementaladditions can be handled by getting just the deltas.



HTH,
Grant



--------------------------
Grant Ingersoll
http://www.lucidimagination.com

Lucene Helpful Hints:
http://wiki.apache.org/lucene-java/BasicsOfPerformance
http://wiki.apache.org/lucene-java/LuceneFAQ

Re: Hardware config for SOLR

Reply via email to