Re: shards and performance

Ian Connor Wed, 20 Aug 2008 04:27:54 -0700

I have based my machines on bare bones servers (I call them ghetto
servers). I essentially have motherboards in a rack sitting on
catering trays (heat resistance is key).


http://web.mac.com/iconnor/iWeb/Site/ghetto-servers.html

Motherboards: GIGABYTE GA-G33M-S2L (these are small mATX with 4 RAM
slots - allows as much cheap RAM as possible)
CPU: Intel Q6600 (quad core 2.4GHz - but I might try AMD next to see
if the different RAM approach works better and they are greener)
Memory: 8GB (4 x 2GB DDR2 - best price per GB)
HDD: SATA Disk (between 200 to 500GB - I had these from another project)

I have HAProxy between the App servers and Solr so that I get failover
if one of these goes down (expect failure).

Having only 1M documents but more data per document will mean your
situation is different. I am having particular performance issues with
facets and trying to get my head around all the issues involved there.

I see Mike has only 2 shards per box as he was "squeezing"
performance. I didn't see any significant gain in performance but that
is not to say there isn't one. Just for me, I had a level of
performance in mind and stopped when that was met. It took almost a
month of testing to get to that point so I was ready to move on to
other problems - I might revisit it later.

Also, my ghetto servers are getting similar reliability to the Dell
Servers I have - but I have built the system with the expectations
they will fail often although that has not happened yet.

On Tue, Aug 19, 2008 at 4:40 PM, Alexander Ramos Jardim
<[EMAIL PROTECTED]> wrote:
> As long as Solr/Lucene makes smart use from memory (and they from my
> experiences), it is really easy to calculate how long a huge query/update
> will take when you know how much the smaller ones will take. Just keep in
> mind that the resource consumption of memory and disk space is almost always
> proportional.
>
> 2008/8/19 Mike Klaas <[EMAIL PROTECTED]>
>
>>
>> On 19-Aug-08, at 12:58 PM, Phillip Farber wrote:
>>
>>>
>>> So you experience differs from Mike's.  Obviously it's an important
>>> decision as to whether to buy more machines.  Can you (or Mike) weigh in on
>>> what factors led to your different take on local shards vs. shards
>>> distributed across machines?
>>>
>>
>> I do both; the only reason I have two shards on each machine is to squeeze
>> maximum performance out of an equipment budget.  Err on the side of multiple
>> machines.
>>
>>  At least for building the index, the number of shards really does
>>>> help. To index Medline (1.6e7 docs which is 60Gb in XML text) on a
>>>> single machine starts at about 100doc/s but slows down to 10doc/s when
>>>> the index grows. It seems as though the limit is reached once you run
>>>> out of RAM and it gets slower and slower in a linear fashion the
>>>> larger the index you get.
>>>> My sweet spot was 5 machines with 8GB RAM for indexing about 60GB of
>>>> data.
>>>>
>>>
>>> Can you say what the specs were for these machines? Given that I have more
>>> like 1TB of data over 1M docs how do you think my machine requirements might
>>> be affected as compared to yours?
>>>
>>
>> You are in a much better position to determine this than we are.  See how
>> big an index you can put on a single machine while maintaining acceptible
>> performance using a typical query load.  It's relatively safe to extrapolate
>> linearly from that.
>>
>> -Mike
>>
>
>
>
> --
> Alexander Ramos Jardim
>



-- 
Regards,

Ian Connor
1 Leighton St #605
Cambridge, MA 02141
Direct Line: +1 (978) 6333372
Call Center Phone: +1 (714) 239 3875 (24 hrs)
Mobile Phone: +1 (312) 218 3209
Fax: +1(770) 818 5697
Suisse Phone: +41 (0) 22 548 1664
Skype: ian.connor

Re: shards and performance

Reply via email to