Re: shards and performance

Ian Connor Wed, 20 Aug 2008 06:59:16 -0700

So, because the OS is doing the caching in RAM. It means I could have
6 jetty servers per machine all pointing to the same data. Once the
index is built, I can load up some more servers on different ports and
it will boost performance.


That does sound promising - thanks for the tip. What made you pick 6?

On Wed, Aug 20, 2008 at 9:49 AM, Alexander Ramos Jardim
<[EMAIL PROTECTED]> wrote:
> Another thing to consider on your sharding is the access rate you want to
> guarantee.
>
> In the project I am working, I need to guarantee at least 200hits/second
> with various facets in all queries.
>
> I am not using sharding, but I have 6 Solr instances per cluster node, and I
> have 3 nodes, to a total of 18 solr instances. Each node has only one index,
> so I keep the 6 instance pointing to the same the index in a given node.
> What made a huge diference in my performance was the removal of the lock.
>
> I expect that helps you out.
>
> 2008/8/20 Ian Connor <[EMAIL PROTECTED]>
>
>> I have based my machines on bare bones servers (I call them ghetto
>> servers). I essentially have motherboards in a rack sitting on
>> catering trays (heat resistance is key).
>>
>> http://web.mac.com/iconnor/iWeb/Site/ghetto-servers.html
>>
>> Motherboards: GIGABYTE GA-G33M-S2L (these are small mATX with 4 RAM
>> slots - allows as much cheap RAM as possible)
>> CPU: Intel Q6600 (quad core 2.4GHz - but I might try AMD next to see
>> if the different RAM approach works better and they are greener)
>> Memory: 8GB (4 x 2GB DDR2 - best price per GB)
>> HDD: SATA Disk (between 200 to 500GB - I had these from another project)
>>
>> I have HAProxy between the App servers and Solr so that I get failover
>> if one of these goes down (expect failure).
>>
>> Having only 1M documents but more data per document will mean your
>> situation is different. I am having particular performance issues with
>> facets and trying to get my head around all the issues involved there.
>>
>> I see Mike has only 2 shards per box as he was "squeezing"
>> performance. I didn't see any significant gain in performance but that
>> is not to say there isn't one. Just for me, I had a level of
>> performance in mind and stopped when that was met. It took almost a
>> month of testing to get to that point so I was ready to move on to
>> other problems - I might revisit it later.
>>
>> Also, my ghetto servers are getting similar reliability to the Dell
>> Servers I have - but I have built the system with the expectations
>> they will fail often although that has not happened yet.
>>
>> On Tue, Aug 19, 2008 at 4:40 PM, Alexander Ramos Jardim
>> <[EMAIL PROTECTED]> wrote:
>> > As long as Solr/Lucene makes smart use from memory (and they from my
>> > experiences), it is really easy to calculate how long a huge query/update
>> > will take when you know how much the smaller ones will take. Just keep in
>> > mind that the resource consumption of memory and disk space is almost
>> always
>> > proportional.
>> >
>> > 2008/8/19 Mike Klaas <[EMAIL PROTECTED]>
>> >
>> >>
>> >> On 19-Aug-08, at 12:58 PM, Phillip Farber wrote:
>> >>
>> >>>
>> >>> So you experience differs from Mike's.  Obviously it's an important
>> >>> decision as to whether to buy more machines.  Can you (or Mike) weigh
>> in on
>> >>> what factors led to your different take on local shards vs. shards
>> >>> distributed across machines?
>> >>>
>> >>
>> >> I do both; the only reason I have two shards on each machine is to
>> squeeze
>> >> maximum performance out of an equipment budget.  Err on the side of
>> multiple
>> >> machines.
>> >>
>> >>  At least for building the index, the number of shards really does
>> >>>> help. To index Medline (1.6e7 docs which is 60Gb in XML text) on a
>> >>>> single machine starts at about 100doc/s but slows down to 10doc/s when
>> >>>> the index grows. It seems as though the limit is reached once you run
>> >>>> out of RAM and it gets slower and slower in a linear fashion the
>> >>>> larger the index you get.
>> >>>> My sweet spot was 5 machines with 8GB RAM for indexing about 60GB of
>> >>>> data.
>> >>>>
>> >>>
>> >>> Can you say what the specs were for these machines? Given that I have
>> more
>> >>> like 1TB of data over 1M docs how do you think my machine requirements
>> might
>> >>> be affected as compared to yours?
>> >>>
>> >>
>> >> You are in a much better position to determine this than we are.  See
>> how
>> >> big an index you can put on a single machine while maintaining
>> acceptible
>> >> performance using a typical query load.  It's relatively safe to
>> extrapolate
>> >> linearly from that.
>> >>
>> >> -Mike
>> >>
>> >
>> >
>> >
>> > --
>> > Alexander Ramos Jardim
>> >
>>
>>
>>
>> --
>> Regards,
>>
>> Ian Connor
>> 1 Leighton St #605
>> Cambridge, MA 02141
>> Direct Line: +1 (978) 6333372
>> Call Center Phone: +1 (714) 239 3875 (24 hrs)
>> Mobile Phone: +1 (312) 218 3209
>> Fax: +1(770) 818 5697
>> Suisse Phone: +41 (0) 22 548 1664
>> Skype: ian.connor
>>
>
>
>
> --
> Alexander Ramos Jardim
>



-- 
Regards,

Ian Connor
1 Leighton St #605
Cambridge, MA 02141
Direct Line: +1 (978) 6333372
Call Center Phone: +1 (714) 239 3875 (24 hrs)
Mobile Phone: +1 (312) 218 3209
Fax: +1(770) 818 5697
Suisse Phone: +41 (0) 22 548 1664
Skype: ian.connor

Re: shards and performance

Reply via email to