Re: shards and performance

Alexander Ramos Jardim Wed, 20 Aug 2008 07:23:53 -0700

2008/8/20 Ian Connor <[EMAIL PROTECTED]>

> So, because the OS is doing the caching in RAM. It means I could have
> 6 jetty servers per machine all pointing to the same data. Once the
> index is built, I can load up some more servers on different ports and
> it will boost performance.
>
> That does sound promising - thanks for the tip. What made you pick 6?
>


Each weblogic instance sits on top of a 2GB heap size JVM. Each cluster node
has 16GB RAM.


>
> On Wed, Aug 20, 2008 at 9:49 AM, Alexander Ramos Jardim
> <[EMAIL PROTECTED]> wrote:
> > Another thing to consider on your sharding is the access rate you want to
> > guarantee.
> >
> > In the project I am working, I need to guarantee at least 200hits/second
> > with various facets in all queries.
> >
> > I am not using sharding, but I have 6 Solr instances per cluster node,
> and I
> > have 3 nodes, to a total of 18 solr instances. Each node has only one
> index,
> > so I keep the 6 instance pointing to the same the index in a given node.
> > What made a huge diference in my performance was the removal of the lock.
> >
> > I expect that helps you out.
> >
> > 2008/8/20 Ian Connor <[EMAIL PROTECTED]>
> >
> >> I have based my machines on bare bones servers (I call them ghetto
> >> servers). I essentially have motherboards in a rack sitting on
> >> catering trays (heat resistance is key).
> >>
> >> http://web.mac.com/iconnor/iWeb/Site/ghetto-servers.html
> >>
> >> Motherboards: GIGABYTE GA-G33M-S2L (these are small mATX with 4 RAM
> >> slots - allows as much cheap RAM as possible)
> >> CPU: Intel Q6600 (quad core 2.4GHz - but I might try AMD next to see
> >> if the different RAM approach works better and they are greener)
> >> Memory: 8GB (4 x 2GB DDR2 - best price per GB)
> >> HDD: SATA Disk (between 200 to 500GB - I had these from another project)
> >>
> >> I have HAProxy between the App servers and Solr so that I get failover
> >> if one of these goes down (expect failure).
> >>
> >> Having only 1M documents but more data per document will mean your
> >> situation is different. I am having particular performance issues with
> >> facets and trying to get my head around all the issues involved there.
> >>
> >> I see Mike has only 2 shards per box as he was "squeezing"
> >> performance. I didn't see any significant gain in performance but that
> >> is not to say there isn't one. Just for me, I had a level of
> >> performance in mind and stopped when that was met. It took almost a
> >> month of testing to get to that point so I was ready to move on to
> >> other problems - I might revisit it later.
> >>
> >> Also, my ghetto servers are getting similar reliability to the Dell
> >> Servers I have - but I have built the system with the expectations
> >> they will fail often although that has not happened yet.
> >>
> >> On Tue, Aug 19, 2008 at 4:40 PM, Alexander Ramos Jardim
> >> <[EMAIL PROTECTED]> wrote:
> >> > As long as Solr/Lucene makes smart use from memory (and they from my
> >> > experiences), it is really easy to calculate how long a huge
> query/update
> >> > will take when you know how much the smaller ones will take. Just keep
> in
> >> > mind that the resource consumption of memory and disk space is almost
> >> always
> >> > proportional.
> >> >
> >> > 2008/8/19 Mike Klaas <[EMAIL PROTECTED]>
> >> >
> >> >>
> >> >> On 19-Aug-08, at 12:58 PM, Phillip Farber wrote:
> >> >>
> >> >>>
> >> >>> So you experience differs from Mike's.  Obviously it's an important
> >> >>> decision as to whether to buy more machines.  Can you (or Mike)
> weigh
> >> in on
> >> >>> what factors led to your different take on local shards vs. shards
> >> >>> distributed across machines?
> >> >>>
> >> >>
> >> >> I do both; the only reason I have two shards on each machine is to
> >> squeeze
> >> >> maximum performance out of an equipment budget.  Err on the side of
> >> multiple
> >> >> machines.
> >> >>
> >> >>  At least for building the index, the number of shards really does
> >> >>>> help. To index Medline (1.6e7 docs which is 60Gb in XML text) on a
> >> >>>> single machine starts at about 100doc/s but slows down to 10doc/s
> when
> >> >>>> the index grows. It seems as though the limit is reached once you
> run
> >> >>>> out of RAM and it gets slower and slower in a linear fashion the
> >> >>>> larger the index you get.
> >> >>>> My sweet spot was 5 machines with 8GB RAM for indexing about 60GB
> of
> >> >>>> data.
> >> >>>>
> >> >>>
> >> >>> Can you say what the specs were for these machines? Given that I
> have
> >> more
> >> >>> like 1TB of data over 1M docs how do you think my machine
> requirements
> >> might
> >> >>> be affected as compared to yours?
> >> >>>
> >> >>
> >> >> You are in a much better position to determine this than we are.  See
> >> how
> >> >> big an index you can put on a single machine while maintaining
> >> acceptible
> >> >> performance using a typical query load.  It's relatively safe to
> >> extrapolate
> >> >> linearly from that.
> >> >>
> >> >> -Mike
> >> >>
> >> >
> >> >
> >> >
> >> > --
> >> > Alexander Ramos Jardim
> >> >
> >>
> >>
> >>
> >> --
> >> Regards,
> >>
> >> Ian Connor
> >> 1 Leighton St #605
> >> Cambridge, MA 02141
> >> Direct Line: +1 (978) 6333372
> >> Call Center Phone: +1 (714) 239 3875 (24 hrs)
> >> Mobile Phone: +1 (312) 218 3209
> >> Fax: +1(770) 818 5697
> >> Suisse Phone: +41 (0) 22 548 1664
> >> Skype: ian.connor
> >>
> >
> >
> >
> > --
> > Alexander Ramos Jardim
> >
>
>
>
> --
> Regards,
>
> Ian Connor
> 1 Leighton St #605
> Cambridge, MA 02141
> Direct Line: +1 (978) 6333372
> Call Center Phone: +1 (714) 239 3875 (24 hrs)
> Mobile Phone: +1 (312) 218 3209
> Fax: +1(770) 818 5697
> Suisse Phone: +41 (0) 22 548 1664
> Skype: ian.connor
>



-- 
Alexander Ramos Jardim

Re: shards and performance

Reply via email to