Re: shards and performance

Alexander Ramos Jardim Thu, 21 Aug 2008 10:59:28 -0700

2008/8/21 Otis Gospodnetic <[EMAIL PROTECTED]>

> Uh uh.  6 instances per node all pointing to the same index?
> Yes, this can increase performance, but only because it essentially gives
> you 6 separate searchers (SolrIndexSearchers).  This clearly uses more RAM,
> especially if you sort on fields and especially if you are not omiting norms
> where you can.



I know this is a memory hog approach. Is there another way to keep various
independent searchers open?


> Is this a dual or quad-core box and how big is your index, Alexander?


Machine Specs:
processor: 2 x Quad Core
memory: 32GB RAM
disk: (the f0cking sysdamin didn't give me the specs)

Index Specs:
Each Solr instance has 6 indexes (multicore). the total size they ocupy is
less than 3GB. The total number of docs is less than 100 million. We have
little real tiny docs (only 3 fields) and one big doc index (something like
60ish fields).

We do lots of faceting in all queries.

>
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> ----- Original Message ----
> > From: Alexander Ramos Jardim <[EMAIL PROTECTED]>
> > To: solr-user@lucene.apache.org
> > Sent: Wednesday, August 20, 2008 9:49:04 AM
> > Subject: Re: shards and performance
> >
> > Another thing to consider on your sharding is the access rate you want to
> > guarantee.
> >
> > In the project I am working, I need to guarantee at least 200hits/second
> > with various facets in all queries.
> >
> > I am not using sharding, but I have 6 Solr instances per cluster node,
> and I
> > have 3 nodes, to a total of 18 solr instances. Each node has only one
> index,
> > so I keep the 6 instance pointing to the same the index in a given node.
> > What made a huge diference in my performance was the removal of the lock.
> >
> > I expect that helps you out.
> >
> > 2008/8/20 Ian Connor
> >
> > > I have based my machines on bare bones servers (I call them ghetto
> > > servers). I essentially have motherboards in a rack sitting on
> > > catering trays (heat resistance is key).
> > >
> > > http://web.mac.com/iconnor/iWeb/Site/ghetto-servers.html
> > >
> > > Motherboards: GIGABYTE GA-G33M-S2L (these are small mATX with 4 RAM
> > > slots - allows as much cheap RAM as possible)
> > > CPU: Intel Q6600 (quad core 2.4GHz - but I might try AMD next to see
> > > if the different RAM approach works better and they are greener)
> > > Memory: 8GB (4 x 2GB DDR2 - best price per GB)
> > > HDD: SATA Disk (between 200 to 500GB - I had these from another
> project)
> > >
> > > I have HAProxy between the App servers and Solr so that I get failover
> > > if one of these goes down (expect failure).
> > >
> > > Having only 1M documents but more data per document will mean your
> > > situation is different. I am having particular performance issues with
> > > facets and trying to get my head around all the issues involved there.
> > >
> > > I see Mike has only 2 shards per box as he was "squeezing"
> > > performance. I didn't see any significant gain in performance but that
> > > is not to say there isn't one. Just for me, I had a level of
> > > performance in mind and stopped when that was met. It took almost a
> > > month of testing to get to that point so I was ready to move on to
> > > other problems - I might revisit it later.
> > >
> > > Also, my ghetto servers are getting similar reliability to the Dell
> > > Servers I have - but I have built the system with the expectations
> > > they will fail often although that has not happened yet.
> > >
> > > On Tue, Aug 19, 2008 at 4:40 PM, Alexander Ramos Jardim
> > > wrote:
> > > > As long as Solr/Lucene makes smart use from memory (and they from my
> > > > experiences), it is really easy to calculate how long a huge
> query/update
> > > > will take when you know how much the smaller ones will take. Just
> keep in
> > > > mind that the resource consumption of memory and disk space is almost
> > > always
> > > > proportional.
> > > >
> > > > 2008/8/19 Mike Klaas
> > > >
> > > >>
> > > >> On 19-Aug-08, at 12:58 PM, Phillip Farber wrote:
> > > >>
> > > >>>
> > > >>> So you experience differs from Mike's.  Obviously it's an important
> > > >>> decision as to whether to buy more machines.  Can you (or Mike)
> weigh
> > > in on
> > > >>> what factors led to your different take on local shards vs. shards
> > > >>> distributed across machines?
> > > >>>
> > > >>
> > > >> I do both; the only reason I have two shards on each machine is to
> > > squeeze
> > > >> maximum performance out of an equipment budget.  Err on the side of
> > > multiple
> > > >> machines.
> > > >>
> > > >>  At least for building the index, the number of shards really does
> > > >>>> help. To index Medline (1.6e7 docs which is 60Gb in XML text) on a
> > > >>>> single machine starts at about 100doc/s but slows down to 10doc/s
> when
> > > >>>> the index grows. It seems as though the limit is reached once you
> run
> > > >>>> out of RAM and it gets slower and slower in a linear fashion the
> > > >>>> larger the index you get.
> > > >>>> My sweet spot was 5 machines with 8GB RAM for indexing about 60GB
> of
> > > >>>> data.
> > > >>>>
> > > >>>
> > > >>> Can you say what the specs were for these machines? Given that I
> have
> > > more
> > > >>> like 1TB of data over 1M docs how do you think my machine
> requirements
> > > might
> > > >>> be affected as compared to yours?
> > > >>>
> > > >>
> > > >> You are in a much better position to determine this than we are.
>  See
> > > how
> > > >> big an index you can put on a single machine while maintaining
> > > acceptible
> > > >> performance using a typical query load.  It's relatively safe to
> > > extrapolate
> > > >> linearly from that.
> > > >>
> > > >> -Mike
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > Alexander Ramos Jardim
> > > >
> > >
> > >
> > >
> > > --
> > > Regards,
> > >
> > > Ian Connor
> > > 1 Leighton St #605
> > > Cambridge, MA 02141
> > > Direct Line: +1 (978) 6333372
> > > Call Center Phone: +1 (714) 239 3875 (24 hrs)
> > > Mobile Phone: +1 (312) 218 3209
> > > Fax: +1(770) 818 5697
> > > Suisse Phone: +41 (0) 22 548 1664
> > > Skype: ian.connor
> > >
> >
> >
> >
> > --
> > Alexander Ramos Jardim
>
>


-- 
Alexander Ramos Jardim

Re: shards and performance

Reply via email to