RE: shards and performance

Lance Norskog Thu, 21 Aug 2008 11:10:25 -0700

We found that searching by itself was faster with the Distributed multicore
search over three cores in the same servlet engine, than one just one core.


Faceting and sorting use more memory than simple searches, and we could not
do faceting on our one simple index. We needed this for data analysis. With
three cores we could, and it was surprisingly fast. Each core would do its
facets, then forget its faceting data. The next one garbage collects that
space and can its own facets.

Lance

-----Original Message-----
From: Alexander Ramos Jardim [mailto:[EMAIL PROTECTED] 
Sent: Thursday, August 21, 2008 10:59 AM
To: solr-user@lucene.apache.org
Subject: Re: shards and performance

2008/8/21 Otis Gospodnetic <[EMAIL PROTECTED]>

> Uh uh.  6 instances per node all pointing to the same index?
> Yes, this can increase performance, but only because it essentially 
> gives you 6 separate searchers (SolrIndexSearchers).  This clearly 
> uses more RAM, especially if you sort on fields and especially if you 
> are not omiting norms where you can.


I know this is a memory hog approach. Is there another way to keep various
independent searchers open?


> Is this a dual or quad-core box and how big is your index, Alexander?


Machine Specs:
processor: 2 x Quad Core
memory: 32GB RAM
disk: (the f0cking sysdamin didn't give me the specs)

Index Specs:
Each Solr instance has 6 indexes (multicore). the total size they ocupy is
less than 3GB. The total number of docs is less than 100 million. We have
little real tiny docs (only 3 fields) and one big doc index (something like
60ish fields).

We do lots of faceting in all queries.

>
>
> Otis
> --
> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
>
>
>
> ----- Original Message ----
> > From: Alexander Ramos Jardim <[EMAIL PROTECTED]>
> > To: solr-user@lucene.apache.org
> > Sent: Wednesday, August 20, 2008 9:49:04 AM
> > Subject: Re: shards and performance
> >
> > Another thing to consider on your sharding is the access rate you 
> > want to guarantee.
> >
> > In the project I am working, I need to guarantee at least 
> > 200hits/second with various facets in all queries.
> >
> > I am not using sharding, but I have 6 Solr instances per cluster 
> > node,
> and I
> > have 3 nodes, to a total of 18 solr instances. Each node has only 
> > one
> index,
> > so I keep the 6 instance pointing to the same the index in a given node.
> > What made a huge diference in my performance was the removal of the
lock.
> >
> > I expect that helps you out.
> >
> > 2008/8/20 Ian Connor
> >
> > > I have based my machines on bare bones servers (I call them ghetto 
> > > servers). I essentially have motherboards in a rack sitting on 
> > > catering trays (heat resistance is key).
> > >
> > > http://web.mac.com/iconnor/iWeb/Site/ghetto-servers.html
> > >
> > > Motherboards: GIGABYTE GA-G33M-S2L (these are small mATX with 4 
> > > RAM slots - allows as much cheap RAM as possible)
> > > CPU: Intel Q6600 (quad core 2.4GHz - but I might try AMD next to 
> > > see if the different RAM approach works better and they are 
> > > greener)
> > > Memory: 8GB (4 x 2GB DDR2 - best price per GB)
> > > HDD: SATA Disk (between 200 to 500GB - I had these from another
> project)
> > >
> > > I have HAProxy between the App servers and Solr so that I get 
> > > failover if one of these goes down (expect failure).
> > >
> > > Having only 1M documents but more data per document will mean your 
> > > situation is different. I am having particular performance issues 
> > > with facets and trying to get my head around all the issues involved
there.
> > >
> > > I see Mike has only 2 shards per box as he was "squeezing"
> > > performance. I didn't see any significant gain in performance but 
> > > that is not to say there isn't one. Just for me, I had a level of 
> > > performance in mind and stopped when that was met. It took almost 
> > > a month of testing to get to that point so I was ready to move on 
> > > to other problems - I might revisit it later.
> > >
> > > Also, my ghetto servers are getting similar reliability to the 
> > > Dell Servers I have - but I have built the system with the 
> > > expectations they will fail often although that has not happened yet.
> > >
> > > On Tue, Aug 19, 2008 at 4:40 PM, Alexander Ramos Jardim
> > > wrote:
> > > > As long as Solr/Lucene makes smart use from memory (and they 
> > > > from my experiences), it is really easy to calculate how long a 
> > > > huge
> query/update
> > > > will take when you know how much the smaller ones will take. 
> > > > Just
> keep in
> > > > mind that the resource consumption of memory and disk space is 
> > > > almost
> > > always
> > > > proportional.
> > > >
> > > > 2008/8/19 Mike Klaas
> > > >
> > > >>
> > > >> On 19-Aug-08, at 12:58 PM, Phillip Farber wrote:
> > > >>
> > > >>>
> > > >>> So you experience differs from Mike's.  Obviously it's an 
> > > >>> important decision as to whether to buy more machines.  Can 
> > > >>> you (or Mike)
> weigh
> > > in on
> > > >>> what factors led to your different take on local shards vs. 
> > > >>> shards distributed across machines?
> > > >>>
> > > >>
> > > >> I do both; the only reason I have two shards on each machine is 
> > > >> to
> > > squeeze
> > > >> maximum performance out of an equipment budget.  Err on the 
> > > >> side of
> > > multiple
> > > >> machines.
> > > >>
> > > >>  At least for building the index, the number of shards really 
> > > >> does
> > > >>>> help. To index Medline (1.6e7 docs which is 60Gb in XML text) 
> > > >>>> on a single machine starts at about 100doc/s but slows down 
> > > >>>> to 10doc/s
> when
> > > >>>> the index grows. It seems as though the limit is reached once 
> > > >>>> you
> run
> > > >>>> out of RAM and it gets slower and slower in a linear fashion 
> > > >>>> the larger the index you get.
> > > >>>> My sweet spot was 5 machines with 8GB RAM for indexing about 
> > > >>>> 60GB
> of
> > > >>>> data.
> > > >>>>
> > > >>>
> > > >>> Can you say what the specs were for these machines? Given that 
> > > >>> I
> have
> > > more
> > > >>> like 1TB of data over 1M docs how do you think my machine
> requirements
> > > might
> > > >>> be affected as compared to yours?
> > > >>>
> > > >>
> > > >> You are in a much better position to determine this than we are.
>  See
> > > how
> > > >> big an index you can put on a single machine while maintaining
> > > acceptible
> > > >> performance using a typical query load.  It's relatively safe 
> > > >> to
> > > extrapolate
> > > >> linearly from that.
> > > >>
> > > >> -Mike
> > > >>
> > > >
> > > >
> > > >
> > > > --
> > > > Alexander Ramos Jardim
> > > >
> > >
> > >
> > >
> > > --
> > > Regards,
> > >
> > > Ian Connor
> > > 1 Leighton St #605
> > > Cambridge, MA 02141
> > > Direct Line: +1 (978) 6333372
> > > Call Center Phone: +1 (714) 239 3875 (24 hrs) Mobile Phone: +1 
> > > (312) 218 3209
> > > Fax: +1(770) 818 5697
> > > Suisse Phone: +41 (0) 22 548 1664
> > > Skype: ian.connor
> > >
> >
> >
> >
> > --
> > Alexander Ramos Jardim
>
>


--
Alexander Ramos Jardim

RE: shards and performance

Reply via email to