RE: performance crossover between single index and sharding

Bob Sandiford Thu, 04 Aug 2011 05:41:23 -0700

Dumb question time - you are using a 64 bit Java, and not a 32 bit Java?

Bob Sandiford | Lead Software Engineer | SirsiDynix
P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com
www.sirsidynix.com



> -----Original Message-----
> From: Bernd Fehling [mailto:bernd.fehl...@uni-bielefeld.de]
> Sent: Thursday, August 04, 2011 2:39 AM
> To: solr-user@lucene.apache.org
> Subject: Re: performance crossover between single index and sharding
> 
> Hi Shawn,
> 
> the 0.05 seconds for search time at peek times (3 qps) is my target for
> Solr.
> The numbers for solr are from Solr's statistic report page. So 39.5
> seconds
> average per request is definately to long and I have to change to
> sharding.
> 
> For FAST system the numbers for the search dispatcher are:
>       0.042 sec elapsed per normal search, on avg.
>       0.053 sec average uncached normal search time (last 100 queries).
>       99.898% of searches using < 1 sec
>       99.999% of searches using < 3 sec
>       0.000% of all requests timed out
>       22454567.577 sec time up (that is 259 days)
> 
> Is there a report page for those numbers for Solr?
> 
> About the RAM, the 32GB RAM sind physical for each VM and the 20GB RAM
> are -Xmx for Java.
> Yesterday I noticed that we are running out of heap during replication
> so I have to
> increase -Xmx to about 22g.
> 
> The reported 0.6 average requests per second seams to me right because
> the Solr system isn't under full load yet. The FAST system is still
> taking
> most of the load. I plan to switch completely to Solr after sharding is
> up and
> running stable. So there will be additional 3 qps to Solr at peek
> times.
> 
> I don't know if a controlling master like FAST makes any sense for
> Solr.
> The small VMs with heartbeat and haproxy sounds great, must be on my
> todo list.
> 
> But the biggest problem currently is, how to configure the DIH to split
> up the
> content to several indexer. Is there an indexing distributor?
> 
> Regards,
> Bernd
> 
> 
> Am 03.08.2011 16:33, schrieb Shawn Heisey:
> > Replies inline.
> >
> > On 8/3/2011 2:24 AM, Bernd Fehling wrote:
> >> To show that I compare apples and oranges here are my previous FAST
> Search setup:
> >> - one master server (controlling, logging, search dispatcher)
> >> - six index server (4.25 mio docs per server, 5 slices per index)
> >> (searching and indexing at the same time, indexing once per week
> during the weekend)
> >> - each server has 4GB RAM, all servers are physical on seperate
> machines
> >> - RAM usage controlled by the processes
> >> - total of 25.5 mio. docs (mainly metadata) from 1500 databases
> worldwide
> >> - index size is about 67GB per indexer --> about 402GB total
> >> - about 3 qps at peek times
> >> - with average search time of 0.05 seconds at peek times
> >
> > An average query time of 50 milliseconds isn't too bad. If the number
> from your Solr setup below (39.5) is the QTime, then Solr thinks it is
> > performing better, but Solr's QTime does not include absolutely
> everything that hs to happen. Do you by chance have 95th and 99th
> percentile
> > query times for either system?
> >
> >> And here is now my current Solr setup:
> >> - one master server (indexing only)
> >> - two slave server (search only) but only one is online, the second
> is fallback
> >> - each server has 32GB RAM, all server are virtuell
> >> (master on a seperate physical machine, both slaves together on a
> physical machine)
> >> - RAM usage is currently 20GB to java heap
> >> - total of 31 mio. docs (all metadata) from 2000 databases worldwide
> >> - index size is 156GB total
> >> - search handler statistic report 0.6 average requests per second
> >> - average time per request 39.5 (is that seconds?)
> >> - building the index from scratch takes about 20 hours
> >
> > I can't tell whether you mean that each physical host has 32GB or
> each VM has 32GB. You want to be sure that you are not oversubscribing
> your
> > memory. If you can get more memory in your machines, you really
> should. Do you know whether that 0.6 seconds is most of the delay that
> a user
> > sees when making a search request, or are there other things going on
> that contribute more delay? In our webapp, the Solr request time is
> > usually small compared with everything else the server and the user's
> browser are doing to render the results page. As much as I hate being
> the
> > tall pole in the tent, I look forward to the day when the developers
> can change that balance.
> >
> >> The good thing is I have the ability to compare a commercial product
> and
> >> enterprise system to open source.
> >>
> >> I started with my simple Solr setup because of "kiss" (keep it
> simple and stupid).
> >> Actually it is doing excellent as single index on a single virtuell
> server.
> >> But the average time per request should be reduced now, thats why I
> started
> >> this discussion.
> >> While searches with smaller Solr index size (3 mio. docs) showed
> that it can
> >> stand with FAST Search it now shows that its time to go with
> sharding.
> >> I think we are already far behind the point of search performance
> crossover.
> >>
> >> What I hope to get with sharding:
> >> - reduce time for building the index
> >> - reduce average time per request
> >
> > You will probably achieve both of these things by sharding,
> especially if you have a lot of CPU cores available. Like mine, your
> query volume is
> > very low, so the CPU cores are better utilized distributing the
> search.
> >
> >> What I fear with sharding:
> >> - i currently have master/slave, do I then have e.g. 3 master and 3
> slaves?
> >> - the query changes because of sharding (is there a search
> distributor?)
> >> - how to distribute the content the indexer with DIH on 3 server?
> >> - anything else to think about while changing to sharding?
> >
> > I think sharding is probably a good idea for you, as long as you
> don't lose redundancy. You can duplicate the FAST concept of a master
> server,
> > in a Solr core with no index. The solrconfig.xml for the core needs
> to include the shards parameter. That core combined with those shards
> will
> > make up one complete index chain, and you need to have at least two
> complete chains, running on separate physical hardware. A load balancer
> will
> > be critical. I use two small VMs on separate hosts with heartbeat and
> haproxy for mine.
> >
> > Thanks,
> > Shawn
> >

RE: performance crossover between single index and sharding

Reply via email to