Dumb question time - you are using a 64 bit Java, and not a 32 bit Java? Bob Sandiford | Lead Software Engineer | SirsiDynix P: 800.288.8020 X6943 | bob.sandif...@sirsidynix.com www.sirsidynix.com
> -----Original Message----- > From: Bernd Fehling [mailto:bernd.fehl...@uni-bielefeld.de] > Sent: Thursday, August 04, 2011 2:39 AM > To: solr-user@lucene.apache.org > Subject: Re: performance crossover between single index and sharding > > Hi Shawn, > > the 0.05 seconds for search time at peek times (3 qps) is my target for > Solr. > The numbers for solr are from Solr's statistic report page. So 39.5 > seconds > average per request is definately to long and I have to change to > sharding. > > For FAST system the numbers for the search dispatcher are: > 0.042 sec elapsed per normal search, on avg. > 0.053 sec average uncached normal search time (last 100 queries). > 99.898% of searches using < 1 sec > 99.999% of searches using < 3 sec > 0.000% of all requests timed out > 22454567.577 sec time up (that is 259 days) > > Is there a report page for those numbers for Solr? > > About the RAM, the 32GB RAM sind physical for each VM and the 20GB RAM > are -Xmx for Java. > Yesterday I noticed that we are running out of heap during replication > so I have to > increase -Xmx to about 22g. > > The reported 0.6 average requests per second seams to me right because > the Solr system isn't under full load yet. The FAST system is still > taking > most of the load. I plan to switch completely to Solr after sharding is > up and > running stable. So there will be additional 3 qps to Solr at peek > times. > > I don't know if a controlling master like FAST makes any sense for > Solr. > The small VMs with heartbeat and haproxy sounds great, must be on my > todo list. > > But the biggest problem currently is, how to configure the DIH to split > up the > content to several indexer. Is there an indexing distributor? > > Regards, > Bernd > > > Am 03.08.2011 16:33, schrieb Shawn Heisey: > > Replies inline. > > > > On 8/3/2011 2:24 AM, Bernd Fehling wrote: > >> To show that I compare apples and oranges here are my previous FAST > Search setup: > >> - one master server (controlling, logging, search dispatcher) > >> - six index server (4.25 mio docs per server, 5 slices per index) > >> (searching and indexing at the same time, indexing once per week > during the weekend) > >> - each server has 4GB RAM, all servers are physical on seperate > machines > >> - RAM usage controlled by the processes > >> - total of 25.5 mio. docs (mainly metadata) from 1500 databases > worldwide > >> - index size is about 67GB per indexer --> about 402GB total > >> - about 3 qps at peek times > >> - with average search time of 0.05 seconds at peek times > > > > An average query time of 50 milliseconds isn't too bad. If the number > from your Solr setup below (39.5) is the QTime, then Solr thinks it is > > performing better, but Solr's QTime does not include absolutely > everything that hs to happen. Do you by chance have 95th and 99th > percentile > > query times for either system? > > > >> And here is now my current Solr setup: > >> - one master server (indexing only) > >> - two slave server (search only) but only one is online, the second > is fallback > >> - each server has 32GB RAM, all server are virtuell > >> (master on a seperate physical machine, both slaves together on a > physical machine) > >> - RAM usage is currently 20GB to java heap > >> - total of 31 mio. docs (all metadata) from 2000 databases worldwide > >> - index size is 156GB total > >> - search handler statistic report 0.6 average requests per second > >> - average time per request 39.5 (is that seconds?) > >> - building the index from scratch takes about 20 hours > > > > I can't tell whether you mean that each physical host has 32GB or > each VM has 32GB. You want to be sure that you are not oversubscribing > your > > memory. If you can get more memory in your machines, you really > should. Do you know whether that 0.6 seconds is most of the delay that > a user > > sees when making a search request, or are there other things going on > that contribute more delay? In our webapp, the Solr request time is > > usually small compared with everything else the server and the user's > browser are doing to render the results page. As much as I hate being > the > > tall pole in the tent, I look forward to the day when the developers > can change that balance. > > > >> The good thing is I have the ability to compare a commercial product > and > >> enterprise system to open source. > >> > >> I started with my simple Solr setup because of "kiss" (keep it > simple and stupid). > >> Actually it is doing excellent as single index on a single virtuell > server. > >> But the average time per request should be reduced now, thats why I > started > >> this discussion. > >> While searches with smaller Solr index size (3 mio. docs) showed > that it can > >> stand with FAST Search it now shows that its time to go with > sharding. > >> I think we are already far behind the point of search performance > crossover. > >> > >> What I hope to get with sharding: > >> - reduce time for building the index > >> - reduce average time per request > > > > You will probably achieve both of these things by sharding, > especially if you have a lot of CPU cores available. Like mine, your > query volume is > > very low, so the CPU cores are better utilized distributing the > search. > > > >> What I fear with sharding: > >> - i currently have master/slave, do I then have e.g. 3 master and 3 > slaves? > >> - the query changes because of sharding (is there a search > distributor?) > >> - how to distribute the content the indexer with DIH on 3 server? > >> - anything else to think about while changing to sharding? > > > > I think sharding is probably a good idea for you, as long as you > don't lose redundancy. You can duplicate the FAST concept of a master > server, > > in a Solr core with no index. The solrconfig.xml for the core needs > to include the shards parameter. That core combined with those shards > will > > make up one complete index chain, and you need to have at least two > complete chains, running on separate physical hardware. A load balancer > will > > be critical. I use two small VMs on separate hosts with heartbeat and > haproxy for mine. > > > > Thanks, > > Shawn > >