Hi all, I'd love to share the diagram, just not sure how to do that on the list (it's a word document I tried to send as attachment).
Jens, to answer your questions: 1. Correct, in our setup the source of the data is a DB from which we pull the data using DIH (search the list for my previous post "DIH - deleting documents, high performance (delta) imports, and passing parameters" if you want info about that). We were lucky enough to have the data sharded at the DB level before we started using Solr, so using the same shards was an easy extension. Note that we're not (yet...) using SolrCloud, it was just something I thought you should consider. 2. I got the idea for the "aggregator" from the Solr book (PACKT). I don't remember if that term was used in the book or if I made it up (if Google doesn't know it, I probably mad it up...), but I think it conveys what this part of the puzzle does. As you said, this is simply a Solr instance which doesn't hold its own index, but shares the same schema as the slaves and masters. I actually defined the default query handler on this instance to include the shards parameter (see below), so the client doesn't have to know anything about the internal workings of the sharded setup, it just hits the aggregator load balancer with a regular query and everything is handled behind the scenes. This simplifies the client and allows me to change the architecture in the future (i.e. change the number of shards or their DNS name) without requiring a client change. Sharded query handler: <requestHandler name="sharded" class="solr.SearchHandler" default="${aggregator:false}"> <!-- default values for query parameters --> <lst name="defaults"> <str name="echoParams">explicit</str> <str name="shards">${slaveUrls:null}</str> </lst> </requestHandler> All of our Solr instances share the same configs (solrconfig.xml, schema.xml, etc.) and different instances take different roles according to properties defined in solr.xml which is generated by a script specifically for each Solr instance (the script has a "map" of which instances should be on which host, and has to be run once on each host). In this case, this is how the generated solr.xml looks: <solr sharedLib="../lib" persistent="true"> <property name="name" value="aggregator" /> -- just a name that appears in Solr management -- to make it easier to know which instance you're on <property name="aggregator" value="true" /> -- this tells the instance is an aggregator, -- so it should use the sharded request handler by default -- masters and slaves have master/slave accordingly do define -- replication, a regular default search handler for slaves, -- and DIH on masters <property name="shardID" value="" /> -- this is used by instances which are shards in order to determine which -- DB they should import from (masters) -- and which master they should replicate from (slaves) <property name="slaveUrls" value="long,list.of,shard.urls" /> -- used by the sharded request handler <property name="HealthCheckDir" value="/data/servers/xxxxx_solr/ aggregator/core0/conf" /> -- used by load balancer to -- know if this instance is alive <cores adminPath="/admin/cores" defaultCoreName="prod"> <core name="prod" instanceDir="core0/"/> -- just one core for this instance -- indexers have 2 cores, one prod and one for full reindex </cores> </solr> Let me know if I can assist any further. Ephraim Ofir -----Original Message----- From: Jonathan DeMello [mailto:demello....@googlemail.com] Sent: Wednesday, April 06, 2011 8:58 AM To: solr-user@lucene.apache.org Cc: Isan Fulia; Tirthankar Chatterjee Subject: Re: FW: Very very large scale Solr Deployment = how to do (Expert Question)? I third that request. Would greatly appreciate taking a look at that diagram! Regards, Jonathan On Wed, Apr 6, 2011 at 9:12 AM, Isan Fulia <isan.fu...@germinait.com> wrote: > Hi Ephraim/Jen, > > Can u share that diagram with all.It may really help all of us. > Thanks, > Isan Fulia. > > On 6 April 2011 10:15, Tirthankar Chatterjee <tchatter...@commvault.com > >wrote: > > > Hi Jen, > > Can you please forward the diagram attachment too that Ephraim sent. :-) > > Thanks, > > Tirthankar > > > > -----Original Message----- > > From: Jens Mueller [mailto:supidupi...@googlemail.com] > > Sent: Tuesday, April 05, 2011 10:30 PM > > To: solr-user@lucene.apache.org > > Subject: Re: FW: Very very large scale Solr Deployment = how to do > (Expert > > Question)? > > > > Hello Ephraim, > > > > thank you so much for the great Document/Scaling-Concept!! > > > > First I think you really should publish this on the solr wiki. This > > approach is nowhere documented there and not really obvious for newbies > and > > your document is great and explains this very well! > > > > Please allow me to further questions regarding your document: > > 1.) Is it correct, that you mean by "DB" the Origin-Data-Source of the > data > > that is fed into the Solr "Cloud" for searching? > > > > 2.) Solr Aggregator: This term did not yeald any google results, but is a > > very important aspect of your design (and this was the missing piece for > me > > when thinking about solr architectures): Is it cocrrec that the > > "aggregators" are simply tomcat instances, with the solr webapp deployed? > > These Aggregators do not have their own index but only run the solr > webapp > > and I access them via the ?shard= parameter giving the shards I want to > > query? (So in the end they aggreate the data of the shards but do not > have > > their own data). This is really an important aspect that is not > documented > > well enough in the solr documentation. > > > > Thank you very much! > > Jens > > > > > > 2011/4/5 Ephraim Ofir <ephra...@icq.com> > > > > > of course the attachment didn't get to the list, so here it is if you > > > want it... > > > > > > Ephraim Ofir > > > > > > > > > -----Original Message----- > > > From: Ephraim Ofir > > > Sent: Tuesday, April 05, 2011 10:20 AM > > > To: 'solr-user@lucene.apache.org' > > > Subject: RE: Very very large scale Solr Deployment = how to do (Expert > > > Question)? > > > > > > I'm not sure about the scale you're aiming for, but you probably want > > > to do both sharding and replication. There's no central server which > > > would be the bottleneck. The guidelines should probably be something > > like: > > > 1. Split your index to enough shards so it can keep up with the update > > > rate. > > > 2. Have enough replicates of each shard master to keep up with the > > > rate of queries. > > > 3. Have enough aggregators in front of the shard replicates so the > > > aggregation doesn't become a bottleneck. > > > 4. Make sure you have good load balancing across your system. > > > > > > Attached is a diagram of the setup we have. You might want to look > > > into SolrCloud as well. > > > > > > Ephraim Ofir > > > > > > > > > -----Original Message----- > > > From: Jens Mueller [mailto:supidupi...@googlemail.com] > > > Sent: Tuesday, April 05, 2011 4:25 AM > > > To: solr-user@lucene.apache.org > > > Subject: Very very large scale Solr Deployment = how to do (Expert > > > Question)? > > > > > > Hello Experts, > > > > > > > > > > > > I am a Solr newbie but read quite a lot of docs. I still do not > > > understand what would be the best way to setup very large scale > > > deployments: > > > > > > > > > > > > Goal (threoretical): > > > > > > A.) Index-Size: 1 Petabyte (1 Document is about 5 KB in Size) > > > > > > B) Queries: 100000 Queries/ per Second > > > > > > C) Updates: 100000 Updates / per Second > > > > > > > > > > > > > > > Solr offers: > > > > > > 1.) Replication => Scales Well for B) BUT A) and C) are not > > > satisfied > > > > > > > > > 2.) Sharding => Scales well for A) BUT B) and C) are not satisfied > > > (=> As > > > I understand the Sharding approach all goes through a central server, > > > that dispatches the updates and assembles the quries retrieved from > > > the different shards. But this central server has also some capacity > > > limits...) > > > > > > > > > > > > > > > What is the right approach to handle such large deployments? I would > > > be thankfull for just a rough sketch of the concepts so I can > > > experiment/search further... > > > > > > > > > Maybe I am missing something very trivial as I think some of the "Solr > > > Users/Use Cases" on the homepage are that kind of large deployments. > > > How are they implemented? > > > > > > > > > > > > Thanky very much!!! > > > > > > Jens > > > > > ******************Legal Disclaimer*************************** > > "This communication may contain confidential and privileged > > material for the sole use of the intended recipient. Any > > unauthorized review, use or distribution by others is strictly > > prohibited. If you have received the message in error, please > > advise the sender by reply email and delete the message. Thank > > you." > > ********************************************************* > > > > > > -- > Thanks & Regards, > Isan Fulia. >