I think the key is this: you want to think of a SolrCore on a single node Solr installation as a collection on a multi node SolrCloud installation.
So if you would use multiple SolrCore's with a std Solr setup, you should be using multiple collections in SolrCloud. If you were going to try to do everything in one SolrCore, that would be like putting everything in one collection in SolrCloud. I don't think it generally makes sense to try and work at the SolrCore level when working with SolrCloud. This will be made more clear once we add a simple collections api. So I think your choice should be similar to using a single node - do you want to put everything in one 'collection' and use a filter to separate customers (with all its caveats and limitations) or do you want to use a collection per customer. You can always start up more clusters if you reach any limits. On May 22, 2012, at 10:08 AM, Darren Govoni wrote: > I'm curious what the solrcloud experts say, but my suggestion is to try not > to over-engineering the search architecture on solrcloud. For example, what > is the benefit of managing the what cores are indexed and searched? Having to > know those details, in my mind, works against the automation in solrcore, but > maybe there's a good reason you want to do it this way. > > <br><br><br>------- Original Message ------- > On 5/22/2012 07:35 AM Yandong Yao wrote:<br>Hi Darren, > <br> > <br>Thanks very much for your reply. > <br> > <br>The reason I want to control core indexing/searching is that I want to > <br>use one core to store one customer's data (all customer share same > <br>config): such as customer 1 use coreForCustomer1 and customer 2 > <br>use coreForCustomer2. > <br> > <br>Is there any better way than using different core for different customer? > <br> > <br>Another way maybe use different collection for different customer, while > <br>not sure how many collections solr cloud could support. Which way is > better > <br>in terms of flexibility/scalability? (suppose there are tens of thousands > <br>customers). > <br> > <br>Regards, > <br>Yandong > <br> > <br>2012/5/22 Darren Govoni <dar...@ontrenet.com> > <br> > <br>> Why do you want to control what gets indexed into a core and then > <br>> knowing what core to search? That's the kind of "knowing" that SolrCloud > <br>> solves. In SolrCloud, it handles the distribution of documents across > <br>> shards and retrieves them regardless of which node is searched from. > <br>> That is the point of "cloud", you don't know the details of where > <br>> exactly documents are being managed (i.e. they are cloudy). It can > <br>> change and re-balance from time to time. SolrCloud performs the > <br>> distributed search for you, therefore when you try to search a node/core > <br>> with no documents, all the results from the "cloud" are retrieved > <br>> regardless. This is considered "A Good Thing". > <br>> > <br>> It requires a change in thinking about indexing and searching.... > <br>> > <br>> On Tue, 2012-05-22 at 08:43 +0800, Yandong Yao wrote: > <br>> > Hi Guys, > <br>> > > <br>> > I use following command to start solr cloud according to solr cloud > wiki. > <br>> > > <br>> > yydzero:example bjcoe$ java -Dbootstrap_confdir=./solr/conf > <br>> > -Dcollection.configName=myconf -DzkRun -DnumShards=2 -jar start.jar > <br>> > yydzero:example2 bjcoe$ java -Djetty.port=7574 -DzkHost=localhost:9983 > <br>> -jar > <br>> > start.jar > <br>> > > <br>> > Then I have created several cores using CoreAdmin API ( > <br>> > http://localhost:8983/solr/admin/cores?action=CREATE&name= > <br>> > <coreName>&collection=collection1), and clusterstate.json show > following > <br>> > topology: > <br>> > > <br>> > > <br>> > collection1: > <br>> > -- shard1: > <br>> > -- collection1 > <br>> > -- CoreForCustomer1 > <br>> > -- CoreForCustomer3 > <br>> > -- CoreForCustomer5 > <br>> > -- shard2: > <br>> > -- collection1 > <br>> > -- CoreForCustomer2 > <br>> > -- CoreForCustomer4 > <br>> > > <br>> > > <br>> > 1) Index: > <br>> > > <br>> > Using following command to index mem.xml file in exampledocs > directory. > <br>> > > <br>> > yydzero:exampledocs bjcoe$ java -Durl= > <br>> > http://localhost:8983/solr/coreForCustomer3/update -jar post.jar > mem.xml > <br>> > SimplePostTool: version 1.4 > <br>> > SimplePostTool: POSTing files to > <br>> > http://localhost:8983/solr/coreForCustomer3/update.. > <br>> > SimplePostTool: POSTing file mem.xml > <br>> > SimplePostTool: COMMITting Solr index changes. > <br>> > > <br>> > And now SolrAdmin UI shows that 'coreForCustomer1', > 'coreForCustomer3', > <br>> > 'coreForCustomer5' has 3 documents (mem.xml has 3 documents) and > other 2 > <br>> > core has 0 documents. > <br>> > > <br>> > *Question 1:* Is this expected behavior? How do I to index documents > <br>> into > <br>> > a specific core? > <br>> > > <br>> > *Question 2*: If SolrCloud don't support this yet, how could I > extend it > <br>> > to support this feature (index document to particular core), where > <br>> should i > <br>> > start, the hashing algorithm? > <br>> > > <br>> > *Question 3*: Why the documents are also indexed into > 'coreForCustomer1' > <br>> > and 'coreForCustomer5'? The default replica for documents are 1, > right? > <br>> > > <br>> > Then I try to index some document to 'coreForCustomer2': > <br>> > > <br>> > $ java -Durl=http://localhost:8983/solr/coreForCustomer2/update -jar > <br>> > post.jar ipod_video.xml > <br>> > > <br>> > While 'coreForCustomer2' still have 0 documents and documents in > <br>> ipod_video > <br>> > are indexed to core for customer 1/3/5. > <br>> > > <br>> > *Question 4*: Why this happens? > <br>> > > <br>> > 2) Search: I use " > <br>> > http://localhost:8983/solr/coreForCustomer2/select?q=*%3A*&wt=xml" to > <br>> > search against 'CoreForCustomer2', while it will return all documents > in > <br>> > the whole collection even though this core has no documents at all. > <br>> > > <br>> > Then I use " > <br>> > > <br>> > http://localhost:8983/solr/coreForCustomer2/select?q=*%3A*&wt=xml&shards=localhost:8983/solr/coreForCustomer2 > <br>> ", > <br>> > and it will return 0 documents. > <br>> > > <br>> > *Question 5*: So If want to search against a particular core, we need > to > <br>> > use 'shards' parameter and use solrCore name as parameter value, > right? > <br>> > > <br>> > > <br>> > Thanks very much in advance! > <br>> > > <br>> > Regards, > <br>> > Yandong > <br>> > <br>> > <br>> > <br> - Mark Miller lucidimagination.com