My own choices were driven mostly by the usage of the data - from a more architectural perspective.
I have "appDocuments" and "appImages" for one of the applications I'm supporting. Because they are so closely connected (an appDocuments can have N number of appImages and appImages can belong to more than one appDocuments) I decided to keep them close to each other in the same collection. This certainly made querying for a combination of the two (something I need to do sometimes) more straightforward for developers. I could have done cross-referencing of one type of ID into the other and kept them in separate collections - but that seemed unnecessarily complex for my purposes. I also have a fairly small amount of solr documents... < 200K appImages and appDocuments combined. The fields they use are mostly different however. That's no problem because when I'm building the solr doc for either in my solrJ code, I only address the fields that are appropriate for whichever "type" I'm using. (I also put a "meta" field on the collection for which type (appImages or appDocuments) so that I have a way to NOT bother searching through images when all I care about is docs.) &fq=meta_type: appDocuments On the other hand, I have other applications I will be supporting and even though their data is mostly similar, I will build different collections and run them on different Solr instances simply because I need to keep each application (and it's back-end) separate and distinct for purposes of support, updates, and disaster recovery. For what it's worth anyway... hope it helps... On Tue, Apr 5, 2016 at 9:29 AM, Davis, Daniel (NIH/NLM) [C] < daniel.da...@nih.gov> wrote: > You have choices: > - Use a separate collection for each data import > - Use the same collection for each data import, differentiating them > using a field you can query > > The choice depends on the objects and how they will be use, and I trust > others on this list to have better advise on how to choose. > > -----Original Message----- > From: Yangrui Guo [mailto:guoyang...@gmail.com] > Sent: Tuesday, April 05, 2016 11:27 AM > To: solr-user@lucene.apache.org > Subject: Re: Multiple data-config.xml in one collection? > > Hi thanks for the answer. Yes I will be using DIH to import data from > different database connections. Do I have to create a collection for each > connection? > > On Tuesday, April 5, 2016, Shawn Heisey <apa...@elyograg.org> wrote: > > > On 4/5/2016 8:12 AM, Yangrui Guo wrote: > > > I'm using Solr Cloud to index a number of databases. The problem is > > > there is unknown number of databases and each database has its own > > configuration. > > > If I create a single collection for every database the query would > > > eventually become insanely long. Is it possible to upload different > > config > > > to zookeeper for each node in a single collection? > > > > Every shard replica (core) in a collection shares the same > > configuration, which it gets from zookeeper. This is one of > > SolrCloud's guarantees, to prevent problems found with old-style > > sharding when the configuration is different on each machine. > > > > If you're using the dataimport handler, which you probably are since > > you mentioned databases, you can parameterize pretty much everything > > in the DIH config file so it comes from URL parameters on the > > full-import or delta-import command. > > > > Below is a link to the DIH config that I'm using, redacted slightly. > > I'm not running SolrCloud, but the same thing should work in cloud. > > It should give you some idea of how to use variables in your config, > > set by parameters on the URL. > > > > http://apaste.info/jtq > > > > Thanks, > > Shawn > > > > >