My own choices were driven mostly by the usage of the data - from a more
architectural perspective.

I have "appDocuments" and "appImages" for one of the applications I'm
supporting.  Because they are so closely connected (an appDocuments can
have N number of appImages and appImages can belong to more than
one appDocuments) I decided to keep them close to each other in the same
collection.  This certainly made querying for a combination of the two
(something I need to do sometimes) more straightforward for developers.

I could have done cross-referencing of one type of ID into the other and
kept them in separate collections - but that seemed unnecessarily complex
for my purposes.  I also have a fairly small amount of solr documents... <
200K appImages and appDocuments combined.

The fields they use are mostly different however.  That's no problem
because when I'm building the solr doc for either in my solrJ code, I only
address the fields that are appropriate for whichever "type" I'm using.

(I also put a "meta" field on the collection for which type
(appImages or appDocuments) so that I have a way to NOT bother searching
through images when all I care about is docs.)  &fq=meta_type: appDocuments

On the other hand, I have other applications I will be supporting and even
though their data is mostly similar, I will build different collections and
run them on different Solr instances simply because I need to keep each
application (and it's back-end) separate and distinct for purposes of
support, updates, and disaster recovery.

For what it's worth anyway...  hope it helps...

On Tue, Apr 5, 2016 at 9:29 AM, Davis, Daniel (NIH/NLM) [C] <
daniel.da...@nih.gov> wrote:

> You have choices:
>  - Use a separate collection for each data import
>  - Use the same collection for each data import, differentiating them
> using a field you can query
>
> The choice depends on the objects and how they will be use, and I trust
> others on this list to have better advise on how to choose.
>
> -----Original Message-----
> From: Yangrui Guo [mailto:guoyang...@gmail.com]
> Sent: Tuesday, April 05, 2016 11:27 AM
> To: solr-user@lucene.apache.org
> Subject: Re: Multiple data-config.xml in one collection?
>
> Hi thanks for the answer. Yes I will be using DIH to import data from
> different database connections. Do I have to create a collection for each
> connection?
>
> On Tuesday, April 5, 2016, Shawn Heisey <apa...@elyograg.org> wrote:
>
> > On 4/5/2016 8:12 AM, Yangrui Guo wrote:
> > > I'm using Solr Cloud to index a number of databases. The problem is
> > > there is unknown number of databases and each database has its own
> > configuration.
> > > If I create a single collection for every database the query would
> > > eventually become insanely long. Is it possible to upload different
> > config
> > > to zookeeper for each node in a single collection?
> >
> > Every shard replica (core) in a collection shares the same
> > configuration, which it gets from zookeeper.  This is one of
> > SolrCloud's guarantees, to prevent problems found with old-style
> > sharding when the configuration is different on each machine.
> >
> > If you're using the dataimport handler, which you probably are since
> > you mentioned databases, you can parameterize pretty much everything
> > in the DIH config file so it comes from URL parameters on the
> > full-import or delta-import command.
> >
> > Below is a link to the DIH config that I'm using, redacted slightly.
> > I'm not running SolrCloud, but the same thing should work in cloud.
> > It should give you some idea of how to use variables in your config,
> > set by parameters on the URL.
> >
> > http://apaste.info/jtq
> >
> > Thanks,
> > Shawn
> >
> >
>

Reply via email to