Re: Solr Best Practice Configuration

Chantal Ackermann Fri, 09 Dec 2011 04:11:47 -0800

Hi Ben,

what I understand from your post is:

Advertiser (1) <-> (*) Advert
(one-to-many where there can be 50,000 per single Advertiser)

Your index entity is based on Advert which means that there can be
50,000 documents in the index that need to be changed if a field of an
Advertiser is updated in the database.

I am using multi-core setups with differently structured indexes for
these needs. This means that some more complex lookups require queries
on several cores. This has not been a problem, so far. Our indexes,
however, have rather few data (ranging from a few hundred thousand
entries to some millions, rather a lot of fields with short texts) and
are highly dynamic (rebuilt several times a day, full rebuilt, no
increments).

Moving the Advertiser data out of the Advertiser index means:
(1) on updates of the Advertiser fields you don't need to change the
Advert index
(2) the Advert index might be a bit smaller (if that matters)
(3) the statistics on the Advertiser data will be in relation to the
Advertiser data and not in relation to the Adverts, while the statistics
on the Adverts won't contain any Advertiser data, anymore.

(This list might not be complete.)

What does (3) imply?
You will not be able to facet or sort or group on Adverts using any of
the Advertiser fields (as they reside in a different index core).

If you need facetting or similar then consider first testing the
performance of a massive update or rebuilding your index before starting
to change to multiple cores. Maybe the performance is better than you
fear it to be and no change is required.

Cheers,
Chantal

On Fri, 2011-12-09 at 10:46 +0100, BenMccarthy wrote:
> Good Morning.
> 
> I have now been through the various Solr tutorials and read the SOLR 3
> Enterprise server book.  Im not at the point of figuring out if Solr can
> help us with a scaling problem.  Im looking for advice on the following
> scenario any pointers or references will be great:
> 
> I have two sets of distinct data:
> 
> Advert
> Advertiser
> 
> An Advertiser has many Adverts in the db looking like
> 
> Advert {
>     id
>     field a
>     field b
>     advertiser_id
> }
> 
> Advertiser {
>     id
>     field c
>     field d
>     lat
>     long
> }
> 
> So ive followed some docs and ive created a DIH which pulls all this into
> one SOLR index.  Which is great.  The problem im looking at is that we have
> a massive churn on Advertiser updates and with the one index i dont think it
> will scale (Correct me if im wrong).
> 
> Would it be possible to have two seperate cores each with its own index and
> then when issuing queries the results are returned as they are in a single
> core setup.
> 
> Im basically looking for some pointers telling me if im going in the right
> direction.  I dont want to have to update 50000 adverts when a advertiser
> simply updated field c.  This is a problem we have with our current search.
> 
> Thanks
> Ben
> 
> 
> 
> --
> View this message in context: 
> http://lucene.472066.n3.nabble.com/Solr-Best-Practice-Configuration-tp3572492p3572492.html
> Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr Best Practice Configuration

Reply via email to