Re: Distributed Search in Solr with different queries per shard

Erick Erickson Thu, 22 May 2014 08:20:08 -0700

<rant>
Gotta reiterate: Do you _know_ making your cores "do extra work" is
really a problem?


I _strongly_ urge you to demonstrate (via hard measurements) that you
have a problem here before investing any time or effort in fixing it.
The development and maintenance issues will almost undoubtedly
surprise you and consume valuable time that you could use adding value
someplace else.

This really smells like premature optimization to me. Let's claim you
put together a test harness (jMeter or similar) and discover that
making your cores "do extra work" costs 50ms per query. Nobody will
notice. They _will_notice that the custom code put in place doesn't
work, guaranteed. And your project managers will notice the schedule
slip, also guaranteed. OTOH, if you demonstrate that you can increase
your throughput by 100x, the risk may be worth it. You simply cannot
make a rational decision without some measurements though.

That said, you know your problem space better than I do so feel free.
</rant>

Best,
Erick

On Wed, May 21, 2014 at 9:59 PM, Avner Levy <av...@checkpoint.com> wrote:
> I believe unifying multiple query results including facets, paging, sorts and 
> other extra features on my own in the application is complex as well.
> Is there some Solr code I can use in the application level to unify multiple 
> results? (this can be actually an interesting direction)
> The queries were of course just an example. In real life I have 4 cores with 
> very complex queries for each so unifying all 4 may cause a significant 
> overhead on the system, especially if there are tens of such queries per 
> second.
> Thanks,
>   Avner
>
> -----Original Message-----
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Wednesday, May 21, 2014 6:13 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Distributed Search in Solr with different queries per shard
>
> I suppose you could, but I _really_ question whether it's a wise investment 
> in time. Personally I'd treat them as two different collections and have the 
> app layer fire off two queries and do the aggregation (this is a variant of 
> "federated search" I think). This removes your issue with having the cores 
> "do extra work"....
>
> Additionally, I'd really prove out that the "extra work" is actually a 
> measurable performance issue before worrying about this, it smells like 
> premature optimization.
>
> FWIW,
> Erick
>
> On Wed, May 21, 2014 at 6:56 AM, Avner Levy <av...@checkpoint.com> wrote:
>> I have 2 cores.
>> One with active data and one with historical data (for documents which were 
>> removed from the active one).
>> I want to run Distributed Search on both and get the unified result (as 
>> supported by Solr Distributed Search, I'm not using Solr Cloud).
>> My problem is that the query for each core is different.
>> Is there a way to specify different query per core and still let Solr to 
>> unify the query results?
>> For example:
>> Active data core query: select all green docs History core query:
>> select all green docs with year=2012 Is there a way to extend the
>> distributed search handler to support such a scenario?
>> Thanks in advance,
>>   Avner
>> ·         One option is to send a unified query to both but then each core 
>> will work harder for no reason.
>>
>
> Email secured by Check Point

Re: Distributed Search in Solr with different queries per shard

Reply via email to