Erick:

> ...one for each cluster and just merged the docs when it got them back


This would be the logical way. I'm afraid that "just merged the docs" is the 
crux here, that would
make this an expensive task. You'd have to merge docs, facets, highlights etc, 
handle the
different search phases (ID fetch, doc fetch, potentially global idf fetch?) 
etc.
It may be that the code necessary to do the merge already exists in the 
project, haven't looked...

Charlie:

Yes it should "just" work. Until someone upgrades the schema in one cloud and 
not the others
of course :) and we still need to handle failure cases such as high latency or 
one cluster down…

Besides, we'll have SSL certs with client auth and probably some sort of 
auth&auz in place in
all clouds, and we'd of course need to make sure that the user exists in all 
clusters and that
cross cluster traffic is allowed in everywhere. PKI auth is not really intended 
for accepting
requests from a foreign node that is not in its ZK etc.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com

> 31. jan. 2018 kl. 10:06 skrev Charlie Hull <char...@flax.co.uk>:
> 
> On 30/01/2018 16:09, Jan Høydahl wrote:
>> Hi,
>> A customer has 10 separate SolrCloud clusters, with same schema across all, 
>> but different content.
>> Now they want users in each location to be able to federate a search across 
>> all locations.
>> Each location is 100% independent, with separate ZK etc. Bandwidth and 
>> latency between the
>> clusters is not an issue, they are actually in the same physical datacenter.
>> Now my first thought was using a custom &shards parameter, and let the 
>> receiving node fan
>> out to all shards of all clusters. We’d need to contact the ZK for each 
>> environment and find
>> all shards and replicas participating in the collection and then construct 
>> the shards=A1|A2,B1|B2…
>> sting which would be quite big, but if we get it right, it should “just 
>> work".
>> Now, my question is whether there are other smarter ways that would leave it 
>> up to existing Solr
>> logic to select shards and load balance, that would also take into account 
>> any shard.keys/_route_
>> info etc. I thought of these
>>   * &collection=collA,collB  — but it only supports collections local to one 
>> cloud
>>   * Create a collection ALIAS to point to all 10 — but same here, only local 
>> to one cluster
>>   * Streaming expression top(merge(search(q=,zkHost=blabla))) — but we want 
>> it with pure search API
>>   * Write a custom ShardHandler plugin that knows about all clusters — but 
>> this is complex stuff :)
>>   * Write a custom SearchComponent plugin that knows about all clusters and 
>> adds the &shards= param
>> Another approach would be for the originating cluster to fan out just ONE 
>> request to each of the other
>> clusters and then write some SearchComponent to merge those responses. That 
>> would let us query
>> the other clusters using one LB IP address instead of requiring full 
>> visibility to all solr nodes
>> of all clusters, but if we don’t need that isolation, that extra merge code 
>> seems fairly complex.
>> So far I opt for the custom SearchComponent and &shards= param approach. Any 
>> useful input from
>> someone who tried a similar approach would be priceless!
> 
> Hi Jan,
> 
> We actually looked at this for the BioSolr project - a SolrCloud of 
> SolrClouds. Unfortunately the funding didn't appear for the project so we 
> didn't take it any further than some rough ideas - as you say, if you get it 
> right it should 'just work'. We had some extra complications in terms of 
> shared partial schemas...
> 
> Cheers
> 
> Charlie
>> --
>> Jan Høydahl, search solution architect
>> Cominvent AS - www.cominvent.com <http://www.cominvent.com/>
> 
> 
> -- 
> Charlie Hull
> Flax - Open Source Enterprise Search
> 
> tel/fax: +44 (0)8700 118334
> mobile:  +44 (0)7767 825828
> web: www.flax.co.uk <http://www.flax.co.uk/>

Reply via email to