[ 
https://issues.apache.org/jira/browse/SOLR-15055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258370#comment-17258370
 ] 

Andrzej Bialecki commented on SOLR-15055:
-----------------------------------------

Additional notes on how {{withCollection}} was implemented in 8x.

Let's first establish the naming:
 * collection A (primary) is the one that wants the other collection to be 
always co-located with it, eg. to implement faster cross-collection joins.
 * collection B (secondary) is an auxiliary collection that is used by 
collection A (primary). In 8x this collection had to be single-sharded.

In 8x collection A can be marked (by setting a collection property) as 
{{withCollection: B}}. Collection B must already exist. This constraint causes 
all ADDREPLICA commands for the collection A (including its initial creation) 
to also automatically invoke ADDREPLICA for collection B's replica (of the only 
shard) to be placed on the same node as the A's replica, if a B's replica is 
missing on the target node for the A's replica.

This relationship in 8x was always supposed to be 1:1, i.e. a single primary 
collection could specify at most a single {{withCollection: B}}.

A reverse relationship was also created in collection B using 
{{COLOCATED_WITH}} property. This property would point to collection A and it 
would prevent collection B from being deleted while in use by collection A.

That implementation was not ideal, for several reasons:
* additional replicas of the secondary collection B were never removed when 
primary replicas were deleted or moved around.
* the code would always add an NRT replica for the B collection, there was no 
way to request other types of replicas to add.
* AFAIK the placement could fail due to the fact that the B replica placements 
would bypass the usual placement policy calculations (including free disk space 
checks).
* for the same reason the placement of the A replica could be sub-optimal 
because it didn't consider the combined metrics of A+B replicas (combined 
replica size, combined number of cores, etc).
* only 1:1 relationship was officially supported - if multiple primary 
collection pointed to the same B collection the {{COLOCATED_WITH}} property in 
B would point only to the latest primary collection. This means that users 
could accidentally bypass the B's deletion prevention mechanism if they deleted 
the latest primary collection - but still kept in use the other previously 
defined primary collections.

> Re-implement 'withCollection' and 'maxShardsPerNode'
> ----------------------------------------------------
>
>                 Key: SOLR-15055
>                 URL: https://issues.apache.org/jira/browse/SOLR-15055
>             Project: Solr
>          Issue Type: Improvement
>      Security Level: Public(Default Security Level. Issues are Public) 
>            Reporter: Andrzej Bialecki
>            Assignee: Andrzej Bialecki
>            Priority: Major
>
> Solr 8x replica placement provided two settings that are very useful in 
> certain scenarios:
> * {{withCollection}} constraint specified that replicas should be placed on 
> the same nodes where replicas of another collection are located. In the 8x 
> implementation this was limited in practice to co-locating single-shard 
> secondary collections used for joins or other lookups from the main 
> collection (which could be multi-sharded).
> * {{maxShardsPerNode}} - this constraint specified the maximum number of 
> replicas per shard that can be placed on the same node. In most scenarios 
> this was set to 1 in order to ensure fault-tolerance (ie. at most 1 replica 
> of any given shard would be placed on any given node). Changing this 
> constraint to values > 1 would reduce fault-tolerance but may be desired in 
> test setups or as a temporary relief measure.
>  
> Both these constraints are collection-specific so they should be configured 
> e.g. as collection properties.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to