[ https://issues.apache.org/jira/browse/SOLR-15055?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17258370#comment-17258370 ]
Andrzej Bialecki commented on SOLR-15055: ----------------------------------------- Additional notes on how {{withCollection}} was implemented in 8x. Let's first establish the naming: * collection A (primary) is the one that wants the other collection to be always co-located with it, eg. to implement faster cross-collection joins. * collection B (secondary) is an auxiliary collection that is used by collection A (primary). In 8x this collection had to be single-sharded. In 8x collection A can be marked (by setting a collection property) as {{withCollection: B}}. Collection B must already exist. This constraint causes all ADDREPLICA commands for the collection A (including its initial creation) to also automatically invoke ADDREPLICA for collection B's replica (of the only shard) to be placed on the same node as the A's replica, if a B's replica is missing on the target node for the A's replica. This relationship in 8x was always supposed to be 1:1, i.e. a single primary collection could specify at most a single {{withCollection: B}}. A reverse relationship was also created in collection B using {{COLOCATED_WITH}} property. This property would point to collection A and it would prevent collection B from being deleted while in use by collection A. That implementation was not ideal, for several reasons: * additional replicas of the secondary collection B were never removed when primary replicas were deleted or moved around. * the code would always add an NRT replica for the B collection, there was no way to request other types of replicas to add. * AFAIK the placement could fail due to the fact that the B replica placements would bypass the usual placement policy calculations (including free disk space checks). * for the same reason the placement of the A replica could be sub-optimal because it didn't consider the combined metrics of A+B replicas (combined replica size, combined number of cores, etc). * only 1:1 relationship was officially supported - if multiple primary collection pointed to the same B collection the {{COLOCATED_WITH}} property in B would point only to the latest primary collection. This means that users could accidentally bypass the B's deletion prevention mechanism if they deleted the latest primary collection - but still kept in use the other previously defined primary collections. > Re-implement 'withCollection' and 'maxShardsPerNode' > ---------------------------------------------------- > > Key: SOLR-15055 > URL: https://issues.apache.org/jira/browse/SOLR-15055 > Project: Solr > Issue Type: Improvement > Security Level: Public(Default Security Level. Issues are Public) > Reporter: Andrzej Bialecki > Assignee: Andrzej Bialecki > Priority: Major > > Solr 8x replica placement provided two settings that are very useful in > certain scenarios: > * {{withCollection}} constraint specified that replicas should be placed on > the same nodes where replicas of another collection are located. In the 8x > implementation this was limited in practice to co-locating single-shard > secondary collections used for joins or other lookups from the main > collection (which could be multi-sharded). > * {{maxShardsPerNode}} - this constraint specified the maximum number of > replicas per shard that can be placed on the same node. In most scenarios > this was set to 1 in order to ensure fault-tolerance (ie. at most 1 replica > of any given shard would be placed on any given node). Changing this > constraint to values > 1 would reduce fault-tolerance but may be desired in > test setups or as a temporary relief measure. > > Both these constraints are collection-specific so they should be configured > e.g. as collection properties. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org