[ 
https://issues.apache.org/jira/browse/SOLR-15146?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17281460#comment-17281460
 ] 

Ilan Ginzburg commented on SOLR-15146:
--------------------------------------

I've hacked a PoC distributing only the Collection creation API call (and 
taking shortcuts, but basically implementing the happy path reasonably) to get 
an idea of the implementation effort and collect a few numbers.


 The branch is at 
[github.com/murblanc/lucene-solr/tree/Distributing_Collection_API_PoC|https://github.com/murblanc/lucene-solr/tree/Distributing_Collection_API_PoC]
 and is based on the code from [PR 
2285|https://github.com/apache/lucene-solr/pull/2285] from SOLR-14928.

Here's a few timing values based on runs on my laptop (3 nodes cluster). I've 
run twice each test and kept the set of values with the lowest average.
 Don't take these numbers too literally when they're close as they can go 
either way (same tests slightly different values in comment on SOLR-14928 for 
example), but major differences do show certain strategies are a better fit for 
the use case. Times in ms.

*Create 100 collections (10 concurrent threads, 10 collections each) of 2 
shards of 2 replicas each collection:*

Overseer state + Overseer collection API + json replica state: *Avg 11728*, min 
8307, max 15391
 Overseer state + Overseer collection API + *PerReplicaState*: *Avg 11718*, min 
5615, max 14565 
 *Distributed state* + Overseer collection API + json replica state: *Avg 
7880*, min 6298, max 10986 
 *Distributed state* + Overseer collection API + *PerReplicaState*: *Avg 7768*, 
min 6902, max 8939
 *Distributed state* + *distributed Collection API* + json replica state: *Avg 
8322*, min 6443, max 12285 
 *Distributed state* + *distributed Collection API* + *PerReplicaState*: *Avg 
8702*, min 6831, max 13803

*Create 50 collections by 50 concurrent threads (1 collection each), 2 shards 2 
replicas each collection:*

Overseer state + Overseer collection API + json replica state: *Avg 45315*, min 
40708, max 50431 
 Overseer state + Overseer collection API + *PerReplicaState*: *Avg 46174*, min 
43431, max 50025
 *Distributed state* + Overseer collection API + json replica state: *Avg 
22365*, min 20591, max 23708 
 *Distributed state* + Overseer collection API + *PerReplicaState*: *Avg 
22525*, min 18067, max 24049 
 *Distributed state* + *distributed Collection API* + json replica state: *Avg 
18421*, min 16670, max 18968 
 *Distributed state* + *distributed Collection API* + *PerReplicaState*: *Avg 
18342*, min 16137, max 18912

> Distribute Collection API command execution
> -------------------------------------------
>
>                 Key: SOLR-15146
>                 URL: https://issues.apache.org/jira/browse/SOLR-15146
>             Project: Solr
>          Issue Type: Sub-task
>      Security Level: Public(Default Security Level. Issues are Public) 
>          Components: SolrCloud
>    Affects Versions: master (9.0)
>            Reporter: Ilan Ginzburg
>            Assignee: Ilan Ginzburg
>            Priority: Major
>              Labels: collection-api, overseer
>
> Building on the distributed cluster state update changes (SOLR-14928), this 
> ticket will distribute the Collection API so that commands can execute on any 
> node (i.e. the node handling the request through {{CollectionsHandler}}) 
> without having to go through a Zookeeper queue and the Overseer.
> This is the second step (first was SOLR-14928) after which the Overseer could 
> be removed (but the code keeps existing execution options so completion by no 
> means Overseer is gone, but it could be removed in a future release).
> There is a dependency on the distributed cluster state changes because the 
> Overseer locking protecting same collection (or same shard) Collection API 
> commands from executing concurrently will be replaced by optimistic locking 
> of the collection {{state.json}} znodes (or other znodes that will eventually 
> replace/augment {{state.json}}).
> The goal of this ticket is threefold:
> * Simplify the code (running synchronously and not going through the 
> Zookeeper queues and the Overseer dequeue logic is much simpler),
> * Lead to improved performance for most/all use cases (although this is a 
> secondary goal, as long as performance is not degraded) and
> * Allow a future change (in another future Jira) to the way cluster state is 
> cached on the nodes of the cluster (keep less information, be less dependent 
> on Zookeeper watches, do not care about collections not present on the node). 
> This future work will aim to significantly increase the scale (amount of 
> collections) supported by SolrCloud.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to