[jira] [Comment Edited] (SOLR-14613) Provide a clean API for pluggable replica assignment implementations

Ilan Ginzburg (Jira) Mon, 03 Aug 2020 07:29:04 -0700


    [ 
https://issues.apache.org/jira/browse/SOLR-14613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17170070#comment-17170070
 ]


Ilan Ginzburg edited comment on SOLR-14613 at 8/3/20, 2:28 PM:
---------------------------------------------------------------

This is feedback on [~noble.paul]'s [PR for alternate proposal using the 
cluster APIs|https://github.com/apache/lucene-solr/pull/1714/].
 I'm putting it here because I find it hard to follow global discussions on 
very fragmented comments on files in a PR.

As a general note, I believe sample code of how these interfaces are to be used 
would be very useful (more so than how the interfaces can be implemented). Not 
only it guides the reader of that code in understanding the structure, it also 
helps the designer of that code understand which abstractions work etc. Even if 
the sample can't be run at this early stage, it's still a useful tool.
 I didn't really review the implementation changes for existing classes as I 
think these should come once we agree on the interfaces (otherwise it's throw 
away work).

Would be happy to get a bit more context on the value this proposal brings 
compared to the ongoing effort in [PR 
1684|https://github.com/apache/lucene-solr/pull/1684] because I don't see it.

Here are my comments:

{{+*AssignContext*+}}:

Getting all the information needed for plugin operation is not going to be as 
efficient as it could because {{getCluster()}} with the hints is going to have 
to contact nodes to fetch relevant information, then {{fetchMetrics()}} might 
have to contact the same nodes again to fetch metrics. A system in which 
everything can be fetched by a minimal number of round trips to remote nodes 
(i.e. one per node) is preferable. We could mitigate by caching, but there 
would still be an underlying inefficiency.

Also are missing the fetching of system properties for example. If these have 
yet another method, that means a single node might be contacted three times to 
fetch all required information.

It seems unrealistic in {{getCluster()}} to provide very general hints such as 
{{PrefetchHint.COLLECTION_STATE}}. A placement implementation might care about 
the detailed collection state of one or two collections but not about the 
states of all collections in the cluster (might not care at all about other 
collections).

{{+*SimpleMap*+}}:

{{SimpleMap}} and {{LinkedSimpleHashMap}} use {{CharSequence}} as keys, this is 
inappropriate as +explicitly+ stated by the Javadoc for 
{{[CharSequence|https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/CharSequence.html]}}:
{quote}This interface does not refine the general contracts of the equals and 
hashCode methods. The result of testing two objects that implement CharSequence 
for equality is therefore, in general, undefined. Each object may be 
implemented by a different class, and there is no guarantee that each class 
will be capable of testing its instances for equality with those of the other. 
*It is therefore inappropriate to use arbitrary CharSequence instances as 
elements in a set or as keys in a map.*
{quote}
Also, unclear how the statement in the {{SimpleMap}} Javadoc _"It is designed 
to support large datasets without consuming lot of memory"_ is backed by 
reality. {{SimpleMap}} is a {{LinkedHashMap}} (in {{LinkedSimpleHashMap}} but 
this class is never used) or (in anonymous subclasses in {{ClusterState}}, 
{{Slice}} and {{DocCollection}}) delegates to other types of maps defined 
elsewhere (mostly/only {{HashMap}} or {{LinkedHashMap}}). It is actually less 
efficient (by a tiny amount) than directly using these other types directly.

{{SimpleMap}} does not implement {{Map}} and this makes it harder for new 
developers to approach (and its name is misleading). {{SimpleMap}} offers 
methods for iterating over all elements but these can be implemented as 
separate utility methods applicable to any map or collection, and existing 
methods such as {{Map.forEach()}} can also be used directly where needed.

I therefore suggest to not use {{SimpleMap}}.

{{+*NodeMetrics*+}}

This interface actually exposes a large part of the implementation. I would 
prefer the interfaces to hide implementation details. Also, I'd rather we move 
away from non typed methods that need to be cast all over the place and lead to 
a frustrating dev experience. We can hide both implementation and weakly typed 
implementations (if that's needed at some point - usually it is for 
communicating over the wire) in the hidden implementation classes and provide 
plugins with simpler and easier to consume interfaces.

{{+*AssignStrategy*+}}

The plugin is asked to return a {{WorkOrderResult}} that contains the 
{{WorkDecision}}'s as well as the final {{SolrCluster}} state. I believe it's 
not the responsibility of the plugin to return the final state (and would argue 
there might not be real use for that state in an ambitious implementation where 
{{WorkDecision}}'s from multiple placement computations can be combined to 
create the latest current known/expected cluster state for subsequent placement 
computations).

{{+*SolrCluster*+}}

{{collections()}} would return a map (currently {{SimpleMap}}) containing 
potentially hundreds of thousands collections. This map of collections will 
have to be built and will likely not be needed because each plugin computation 
will need a limited number of (solr) collection detailed info.

{{overseerNode()}} should likely return a {{SolrNode}} instance, like 
{{nodes()}} do.

{{+*Config*+}}

{{resources()}} likely better named {{getFileNames()}}.

{{resource()}} signature is wrong if a stream is to be returned to caller.

{{+*SolrCollection*+}}

A collection has a name so should likely have a {{getName()}} method.
 A collection has properties and should likely provide a way to retrieve these 
properties (for example {{withCollection}}).

{{+*ShardReplica*+}}

{{Replica}} would likely be a better and simpler name (otherwise shall we calls 
shards {{CollectionShard}}?).
 Exposing the internal {{org.apache.solr.common.cloud.Replica.Type}} class is 
not a good idea IMO.

{{+*WorkOrder*+}}

Rather than having a notion of {{WorkOrder}} {{Type}}, the code might be easier 
to read and understand and use if subclasses (or subinterfaces) of 
{{WorkOrder}} are used instead.

{{+*SolrNode*+}}

Why do placement code need a URL to the node? Are we planning to allow plugin 
code to go do whatever they want to do or are we targeting a controlled (and 
simpler) environment? I prefer the second option, I'm afraid things will 
quickly become unmanageable with the first option.


was (Author: murblanc):
This is feedback on [~noble.paul]'s [PR for alternate proposal using the 
cluster APIs|https://github.com/apache/lucene-solr/pull/1714/].
 I'm putting it here because I find it hard to follow global discussions on 
very fragmented comments on files in a PR.

As a general note, I believe sample code of how these interfaces are to be used 
would be very useful (more so than how the interfaces can be implemented). Note 
only it guides the reader of that code in understanding the structure, it also 
helps the designer of that code understand which abstractions work etc. Even if 
the sample can't be run at this early stage, it's still a useful tool.
 I didn't really review the implementation changes for existing classes as I 
think these should come once we agree on the interfaces (otherwise it's throw 
away work).

Would be happy to get a bit more context on the value this proposal brings 
compared to the ongoing effort in [PR 
1684|https://github.com/apache/lucene-solr/pull/1684] because I don't see it.

Here are my comments:

{{+*AssignContext*+}}:

Getting all the information needed for plugin operation is not going to be as 
efficient as it could because {{getCluster()}} with the hints is going to have 
to contact nodes to fetch relevant information, then {{fetchMetrics()}} might 
have to contact the same nodes again to fetch metrics. A system in which 
everything can be fetched by a minimal number of round trips to remote nodes 
(i.e. one per node) is preferable. We could mitigate by caching, but there 
would still be an underlying inefficiency.

Also are missing the fetching of system properties for example. If these have 
yet another method, that means a single node might be contacted three times to 
fetch all required information.

It seems unrealistic in {{getCluster()}} to provide very general hints such as 
{{PrefetchHint.COLLECTION_STATE}}. A placement implementation might care about 
the detailed collection state of one or two collections but not about the 
states of all collections in the cluster (might not care at all about other 
collections).

{{+*SimpleMap*+}}:

{{SimpleMap}} and {{LinkedSimpleHashMap}} use {{CharSequence}} as keys, this is 
inappropriate as +explicitly+ stated by the Javadoc for 
{{[CharSequence|https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/CharSequence.html]}}:
{quote}This interface does not refine the general contracts of the equals and 
hashCode methods. The result of testing two objects that implement CharSequence 
for equality is therefore, in general, undefined. Each object may be 
implemented by a different class, and there is no guarantee that each class 
will be capable of testing its instances for equality with those of the other. 
*It is therefore inappropriate to use arbitrary CharSequence instances as 
elements in a set or as keys in a map.*
{quote}
Also, unclear how the statement in the {{SimpleMap}} Javadoc _"It is designed 
to support large datasets without consuming lot of memory"_ is backed by 
reality. {{SimpleMap}} is a {{LinkedHashMap}} (in {{LinkedSimpleHashMap}} but 
this class is never used) or (in anonymous subclasses in {{ClusterState}}, 
{{Slice}} and {{DocCollection}}) delegates to other types of maps defined 
elsewhere (mostly/only {{HashMap}} or {{LinkedHashMap}}). It is actually less 
efficient (by a tiny amount) than directly using these other types directly.

{{SimpleMap}} does not implement {{Map}} and this makes it harder for new 
developers to approach (and its name is misleading). {{SimpleMap}} offers 
methods for iterating over all elements but these can be implemented as 
separate utility methods applicable to any map or collection, and existing 
methods such as {{Map.forEach()}} can also be used directly where needed.

I therefore suggest to not use {{SimpleMap}}.

{{+*NodeMetrics*+}}

This interface actually exposes a large part of the implementation. I would 
prefer the interfaces to hide implementation details. Also, I'd rather we move 
away from non typed methods that need to be cast all over the place and lead to 
a frustrating dev experience. We can hide both implementation and weakly typed 
implementations (if that's needed at some point - usually it is for 
communicating over the wire) in the hidden implementation classes and provide 
plugins with simpler and easier to consume interfaces.

{{+*AssignStrategy*+}}

The plugin is asked to return a {{WorkOrderResult}} that contains the 
{{WorkDecision}}'s as well as the final {{SolrCluster}} state. I believe it's 
not the responsibility of the plugin to return the final state (and would argue 
there might not be real use for that state in an ambitious implementation where 
{{WorkDecision}}'s from multiple placement computations can be combined to 
create the latest current known/expected cluster state for subsequent placement 
computations).

{{+*SolrCluster*+}}

{{collections()}} would return a map (currently {{SimpleMap}}) containing 
potentially hundreds of thousands collections. This map of collections will 
have to be built and will likely not be needed because each plugin computation 
will need a limited number of (solr) collection detailed info.

{{overseerNode()}} should likely return a {{SolrNode}} instance, like 
{{nodes()}} do.

{{+*Config*+}}

{{resources()}} likely better named {{getFileNames()}}.

{{resource()}} signature is wrong if a stream is to be returned to caller.

{{+*SolrCollection*+}}

A collection has a name so should likely have a {{getName()}} method.
 A collection has properties and should likely provide a way to retrieve these 
properties (for example {{withCollection}}).

{{+*ShardReplica*+}}

{{Replica}} would likely be a better and simpler name (otherwise shall we calls 
shards {{CollectionShard}}?).
 Exposing the internal {{org.apache.solr.common.cloud.Replica.Type}} class is 
not a good idea IMO.

{{+*WorkOrder*+}}

Rather than having a notion of {{WorkOrder}} {{Type}}, the code might be easier 
to read and understand and use if subclasses (or subinterfaces) of 
{{WorkOrder}} are used instead.

{{+*SolrNode*+}}

Why do placement code need a URL to the node? Are we planning to allow plugin 
code to go do whatever they want to do or are we targeting a controlled (and 
simpler) environment? I prefer the second option, I'm afraid things will 
quickly become unmanageable with the first option.

> Provide a clean API for pluggable replica assignment implementations
> --------------------------------------------------------------------
>
>                 Key: SOLR-14613
>                 URL: https://issues.apache.org/jira/browse/SOLR-14613
>             Project: Solr
>          Issue Type: Improvement
>          Components: AutoScaling
>            Reporter: Andrzej Bialecki
>            Assignee: Ilan Ginzburg
>            Priority: Major
>          Time Spent: 10h 10m
>  Remaining Estimate: 0h
>
> As described in SIP-8 the current autoscaling Policy implementation has 
> several limitations that make it difficult to use for very large clusters and 
> very large collections. SIP-8 also mentions the possible migration path by 
> providing alternative implementations of the placement strategies that are 
> less complex but more efficient in these very large environments.
> We should review the existing APIs that the current autoscaling engine uses 
> ({{SolrCloudManager}} , {{AssignStrategy}} , {{Suggester}} and related 
> interfaces) to see if they provide a sufficient and minimal API for plugging 
> in alternative autoscaling placement strategies, and if necessary refactor 
> the existing APIs.
> Since these APIs are internal it should be possible to do this without 
> breaking back-compat.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-14613) Provide a clean API for pluggable replica assignment implementations

Reply via email to