[jira] [Commented] (SOLR-14613) Provide a clean API for pluggable replica assignment implementations

Noble Paul (Jira) Mon, 03 Aug 2020 08:24:03 -0700


    [ 
https://issues.apache.org/jira/browse/SOLR-14613?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17170108#comment-17170108
 ]


Noble Paul commented on SOLR-14613:
-----------------------------------

{quote}Would be happy to get a bit more context on the value this proposal 
brings compared to the ongoing effort in PR 1684 because I don't see it.
{quote}
The proposal is to provide a consistent view of Solr Cluster across the 
codebase as plain simple interfaces . It is not intended to just serve the 
assign framework. I have been seeing and working on Solr codebase for very long 
and this is one thing that is sorely missing.
{quote}Also, unclear how the statement in the SimpleMap Javadoc "It is designed 
to support large datasets without consuming lot of memory" is backed by 
reality. SimpleMap is a LinkedHashMap
{quote}
The purpose of SimpleMap is to replace all of NamedList, SimpleOrderedMap and 
Map used in Solr. This will also be implemented by POJOs in the future. The 
LinkedSimpleHashMap is provided as a an example to see how one can be 
implemented.
{quote}SimpleMap does not implement Map and this makes it harder for new 
developers to approach (and its name is misleading).
{quote}
Yes, It was done on purpose. The idea is to have an interface that is efficient 
(Memory and performance) Map is a very bad interface to be used generically .It 
has too many methods and hard to implement as a generic Map like interface. New 
developers fo not have to create implementations of SimpleMap. They'll use one 
of the readily provided implementations like LinkedSimpleHashMap (which is 
nothing but a LinkehashMap) .

You are seeing SimpleMap as a HashMap. I'm looking at a potential impl which 
may be working off of a byte[], POJO, a file etc. Trust me, I have tried 
implementing a Map from these and it was terrible
{quote}This interface does not refine the general contracts of the equals and 
hashCode methods.
{quote}
This framework will never have to worry about the keys in CharSequence. It will 
always work with Strings. There is a small subset of cases where the keys could 
be never deserialized to String Objects for . (That is for very efficient 
desrialization and serialization  ) We will never have to worry about it
{quote}{quote} 
{quote}
I believe it's not the responsibility of the plugin to return the final state
{quote}
It's possible to implement it in other ways. Any plugin that implements 
AssignStratgey will have a memory model of the SolrCluster after the placement 
is computed. What we need is the next computation to start from the previous 
state. A very simple naive impl can easily return the original SolrCluster 
Object. From my old experience, it is pretty easy to give a synthetic 
SolrCluster that represents the new state. May be we can provide a utility 
method to recreate this SolrCluster from the decisions. But, it is sub-optimal. 
But any serious implementation MUST provide a final state because the executing 
the decisions take time and usually other computations need to start computing 
immediately

bq,Why do placement code need a URL to the node? Are we planning to allow 
plugin code to go do whatever they want to do or are we targeting a controlled 
(and simpler) environment?

You are missing the larger picture. This API is to be used across the Solr 
codebase. The implemetations of these interfaces will most likely live in the 
{{org.apache.solr.common.cloud}} packages and you will just consume them in the 
placement. The placement framework will never need it not should it use it
{quote}A collection has a name so should likely have a getName() method.
{quote}
Thanks. I just added it. It was supposed to be there.
{quote}Replica would likely be a better and simpler name
{quote}
Yes, Replica was my preferred name. It was already taken and most likely this 
will be a problem as we will see this all over the codebase and people reading 
the code will be confused as to whiich one it is
{quote}NodeMetrics: This interface actually exposes a large part of the 
implementation
{quote}
Yes. There is nothing wrong in concrete impls and not always interfaces. Let's 
keep in mind that there are 2 types of interfaces
 * The ones plugins implement. They should be purely interfaces
 * The ones implemented by Solr and handed over to the plugin. it does not 
matter if they are interfaces/classes/enums. What matters is the API surface 
area is minimal and easily understood. We should not dogmatically go with "only 
interfaces"

{quote}WorkOrder:Rather than having a notion of WorkOrder Type,
{quote}
Implementing the previous framework, there will only be a limited no:of of 
WorkOrder types. I seean enum as more suitable (I'm ambivalent on this, we can 
go either way)
{quote}AssignContext .A system in which everything can be fetched by a minimal 
number of round trips to remote nodes (i.e. one per node) is preferable.
{quote}
Exactly. This is designed with that in mind. You should not make multiple calls 
to the same node. All attributes of a node is fetched in one call (including 
system properties {{NodeMetrics.Property.SYSPROP.getMetrics("someproperty");}})
{quote}As a general note, I believe sample code of how these interfaces are to 
be used would be very useful
{quote}
Yes. They're coming. 

> Provide a clean API for pluggable replica assignment implementations
> --------------------------------------------------------------------
>
>                 Key: SOLR-14613
>                 URL: https://issues.apache.org/jira/browse/SOLR-14613
>             Project: Solr
>          Issue Type: Improvement
>          Components: AutoScaling
>            Reporter: Andrzej Bialecki
>            Assignee: Ilan Ginzburg
>            Priority: Major
>          Time Spent: 10h 10m
>  Remaining Estimate: 0h
>
> As described in SIP-8 the current autoscaling Policy implementation has 
> several limitations that make it difficult to use for very large clusters and 
> very large collections. SIP-8 also mentions the possible migration path by 
> providing alternative implementations of the placement strategies that are 
> less complex but more efficient in these very large environments.
> We should review the existing APIs that the current autoscaling engine uses 
> ({{SolrCloudManager}} , {{AssignStrategy}} , {{Suggester}} and related 
> interfaces) to see if they provide a sufficient and minimal API for plugging 
> in alternative autoscaling placement strategies, and if necessary refactor 
> the existing APIs.
> Since these APIs are internal it should be possible to do this without 
> breaking back-compat.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-14613) Provide a clean API for pluggable replica assignment implementations

Reply via email to