[jira] [Commented] (GEODE-7830) Management REST API rebalance endpoints return confusing operationResults

Aaron Lindsey (Jira) Thu, 05 Mar 2020 08:35:22 -0800


    [ 
https://issues.apache.org/jira/browse/GEODE-7830?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17052311#comment-17052311
 ]


Aaron Lindsey commented on GEODE-7830:
--------------------------------------

The "operator" in my case is actually a Kubernetes controller. It calls 
rebalance each time Kubernetes tries to stop a Geode server to ensure data is 
not lost. In this case it is very common to call rebalance when there are no 
regions, e.g. during a scaling operation before the user has created any 
regions. Right now we have to parse the status message to determine if the 
rebalance failed due to the no-op error, and then ignore it.

Do you know if having no regions is the only reason the rebalance API will 
return the no-op error? If we were sure of that, then we could call list 
regions to make sure regions exist before calling rebalance.

FWIW, I think it would be best to assume that consumers of this REST API will 
be programs, not humans, and therefore we should design it in such a way that 
it would be easy to consume programatically. It's much more reliable to 
programmatically check the size of an array rather than parse a status message 
to determine if the rebalance succeeded.

> Management REST API rebalance endpoints return confusing operationResults
> -------------------------------------------------------------------------
>
>                 Key: GEODE-7830
>                 URL: https://issues.apache.org/jira/browse/GEODE-7830
>             Project: Geode
>          Issue Type: Bug
>          Components: management
>            Reporter: Aaron Lindsey
>            Assignee: Darrel Schneider
>            Priority: Major
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> We observed odd behavior regarding the operationResult object returned in the 
> rebalance API:
> # It contains success=false if the cluster has no regions or has no servers. 
> This is confusing because the rebalance didn't fail — it just didn't have 
> anything to rebalance so it was basically a no-op. As a consumer of this API, 
> I need to be able to distinguish between "real" failures and this "no-op" 
> failure, and I should not have to write code to parse the "statusMessage" to 
> do that.
> # Sometimes, success=true and other times success=false for the same 
> statusMessage: "Distributed system has no regions that can be rebalanced." 
> This is confusing because I don't know why it sometimes considers this a 
> failure and other times considers it a success. If #1 above is fixed, then 
> this would not be an issue because it would always return success=true for 
> this particular statusMessage.
> Here is an example of two confusing operationResults we observed:
> {code:json}
> {
>   "result": [
>     {
>       "statusCode": "OK",
>       "links": {
>         "self": 
> "http://geodecluster-sample-locator.default/management/v1/operations/rebalances/15dfe6ef-acaf-4a45-9b55-1d855a977ba8";,
>         "list": 
> "http://geodecluster-sample-locator.default/management/v1/operations/rebalances";
>       },
>       "operationStart": "2020-02-25T18:53:34.058Z",
>       "operationEnd": "2020-02-25T18:53:34.063Z",
>       "operationId": "15dfe6ef-acaf-4a45-9b55-1d855a977ba8",
>       "operation": {
>         "simulate": false
>       },
>       "operationResult": {
>         "statusMessage": "Distributed system has no regions that can be 
> rebalanced.",
>         "success": true
>       }
>     },
>     {
>       "statusCode": "OK",
>       "links": {
>         "self": 
> "http://geodecluster-sample-locator.default/management/v1/operations/rebalances/8218ce0d-e3b8-4c49-b925-665a28e821c3";,
>         "list": 
> "http://geodecluster-sample-locator.default/management/v1/operations/rebalances";
>       },
>       "operationStart": "2020-02-25T18:53:45.650Z",
>       "operationEnd": "2020-02-25T18:53:45.654Z",
>       "operationId": "8218ce0d-e3b8-4c49-b925-665a28e821c3",
>       "operation": {
>         "simulate": false
>       },
>       "operationResult": {
>         "statusMessage": "Distributed system has no regions that can be 
> rebalanced.",
>         "success": false
>       }
>     }
>   ],
>   "statusCode": "OK"
> }
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

[jira] [Commented] (GEODE-7830) Management REST API rebalance endpoints return confusing operationResults

Reply via email to