Dennis-Mircea Ciupitu created FLINK-39370:
---------------------------------------------
Summary: In-place scaling check only inspects K8s resource spec
for adaptive scheduler, ignoring JM running configuration
Key: FLINK-39370
URL: https://issues.apache.org/jira/browse/FLINK-39370
Project: Flink
Issue Type: Bug
Components: Autoscaler, Kubernetes Operator
Affects Versions: 1.15.4
Reporter: Dennis-Mircea Ciupitu
Fix For: 1.15.5
h1. Overview
{{NativeFlinkService.supportsInPlaceScaling()}} determines whether a running
Flink job supports in-place rescaling by checking whether
{{jobmanager.scheduler}} is set to {{Adaptive}} in the {_}observe
configuration{_}, which is derived from the Kubernetes {{FlinkDeployment}}
resource spec.
However, the adaptive scheduler can be configured through several mechanisms
that are *not* reflected in the K8s resource spec:
* Flink's native {{flink-conf.yaml}} (baked into the Docker image or mounted
via ConfigMap)
* Environment variables
* Dynamic properties passed via command-line arguments
When the adaptive scheduler is configured through any of these alternative
mechanisms, {{observeConfig.get(SCHEDULER)}} returns the default value
({{{}Default{}}}), and {{supportsInPlaceScaling()}} incorrectly returns
{{{}false{}}}, even though the JobManager is actually running with the adaptive
scheduler and fully supports in-place rescaling.
This forces an unnecessary full restart/redeploy cycle instead of a lightweight
in-place scaling operation.
h1. Expected Behavior
The operator should detect that the running JobManager is using the adaptive
scheduler (by querying its REST API) and proceed with in-place scaling.
h1. Actual Behavior
The operator only checks the {{FlinkDeployment}} spec configuration, finds no
{{jobmanager.scheduler: Adaptive}} entry, and falls back to a full restart.
h1. Proposed Fix
Change {{supportsInPlaceScaling()}} from a static check to an instance method
that, when the K8s resource spec does not indicate the adaptive scheduler,
falls back to querying the JobManager's running configuration via the
{{JobManagerJobConfigurationHeaders}} REST endpoint. If the JM's actual running
config confirms {{{}jobmanager.scheduler: Adaptive{}}}, in-place scaling
proceeds.
--
This message was sent by Atlassian Jira
(v8.20.10#820010)