Dennis-Mircea Ciupitu created FLINK-39370:
---------------------------------------------

             Summary: In-place scaling check only inspects K8s resource spec 
for adaptive scheduler, ignoring JM running configuration
                 Key: FLINK-39370
                 URL: https://issues.apache.org/jira/browse/FLINK-39370
             Project: Flink
          Issue Type: Bug
          Components: Autoscaler, Kubernetes Operator
    Affects Versions: 1.15.4
            Reporter: Dennis-Mircea Ciupitu
             Fix For: 1.15.5


h1. Overview

{{NativeFlinkService.supportsInPlaceScaling()}} determines whether a running 
Flink job supports in-place rescaling by checking whether 
{{jobmanager.scheduler}} is set to {{Adaptive}} in the {_}observe 
configuration{_}, which is derived from the Kubernetes {{FlinkDeployment}} 
resource spec.

However, the adaptive scheduler can be configured through several mechanisms 
that are *not* reflected in the K8s resource spec:
 * Flink's native {{flink-conf.yaml}} (baked into the Docker image or mounted 
via ConfigMap)
 * Environment variables
 * Dynamic properties passed via command-line arguments

When the adaptive scheduler is configured through any of these alternative 
mechanisms, {{observeConfig.get(SCHEDULER)}} returns the default value 
({{{}Default{}}}), and {{supportsInPlaceScaling()}} incorrectly returns 
{{{}false{}}}, even though the JobManager is actually running with the adaptive 
scheduler and fully supports in-place rescaling.

This forces an unnecessary full restart/redeploy cycle instead of a lightweight 
in-place scaling operation.
h1. Expected Behavior

The operator should detect that the running JobManager is using the adaptive 
scheduler (by querying its REST API) and proceed with in-place scaling.
h1. Actual Behavior

The operator only checks the {{FlinkDeployment}} spec configuration, finds no 
{{jobmanager.scheduler: Adaptive}} entry, and falls back to a full restart.
h1. Proposed Fix

Change {{supportsInPlaceScaling()}} from a static check to an instance method 
that, when the K8s resource spec does not indicate the adaptive scheduler, 
falls back to querying the JobManager's running configuration via the 
{{JobManagerJobConfigurationHeaders}} REST endpoint. If the JM's actual running 
config confirms {{{}jobmanager.scheduler: Adaptive{}}}, in-place scaling 
proceeds.



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to