Hi @Ryan/@Daniel
OK it’s up! I’ve updated the Draft PR and added how to configure the Gate
auto-injection to the description
(https://github.com/apache/flink-kubernetes-operator/pull/1043).
TL;DR
When transitionMode: ADVANCED is used and bluegreen.gate.strategy is set in
flinkConfiguration, the operator automatically:
• Injects the -javaagent flag into the JobManager JVM options
• Adds an init container to the JobManager pod that copies the agent JAR
from the operator image into a shared volume, making it available at
/opt/flink/lib/bluegreen-agent.jar
• Sets bluegreen.gate.injection.enabled: "true" to activate injection at
runtime
The only user-facing configuration needed is:
flinkConfiguration:
bluegreen.gate.strategy: "WATERMARK"
bluegreen.gate.watermark.extractor-class:
"com.example.MyWatermarkExtractor” ...
Give it a go and please let me know what you think.
In the meantime I’m working on the other comments you also left
Thanks!
Sergio
> On Mar 26, 2026, at 8:11 AM, Ryan van Huuksloot via dev
> <[email protected]> wrote:
>
> Awesome! Thanks for the update, Sergio. I'm excited to see the plan - it is
> a cool idea so I'm glad it is working.
>
> Ryan van Huuksloot
> Staff Engineer, Infrastructure | Streaming Platform
> [image: Shopify]
> <https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email>
>
>
> On Thu, Mar 26, 2026 at 1:46 AM Sergio Chong Loo <[email protected]>
> wrote:
>
>> @Ryan / @Daniel,
>>
>> Good news! The “GateInjectorPipelineExecutor” idea is successful!!
>>
>> While the original approach of simply activating it with
>> “execution.target” did not quite work, I was able to implement it via the
>> *Instrumentation
>> API with a Java Agent* that injects it… the user doesn’t have to touch
>> their pipelines and I added 2 options, at least for now, to place/inject
>> the Gate after the source or before the sink (complex DAG cases with
>> multiple sources or sinks for now are not supported).
>>
>> I’m documenting everything and prepping the Draft PR for your review,
>> probably a couple more days.
>>
>> Thanks, stay tuned.
>>
>> - Sergio
>>
>>
>> On Mar 16, 2026, at 3:30 PM, Sergio Chong Loo <[email protected]> wrote:
>>
>> Thanks for the ideas and the offer to help out Ryan! It’s invaluable to
>> learn about how other users/teams scenarios.
>>
>> Indeed I have to pursue and evaluate the GateInjectorExecutor nonetheless
>> for our internal development. Ideally it’d be great if the user can simply
>> “invoke” the functionality, even give the user an option to specify “where"
>> the gating mechanism to be placed (e.g. right after the source or before
>> sink), or for the most flexibility they can incorporate and place the Gate
>> manually just like it is now.
>>
>> I’ll share the progress asap and we can all take it from there (this
>> should not exceed a couple weeks). I’ll definitely need more of your
>> feedback to verify this with Flink SQL.
>>
>> Thanks again,
>> Sergio
>>
>>
>> On Mar 16, 2026, at 7:01 AM, Ryan van Huuksloot <
>> [email protected]> wrote:
>>
>> Hi Sergio,
>>
>> re: 1.1
>> My thought is that a BlueGreen Mixin isn't Kubernetes specific and could
>> be reused by other deployment control planes. However, I do agree that
>> attaching it to the sink has other implications so I am happy to pivot if
>> we can find an alternative solution.
>>
>> re: 2
>> I'm happy to leave it out of the Phase 2 implementation, but I think it
>> should be possible. For example we use Phase 1 with cross cluster
>> migrations today. Phase 2 within a single cluster isn't particularly useful
>> for us.
>>
>> re: GateInjectorExecutor
>> This sounds like a neat idea. I need to read more about how it would work
>> but from a high level, injecting an operator before your sinks sounds like
>> a good idea. Better isolation, possible with SQL, no mixins, etc.
>>
>> I will mention that part of the reason I want it before the sinks is
>> because nine out of ten people building pipelines struggle to understand
>> where their state is and how Phase 2 would affect the correctness of their
>> state depending on where they put the gate. I understand that if you have a
>> remote lookup and want to save bandwidth, you could optimize your pipeline
>> by moving the gate before the remote call; however, that seems like an
>> optimization that can be made later.
>>
>> Thanks for driving this! Let me know how we can help.
>>
>> Ryan van Huuksloot
>> Staff Engineer, Infrastructure | Streaming Platform
>> [image: Shopify]
>> <https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email>
>>
>>
>> On Mon, Mar 16, 2026 at 2:21 AM Sergio Chong Loo <[email protected]>
>> wrote:
>>
>>> Hi Ryan
>>>
>>> Thanks a lot for these details. For sure some of these observations
>>> popped up during our initial discussions, and that’s why our initial goal
>>> was to introduce this as simple as possible and gradually enhance it to
>>> cover gaps.
>>>
>>> Allow me to address your concerns:
>>>
>>> 1. I’m happy you stressed the point of “disruption to existing
>>> pipelines”. However, there’s a few points about attempting to build this
>>> functionality into the sinks (or sources) right off the bat (read further
>>> below for my alternative):
>>> 1. Kubernetes centric: as of now the Blue/Green Deployments
>>> support is a Kubernetes specific solution, adding a mixin directly
>>> available to sinks would “leak” this support outside of K8s
>>> 2. A sink being aware of these deployment phases violates single
>>> responsibility, but more importantly…
>>> 3. Flink currently has many connectors, with the majority being
>>> maintained outside of the Flink code base, by separate teams, separate
>>> repos, separate release cycles. This would complicate things
>>> significantly
>>> as to try and add support for this for every potential flink connector
>>> project out there would be a cumbersome. Blue/Green Phase 2 then only
>>> would
>>> works with "gate-aware" sinks.
>>> 2. I’d leave the conversation about migrating jobs between K8s
>>> clusters outside of this scope, even Phase 1 is meant to only work in a
>>> single cluster…
>>> 3. Watermarking, excellent point, it’s indeed a requirement so I’ll
>>> make sure this is validated where applicable (by the concrete
>>> implementation)
>>>
>>>
>>> Having said what I said about point 1.1 above, I’m currently working on
>>> an approach which uses a “GateInjectorPipelineExecutor” so to speak; in
>>> other words a custom PipelineExecutor that would be shipped with the K8s
>>> Operator, invoked by Flink Configuration (via “execution.target:”). This
>>> custom piece would instantiate and inject the Gate at a fixed point in the
>>> StreamGraph right before job submission. I still have to validate and
>>> ensure a few things are correctly taken care of (like Type Information,
>>> etc.) but the theory looks promising.
>>>
>>> For the most part this works well with Flink SQL (same configuration),
>>> here’s my estimation:
>>>
>>> tEnv.executeSql("INSERT INTO my_sink ...")
>>> └─> SQL planner → ExecNodeGraph → Transformation[]
>>> └─> StreamGraph
>>> └─> GateInjectorExecutor injects GateProcessFunction
>>> └─> StreamGraph' (mutated) → JobGraph
>>> └─> Submit Job
>>>
>>> I’m aiming to share some updates along these lines in the next few weeks
>>> but hopefully this falls inline with your objectives/thoughts overall.
>>>
>>> Sergio
>>>
>>>
>>> On Mar 6, 2026, at 3:36 PM, Ryan van Huuksloot via dev <
>>> [email protected]> wrote:
>>>
>>> Hi Sergio,
>>> Thanks for starting this conversation.
>>>
>>> A few thoughts regarding BlueGreen Phase 2:
>>> 1. The Gate Operator is interesting but I don't like that we would have to
>>> modify users' pipelines for them to use Phase 2. This gate function seems
>>> like it could be a Mixin that connectors would implement. If you want to
>>> use Phase 2, your sinks must implement this Mixin. I understand that a
>>> unique GateFunction has pros, but it works less well with FlinkSQL - and
>>> the trade-off doesn't seem worthwhile.
>>> 2. Regarding the ConfigMap. We should consider a solution that supports
>>> migrating Flink jobs between Kubernetes clusters. Otherwise Phase 2 is
>>> only
>>> useful for in cluster operations.
>>> 3. Watermarking is a requirement. Will the Flink Kubernetes Operator
>>> validate that the pipeline is using watermarks?
>>>
>>> What happens when idleness is configured? Watermarks will get ignored from
>>>
>>> these “slow” subtasks and advance, could records from the ignored subtasks
>>> eventually be lost?
>>> Yes they would be lost, but that would happen irrespective of Phase 2.
>>>
>>> I'll have more thoughts after we discuss the Gate Operator, as that is
>>> crucial to the FLIP right now.
>>>
>>> Ryan van Huuksloot
>>> Staff Engineer, Infrastructure | Streaming Platform
>>> [image: Shopify]
>>> <https://www.shopify.com/?utm_medium=salessignatures&utm_source=hs_email>
>>>
>>>
>>> On Mon, Mar 2, 2026 at 6:52 PM Sergio Chong Loo <[email protected]>
>>> wrote:
>>>
>>> Bumping this (Advanced Blue/Green deployments - FLIP-504) thread after
>>> making some code adjustments.
>>>
>>> FYI @drossos <https://github.com/drossos> @ryanvanhuuksloot <
>>> https://github.com/ryanvanhuuksloot> I’d like to get your feedback since
>>> I know you’re interested in this feature.
>>>
>>> Thanks,
>>> - Sergio
>>>
>>>
>>> On Dec 5, 2025, at 2:31 PM, Sergio Chong Loo <[email protected]>
>>>
>>> wrote:
>>>
>>>
>>> Hi folks,
>>>
>>> FLIP-503 (already merged) introduced the Basic Blue/Green Deployment
>>>
>>> functionality to the Flink K8s Operator. It was very straightforward,
>>> simply transitioning to the second deployment once it's considered stable.
>>>
>>>
>>> FLIP-504 is an Advanced version added on top of 503 and brings about the
>>>
>>> notion of "record-level" coordination between the 2 deployments to have no
>>> data duplication and exactly once semantics while preserving a smooth
>>> transition.
>>>
>>>
>>> The main goals are:
>>> • For the community to take a quick look at the current
>>>
>>> functionality (previously mentioned at the Flink Forward 2025 Conference)
>>>
>>> • To get feedback and improvement suggestions
>>>
>>> Flip 504 details:
>>>
>>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=337677650
>>>
>>>
>>> Draft PR: https://github.com/apache/flink-kubernetes-operator/pull/1043
>>>
>>> Thank you!
>>> - Sergio