Just to give a bit of context - why I think it is important. It had never ever happened that we had to yank 4 versions of a provider because of incompatibilities we learned after the fact. And it's not anyone's fault - i't just learning that we should take into account. And Cncf.kubernetes is a very special case that had bitten us in the past several times because of it's tight coupling to the core. And I think if we do any breaking change we should at least think how to avoid similar problems in the future. Those are not academic questions - it already happened (we had similar problems around Airflow 2 migration in the early Airflow 2 days - so this is an indication that this is currently "property" of the relation of the provider with the core so we need to rethink it)
This change introduces a breaking change in the future, which might need another breaking change when (or if) we fix the "real" issue we had and experienced (and which resulted in yanking 4 versions of cncf.kubernetes provider). My points are: * we are introducing a change that will (eventually) make it into a breaking change (and as you mentioned - potentially disruptive). * which will likely need another breaking change in the way how KPO is defined * which means that our users will likely have to go through incompatibility pains more than once. I am not against this change - I am just asking for a bit more forward thinking (and happy to brain-storm in some doc? aip? something more substantial and suitable than just email thread). I have some questions (and my current answers are below): 1) Are we sure we want to do it without even attempting to define (not implementing) how the (maybe imagined) target setup will look like? My answer: I think not because it might lead to confusion with our users. K8S for many of our users is important and any change required will be amplified by a number of people having to do it so we should limit the number of times they have to do it. 2) Are we sure (or at least it is very likely) that the change we introduce now will also hold when we solve the real problem ? I am not sure. I do not know answer to 3) 4) questions to be sure this change is going to hold. 3) Are we going to have some fixed version relationship between Airflow and Cncf.kubernetes? (like we have now: Airflow 2.1 and 2.2 -> use cncf.kubernetes 3*. Airflow 2.3 -> use cncf.kubernetes 4.*. My answer: That's one of the options, but it's a bit limiting in terms of releasing bug-fixes. Likely it will lead to us having to maintain two branches of cncf.kubernetes provider if we do (when we find a critical issue). And people who are using 2.1 will have to migrate to 2.3 in order to use any new features in K8S we developed. This is the current situation we are in and is in a stark contrast with the way how it works for other providers. We might deliberately choos that path though - maybe it is better to keep it this way - with potential price to pay to maintain critical fixes in two or more branches for the provider. But it should be deliberate choice knowing the consequences not an accidental by-product of the versioning approach we choose. Maybe we make a pledge that there will be no incompatible changes and we will keep 4.0 for as long as we can (but due to changes in kubernetes libarary it might be not possible - as we already experienced in 3.0 -> 4.0 move). 4) Alternatively - do we know the changes needd to having "true decoupling" in-place - so that "provider 4.0 and 5.0 will be able to use ? What needs to be done to get there? My answer: I do not really know - I do not know details too much and why the changes we implemented last time were so disruptive and whether we could keep backwards compatibility if we really wanted. Was it deliberated braking compatibility because we had no other choice? Or was it accidental and we **could** keep the compatibility if we really wanted? I am not sure. Of course we cannot anticipate what future kubernetes library will bring but just maybe deciding of what is our "goal" here and whether it seems to be achievable or not is something we should do. 5) Or MAYBE we should simply incorporate cncf.kubernetes provider entirely in the core of Airlfow? Maybe there should be NO "cncf.kubernetes" provider? My Answer: This is the point which is the real reason for me being reluctant here. I see it as quite possible scenario, that we will drop the provider and all kubernetes code will be simply embedded in Airlfow Core. I think this is a very interesting and probably least distruptive scenario. Yes it means that bugfixes will only be releaseable together with whole Airflow, but K8S is so tightly connected with Airflow Core via K8S executor that it might be the best choice. And if we do choose this path, this means that likely the core settings should simply ... stay in core rather than be moved out. I am happy to collaborate on that - but I think we at least need to have a document and discussion on where we are going with this to decide on any breaking changes in Kubernetes settings. J. On Fri, Apr 15, 2022 at 6:44 PM Daniel Standish <[email protected]> wrote: > > Thanks Jarek > > I think the settings intermingling is independent from the problem you're > concerned with. I can see how it would be desirable to define the executor > interface more robustly, and to allow core to not care about k8s version (so > that provider can use whatever k8s version it likes). But this is really a > separate issue. > > The issue I'm concerned with is that we have a defined way to configure hooks > and operators Airflow: (1) the Airflow Connection or (2) direct config > through operator or hook params. We do not do this via the `airflow.cfg` > file. Resolving this inconsistency does not solve the problem you are > concerned with; but it rectifies a user-facing inconsistency and a source of > confusion. > > Whether the K8s executor is ever moved out of core or not, it will remain > desirable that KPO only takes configuration from Airflow Connection or direct > params, because that's how things are done in Airflow. The core > `[kubernetes]` settings should apply to the executor but not the operator or > the hook. And indeed, by and large, this is the case already; there are just > a few `airflow.cfg` settings that affect KPO and the vast majority do not. > > WDYT?
