Thank you, everyone, for the discussions here. Let us take a step back and look at the non-technical parts of the conversation. A lot of our discussions come from two goals:
G0: Maintaining a stable evolution solution for some of our common use-cases G1: Welcome new improvements, land our technical commitment timely, continue to reinvent ourselves, and welcome new community members who have new use cases. Both goals are very important. G0 ties to our ability to continuously support our current use cases. G1 is also essential to our viability as a solution, so we can grow as a community and stay competitive in a fast-evolving machine learning compilation landscape. Enabling both has always been an important theme of long-living projects. Deep learning frameworks are a common reference to refer back to. Usually, they are done in roughly three phases: S0: Introduction of a new feature/component as an optional module. S1: Evolving the overall solutions to make use of the new component. S2: Consider deprecation of some of the existing solutions, or evolve the solutions for a consolidation point. Each stage contains a different level of commitment and would normally entail different levels of gating criteria as we look at them. For example, PyTorch introduced TorchFX as an optional module that supports graph tracing and export. It had some overlapping capabilities with TorchScript. The PyTorch community is collectively evolving some of the compilations (TorchDynamo) to make use of FX. As of now, there is not yet an announcement of S2 from the community. Encouragement of S0 and making it easy to do helps us to enable G1. A too high barrier here can discourage community contributions and result in mainline lacking the latest features and short-living our competition. This is especially important given that the land of machine learning compilation still remains open, and the ability to timely support symbolic shape and training helps bring in users and contributions who would otherwise turn to alternatives. G0 is equally important here. In many cases, they boil down to making careful and informed decisions regarding evolution (S1 and S2). Additionally, making sure that at S0 stage, there is a limited disruptive change to the existing infrastructure. Importantly, not every module/feature has to go through all stages. And in common practices, the decisions in each stage are usually not made at the same time. We can find examples of S0 cases in TVM as well. For example, USMP was currently designed for specific cases like AOT. We welcomed these improvements to unblock needs in embedded settings early. Through USMP we found the need of tir.alloc_const, which related to evolving on existing infra(S1). As a result, we had a more in-depth discussion. Additionally, we are bringing the effort to further enable USMP in a broader setting as part of S1. At some point, we might consider consolidating all memory allocations as S2 – note that many community members are collectively working toward that goal, but we are not yet at a point to make such a decision. As another example, we enabled cascaders that are specifically designed for micro-NPU, which had some domain overlapping with the arithmetic affine module, but nevertheless bought in without consolidation because we believed that there is enough interest and maintenance support for the module. Finally, the unpacked_api was specifically enabled for extremely low-resource settings, and we enabled S0 level inclusion despite some inconsistency with the packed func API. Of course, we do not want to enable random things in the codebase, which ties back to the maintenance overhead concern. One of the questions we want to ask here is whether the module contains enough support from the community that allows continued maintenance. Additionally, we should consider the fact of added engineering support by welcoming additional community members who are interested in the needs and would otherwise look elsewhere. Our overall thought process and decision time point for each stage can be different – they should be so we can enable both G0 and G1. Nor do all modules have to go through all the stages. For S0, we would expect if there are enough champions in the community with a self-contained plan. For important features, we would expect, say, more than three committers who can champion the module and significant community support to maintain them. Additionally, S0 should be made as minimally disruptive (wrt to the current infrastructure) as possible. To encourage G1, we can overlook some levels of duplications (just like the TorchFX and TorchScript case, USMP, and other allocators when they land as S0), considering the additional community support we get to maintain them. S1 and S2 would involve more careful discussions and coordination with greater amounts of details on some of the key points. Likely, they will also happen at a different time point so we can make informed decisions. This particular RFC is at the S0 stage and intentionally made to be so. As the RFC stated, there is no proposal to make S1/S2 decisions at this RFC. Many of our current discussions are around S1/S2 – the future evolution of the system. They are extremely helpful discussions to have to set up the context and help us improve the design, but not necessarily decisions we have to make immediately. Let us think about the broader community members we can empower and bring in through enabling the S0 improvement. Thank you, everyone, for the discussions so far, and let us work together to enable our community. -- Reply to this email directly or view it on GitHub: https://github.com/apache/tvm-rfcs/pull/89#issuecomment-1224114184 You are receiving this because you are subscribed to this thread. Message ID: <apache/tvm-rfcs/pull/89/c1224114...@github.com>