Re: [apache/tvm-rfcs] [Process RFC] Empowering New Scoped Module to the Project (PR #95)

Andrew Reusch Fri, 21 Oct 2022 04:20:32 -0700

@Hzfengsy 

> Note that the reviews of each PR are brought to their own context, and we 
> anticipate grounded constructive actionable comments, such as adding test 
> coverage here/there. The module establishment part can be into subjective 
> discussions that are less grounded, and community empowerment becomes more 
> critical in such cases.


Taking a step back here: why do S0 modules need to live in `main`? All of our 
development actually happens on forks (i.e. you fork the repo on GitHub)--the 
question is: how much development goes on outside of `main` before we merge? 
Forks could technically live for years outside of `main` (as Relax has). So 
what are the benefits of merging:

B0. Joins development of the S0 module to TVM's S1 and S2 modules, distributing 
the need to rebase and handle merge conflicts throughout the TVM community.
B1. Solicits a wider audience to interact with the module by publicizing it as 
part of TVM's tree.
B2. Fosters further development of the module inside of TVM's core forums. This 
also helps some community members to better engage by potentially removing 
hurdles (i.e. engineers at industry companies may be restricted by company 
policy to engaging with only specified OSS communities).
B3. Grants access to TVM's CI resources and runs tests in the mainline 
regression suite.
B4. Permits development of tighter integrations with the remainder of the 
project by removing merge conflicts.

There are also some drawbacks:

D0. Increases the load on TVM's CI.
D1. Increases the scope of TVM's codebase, potentially making it harder to 
maintain. For example, changes to core components such as the test 
infrastructure, the runtime, the object system may require sweeping knowledge 
of the project. As the number of components in the project increases, the 
number of engineers with that knowledge decreases relative to the size of the 
codebase. If that number drops to 0, the codebase will rot.
D2. Could contribute to effectively splitting the community by creating 
multiple arenas for development. While there are benefits to this in terms of 
exploring new techniques, allowing this as a permanent state means that 
effectively TVM is many projects with sub-communities rather than one. This may 
be a reasonable thing to strive for, but for D1 reason it is not clear that 
those projects should be in a monorepo.

A theme across these is that some aspects are community-related (i.e. 
publicizing a flow, consolidating discussions, allowing us to explore multiple 
avenues to solving a problem) and some of these are essentially 
technical/logistical problems (i.e. maintaining a CI, maintaining the core 
infrastructure, etc). Weighing the community aspects of these benefits and 
drawbacks, I believe that the benefits are worth the drawbacks and I am in 
favor of allowing new contributions and approaches in TVM.

However, I feel we are still not really addressing the technical side of 
things: there is no such thing as a truly S0 module, because we state it is
> No disruptive change to the rest of the codebase

The problem is that because we are in a monorepo, there are 
technical/logistical aspects to integrating a module where "a failure in one 
module is a failure for all of TVM," the most obvious example being CI:
- A flaky test is checked in; someone is trying to make a change to an S1 
module, but the CI fails sporadically due to the flaky test to impede progress.
- Someone is trying to improve the way we shard tests with pytest; in the S1 
modules, we have e.g. properly parameterized the tests, but in some S0 module, 
the parameterization was not properly followed. Now that engineer needs to go 
and learn about how the S0 module works, fix its tests, and find a reviewer for 
those improvements before they can proceed.
- Someone adds a new CI image to test the S1 modules on a new hardware backend. 
The S0 module is excluded from that backend. Now a user comes along and tries 
to use the S0 module with the backend, and files a bug. Can the owner of the 
hardware backend truly claim they "support TVM" if they don't fix the S0 module 
bug? What mechanism have we put in place to ensure that users approaching TVM 
understand that they are using an S0 module?
- Someone is modifying a core TVM API; an S0 module uses it in a novel way. Now 
the modifier has to go and understand how the S0 module works so that they can 
modify the API in a way that doesn't break it.

While i agree with you that
> The module establishment part can be into subjective discussions that are 
> less grounded, and community empowerment becomes more critical in such cases.

I am concerned that the situations I elaborated above could similarly lead to 
people leaving the community if it becomes too fragmented. I also feel a bit 
like the perspective of maintaining the core APIs and test infrastructure is 
not adequately represented in the discourse so far, and may not be represented 
in the proposed voting mechanism.

There is a lot of detail here about the establishment of an S0 module, down to 
even introducing a new voting process. When it comes to transitions between 
those states, there is really not much detail:

> - S0 -> deprecation: When a S0 module no longer has an active set of 
> maintainers, the module will be deprecated. The removal of the module is easy 
> as they are contained in the respective folders, with no modules that come 
> and depend on them.
> - S0 -> S1: When developers propose to incorporate a S0 module broadly into 
> existing flows/components. Each such composition would require its own RFC 
> and following the existing RFC process.

For instance, suppose that yes there is a maintainer for an S0 module, but 
tests keep breaking or they are relatively unresponsive, dragging down any of 
the core maintenance tasks above? What is the bar after which it's reasonable 
for a core maintainer to propose deprecation? Proposing deprecation is a lot 
more unwelcoming than requesting clarifications to new code or saying "no for 
now," and is a lot more challenging because it might involve finding a new home 
for code that is already being used by some members of the TVM community, but 
for which we are finding it hard to keep maintained.

I am not trying to be unwelcoming to new code here. But these considerations 
can often be caught at the RFC level, when questions about test strategy, 
modularization, and documentation can be discussed at a level that might be 
accessible to a broader section of the community. Developing the same level of 
understanding through reading PRs both takes considerably more time and energy 
than reading RFCs, and may come at a time when a module is half-landed. For 
this reason, it is not clear to me why we should relax the voting consensus 
requirements on S0 proposals only to land in a half-baked state that may 
deadlock on a committer.

I would summarize my concern here in two areas:
1. There isn't a clear boundary in TVM between what is "S1" and what is "S0" 
right now. It may not be clear to core maintainers where their responsibilities 
end and the S0 maintainers' responsibilities begin. Modularizing TVM could help 
here, but we haven't done that yet.
2. Already today, "supported by TVM" is quite ambiguous--which flow, on which 
hardware, with which model? S1 modules here are simply those that are "not S0." 
How should a user know what these are? Allowing S0 modules is good, but there 
is currently no distinction in how they're represented to users.

I'm concerned that this proposal could use more detail given that we are a 
monorepo with the problems I mentioned above. I don't know these concerns 
necessarily need to get addressed in this specific proposal (e.g. more clearly 
delineating the S1 modules), but I also don't feel that this proposal has 
adequate detail right now given the state of the project.

-- 
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm-rfcs/pull/95#issuecomment-1286821762
You are receiving this because you are subscribed to this thread.

Message ID: <apache/tvm-rfcs/pull/95/c1286821...@github.com>

Re: [apache/tvm-rfcs] [Process RFC] Empowering New Scoped Module to the Project (PR #95)

Reply via email to