> For example, we may introduce explicit cache stage to add the padding, and
> mark this block for later processing.
Wouldn't that require a "remove entirely" annotation that was suggested against
[here](https://github.com/apache/tvm-rfcs/pull/77#issuecomment-1163019805)? I
could see how we co
> > For example, we may introduce explicit cache stage to add the padding, and
> > mark this block for later processing.
>
> Wouldn't that require a "remove entirely" annotation that was suggested
> against
> [here](https://github.com/apache/tvm-rfcs/pull/77#issuecomment-1163019805)? I
> could
Writing out some of my thoughts, to see if there's a way to express the
constraints while only using existing TIR features. The main goals would be as
follows.
1. Allow simplification of expressions based on the values present in the
padding.
2. Allow local simplifications to take advantage of
Indeed if buffer is used in annotation value that will change the semantic of a
node, however, that are different ways to represent this, as long as it can be
reconstructed later. For example, we may introduce explicit cache stage to add
the padding, and mark this block for later processing.
--
> It doesn't add additional semantic, the computation semantic stays the same,
> it is a hint to the graph compiler.
My apologies, I had meant the semantics of a node from the perspective of a TIR
transformation, not the semantics from the perspective of the computation being
described. For a
> So long as the constraints can be statically searched for, this approach
> makes sense to me. I would be more concerned about adding additional
> semantics to existing nodes, such as a AttrStmt node
It doesn't add additional semantic, the computation semantic stays the same, it
is a hint to t
> Indeed it is important to avoid having a separate compute definition for each
> workload on a new target. In this particular case, all computation definition
> would start with the original layout. Then there is a "schedule
> transformation" like transform layout which will generate the new st
> I'm still a bit confused with this approach, specifically how one would avoid
> having a separate compute definition for each workload on a new target
Indeed it is important to avoid having a separate compute definition for each
workload on a new target. In this particular case, all computatio