For my current use case (RoCC accelerators), yes. Actually I do not even need
a full-scale barrier between the cores: just an `__asm__ volatile("fence");`
would be sufficient. Like I've expressed in the previous reply, I'm wondering
if there can actually be use cases for the `prologue` part,
IMHO, you suggested we might require an `fence` instructions to block the
execution flow until data has been fully flushed back to DRAM. Therefore, I'm
not quite sure do we really need `prologue`?
If what we really need is just `epilogue` pragma, I think `barrier` might be a
better name for
I think this problem would be trivial if it is able to inline a reduction. For
example, when computing GEMM `C += A * B`, we could simply inline the
multiplication into the addition, so these two stages would become one.
However, inlining a reduction is currently not supported. See [issue
#90
I agree with @ajtulloch that perhaps it would be helpful to explore if we can
do that automatically.
For exmaple, we could write a custom pass that insert necessary memory
fence(via a custom pass) when detecting the RW dependencies between the scratch
pad and the data(when they corresponds to
That should be doable as well. I think that should be a `Block(prologue, For,
epilogue)` though, as we still want the loop, not just the body.
However, I'm wondering if this would become a common pattern used for many
targets. So far, non-trivial accelerators with the [RoCC
interface](https
Couldn’t this be implemented as a custom IR pass (in Python or C++) instead of
as a new scheduling primitive? This is essentially taking the body `b` of a
`For` and replacing it with `Block(prologue, b, epilogue)` right?
---
[Visit
Topic](https://discuss.tvm.ai/t/add-support-for-extern-pr
For alternative names, I've been thinking about some, but chose
prologue/epilogue because of the consistency between the two (`pro`/`epi`
-`logue`). Some alternatives I've considered:
- preamble/conclusion
- before_body/after_body
- pre/post
Regarding the implementation, I'm not familiar wit
cc @liangfu @yzhliu @ajtulloch @vinx13 @thierry would love to know your
thoughts
---
[Visit
Topic](https://discuss.tvm.ai/t/add-support-for-extern-prologue-epilogue-functions/6041/3)
to respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from these ema
Thanks for the RFC, I think the proposed pragma is reasonable. However, in
terms of implementation. It would be great if we do a rewriting pass(like in
lower_tvm_intrin) to lower it to call node before we codegen, so we do not need
to handle these pragmas in the codegen phase.
It would also b
Glad to see this work continued!
Attaching link to a comment where we report on the original Autodiff VS Relay
compatibility.
https://github.com/apache/incubator-tvm/issues/2562#issuecomment-461814676
---
[Visit
Topic](https://discuss.tvm.ai/t/rfc-bring-in-tensor-expression-autodiff/598
by default the log goes into the stderr and not to a file. The python side of
the log can be configured using logging.basicConfig For the c++ side, there is
a DLOG which can be turned off by env variable or disable debug options.
---
[Visit Topic](https://discuss.tvm.ai/t/changing-the-log-
Is it possible change the log level for the TVM compiler? Also where can the
logs be found?
---
[Visit Topic](https://discuss.tvm.ai/t/changing-the-log-level-in-tvm/6077/1) to
respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from these emails, [click
@jjohnson-arm Do you want to send a PR about it? Otherwise I will, no problem
---
[Visit
Topic](https://discuss.tvm.ai/t/pytorch-frontend-graph-input-names-can-change-using-loaded-torchscript/6055/8)
to respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe
13 matches
Mail list logo