Yes, I'd very much support a patch to get you going again. I'm confused as to
why just setting the virtual_device_ in the visitor directly does not work, so
option a is you send me a unit test and I dig into that. Option b is your
patch, however since you've needed to bounce back to c++ perhap
Hi Rafael, virtual device handling is unfortunately in a halfway-implemented
state, and it's been on my backlog for a while to wrap that up. Sorry about
that! I'm hoping I can work on it in a few weeks as a break between other tasks.
There's a few things to be done:
- Populate the virtual_dev
Name supplies usually have both a cache lookup:
```
name_supply.UniqueGlobalFor("__main__")
```
and a hinted fresh name generator:
```
name_supply.FreshGlobalWithPrefix("my_module", "my_var", "any_other_prefix")
```
---
[Visit
Topic](https://discuss.tvm.apache.org/t/pre-rfc-name-mangling-i
Other than the plumbing, is there an issue with threading a name supply so that
globals have a unique and appropriately hinted name at birth? It's not too hard
to support name supply splitting such that names can be drawn from independent
supplies without collision. It is also possible to refi
```
backendA= MyUMABackendA()
backendB= MyUMABackendB()
backendA.register()
backendB.register()
mod = backendA.partition(mod)
mod = backendB.partition(mod)
```
Ah, that's the example I was missing (sorry!). After registration I think
calling backend.partition or letting CollagePartition 'do it fo
One more collage/uma overlap aspect: Collage distinguishes 'registered'
backends (ie just TargetKinds) from 'activated' backends (ie Target objects in
the provided build targets). I think though the proposal here is the act of
registration is also activation? I need help understanding how this w
Apologies for not following the conversation in detail in real time. Here are
some thoughts on how we can make sure an UMA-integrated accelerator is also a
Collage-supported 'backend'.
- The registration of patterns will need to support the existing triple of
(pattern name, pattern, predicate)
Thanks @areusch.
> (tvm's present FuseOps pass is greedy and always combines kernels, but
> MetaScheduler does not necessarily
Just a small clarrification: TVM's FuseOps is greedy but CollagePartitioner may
require measuring (and thus tuning) many more TVM partitions/kernels. For
autotvm this is
PTAL. Barring more github incompetence on my part I think I covered your
comments @manupa-arm. Thanks again.
--
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm-rfcs/pull/62#issuecomment-1081128633
You are receiving this because you are subscribed to this thread.
> I think you may have missed the comments that is hidden by Github
D'oh!
Hidden resolved conversations != Hidden because, you know, looks ugly if the
page is too long, or something.
--
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm-rfcs/pull/62#issuecomment-10
PTAl, bumped to 0.81, thank you @manupa-arm, @mbaret and @cgerum, you've been
a great help.
--
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm-rfcs/pull/62#issuecomment-1079516472
You are receiving this because you are subscribed to this thread.
Message ID:
Thanks @manupa-arm for slogging through.
> This might enable bring us to down to get the cost estimation but it will
> still explore the combinatorial partitioning possibilities of ops x backends,
> right ? (lmk, if I am wrong here).
Any sensible implementation of CostEstimator should cache by
> Is there a model being developed to estimate latency of the chosen
> partitioning?
No. As far as Collage is concerned it just calls the abstract
CostEstimator::Estimate interface for each candidate partition, and can remain
ignorant as to where those costs come from. In the prototype it is h
Hi all, the next revision of the RFC is up, incorporating comments from the
first round of reviews.
---
[Visit
Topic](https://discuss.tvm.apache.org/t/collage-rfc-ready-for-comments/12308/2)
to respond.
You are receiving this because you enabled mailing list mode.
To unsubscribe from th
PTAL all, thanks.
--
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm-rfcs/pull/62#issuecomment-1076832944
You are receiving this because you are subscribed to this thread.
Message ID:
Bumped version to 0.8 based on extensive reworking due to excellent comments
above -- thanks!.
I still need to expand the PartitionRule section, should be ready tomorrow.
--
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm-rfcs/pull/62#issuecomment-1076751653
You
Hey all, heads up I'm taking a close look at the 'pass ordering' problem hiding
in this RFC. That is, as written and prototyped, CollageFuseOps runs just
before the current FuseOps so that it can 'see' rewrites which guide TVM's
native fusion rules. However some of those rewrites are target spec
Hi folks, I've just put up a PR describing our plans for 'Collage' here at
OctoML:
https://github.com/apache/tvm-rfcs/pull/62
This work is derived from the preprint:
> *Collage: Automated Integration of Deep Learning Backends*
> Byungsoo Jeon, Sunghyun Park, Peiyuan Liao, Sheng Xu, Tianqi Che
You can view, comment on, or merge this pull request online at:
https://github.com/apache/tvm-rfcs/pull/62
-- Commit Summary --
* Collage RFC
-- File Changes --
A rfcs/-collage.md (833)
-- Patch Links --
https://github.com/apache/tvm-rfcs/pull/62.patch
https://github.com/apache/
Closing as obsolete, since most of this is either already done or has been
subsumed by the Collage proposal.
--
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm-rfcs/pull/38#issuecomment-1064590913
You are receiving this because you are subscribed to this thread.
Closed #38.
--
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm-rfcs/pull/38#event-6221275454
You are receiving this because you are subscribed to this thread.
Message ID:
Agree with your last sentence -- FoldConstants should be CPU only and not carry
forward any target-specific flags. (Ideally do all that more directly instead
of piggy-backing on the interpreter, but that's a bigger issue.)
---
[Visit
Topic](https://discuss.tvm.apache.org/t/problem-with-fu
Hi Michael, thanks for the proposal! Like others I'm very supportive of
tightening up the BYOC interfaces.
My group here at OctoML have been looking at bringing a backend placement
search capability to TVM, a la the 'Collage' paper
(https://arxiv.org/pdf/2111.00655.pdf). Under that approach t
Hi @wrongtest, thanks for the nice write up.
> Currently, we have to hack the compile engine to find the pre-scheduled
> PrimFunc from a standalone cache, we are glad to know what is the best way to
> achieve this goal.
Here's some thoughts, please correct any misunderstandings I might have.
Here's where we've gotten to:
1. TECompiler has fully replaced CompileEngine :-)
2. te_complier_cache.cc still in pretty messy state, the cache datastructures
themselves can be simplified however we still have customers of
TECompiler::Lower etc via te_compiler.py.
3. After #9483 all backends w
Thumbs up from me, I'd like to see this proceed to PRs.
- Agree with adding config as a first-class field to IRModule.
- The build API backwards compat should be straightforward by extension of the
existing `if isinstance` checks. We can emit deprecation warnings for a release
or two.
- I thin
Thanks for the comments @areusch . Now that I've started working on this (with
an emphasis on handling memory scopes) I've decided to shift focus a bit. PTAL.
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.co
More notes to self:
- You can get the device_type for a Target from target_kind->device_type.
Somehow I missed that very obvious fact.
- Eric has a very nice write-up explaining devices vs targets vs device api at
docs/dev/device_target_interactions.rst
--
You are receiving this because you
Note to self: The With convention should probably also be removed by
this work also, but I've not audited the code to see how pervasive it is.
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm-rfc
Thanks @manupa-arm for the reminder there were some good comments on #8892. I
see a work stream:
1. get the multi-target handling under control, and stop relying on the
limiting device type -> target mapping
2. allow device planning to be re-run with additional memory scope constraints
to furthe
No objections from me.
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm-rfcs/pull/23#issuecomment-931793600
While working on unifying some device planning passes and cleaning up the
TECompiler we noticed we have a lot of issues around how we transition from
devices to Targets in the hetogeneous compilation flow. This pre-RFC is a stab
at fixing those. There's a somewhat larger issue around bringing BY
LGTM again (CI will go faster that way, right?)
--
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm-rfcs/pull/10#issuecomment-926166866
This tutorial registers a global layout transformation for conv2d for all
targets which is not well-formed. Later uses of conv2d in the tutorials
pick that layout up then assert fail in the conv2d type-relation.
Better would be to register a transform for an entirely fake target, but
that is beyon
> (IRModule, Function) -> (IRModule, GlobalVar)
I'm still is favor of this signature since it's a cheep and cheerful way to
ensure we don't end up with N ways to implement the lower-and-rewrite-calls
machinery embedded in te_complier. I think my earlier question still stands:
> Have you tried out
This LGTM but I'm thinking going back to relay_to_tir being a
Function->PrimFunc packed func as you implemented in the companion PR is better
since it makes it easier for all the caching, call rewriting, dynamic shape
handling and other bookkeeping to be shared. Have you tried out the pass
appr
And I think the just-another-pass approach implies some of the private
machinery in te_compliler needs to be hoisted to be reusable for all
lowering-like passes. Eg LowerTensorExprMutator. So instead of
monolithic lowering + target-specific callbacks
we have
target-specific lowering passes +
> One core distinction here is that the tir_to_runtime hook isn't a pass, it's
> an override of code generation.
Ah, excellent, that's where we've been trying to get to but when we started
this conversation the refactoring to support it seemed to much to foist upon
you. But we've now (thanks Li
Can I try an example? Again, this is all motivated by working in the device
planning code so my pov may be skewed.
Let's say we've implemented the above and we have three target labels and their
associated definitions:
```
cpu: llvm ...
gpu1: ... device_id=0
gpu2: ... device_id=1
```
Coming late to the party here. As it happens I'm working on trying to separate
device planning from memory planning as part of the 'unified lowering' effort.
I've noticed 'device' in this setting means 'DLDeviceType' or the
default/invalid '0' type. A few parts of the code use DLDevice, thus
40 matches
Mail list logo