I am not sure if the clarification of packaging part is clear enough, but there
is actually a potential problem. The goal is to be able to conveniently
assemble code and metadata separately from the frontend in a modular way. The
generated artifact is intended to be usable by AOT, graph runtim
For per-channel weight quantization, it is fully supported. I don't know much
about TFLite frontend, but our pytorch frontend fully supports per channel
quantization.
This tutorial demonstrate importing per-channel quantized pytorch model.
https://docs.tvm.ai/tutorials/frontend/deploy_prequa
Hello there,
Welcome to the community ! AFAIK, there is nothing in place for signed int8
symmetric quantization support in the tflite frontend yet even in master :
however I believe the underlying codegeneration framework can support it with
the qnn dialect of relay based on this
https://di
some thoughts:
1. I think they should be based on keys. Ideally, we should not think about
generic dispatching but collection of strategies that can be applied. For
example, if the keys include `[gpu, cuda, tensorcore]`, then it means we can
apply all the strategies registered for these three
I stand with Tianqi on the `target_host` attribute as it encapsulates the
information required to compile for a device and can simplify the
transformation passes in the TVM stack. I have a few questions to the new
target specification.
1. How will the generic function and dispatching works wi
I like the modularized setup that decouples the meta-data from the code. It
would be great to have a brainstorm and discussion about the naming candidates
for the `PackingModule`.
Also cc @junrushao1994 @FrozenGene
---
[Visit
Topic](https://discuss.tvm.ai/t/byoc-runtime-json-runtime-fo
Per offline discussions, here is a summary of the updated proposal:
* The original proposal uses a runtime module to maintain both json and
metadata (e.g., constant weights) together. As @tqchen pointed out, although
this is simple to be implemented, it is hard to debug and cannot be shared ov
Thanks for the example. One of our goal is to consolidate the setting into a
single target so that the configuration becomes simple. In this case it should
be the system target.
I still think it is useful to allow an optionally `target_host`(we can also
change the name if we find a better alt
@cloudhan Thanks for your info. @icemelon9 Do we have any work related to
dynamic axis range?
In terms of codegen, indeed efficiency(and also how to limit the number of
buckets but loss less performance) is one of the difficult part. We are working
on improving some fundamental infra to see ho
In such case we can have
```
GPU_target = [ id: "cuda" ]// non-composite, no target_host
System_target = [ id: "cuda", id: "cpu" ] // composite
optimize(E1, GPU_target) // func, target
set> S = legalize(E1, System_target)
for s in S:
low_level_optimize(s.first
That is why we might want to have a target host field in the device target as
in the above examples. The split host device pass can pick up the target host
field and split out the host driving part into a program set to have the
`target_host`.
Due to the restrictions of the target device(e.g.
If it's possible that the entire E1 can be compiled for a single device then it
makes sense to treat it as device code. In such case, moving `alloc` to the
host could be treated as an "optimization" that is specific to this target.
However, if E1 has a non-composite target, how would that op
I do not disagree. The sticky pt is how do we categorize the "host
driving"(memory allocation, kernel launch parameter computation) part of the
target program.
We do not intent to categorize arbitrary CPU + GPU program as "gpu program".
Under V0, a device target(with target host) program can
I'm not opposed to composite targets, I'm arguing that the way we handle
composite targets should not depend on what targets are members of the
composite target. Whether it's "CPU+GPU" or "CPU+GPU+DSP", the logic of the
analysis should be the same. The decisions it makes can be different,
o
If a program contains both a GPU and DSP, then the target is `composite`(which
is supported), with both of the device target's `target_host` being points to
the same host. Given that the target host is optional, we could also not
specify the target host in this case assuming the host is clear
@kevinthesun Any timeframe?
Off topic, I want to mention TensorRT supports dynamic shape from 7.0. To
provide better performance, it supports multiple optimization profiles for
different shape range. Say your input is 1d ranged from 1 to 1024. You can
create profiles whatever shape you specifie
V0 is not really well defined. Consider some hardware that has both a GPU, and
a DSP (as well as some host CPU). If you write a program for this system, is
it a GPU program, or a DSP program? What target will TOPI assume for the
operators in this program?
When you consider an example for
17 matches
Mail list logo