[TVM Discuss] [Development/RFC] [BYOC][runtime] JSON runtime for BYOC

2020-06-03 Thread Zhi via TVM Discuss
I am not sure if the clarification of packaging part is clear enough, but there is actually a potential problem. The goal is to be able to conveniently assemble code and metadata separately from the frontend in a modular way. The generated artifact is intended to be usable by AOT, graph runtim

[TVM Discuss] [Development] Per-axis quantization support for TFLite

2020-06-03 Thread masahi via TVM Discuss
For per-channel weight quantization, it is fully supported. I don't know much about TFLite frontend, but our pytorch frontend fully supports per channel quantization. This tutorial demonstrate importing per-channel quantized pytorch model. https://docs.tvm.ai/tutorials/frontend/deploy_prequa

[TVM Discuss] [Development] Per-axis quantization support for TFLite

2020-06-03 Thread Ramana Radhakrishnan via TVM Discuss
Hello there, Welcome to the community ! AFAIK, there is nothing in place for signed int8 symmetric quantization support in the tflite frontend yet even in master : however I believe the underlying codegeneration framework can support it with the qnn dialect of relay based on this https://di

[TVM Discuss] [Development/RFC] [RFC] TVM Target Specification

2020-06-03 Thread tqchen via TVM Discuss
some thoughts: 1. I think they should be based on keys. Ideally, we should not think about generic dispatching but collection of strategies that can be applied. For example, if the keys include `[gpu, cuda, tensorcore]`, then it means we can apply all the strategies registered for these three

[TVM Discuss] [Development/RFC] [RFC] TVM Target Specification

2020-06-03 Thread Haichen Shen via TVM Discuss
I stand with Tianqi on the `target_host` attribute as it encapsulates the information required to compile for a device and can simplify the transformation passes in the TVM stack. I have a few questions to the new target specification. 1. How will the generic function and dispatching works wi

[TVM Discuss] [Development/RFC] [BYOC][runtime] JSON runtime for BYOC

2020-06-03 Thread tqchen via TVM Discuss
I like the modularized setup that decouples the meta-data from the code. It would be great to have a brainstorm and discussion about the naming candidates for the `PackingModule`. Also cc @junrushao1994 @FrozenGene --- [Visit Topic](https://discuss.tvm.ai/t/byoc-runtime-json-runtime-fo

[TVM Discuss] [Development/RFC] [BYOC][runtime] JSON runtime for BYOC

2020-06-03 Thread Cody H. Yu via TVM Discuss
Per offline discussions, here is a summary of the updated proposal: * The original proposal uses a runtime module to maintain both json and metadata (e.g., constant weights) together. As @tqchen pointed out, although this is simple to be implemented, it is hard to debug and cannot be shared ov

[TVM Discuss] [Development/RFC] [RFC] TVM Target Specification

2020-06-03 Thread tqchen via TVM Discuss
Thanks for the example. One of our goal is to consolidate the setting into a single target so that the configuration becomes simple. In this case it should be the system target. I still think it is useful to allow an optionally `target_host`(we can also change the name if we find a better alt

Re: [apache/incubator-tvm] [RFC] Dynamic Shape Support - Graph Dispatching (#4118)

2020-06-03 Thread Yao Wang
@cloudhan Thanks for your info. @icemelon9 Do we have any work related to dynamic axis range? In terms of codegen, indeed efficiency(and also how to limit the number of buckets but loss less performance) is one of the difficult part. We are working on improving some fundamental infra to see ho

[TVM Discuss] [Development/RFC] [RFC] TVM Target Specification

2020-06-03 Thread Krzysztof Parzyszek via TVM Discuss
In such case we can have ``` GPU_target = [ id: "cuda" ]// non-composite, no target_host System_target = [ id: "cuda", id: "cpu" ] // composite optimize(E1, GPU_target) // func, target set> S = legalize(E1, System_target) for s in S: low_level_optimize(s.first

[TVM Discuss] [Development/RFC] [RFC] TVM Target Specification

2020-06-03 Thread tqchen via TVM Discuss
That is why we might want to have a target host field in the device target as in the above examples. The split host device pass can pick up the target host field and split out the host driving part into a program set to have the `target_host`. Due to the restrictions of the target device(e.g.

[TVM Discuss] [Development/RFC] [RFC] TVM Target Specification

2020-06-03 Thread Krzysztof Parzyszek via TVM Discuss
If it's possible that the entire E1 can be compiled for a single device then it makes sense to treat it as device code. In such case, moving `alloc` to the host could be treated as an "optimization" that is specific to this target. However, if E1 has a non-composite target, how would that op

[TVM Discuss] [Development/RFC] [RFC] TVM Target Specification

2020-06-03 Thread tqchen via TVM Discuss
I do not disagree. The sticky pt is how do we categorize the "host driving"(memory allocation, kernel launch parameter computation) part of the target program. We do not intent to categorize arbitrary CPU + GPU program as "gpu program". Under V0, a device target(with target host) program can

[TVM Discuss] [Development/RFC] [RFC] TVM Target Specification

2020-06-03 Thread Krzysztof Parzyszek via TVM Discuss
I'm not opposed to composite targets, I'm arguing that the way we handle composite targets should not depend on what targets are members of the composite target. Whether it's "CPU+GPU" or "CPU+GPU+DSP", the logic of the analysis should be the same. The decisions it makes can be different, o

[TVM Discuss] [Development/RFC] [RFC] TVM Target Specification

2020-06-03 Thread tqchen via TVM Discuss
If a program contains both a GPU and DSP, then the target is `composite`(which is supported), with both of the device target's `target_host` being points to the same host. Given that the target host is optional, we could also not specify the target host in this case assuming the host is clear

Re: [apache/incubator-tvm] [RFC] Dynamic Shape Support - Graph Dispatching (#4118)

2020-06-03 Thread Cloud Han
@kevinthesun Any timeframe? Off topic, I want to mention TensorRT supports dynamic shape from 7.0. To provide better performance, it supports multiple optimization profiles for different shape range. Say your input is 1d ranged from 1 to 1024. You can create profiles whatever shape you specifie

[TVM Discuss] [Development/RFC] [RFC] TVM Target Specification

2020-06-03 Thread Krzysztof Parzyszek via TVM Discuss
V0 is not really well defined. Consider some hardware that has both a GPU, and a DSP (as well as some host CPU). If you write a program for this system, is it a GPU program, or a DSP program? What target will TOPI assume for the operators in this program? When you consider an example for