We are proposing a new dialect named `QNN`, that introduces a quantized version 
of existing relay operators. The goal is to support the models that have been 
pre-quantized in the framework. 

Some important notes about QNN dialect are
* QNN operators are lowered to existing Relay operators to ensure that we can 
reuse Relay infrastructure.
* Code resides in new directory. Python files are in `python/relay/qnn` and CPP 
files are in `src/relay/qnn`.
* QNN, like any other dialect, introduces new Relay passes. These passes only 
deal with QNN ops (like lowering of QNN ops to existing Relay ops). For any 
generic optimization, we rely on existing Relay passes.

We can use this thread to discuss various open questions. Some of these 
questions can be
1) Code organization, namespaces, API discussion.
2) QNN operator lowering - Infrastructure, correct sequence of Relay operations 
etc.
3) Ways to efficiently add new operators with minimal engineering efforts.
4) Requirements (if any) of new generic Relay passes to achieve good 
performance.
5) Any new bugs that arise as we start testing integer computations more 
thoroughly.

The idea of QNN dialect was a result of discussion at Issue 
https://github.com/dmlc/tvm/issues/2351. Thanks @tqchen @FrozenGene @jackwish  
@jnorwood @shoubhik for the discussions.

First few PRs for the QNN dialect
* Requantize operator - https://github.com/dmlc/tvm/pull/3531
* Quantize and Dequantize operator - https://github.com/dmlc/tvm/pull/3512

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/3591

Reply via email to