We are proposing a new dialect named `QNN`, that introduces a quantized version of existing relay operators. The goal is to support the models that have been pre-quantized in the framework.
Some important notes about QNN dialect are * QNN operators are lowered to existing Relay operators to ensure that we can reuse Relay infrastructure. * Code resides in new directory. Python files are in `python/relay/qnn` and CPP files are in `src/relay/qnn`. * QNN, like any other dialect, introduces new Relay passes. These passes only deal with QNN ops (like lowering of QNN ops to existing Relay ops). For any generic optimization, we rely on existing Relay passes. We can use this thread to discuss various open questions. Some of these questions can be 1) Code organization, namespaces, API discussion. 2) QNN operator lowering - Infrastructure, correct sequence of Relay operations etc. 3) Ways to efficiently add new operators with minimal engineering efforts. 4) Requirements (if any) of new generic Relay passes to achieve good performance. 5) Any new bugs that arise as we start testing integer computations more thoroughly. The idea of QNN dialect was a result of discussion at Issue https://github.com/dmlc/tvm/issues/2351. Thanks @tqchen @FrozenGene @jackwish @jnorwood @shoubhik for the discussions. First few PRs for the QNN dialect * Requantize operator - https://github.com/dmlc/tvm/pull/3531 * Quantize and Dequantize operator - https://github.com/dmlc/tvm/pull/3512 -- You are receiving this because you are subscribed to this thread. Reply to this email directly or view it on GitHub: https://github.com/dmlc/tvm/issues/3591