## Motivation and Scope

The Arm Ethos-N series is an high throughput, low area neural network processor 
for ML inference from cloud to edge to endpoint. This processor and software 
driver stack supports a variety of popular neural networks, including CNNs and 
RNNs, for classification, object detection, image enhancements, speech 
recognition and natural language understanding. Arm has recently open-sourced 
the 
[ethos-n-driver-stack](https://github.com/ARM-software/ethos-n-driver-stack).  
The intention of this RFC is to integrate the driver stack into TVM so 
operations supported by the stack can be offloaded to the Ethos-N neural 
network processors.

## Proposal

Over the past several months, work has been ongoing in the area of graph 
partitioning. We propose to build on top of this work by defining 
merge-composite patterns that partition the Relay graph into sections that can 
be offloaded by the bring-your-own compiler (BYOC) infrastructure. The Ethos-N 
driver stack provides a compiler front-end, the Support Library (SL), that 
accepts a graph structure similar to the Relay graph structure. The "compile" 
phase of the BYOC passes the Relay operators to the SL which builds an internal 
graph. This graph is then compiled into a command-stream; a description of 
processing steps required to execute the inference on the Ethos-N processors. 
The command stream is included in the generated module as a blob.

The packed function that is also generated by the the BYOC infrastructure calls 
into a runtime Inference function, passing in the command stream. This 
functions sets up the necessary buffers if required. The command stream is then 
executed by a driver library included in the ethos-N driver stack.

A conversion needs to take place between the Relay operators and the SL 
operators, e.g. for tensor descriptors and attributes, and some operations in 
Relay are combined in the SL. This conversion takes place when the composite 
functions are processed and handed over to the SL.

TVM supports a larger range of operators that the Ethos-N processor. In order 
to determine what is supported on Ethos-N, the SL supports an IsSupported() 
query mechanism. This will be used in the existing "check functions" as 
implemented in PR 5261.

The integration requires changes in several areas.

#### Build system

The driver stack code can be cloned from the GitHub repository. A build script 
similar to for example the existing Vulcan support builds the driver stack 
libraries for use in TVM. The Ethos-N support in TVM can be enabled by adding a 
path to the USE_ETHOSN configuration variable. This causes the build process to 
pick up the required header files and libraries and compile-in the support for 
Ethos-N, enables the relevant tests, and enables the graph partitioning code to 
detect Ethos-N compatible operations.

#### Operator support

Partitioning pattern definitions are created for the operators that are 
supported in Ethos-N to cause them to be picked up by the graph partitioning 
code. A layer in between the graph partitioning code translates the graph 
partitions, the composite functions, from Relay to the Ethos-N compatible 
formats and adds the converted operators to the Ethos-N support library. The 
partition is then compiled, resulting in a command stream. The command stream 
and the constant data (weights), if any, is added to the generated module for 
this partition. 

#### Runtime support

The packed function that is compiled for each graph partition calls into a 
packed function in the TVM runtime to do the heavy lifting. It passes in the 
command stream for the section of the graph it is concerned with, and the input 
and output tensors. The runtime function sets up buffers using information 
stored in the command stream and calls into the Ethos-N driver library to 
execute the inference. The result of the inference is passed back as usual.

##### Testing

There are two sets of tests. The network tests test a network end to end and 
assume the hardware is available. These test push a network through and compare 
against known good results. The Ethos-N driver stack is required; the tests 
will be disabled if this is not available.

The unit tests test the individual operator sequences that can be offloaded to 
the Ethos-N processor. These tests do not need hardware to run and are enabled 
when the driver stack is available. They use a small Relay graph as a model, 
partition this and run an inference with random data. They do this once for the 
CPU and once for the Ethos-N processor. The results for the CPU are passed into 
the runtime inference code via a backdoor mechanism. When the actual inference 
is run through the Ethos flow, these results are passed back, simulating a 
hardware inference. This allows end-to-end testing of the TVM integration for 
each of the supported operators.

#### Code locations

Build system: cmake/modules/contrib/ethosn.cmake, cmake/util/FindEthosN.cmake

Compiler code: src/relay/backend/contrib/ethosn. Parsing of graph partition, 
conversion into SL data structures, compile into module.

Runtime code: src/runtime/contrib/ethosn directory. Run an inference given a 
command stream and input/output tensors.

Unit test code: tests/python/contrib/test_ethosn directory conforming to pytest.

Network test code: tests/python/contrib/test_ethos_compiler.py also in purest 
format.

#### Phasing

In order to facilitate code review the code changes are split into a number of 
PRs.

1. Unit test support for conv2 operator. This is the minimal amount of code 
that can work end to end.
  1. Build support. This includes CMake support and updates to scripts in 
tests/scripts/task_config_build_cpus.sh, driver stack build, minor changes in 
docker scripts.
  2. Runtime support. This is the inference code in the runtime.
  3. Unit test support. This is the directory that contains the common test 
code and a test for conv2d.
2. Full operator support for Mobilenet. Complete the unit tests with all 
necessary operators with a PR issued for most operator separately.
3. End to end test for Mobilenet. This cannot be fully tested without hardware 
support but we will add a round-trip test that re-uses results from a CPU 
execution so the flow can be tested end-to-end, as described above.
4. IsSupported() support, based on PR BYOC #5261.

The following steps add support for more operators and networks. The required 
changes follow the same pattern: add compiler code, unit tests for operators, 
add a network test once a network is supported. Most if not all of the changes 
are in the area of front-end compiler support and appropriate tests.

We intend to track the BYOC infrastructure development in TVM as it happens as 
this work is heavily reliant on it.

As always, comments and suggestions are more than welcome.





---
[Visit 
Topic](https://discuss.tvm.ai/t/rfc-ethosn-arm-ethos-n-integration/6680/1) to 
respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.ai/email/unsubscribe/69897d06a59122c1b30ae68ae5f9f567a169cd23cc370c53978ba7664898fb0f).

Reply via email to