See [µTVM RPC Server Draft 
PR](https://github.com/apache/incubator-tvm/pull/6334).

## Motivation

The µTVM project can be thought of in two logical components that work together 
to execute models on device:

1. A compiler that transforms Relay functions into a set of fused Relay 
operators, and then generates portable C functions to implement each group of 
fused operators. This is largely just the TVM compiler with a few modifications 
to target a minimal runtime.
2. A minimal runtime compatible with bare-metal/RTOS environments.

To achieve its end goals, µTVM needs to be able to execute compiled Relay 
operators under two different workflows:

1. Production workflow. The driver needs to be compiled into the device 
firmware and needs to allocate Tensor memory and invoke operator 
implementations in graph order. This workflow is not yet supported at `HEAD`, 
and there are a variety of implementation strategies that will be explored in 
the coming weeks.
2. AutoTVM/evaluation workflow. An attached host machine can drive overall 
model execution for evaluation without writing complete firmware, or choose to 
invoke one operator at a time for AutoTVM. Must be able to time operator 
execution for AutoTVM.

This RFC is concerned primarily with the AutoTVM/evaluation workflow, which is 
currently supported at `HEAD` today with substantial limitations. Currently, 
µTVM loads a small 
[runtime](https://github.com/apache/incubator-tvm/blob/99745a44407f2d1bd06b8c6a47e6c6c5239ec665/src/runtime/micro/host_driven/utvm_runtime.c)
 into RAM, writes `TVMArgs` using GDB, populates a task list, and sets the 
device PC to the runtime entry point. This process can be invoked remotely on a 
TVM RPC Server by using the TVM Device API with a `micro_dev` context.

This strategy uses a very minimal on-device runtime; however, it has some 
drawbacks:

- ISRs raised by the SoC aren't handled and appear as timeouts. If the SoC 
enters an exception handler, it must be reset (sometimes, software reset is 
sufficient, and others a hard reset or board power cycle is necessary).
- The SoC needs to be configured by a program loaded in flash. There are a 
bunch of features that typically affect CPU performance: oscillator 
configuration, caches, and power modes, among others. Currently, the µTVM 
blogpost eval repo expects this [mBED-based 
program](https://github.com/areusch/utvm-mbed-runtime/tree/utvm-blogpost-1) to 
live in flash and execute on device startup to configure the SoC. However, this 
isn't enforced or checked by TVM.
- For higher-bandwidth communication, device peripherals need to be configured. 
Drivers for these peripherals are typically written in C (rather than something 
usable from GDB) and expect to be able to use ISRs.

This RFC proposes to move the TVM RPC server onto the bare metal target, taking 
advantage of the [RPC modularization 
PR](https://github.com/apache/incubator-tvm/pull/5484) and the tendency for 
embedded devices to contain stream-oriented peripherals. As an embedded device 
is generally smaller, some limitations will exist in the µTVM RPC Server:

- Only the C++ [RPC Endpoint 
API](https://github.com/apache/incubator-tvm/blob/master/src/runtime/rpc/rpc_endpoint.cc)
 will be exposed. Features that live behind PackedFuncs, such as the RPC 
proxying, etc won't necessarily be included.
- Dynamic code loading won't be supported initially (but may be possible in a 
limited fashion in a future RFC)
- Some message length and tensor rank limits will be stricter than those on the 
full Python hosted runtime

The goals of the µTVM On-Device RPC server are to allow users to evaluate 
models and to run AutoTVM. A non-goal of the µTVM On-Device RPC server is to 
handle model deployment.

## Approach

Breaking from the previous µTVM strategy, this RFC proposes that µTVM builds 
binary images meant to be placed in device flash like any other long-lived 
firmware. This means that the µTVM RPC server binary is responsible for the 
following (in a typical AutoTVM session):

- SoC initialization (i.e. oscillator configuration, cache setup, etc)
- Handling interrupts
- Transmitting and receiving RPC protocol data over some peripheral
- Running the RPC server and resulting remote-triggered code
- Timing execution of TVM functions

### Code Organization

A µTVM RPC Server binary can be thought of in 3 parts:

1. **SoC Initialization, ISR handlers, and Device Drivers.**
In order to achieve reproducible results, the SoC needs to be configured from a 
known good state e.g. from device reset. In some cases, a known good state is 
power-on, so this code needs to live in the SoC flash and be invoked directly 
from reset. This code is expected to live in repos outside the TVM repo, and 
should be configured per-device or per-project. The `main()` function exists 
here.
2. **TVM MinRPC Server and C Runtime**
Supplied from the TVM repo and invoked by the code in part #1. Implements the 
TVM RPC server using the [C 
Runtime](https://github.com/apache/incubator-tvm/tree/master/src/runtime/crt).
3. **Compiled TVM model functions**
Built per target and integrated as the System library.

Each piece is discussed in detail below.

### SoC Initialization, ISR Handlers, and Device Drivers

This code is intended to be specific to the targeted development board. It can 
be based on anything from a `printf("Hello, world!\n")` demo to a fully-fledged 
RTOS; the requirements are:

1. It needs to deterministically configure the SoC in terms of CPU performance
2. It needs to facilitate UART-like communication over any peripheral the host 
can access (i.e. USB, Ethernet, semihosting).
3. It needs to handle device ISRs and understand when the device has entered a 
bad state.
4. It needs to provide memory for the µTVM RPC server to allocate function 
arguments and intermediate tensors.

This code does not live in the TVM repo, and is intended to just be referenced 
from autotuning scripts. Examples exist using the 
[mBED](https://github.com/areusch/utvm-mbed-runtime) and 
[Zephyr](https://github.com/areusch/utvm-zephyr-runtime) RTOS. 

As a secondary design goal, it should be able to make third-party libraries 
available to the µTVM RPC Server as PackedFunc. These may be used to validate 
preprocessing steps or capture data from an onboard sensor.

### TVM MinRPC Server and C Runtime

The basic approach is to instantiate the [MinRPC 
server](https://github.com/apache/incubator-tvm/blob/master/src/runtime/rpc/minrpc/minrpc_server.h),
 drive it using a message buffer, and use the MISRA-C runtime to handle the 
lower-level details of RPC calls. To facilitate this, some changes were 
necessary in the MISRA-C runtime (See "Changes to the MISRA-C Runtime").

### Compiled TVM Functions

This portion contains the `SystemLib` TVMModule instance, plus functions to 
register it as such with the runtime.

## MinRPC Server Design

The MinRPC server uses a blocking strategy, which isn't particularly friendly 
to microcontrollers without RTOS, especially those with watchdog timers or 
other peripherals. However, the TVM RPC protocol is a message-oriented protocol 
and each message begins with a length:

```bash
+---------------------------+
| Message Length (uint64_t) |
+---------------------------+
|       Message Body        |
+---------------------------+
```

This means that each message boundary is well-defined—so for the µTVM RPC 
server, an event-driven approach can be safely used as follows:

1. A message buffer accumulates data until a full mesage has been received. 
This part is non-blocking as it doesn't involve the MinRPC Server.
2. `MinRPC Server::ProcessOnePacket` is invoked. `Read()` calls consume data 
from the message buffer. If `Read()` calls overrun the message buffer, it is a 
`CHECK` failure.
3. The process repeats until MinRPC Server indicates it has shutdown.

### Framing and Session

MinRPC Server assumes that the underlying transport provides the properties of 
UNIX pipes or TCP. Some additional components are needed to provide these 
guarantees over a UART. Specifically, these challenges are faced:

- C1. The microcontroller's `CHECK` failure strategy is to reset. This means 
that some wire protocol is needed for the µC to indicate that it has reset, 
even if only half of the previous message had been transmitted. This can be 
roughly thought of as a way to signal **Connection Reset** or **Broken Pipe** 
in a UNIX socket. However, details of CHECK failures can only be read after the 
microcontroller has rebooted, so there are some additional points to consider 
here.

- C2. As a protocol agnostic to the underlying transport, some level of error 
detection needs to be provided.

- C3. A design constraint of the transport is that it should use very little 
memory and code space, but should be able to receive buffers that are large as 
a percentage of on-device RAM (i.e. >50%). This means that implementations 
which expect to buffer messages while performing error detection will limit the 
RPC protocol on device. By contrast, µTVM doesn't care if the payload is 
written to a large DLTensor before a CRC error is detected. While the blocking 
nature of MinRPC server currently limits this, any error detection should pass 
the payload through even if it may contain invalid data.

A **Framing** layer addresses parts of C1 and all of C2. The wire format of one 
message is as follows:

```bash
+----------------------------------+
|  Packet Start Escape (0xff 0xfd) |
+----------------------------------+
|  Packet Length Bytes (uint32_t)  |
+----------------------------------+
|               Payload            |
+----------------------------------+
|   CRC-16 (CCITT, little-endian)  |
+----------------------------------+
```

An **escape character** (`0xff`) is used to start a framing layer control 
sequence. All fields (except the packet start field) need to be escaped on the 
wire. Control sequences are at most 2 bytes long, the second byte indicating 
the sequence. Possible values are:

- `0xff` - Escaped 0xff (so, translate `ff ff` on the wire to a single `ff` of 
payload/length/CRC data)
- `0xfe` - Nop. Used to signal device reset.
- `0xfd` - Packet Start. Signals the beginning of a new packet. If a framing 
layer receives Packet Start while already decoding a packet, the packet being 
decoded is dropped.

While the RPC server is implemented using blocking `Read()` calls, there is 
also a maximum packet length value enforced.

The exact values used here might be adjusted, since `0xff` is likely a fairly 
common byte in `DLTensor`s.

A **Session** layer handles out-of-band signaling and addresses the remainder 
of C1 and C3. Session Messages have the following structure:

```bash
+----------------------------+
| Message Type Code (1 byte) |
+----------------------------+
|    Session ID (2 bytes)    |
+----------------------------+
|       Message Payload      |
+----------------------------+
```

The following message types are supported:

1. **Session Start Init.** Starts a new session. Either party to the link can 
send this message; the sending side becomes termed the *initiator*. This 
message contains the initiator's nonce, which forms half of the session id. 
Should two Session Start Init messages be sent simultaneously, the message 
containing the numerically-lower nonce wins (the other message is ignored).
2. **Session Start Reply.** Confirms the new session as started. The party 
sending this message is termed the *responder*. Contains the full session id to 
be used in subsequent traffic.
3. **Terminate Session.** Contains no session id; invalidates any 
previously-established session. Devices should send this message after 
resetting, in case the other party is awaiting a reply.
4. **Log Message**. Allows the device, which typically has no connected 
display, to asynchronously print diagnostic log messages on the host. Mostly 
helpful for debugging. Log messages are always sent with session id 0 and are 
valid regardless of whether a session is established.
5. **Normal Traffic**. Standard µTVM RPC traffic. Each Session message contains 
exactly one TVM RPC message. The session id must match the session id 
established during the **Session Start** handshake.

**Session Handshake**

Before normal traffic can be exchanged, a *session ID* is established using a 
two-way handshake. *Session IDs* are 2 bytes: 1 byte populated by the initiator 
and 1 by the responder. The handshake is as follows:

```bash
Initiator:                                  Responder:
+--------------------------+
| Type: Session Start Init |
+--------------------------+ --->
|    I_Nonce     0x00      |
+--------------------------+

                                            +---------------------------+
                                            | Type: Session Start Reply |
                                       <--- +---------------------------+
                                            |    I_Nonce     R_Nonce    |
                                            +---------------------------+
                      (session established, ID is {I_Nonce, R_Nonce}
```

**Session Termination**

When a *Terminate Session* message is received, the receiving party should 
assume that the sender has lost all state. The proposed PR raises an exception 
back to Python in this case.

**Long Messages**

µTVM RPC server faces a somewhat unique challenge in that some messages (e.g. 
CopyToRemote) may have very large payloads relative to the amount of available 
memory. At present, the proposed implementation can't receive messages like 
this; however, a future PR could rewrite MinRPCServer to handle the message 
header and payload separately. Then, CopyToRemote could progressively write the 
payload directly to the allocated tensor space in a zero-copy fashion.

**Testing**

Initially testing will be done by compiling a µTVM RPC server targeted to the 
host machine, invoking it as a subprocess, and using stdin/stdout as the 
transport pipes. Most blackbox testing should be able to be accomplished in 
this way. To catch cross-compilation errors, a qemu-based M3-based target could 
be used.

Some additional unit testing is done using googletest; this could also be 
ported to a target to validate it. However, this is somewhat more involved so 
isn't done in the PR yet.

**Points for Discussion**

1. Is the CRC layer adequate given packet sizes?
    1. Use a 16-bit CRC as done here and add an explicit packet length limit of 
around 16K. Tensors longer than 16K, and modules (if loadable modules are 
implemented in the future to alleviate flash stress) will need to be split into 
multiple messages.
    2. Use a 32-bit CRC, which will take more flash space and/or longer to 
execute, but allow longer packets





---
[Visit Topic](https://discuss.tvm.ai/t/rfc-tvm-tvm-rpc-server/7742/1) to 
respond.

You are receiving this because you enabled mailing list mode.

To unsubscribe from these emails, [click 
here](https://discuss.tvm.ai/email/unsubscribe/1c1bd31d9a6bb405aed898ae264dc0986b4082b3c16a55826bd0bafc6ce3eb3e).

Reply via email to