The following RFC proposes a new simulation environment called *TSIM* that 
improves software and hardware integration and simulation accuracy compared to 
functional simulation. One of the goals of this RFC is integrating the hardware 
development process into the software stack from the beginning, allowing 
features to be incrementally implemented and evaluated as workloads evolve over 
time.
Under this environment, the hardware description is the actual specification. 
This reduces the burden of maintaining consistency between the specification 
written usually in a higher language such as C/C++ and the actual hardware 
design described in a language such as Verilog. Moving to TSIM will allow us to 
have a more fluid hardware-software specification, and invite more 
contributions to modify different layers of the stack.

Moreover, this integration provides a more accurate performance feedback, i.e. 
clock cycles, compared to the traditional functional model of a hardware 
accelerator.
This is because TSIM is based on an open-source hardware simulator called 
[Verilator](https://www.veripool.org/wiki/verilator), which compiles Verilog 
designs down to C++ classes for cycle-accurate simulation. 

Lastly, Verilator is already available in many Linux distributions, i.e. 
Ubuntu, and OSX via homebrew.

## Proposed design

TSIM uses Verilator to integrate VTA designs into TVM and provides flexibility 
in the hardware language used to implement these designs.
For example, one could use OpenCL, C/C++ or Chisel3 to describe a VTA design 
that would eventually be compiled down to Verilog, since it is the standard 
input language for FPGA/ASIC tools.
Additionally, Verilator supports the Direct Programming Interface (DPI), which 
is part of the Verilog standard and a mechanism to support foreign programming 
languages.

We leverage these features available in Verilator to interface hardware designs 
from upper layers in the TVM stack such as drivers, runtime, etc. In fact, we 
have developed all the glue layers to make this happen, including:

* **DPI module.** Based on the DSO module located at 
`tvm/src/runtime/dso_module.cc`, the `dpi_module.cc` is in charge of loading 
the shared library `libtsim.so` that contains the hardware accelerator and the 
Verilator execution function.
As stated earlier, Verilator is used to compile the hardware accelerator from 
Verilog to C++.
Additionally, the DPI module provides an API that can be used by drivers to 
manage the accelerator by writing/reading registers and terminate (exit) the 
simulation.

* **Verilator execution function.** This function is called `tsim.cc` and it is 
used by Verilator to instantiate the accelerator, generate clock and reset 
signals, and dump simulation waveforms when it is enabled. The `tsim.cc` also 
contains function pointers to DPI functions which are implemented in the DPI 
module `dpi_module.cc`. This adds greater flexibility because the behavior of 
DPI functions can be modified by upper layers in the stack.

* **Hardware DPI modules.** Normally, a hardware accelerator interface can be 
simplified in two main components, one for control and another for data. The 
control interface is driven by a host CPU, whereas the data interface is 
connected to either external memories (DRAM) or internal memories in the form 
of scratchpads or caches.
There are two hardware modules written in Verilog implementing these two 
interfaces called `VTAHostDPI.v` and `VTAMemDPI.v`.
Accelerators implemented in Verilog can use these modules directly but we also 
provide Chisel3 wrappers `BlackBox` for accelerators described in this language.

* **Add-by-one accelerator example.** To showcase the interaction between all 
of these components, we implemented an Add-by-one accelerator, in both Chisel3 
and Verilog, together with a software driver called `test_driver.cc`.
Also, we provide cmake scripts for building everything automatically and a 
`config.json` file for managing accelerator and simulation options.

Finally, the following snippet shows how a VTA design simulation, based on the 
add-by-one example, is invoked on TVM:

```Python
ctx = tvm.cpu(0)
a = tvm.nd.array(...) # input
b = tvm.nd.array(...) # output
tsim = tvm.module.load("libtsim.so", "vta-tsim")
f = tvm.get_global_func("tvm.vta.driver")
f(tsim, a, b)
```

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/dmlc/tvm/issues/3009

Reply via email to