tqchen commented on code in PR #169:
URL: https://github.com/apache/tvm-ffi/pull/169#discussion_r2535212678


##########
docs/guides/kernel_library_guide.rst:
##########
@@ -0,0 +1,192 @@
+.. Licensed to the Apache Software Foundation (ASF) under one
+.. or more contributor license agreements.  See the NOTICE file
+.. distributed with this work for additional information
+.. regarding copyright ownership.  The ASF licenses this file
+.. to you under the Apache License, Version 2.0 (the
+.. "License"); you may not use this file except in compliance
+.. with the License.  You may obtain a copy of the License at
+..
+..   http://www.apache.org/licenses/LICENSE-2.0
+..
+.. Unless required by applicable law or agreed to in writing,
+.. software distributed under the License is distributed on an
+.. "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+.. KIND, either express or implied.  See the License for the
+.. specific language governing permissions and limitations
+.. under the License.
+
+====================
+Kernel Library Guide
+====================
+
+This guide serves as a quick start for shipping python version and machine 
learning(ML) framework agnostic kernel libraries with TVM FFI. With the help of 
TVM FFI, we can connect the kernel libraries to multiple ML framework, such as 
PyTorch, XLA, JAX, together with the minimal efforts.
+
+Tensor
+======
+
+Almost all kernel libraries are about tensor computation and manipulation. For 
better adaptation to different ML frameworks, TVM FFI provides a minimal set of 
data structures to represent tensors from ML frameworks, including the tensor 
basic attributes and storage pointer. To be specific, in TVM FFI, two types of 
tensor constructs, ``ffi::Tensor`` and ``ffi::TensorView``, can be used to 
represent a tensor from ML frameworks.
+
+Tensor and TensorView
+---------------------
+
+Though both ``ffi::Tensor`` and ``ffi::TensorView`` are designed to represent 
tensors from ML frameworks that interact with the TVM FFI ABI. The main 
difference is whether it is an owning tensor structure.
+
+ffi::Tensor
+ ``ffi::Tensor`` is a completely onwing tensor pointer, pointing to a TVM FFI 
tensor object. TVM FFI handles the lifetime of ``ffi::Tensor`` by retaining a 
strong reference.
+
+ffi::TensorView
+ ``ffi::TensorView`` is a non-owning view of an existing tensor, pointint to 
an existing ML framework tensor. It is backed by ``DLTensor`` structure in 
DLPack in practice. And TVM FFI does not guarantee its lifetime also.
+
+It is **recommended** to use ``ffi::TensorView`` when possible, that helps us 
to support more cases, including cases where only view but not strong reference 
are passed, like XLA buffer. It is also more lightweight. However, since 
``ffi::TensorView`` is a non-owning view, it is the user's responsibility to 
ensure the lifetime of underlying tensor data and attributes of the viewed 
tensor object.
+
+Tensor Attributes
+-----------------
+
+For the sake of convenience, ``ffi::TensorView`` and ``ffi::Tensor`` align the 
following attributes retrieval mehtods to ``at::Tensor`` interface, to obtain 
tensor basic attributes and storage pointer:
+
+``dim``, ``sizes``, ``size``, ``strides``, ``stride``, ``numel``, 
``data_ptr``, ``device``, ``is_contiguous``
+
+DLDataType
+ In TVM FFI, tensor data types are stored as ``DLDataType`` which is defined 
by DLPack protocol.
+
+DLDevice
+ In TVM FFI, tensor device information are stored as ``DLDevice`` which is 
defined by DLPack protocol.
+
+ShapeView
+ In TVM FFI, tensor shapes and strides attributes retrieval are returned as 
``ShapeView``. It is an iterate-able data structure storing the shapes or 
strides data as ``int64_t`` array.
+
+Tensor Allocation
+-----------------
+
+TVM FFI provides several methods to allocate tensors at C++ runtime. 
Generally, there are two types of tensor allocation:
+
+* Allocate a tensor with new storage from scratch, i.e. ``FromEnvAlloc`` and 
``FromNDAlloc``. By this types of methods, the shapes, strides, data types, 
devices and other attributes are required for the allocation.
+* Allocate a tensor with existing storage following DLPack protocol, i.e. 
``FromDLPack`` and ``FromDLPackVersioned``. By this types of methods, the 
shapes, data types, devices and other attributes can be inferred from the 
DLPack attributes.
+
+FromEnvAlloc
+^^^^^^^^^^^^
+
+To better adapt to the ML framework, it is **recommended** to reuse the 
framework tensor allocator anyway, instead of directly allocating the tensors 
via CUDA runtime API, like ``cudaMalloc``. Since reusing the framework tensor 
allocator:
+
+* Benefit from the framework's native caching allocator or related allocation 
mechanism.
+* Help framework tracking memory usage and planning globally.
+
+For this case, TVM FFI provides ``FromEnvAlloc``. It internally calls the 
framework tensor allocator. To determine which framework tensor allocator, TVM 
FFI infers it from the passed-in framework tensors. For example, when calling 
the kernel library at Python side, there is an input framework tensor if of 
type ``torch.Tensor``, TVM FFI will automatically bind the ``at::empty`` as the 
current framework tensor allocator by ``TVMFFIEnvTensorAlloc``. And then the 
``FromEnvAlloc`` is calling the ``at::empty`` actually:
+
+.. code-block:: c++
+
+ ffi::Tensor tensor = ffi::Tensor::FromEnvAlloc(TVMFFIEnvTensorAlloc, ...);
+
+which is equivalent to:
+
+.. code-block:: c++
+
+ at::Tensor tensor = at::empty(...);
+
+FromNDAlloc
+^^^^^^^^^^^
+
+``FromNDAlloc`` is the most basic tensor allocator. It is designed for simple 
cases where framework tensor allocator is no longer needed. ``FromNDAlloc`` 
just requires a custom allocator struct to handle the tensor allocation and 
free, with fixed interface ``void AllocData(DLTensor*)`` and ``void 
FreeData(DLTensor*)`` methods. Here are the examples of CPU, CUDA and NVSHMEM 
allocation:

Review Comment:
   add a note stating that if we are returning arrays that are allocated by 
`FromNDAlloc` to caller, we need to make sure the array does not outlive the 
runtime.Module, because its deleter points to a function pointer in the DLL. 
This can typically be done by retaining the runtime.Module globally, or for the 
period of time. 
   We always recommend using `ffi::Tensor::FromEnvAlloc(TVMFFIEnvTensorAlloc, 
...)` when possible



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to