Thanks @giuseros . To just discuss a bit on the difference between the following two type erased interface
## X0: Function with typeid ```c typedef int (*TVMBackendPackedCFunc)(TVMValue* args, int* type_codes, int num_args, TVMValue* out_ret_value, int* out_ret_tcode, void* resource_handle) ``` ## X1: Function without typeid ```c typedef int (*TVMBackendCFunc)(void** inputs, void** outputs, void* resource_handle); ``` ## Discussions The main reason that we choosed X0 over X1 is because X0 gives a safe interface for both static and dynamic languages. Imagine a case where the callee passes in a integer but the caller expects a float. X1 won't provide any mechanism to detect such mismatch during runtime(if debug is enabled) while X0 allows us to provide type checking to do so. Making a function call in the X1 convention would also requires stack allocations(for the array of inputs and outputs). Without considering any compiler optimization, if we get down to number of bytes in a 32bit system, a function call with `n` number of arguments one output. A function call in the form of X0 would cost us `8 * n + 4 * n + 4 + 8+ 4 + 4 = 12 * n +20` bytes of space, while a function call in the form of X1 would cost us `4* n + 4 * n + 4 = 8 * n +20` bytes of space. Say n=3 (a typical number), then X0 would cost `84 bytes`, while X1 will cost `44 bytes`. The memory overhead of the function call, when comparing to the followup memory operations on NDArrays(which normally contains KB or more memory) is negilible. Additionally, this is considering no compiler optimization. Let us think about what will happen when the compiler inlines the call. In such cases the function call becomes a load and store into a heap memory. With a typical `mem2reg` pass, the heap space can be promoted to registers. If callee code(operator) is compiled to not read the typeid in release mode, then the assignment to typeid becomes deadcode and will be eliminated by the compiler. Similarly the argument passing could become direct argument passing in this case. Considering these compiler optimizations, both X0 and X1 would allow optimizations that leads to similar performing code as the final direct call form. Back to the topic of the int64, note that most of our operator call only uses `void*` as argument and not int64. The cost of `int64` is mainly a memory overhead of passing argument rather than an ALU concern, both the caller and callee can feel free to convert to int32 after the passing, and assign int32 fields during passing, again considering possible compiler optimizations above this could turns out to be nop. Even in the absence of compiler optimizations, the general overhead incurred in the X0 is not too much larger than X1, and it would be great to do some on workload of interest to see the difference. --- [Visit Topic](https://discuss.tvm.apache.org/t/implementing-aot-in-tvm/9206/30) to respond. You are receiving this because you enabled mailing list mode. To unsubscribe from these emails, [click here](https://discuss.tvm.apache.org/email/unsubscribe/fe2f6652e01d185e980099cf1f6f8885eba5add8a48f099fe2f3e144979dc307).