Thanks @nihui, @tqchen, @jwfromm.
I generalized this approach to handle devices that don't support the push
descriptor (see the Stream::LaunchDeferred APIs), and devices that don't
support dedicated allocation APIs, and removed the previous runtime
implementation as @tqchen suggested. I also a
RN there is already a few analysis in relay.
For example, quantize analyze for the best range, an WIP bitpack analyze for
the correct layout, Partial Eval do a trivial analysis for functions id, ANF do
analysis for scope...
One can even say that type inference is an analysis.
And annotations like