Re: [apache/tvm] [Release] v0.8 Release Planning (#8976)

2021-09-13 Thread tristanqiu8
Expecting the new release!

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm/issues/8976#issuecomment-917922534

Re: [apache/tvm] [Release] v0.8 Release Planning (#8976)

2021-09-13 Thread Zhao Wu
> Agree with @leandron that we could firstly refer to the items there. Many 
> "initial" features in v0.7 are now stable. For example:
> 
> * Initial automatic scheduling support -> stable.
> 
> * Initial command line driver interface -> stable.
> 
> * Intial Hexagon support -> stable.
> 
> * Bring your own codegen (BYOC) support -> now we have several backends.
>   
>   * [stable] NVIDIA TensorRT, Xilinx Vitis-AI, ARM compute library, ARM 
> Ethos-N, etc.
>   * [experimental] TBA.

Does our hexagon support is stable now? I am not sure about it. As I saw we 
still pull requests actively(like https://github.com/apache/tvm/pull/8986 to 
support model lauch). I think the status maybe is still not stable. 
@kparzysz-quic should give more definitive answer

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm/issues/8976#issuecomment-917932363

Re: [apache/tvm-rfcs] [RFC]PyTorchTVM (#25)

2021-09-13 Thread Meteorix
@hogepodge @tqchen thanks for your advices. I have updated the PR.  

-- 
You are receiving this because you commented.
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm-rfcs/pull/25#issuecomment-917983427

Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers

2021-09-13 Thread Daniel Vetter
On Mon, Sep 13, 2021 at 3:20 PM Arnd Bergmann  wrote:
>
> On Mon, Sep 13, 2021 at 12:51 AM Linus Walleij  
> wrote:
> > On Sun, Sep 12, 2021 at 11:13 PM Dave Airlie  wrote:
> >
> > > For userspace components as well these communities of experts need to
> > > exist for each domain, and we need to encourage upstream first
> > > processes across the board for these split kernel/userspace stacks.
> > >
> > > The habanalabs compiler backend is an LLVM fork, I'd like to see the
> > > effort to upstream that LLVM backend into LLVM proper.
> >
> > I couldn't agree more.
> >
> > A big part of the problem with inference engines / NPU:s is that of no
> > standardized userspace. Several of the machine learning initiatives
> > from some years back now have stale git repositories and are
> > visibly unmaintained, c.f. Caffe https://github.com/BVLC/caffe
> > last commit 2 years ago.
>
> Caffe as a standalone project was abandoned and merged into
> PyTorch, see https://caffe2.ai/. I think this is the kind of consolidation
> of those projects that you are looking for.
>
> > Habanalabs propose an LLVM fork as compiler, yet the Intel
> > logo is on the Apache TVM website, and no sign of integrating with
> > that project. They claim to support also TensorFlow.
> >
> > The way I perceive it is that there simply isn't any GCC/LLVM or
> > Gallium 3D of NPU:s, these people haven't yet decided that "here
> > is that userspace we are all going to use". Or have they?
> >
> > LLVM? TVM? TensorFlow? PyTorch? Some other one?
> >
> > What worries me is that I don't see one single developer being
> > able to say "this one definitely, and they will work with the kernel
> > community", and that is what we need to hear.
>
> I don't actually think this is a decision we can possibly wait for.
> The ones you listed all work on different levels, some build on top
> of others, and some may get replaced by new ones over time.
>
> For a generic kernel interface, we need something that can be
> supported as a back-end for multiple such libraries, and that
> works on more than just one hardware. Most likely we will need
> both higher-level and lower-level interfaces, so that a
> framework (or an application directly) may target one interface,
> but some hardware may not be able to implement this.
>
> One straightforward hardware independent low-level API would
> be the traditional BLAS GEMM call[1] for matrix multiplication
> and its variants (integer, float, bfloat16, ...).  Most of the frameworks
> are able to use SGEMM to do the actual calculation since that
> has optimized versions for most CPUs and GPUs, and most
> hardware accelerators should be able to provide an
> implementation of this that doesn't completely suck. This
> can be used for both inferencing and training.

I think BLAS are too high-level for these. Sure fore perfect speed the
vendor probably wants to have their own BLAS thing, their own NN
optmizer and a heap of other things, but for the low-level userspace
we're talking about here that pretty much doesn't matter. I think a
really good example of this is the compute stack Intel is building:
- level0 is the absolute bare-bones low level driver. For this
discussion here that's enough of a userspace to make at least Dave&me
happy. In 3d this would be vulkan. In AI/NN space, there's nothing
here, at least nothing cross-vendor.
- Then there's the entire OneApi ecosystem on top. Lots of this is
open, some of it is closed, but from the pov of an accel stack it's
all looking like applications, not like driver code. BLAS is sitting
here. For AI/NN this is pytorch, tensorflow and all these higher-level
frameworks (which often have quite sophisticated optimizers of their
won)
- then there's funny intermediate apis like opencl, where the state of
the art is still to implement them directly as userspace drivers on
top of the kernel. Although on the 3d side at least we're getting to a
point where opengl on top of  vulkan is impressively close to an
optimized driver. But for know it's still mostly custom. This is what
AI/NN drivers generally look like, with the high-level library fused
together with the backend. Or the backend being an out-of-tree fork
(which is pretty much always an llvm fork for the compiler side).

Especially BLAS isn't the most impressive, since largely it's fused
multiple-add benchmark and not much else. Ok, enormous amounts of
tuning to perfectly exploit the execution bw and interconnect/cache
hierarchy of your chip, whatever it is. That's often something vendors
don't like sharing (intel's math kernels are still closed afaik)
because it leaks a bit much about actual implementation details of the
chip as opposed to how it's programmed. Also not something I really
care about with my maintainer hat on.

> On the kernel side, this could probably be done inside the
> existing crypto (async), media (mem2mem), or gpu/drm
> interfaces that all provide ways to offload computational
> functions on blocks of memory potentially ba

Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers

2021-09-13 Thread Arnd Bergmann
On Mon, Sep 13, 2021 at 12:51 AM Linus Walleij  wrote:
> On Sun, Sep 12, 2021 at 11:13 PM Dave Airlie  wrote:
>
> > For userspace components as well these communities of experts need to
> > exist for each domain, and we need to encourage upstream first
> > processes across the board for these split kernel/userspace stacks.
> >
> > The habanalabs compiler backend is an LLVM fork, I'd like to see the
> > effort to upstream that LLVM backend into LLVM proper.
>
> I couldn't agree more.
>
> A big part of the problem with inference engines / NPU:s is that of no
> standardized userspace. Several of the machine learning initiatives
> from some years back now have stale git repositories and are
> visibly unmaintained, c.f. Caffe https://github.com/BVLC/caffe
> last commit 2 years ago.

Caffe as a standalone project was abandoned and merged into
PyTorch, see https://caffe2.ai/. I think this is the kind of consolidation
of those projects that you are looking for.

> Habanalabs propose an LLVM fork as compiler, yet the Intel
> logo is on the Apache TVM website, and no sign of integrating with
> that project. They claim to support also TensorFlow.
>
> The way I perceive it is that there simply isn't any GCC/LLVM or
> Gallium 3D of NPU:s, these people haven't yet decided that "here
> is that userspace we are all going to use". Or have they?
>
> LLVM? TVM? TensorFlow? PyTorch? Some other one?
>
> What worries me is that I don't see one single developer being
> able to say "this one definitely, and they will work with the kernel
> community", and that is what we need to hear.

I don't actually think this is a decision we can possibly wait for.
The ones you listed all work on different levels, some build on top
of others, and some may get replaced by new ones over time.

For a generic kernel interface, we need something that can be
supported as a back-end for multiple such libraries, and that
works on more than just one hardware. Most likely we will need
both higher-level and lower-level interfaces, so that a
framework (or an application directly) may target one interface,
but some hardware may not be able to implement this.

One straightforward hardware independent low-level API would
be the traditional BLAS GEMM call[1] for matrix multiplication
and its variants (integer, float, bfloat16, ...).  Most of the frameworks
are able to use SGEMM to do the actual calculation since that
has optimized versions for most CPUs and GPUs, and most
hardware accelerators should be able to provide an
implementation of this that doesn't completely suck. This
can be used for both inferencing and training.

On the kernel side, this could probably be done inside the
existing crypto (async), media (mem2mem), or gpu/drm
interfaces that all provide ways to offload computational
functions on blocks of memory potentially backed by a dmabuf,
but having a new top-level chardev interface may be a better
fit.

A completely different interface would something that lets you
compile a model into a hardware specific blob in user space
and then submit that blob into the kernel, using further commands
to send and receive model specific data. As I understand it,
this method is roughly what habanalabs and some of the
other ones do for inferencing. The performance is almost
certainly better here, but it requires a high degree of integration
between model, framework, user space driver, compiler and
kernel driver.
We already do similar things in the gpu, fpga and remoteproc
frameworks, all of which could be used here, or we add a more
specialized interface.

What the actual interfaces should be I have no clue, those
two are just examples of what it could be, being completely
ignorant of what drivers do today. As Dave said, this really
needs a maintainer that understands both the kernel side
and what kind of hardware and frameworks exist and
what interfaces both sides actually require.

   Arnd

[1] 
http://www.netlib.org/lapack/explore-html/db/dc9/group__single__blas__level3_gafe51bacb54592ff5de056acabd83c260.html

-
To unsubscribe, e-mail: dev-unsubscr...@tvm.apache.org
For additional commands, e-mail: dev-h...@tvm.apache.org



Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers

2021-09-13 Thread James Bottomley
On Mon, 2021-09-13 at 15:20 +0200, Arnd Bergmann wrote:
> On Mon, Sep 13, 2021 at 12:51 AM Linus Walleij <
> linus.wall...@linaro.org> wrote:
> > On Sun, Sep 12, 2021 at 11:13 PM Dave Airlie 
> > wrote:
> > 
> > > For userspace components as well these communities of experts
> > > need to exist for each domain, and we need to encourage upstream
> > > first processes across the board for these split kernel/userspace
> > > stacks.
> > > 
> > > The habanalabs compiler backend is an LLVM fork, I'd like to see
> > > the effort to upstream that LLVM backend into LLVM proper.
> > 
> > I couldn't agree more.
> > 
> > A big part of the problem with inference engines / NPU:s is that of
> > no standardized userspace. Several of the machine learning
> > initiatives from some years back now have stale git repositories
> > and are visibly unmaintained, c.f. Caffe 
> > https://github.com/BVLC/caffe last commit 2 years ago.
> 
> Caffe as a standalone project was abandoned and merged into
> PyTorch, see https://caffe2.ai/. I think this is the kind of
> consolidation of those projects that you are looking for.
> 
> > Habanalabs propose an LLVM fork as compiler, yet the Intel
> > logo is on the Apache TVM website, and no sign of integrating with
> > that project. They claim to support also TensorFlow.
> > 
> > The way I perceive it is that there simply isn't any GCC/LLVM or
> > Gallium 3D of NPU:s, these people haven't yet decided that "here
> > is that userspace we are all going to use". Or have they?
> > 
> > LLVM? TVM? TensorFlow? PyTorch? Some other one?
> > 
> > What worries me is that I don't see one single developer being
> > able to say "this one definitely, and they will work with the
> > kernel community", and that is what we need to hear.
> 
> I don't actually think this is a decision we can possibly wait for.
> The ones you listed all work on different levels, some build on top
> of others, and some may get replaced by new ones over time.

I cut all the interesting design stuff because there's a meta problem
here: we seem to be charting a course based on the idea we have to get
the userspace API right first time.  We really don't, we have to make a
reasonable effort to get it right, but we can go around for a v2 if we
fail ... that's the whole point about open source: fail fast and redo. 
No-one can really design an API without seeing how the users actually
use it.  When we do get it right first time, it's more by luck than
judgment, so we should expect failure more often than not.  The trick
to a successful API is usually finding what the minimal set of
operations is and implementing that.  If you think about bells and
whistles first (as 95% of API design documents do tend to) you usually
fail.

Completely new APIs with producer consumer interlock always have this
failure problem, because in a blue sky environment, neither the
producer nor consumer knows exactly what they want the first time
around ... they usually have to try a couple of times to figure out
what works and what doesn't.  What we have to enable is this fast
iteration while they work it out.  API versioning is usually a good
beginning to this ...

There's also nothing wrong with recommending existing interfaces and
seeing how that works because existing patterns are there for a reason.

James





-
To unsubscribe, e-mail: dev-unsubscr...@tvm.apache.org
For additional commands, e-mail: dev-h...@tvm.apache.org



[apache/tvm] [RESULT][VOTE] Adopt New Code Review Guideline (#8997)

2021-09-13 Thread Tianqi Chen
Thanks everyone who voted.

We have 21 +1s, no 0 and no -1s

The vote is passed.

Voting thread : 
https://lists.apache.org/thread.html/ra8edb5d192bed8bec49ae93a4a7f441b5ccc87e75144ceb6ded3ea97%40%3Cdev.tvm.apache.org%3E



-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm/issues/8997

Re: [apache/tvm] [VOTE] Adopt New Code Review Guideline (#8928)

2021-09-13 Thread Tianqi Chen
Closed #8928.

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm/issues/8928#event-5295424240

Re: [apache/tvm] [VOTE] Adopt New Code Review Guideline (#8928)

2021-09-13 Thread Tianqi Chen
https://github.com/apache/tvm/issues/8997

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm/issues/8928#issuecomment-918561643

[apache/tvm-rfcs] [RFC][Project API] Extend metadata in ProjectOption (#33)

2021-09-13 Thread Gustavo Romero
Hi, could the following RFC be reviewed please?

It is about extending the current metadata associated with project options 
returned by the Project API.

Thank you.

Cheers,
Gustavo
You can view, comment on, or merge this pull request online at:

  https://github.com/apache/tvm-rfcs/pull/33

-- Commit Summary --

  * [RFC][Project API] Extend metadata in ProjectOption

-- File Changes --

A rfcs/0020-project_api_extend_metadata.md (252)

-- Patch Links --

https://github.com/apache/tvm-rfcs/pull/33.patch
https://github.com/apache/tvm-rfcs/pull/33.diff

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm-rfcs/pull/33


Re: [apache/tvm-rfcs] [RFC][Project API] Extend metadata in ProjectOption (#33)

2021-09-13 Thread Gustavo Romero
cc @areusch @manupa-arm 

-- 
You are receiving this because you are subscribed to this thread.
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm-rfcs/pull/33#issuecomment-918581638

Re: [MAINTAINER SUMMIT] User-space requirements for accelerator drivers

2021-09-13 Thread Arnd Bergmann
>n Mon, Sep 13, 2021 at 3:54 PM Daniel Vetter  wrote:

> > One straightforward hardware independent low-level API would
> > be the traditional BLAS GEMM call[1] for matrix multiplication
> > and its variants (integer, float, bfloat16, ...).  Most of the frameworks
> > are able to use SGEMM to do the actual calculation since that
> > has optimized versions for most CPUs and GPUs, and most
> > hardware accelerators should be able to provide an
> > implementation of this that doesn't completely suck. This
> > can be used for both inferencing and training.
>
> I think BLAS are too high-level for these. Sure fore perfect speed the
> vendor probably wants to have their own BLAS thing, their own NN
> optmizer and a heap of other things, but for the low-level userspace
> we're talking about here that pretty much doesn't matter.

I suppose high-level vs low-level is not the correct distinction here,
it's more like fixed-function vs programmable.

As a fixed-function interface, something like GEMM is probably as
low-level as you would want to get, as it's big enough to make sense
as a single atomic command, but small enough to be able to build on
top of it.

> I think a really good example of this is the compute stack Intel is building:
> - level0 is the absolute bare-bones low level driver. For this
> discussion here that's enough of a userspace to make at least Dave&me
> happy. In 3d this would be vulkan. In AI/NN space, there's nothing
> here, at least nothing cross-vendor.
> - Then there's the entire OneApi ecosystem on top. Lots of this is
> open, some of it is closed, but from the pov of an accel stack it's
> all looking like applications, not like driver code. BLAS is sitting
> here. For AI/NN this is pytorch, tensorflow and all these higher-level
> frameworks (which often have quite sophisticated optimizers of their
> won)

Looking at OneAPI, I see a BLAS implementation (oneMKL) next to
somewhat higher-level abstraction (oneDNN). Which of the two are
the generic frameworks (pytorch/tensorflow/...) built on top of?

The oneDNN interface looks like it could be implemented not only on
top of level0 but also layered above some BLAS library or as a thin
wrapper above a fixed-function kernel interface that provides similar
high-level abstractions. Is that a correct understanding? It also seems
like this is similar in purpose to Apple's BNNS library.

> Especially BLAS isn't the most impressive, since largely it's fused
> multiple-add benchmark and not much else. Ok, enormous amounts of
> tuning to perfectly exploit the execution bw and interconnect/cache
> hierarchy of your chip, whatever it is. That's often something vendors
> don't like sharing (intel's math kernels are still closed afaik)
> because it leaks a bit much about actual implementation details of the
> chip as opposed to how it's programmed. Also not something I really
> care about with my maintainer hat on.

It's not /just/ benchmarks, it's actually being used directly underneath
the high-level frameworks precisely because it is simple, portable and
well optimized. If there is a higher-level interface like oneDNN that
is usable by the common frameworks, using a subset of that as a
fixed-function interface for the kernel may be a good alternative
(or at least complementary) to a fully programmable interface.

I realize that fixed-function is not fashionable on GPUs, but they
are widely used in other areas (video codecs, crypto, ...) even when
you are running precompiled code on the accelerator hardware.
This would of course replace the question of open source user space
with the question of open-source firmware, as the user side would
become mostly while the accelerator goes from dynamically created
to a firmware blob.

   Arnd

-
To unsubscribe, e-mail: dev-unsubscr...@tvm.apache.org
For additional commands, e-mail: dev-h...@tvm.apache.org



Re: [apache/tvm-rfcs] [RFC]PyTorchTVM (#25)

2021-09-13 Thread Junru Shao
It seems that we all reach consensus and no more comments are raised for 
several days, let's get this RFC finally merged! 🔥 🎉 

-- 
You are receiving this because you commented.
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm-rfcs/pull/25#issuecomment-918661725

Re: [apache/tvm-rfcs] [RFC]PyTorchTVM (#25)

2021-09-13 Thread Junru Shao
Merged #25 into main.

-- 
You are receiving this because you commented.
Reply to this email directly or view it on GitHub:
https://github.com/apache/tvm-rfcs/pull/25#event-5296055520