Re: [Numpy-discussion] Cythonizing some of NumPy

David Cournapeau Tue, 01 Sep 2015 04:57:12 -0700

On Tue, Sep 1, 2015 at 8:16 AM, Nathaniel Smith <n...@pobox.com> wrote:


> On Sun, Aug 30, 2015 at 2:44 PM, David Cournapeau <courn...@gmail.com>
> wrote:
> > Hi there,
> >
> > Reading Nathaniel summary from the numpy dev meeting, it looks like
> there is
> > a consensus on using cython in numpy for the Python-C interfaces.
> >
> > This has been on my radar for a long time: that was one of my rationale
> for
> > splitting multiarray into multiple "independent" .c files half a decade
> ago.
> > I took the opportunity of EuroScipy sprints to look back into this, but
> > before looking more into it, I'd like to make sure I am not going astray:
> >
> > 1. The transition has to be gradual
>
> Yes, definitely.
>
> > 2. The obvious way I can think of allowing cython in multiarray is
> modifying
> > multiarray such as cython "owns" the PyMODINIT_FUNC and the module
> > PyModuleDef table.
>
> The seems like a plausible place to start.
>
> In the longer run, I think we'll need to figure out a strategy to have
> source code divided over multiple .pyx files (for the same reason we
> want multiple .c files -- it'll just be impossible to work with
> otherwise). And this will be difficult for annoying technical reasons,
> since we definitely do *not* want to increase the API surface exposed
> by multiarray.so, so we will need to compile these multiple .pyx and
> .c files into a single module, and have them talk to each other via
> internal interfaces. But Cython is currently very insistent that every
> .pyx file should be its own extension module, and the interface
> between different files should be via public APIs.
>
> I spent some time poking at this, and I think it's possible but will
> take a few kluges at least initially. IIRC the tricky points I noticed
> are:
>
> - For everything except the top-level .pyx file, we'd need to call the
> generated module initialization functions "by hand", and have a bit of
> utility code to let us access the symbol tables for the resulting
> modules
>
> - We'd need some preprocessor hack (or something?) to prevent the
> non-main module initialization functions from being exposed at the .so
> level (like 'cdef extern from "foo.h"', 'foo.h' re#defines
> PyMODINIT_FUNC to remove the visibility declaration)
>
> - By default 'cdef' functions are name-mangled, which is annoying if
> you want to be able to do direct C calls between different .pyx and .c
> files. You can fix this by adding a 'public' declaration to your cdef
> function. But 'public' also adds dllexport stuff which would need to
> be hacked out as per above.
>
> I think the best strategy for this is to do whatever horrible things
> are necessary to get an initial version working (on a branch, of
> course), and then once that's done assess what changes we want to ask
> the cython folks for to let us eliminate the gross parts.
>

Agreed.

Regarding multiple cython .pyx and symbol pollution, I think it would be
fine to have an internal API with the required prefix (say `_npy_cpy_`) in
a core library, and control the exported symbols at the .so level. This is
how many large libraries work in practice (e.g. MKL), and is a model well
understood by library users.

I will start the cythonize process without caring about any of that though:
one large .pyx file, and everything build together by putting everything in
one .so. That will avoid having to fight both cython and distutils at the
same time :)

David

>
> (Insisting on compiling everything into the same .so will probably
> also help at some point in avoiding Cython-Related Binary Size Blowup
> Syndrome (CRBSBS), because the masses of boilerplate could in
> principle be shared between the different files. I think some modern
> linkers are even clever enough to eliminate this kind of duplicate
> code automatically, since C++ suffers from a similar problem.)
>
> > 3. We start using cython for the parts that are mostly menial refcount
> work.
> > Things like functions in calculation.c are obvious candidates.
> >
> > Step 2 should not be disruptive, and does not look like a lot of work:
> there
> > are < 60 methods in the table, and most of them should be fairly
> > straightforward to cythonize. At worse, we could just keep them as is
> > outside cython and just "export" them in cython.
> >
> > Does that sound like an acceptable plan ?
> >
> > If so, I will start working on a PR to work on 2.
>
> Makes sense to me!
>
> -n
>
> --
> Nathaniel J. Smith -- http://vorpus.org
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>

_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Cythonizing some of NumPy

Reply via email to