Hi Timo,

On Sun, Sep 15, 2024 at 04:30:44PM +0200, Timo Röhling wrote:
> * Helmut Grohne <hel...@subdivi.de> [2024-09-09 18:38]:
> > > And it will not solve cross-building for packages which interrogate
> > > numpy.get_include() directly, as SciPy does in its Meson build.
> > 
> > But this is not a matter of numpy-config then. Would you mind detailing
> > how numpy.get_include() works? Generally speaking, this sounds like a
> > bad idea in any case.
> 
> The get_include() upstream implementation essentially boils down to
> 
>     import numpy._core as _core
>     return os.path.join(os.path.dirname(_core.__file__), "include")

On my system, _core.file becomes
"/usr/lib/python3/dist-packages/numpy/_core/__init__.py" Do you mean
numpy.core rather than numpy._core? (Not sure what the difference is.)

> This is actually the One True Source from which everything else we discussed
> derives, i.e., numpy-config is a wrapper script around numpy._configtool,
> which computes --pkgconfigdir as
> 
>     Path(numpy.get_include()) / ".." / "lib" / "pkgconfig"

This looks next to unfixable to me. If we want to support cross building
here, we need to communicate the host architecture in some way. The
current upstream way of communicating it is "The architecture of the
Python interpreter" and that's bad for cross compiling on Debian. The
other way that's really bad is DEB_HOST_* environment variables as that
only works in Debian package builds and not when using Debian as a base
to cross build software manually.

> Therefore, in order to keep everything consistent, my patch [1] primarily
> changes the numpy.get_include() behavior and resorts to the aforementioned
> $PKG_CONFIG hack in order to determine which path should be returned.

Effectively, you propose that the architecture is implicitly
communicated via the PKG_CONFIG environment variable. As mentioned
elsewhere, this is a very unusual way, but given the options available I
more and more see merit here.

> This works because I *also* let python3-numpy-dev install a sane numpy.pc at
> /usr/lib/${DEB_HOST_MULTIARCH}/pkgconfig (which in turn is the reason why I
> need to patch test_configtool in [1]: the assumption that pkgconfigdir
> points to the NumPy module tree no longer holds).
> 
> [1] 
> https://sources.debian.org/src/numpy/1:2.1.1+ds-1/debian/patches/0008-Fix-path-configuration-to-work-with-Debian-specific-.patch/

Let me suggest yet another way!

Since numpy is a Python extension, I suspect that the majority of its
users is building some dependent Python extension. Is there actually
another way of using it? (If yes, the rest may be a bad idea.)

We already communicate the host architecture to extension builds via
the _PYTHON_SYSCONFIGDATA_NAME environment variable. Once setting it,
the sysconfig module behaves differently. For instance the MULTIARCH
variable changes. How about changing numpy.get_include() to look into
sysconfig for producing an architecture-dependent path and then
symlinking the .pc file into place such that the standard upstream way
via PKG_CONFIG_LIBDIR works again?

I am proposing this, because _PYTHON_SYSCONFIGDATA_NAME already is a
widely established way of doing this.

> > [ Yet another solution snipped ]
> Okay, I admit this is actually not such a bad idea if we are looking to keep
> numpy-config around. It does not solve the get_include() problem, but it
> makes numpy-config more robust.

Indeed.

> But now that I have trapped you sufficiently deep in the rabbit hole of
> NumPy build tooling, do you have another good idea to make get_include()
> work?

Yes, above. In any case, the central question to ask here is:

How should a builder communicate the host architecture to numpy?

Once answering it, the rest falls into place naturally (or in case it
doesn't, the answer to the previous question is not a good one).

Regardless of what the answer is, the question and answer should be
documented in README.multiarch or something similar.

Helmut

Reply via email to