date:20220609

[Numpy-discussion] Re: Importing Numpy when using libpython

2022-06-09 Thread Matthew Brett

Hi,

On Wed, Jun 8, 2022 at 7:34 PM  wrote:
>
> Hi All,
>
> Hope this is the right forum.
>
> I am working on using Numpy from the programming language Racket.
>
> My plan of attack is to use Python via `libpython`.
> That is, from Racket I use `ffilib` to load `libpython` and from there I use 
> the C API
> to control Python.
>
> Here is what works at the moment:
>
> 1. From Racket I can load `libpython` via `ffilib`.
> 2. It is possible to initialize a Python process and run Python programs in 
> it.
> 3. It is possible to import modules written in Python.
>
> What doesn't work is importing `numpy`.
>
> The error I get when I run `import numpy` is:
>
> ImportError: 
> dlopen(/usr/local/lib/python3.10/site-packages/numpy/core/_multiarray_umath.cpython-310-darwin.so,
>  0x0002):
>   symbol not found in flat namespace 
> '_PyBaseObject_Type'

There is some discussion of libpython, embedded interpreters and the
Python namespace symbols here:

https://mail.python.org/pipermail/distutils-sig/2016-February/028275.html

and the following discussion, especially:

https://mail.python.org/pipermail/distutils-sig/2016-February/028286.html

Cheers,

Matthew
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

[Numpy-discussion] Re: Importing Numpy when using libpython

2022-06-09 Thread Petr Viktorin


On 09. 06. 22 10:35, Matthew Brett wrote:

Hi,

On Wed, Jun 8, 2022 at 7:34 PM  wrote:


Hi All,

Hope this is the right forum.

I am working on using Numpy from the programming language Racket.

My plan of attack is to use Python via `libpython`.
That is, from Racket I use `ffilib` to load `libpython` and from there I use 
the C API
to control Python.

Here is what works at the moment:

1. From Racket I can load `libpython` via `ffilib`.
2. It is possible to initialize a Python process and run Python programs in it.
3. It is possible to import modules written in Python.

What doesn't work is importing `numpy`.

The error I get when I run `import numpy` is:

 ImportError: 
dlopen(/usr/local/lib/python3.10/site-packages/numpy/core/_multiarray_umath.cpython-310-darwin.so,
 0x0002):
   symbol not found in flat namespace 
'_PyBaseObject_Type'


There is some discussion of libpython, embedded interpreters and the
Python namespace symbols here:

https://mail.python.org/pipermail/distutils-sig/2016-February/028275.html


Note that since that mail was written in 2016, Fedora changed to be 
closer to Debian:


- libpython3.X.so still contains the actual python runtime
- /usr/bin/python3.X still links to libpython to do the actual work
- python extension module packages still depend on the libpython 
package, but by default now contain extension modules that *ARE NOT* 
linked against libpython3.X.so
- python extension modules compiled locally now *DO NOT* get linked 
against libpython3.X.so by default (AFAIK)


This means that extension modules get the symbols from the Python 
runtime that imported them, which means it's possible to use different 
runtime (like a debug build of Python -- in fact, this change was made 
after debug builds of Python were made API-compatible with regular builds).


Software that embeds Python (i.e. calls Py_Initialize rather than has 
PyInit_modulename called) still links to libpython.
It does need to use RTLD_GLOBAL with libpython, and can't have multiple 
Python runtimes in a single process (which does suck for “plugins” that 
embed Python, like mod_wsgi for the Apache web server).


Does Racket's ffilib support RTLD_GLOBAL?
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

[Numpy-discussion] Re: Importing Numpy when using libpython

2022-06-09 Thread jensaxel

Hi,

Thank you Matthew and Petr - you pointed in the right direction.
The 2016-dicussion and Petr's explanation made it clear what the problem was.

Petr Viktorin:
> Does Racket's ffilib support RTLD_GLOBAL?

Yes. All I had to do was to change  (ffi-lib path-to-libpython) 
into (ffi-lib path-to-libpython #:global? #t).

Thanks again - it was not at all obvious to me what was going wrong.

/Jens Axel
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

[Numpy-discussion] Njumpy-stl

2022-06-09 Thread frank . underdown

Hello,

I recently discovered numpy-stl.  I am looking for documentations and tutorials 
to learn how to use this library.
Also, if you know of any books where this library is discussed, I would 
appreciate you making a recommendation.

Thank you in advance for your help.

Cheers!

Frank
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

[Numpy-discussion] Re: Njumpy-stl

2022-06-09 Thread Matti Picus

Hmm. Do you mean this project [0] on PyPI? It is developed by a team 
entirely separate from NumPy. The README should have an appropriate 
disclaimer. Frank you can find more information about the library in 
their documentation [1]


Matti

[0] https://pypi.org/project/numpy-stl/

[1]  https://numpy-stl.readthedocs.io/en/latest/

On 9/6/22 17:27, frank.underdown@quaise.energy wrote:

Hello,

I recently discovered numpy-stl.  I am looking for documentations and tutorials 
to learn how to use this library.
Also, if you know of any books where this library is discussed, I would 
appreciate you making a recommendation.

Thank you in advance for your help.

Cheers!

Frank
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: matti.pi...@gmail.com

___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

[Numpy-discussion] Re: Fuzzing integration of Numpy into OSS-Fuzz

2022-06-09 Thread david korczynski

Coverage-guided fuzzing is fundamentally just a technique that iteratively
generates input that explores more code relative to the possible execution
space of the code targeted. What the fuzzer gives you to play with is a
byte-array that you can massage in any way possible and pass it into the code
under analysis. The fuzz engine will then observe whether the code under
analysis executed in a way that was not seen before, and save the given byte
array.

Using this you can test for so many things. The way you describe using
hypothesis in terms of testing a given input and whether some post condition is
satisfied: you can do this with fuzzing by converting the byte-array from the
fuzzer into higher level data structures, pass these data structures into the
target code and then use the same asserts to see if all post conditions are
satisfied.

In the context of Numpy, what we can test for are:
1) Memory corruption issues in the native code (OSS-Fuzz will compile it with
sanitizers).
2) Unexpected exceptions, i.e. call functions in Numpy with a data that is
seeded with fuzz input and ensure no exceptions are raised besides those
documented.
3) Behavioural testing similar to how you describe using Hypothesis.

In the OSS-Fuzz PR I added a fuzzer that tests option (2) listed above:
https://github.com/google/oss-fuzz/pull/7681

You're right in that the fuzzing will continue to explore the search space
whenever it runs into an issue. OSS-Fuzz, however, comes with a large backend
that manages all the running of the fuzzers and will do de-duplication such
that a bug is only reported once even if the fuzzer hits it N times.

Kind regards,
David

On 08/06/2022 21:46, Aaron Meurer wrote:
I know the hypothesis developers consider Hypothesis to be different from fuzzing. But I've never
been exactly clear just what is meant by "fuzzing" in the context you are suggesting.
When you say you want to "fuzz NumPy" what sorts of things would the fuzzer be doing?
Would you need to tell it what various NumPy functions and operations are and how to generate
inputs for them? Or does it do that automatically somehow? And how would you tell it what sorts of
things to check for a given set of inputs?

For a Hypothesis test, you would tell it explicitly what the input is, like "a is an array with
some given properties (e.g., >1 dim, has a numerical dtype, has positive values, etc.)". Then
you explicitly write a bunch of assertions that such arrays should satisfy (like some f(a).all()). It
then generates examples from the given set of inputs in an attempt to falsify the given assertions.
The whole process requires a considerable amount of human work because you have to figure out a bunch
of properties that various operations should satisfy on certain sets of inputs and write tests for
them. I'm still unclear on just what "fuzzing" is, but my impression has always been that
it's not this.

One difference I do know between hypothesis and a fuzzer is that hypothesis is
more geared toward finding test failures and getting you to fix them. So for
example, Hypothesis only runs 100 examples by default each run. You have to
manually increase that number to run more. Another difference is if Hypothesis
finds a failure, it will fixate on that failure and always return it, even to
the detriment of finding other possible failures, until you either fix it or
modify the strategies to ignore it. My understanding is that a fuzzer is more
geared toward exploring a wide search space and finding as many possible issues
as possible, even if there isn't the immediate possibility of them becoming
fixed.

I've used Hypothesis on several projects that depend on NumPy and incidentally
found several bugs in NumPy with it (for example,
https://github.com/numpy/numpy/issues/15753).

Aaron Meurer

On Wed, Jun 8, 2022 at 8:44 AM david korczynski
mailto:da...@adalogics.com>> wrote:
I'm not 100% about the important differences, so this is a bit of an
intuitive analysis from my side (I know little about Hypothesis and more
about fuzzing).

Hypothesis has support for traditional fuzzing [sic]:
https://hypothesis.readthedocs.io/en/latest/details.html?highlight=fuzz#use-with-external-fuzzers
and OSS-Fuzz supports using Python fuzzing by way of Hypothesis
https://google.github.io/oss-fuzz/getting-started/new-project-guide/python-lang/#hypothesis
although it will be seeded with the Atheris fuzzer and based on this
issue https://github.com/google/atheris/issues/20 it seems Atheris +
Hypothesis might not be working particularly well together.

I think based on the above and skimming through the Hypothesis docs that
there are many similarities between fuzzing (Atheris specifically) but
the underlying engine that explores the input space is different.
Fuzzing is coverage-guided (which I don't think Hypothesis is, but I
could be wrong), meaning the target program is instrumented to identify
if a newly generated input explores new code. I

[Numpy-discussion] Re: Fuzzing integration of Numpy into OSS-Fuzz

2022-06-09 Thread Zac Hatfield-Dodds

As a maintainer of Hypothesis and sometime-fuzzing-researcher, hopefully 
sharing my perspective might help.

Firstly, fuzzing and property-based testing are clearly related fields!  
Personally I tend to divide them more by the UX than underlying tool: PBT tends 
to be quick (seconds), done by developers, look like unit tests, check 
semantics.  Fuzzing tends to run for much longer (hours to weeks), done by 
security specialists, look like custom binaries/scripts, and check for crashes 
and memory errors.  
https://hypothesis.works/articles/what-is-property-based-testing/ digs into 
this in some more detail, though I don't personally find the definitions very 
useful - mostly because everyone has their own so they're not much use for 
communication!

I also really like these three essays from my now-colleague Nelson: 
https://blog.nelhage.com/post/property-testing-is-fuzzing/ 
https://blog.nelhage.com/post/property-testing-like-afl/ and 
https://blog.nelhage.com/post/two-kinds-of-testing/

I think Matti's underlying question is really "what would Numpy get out of 
OSS-Fuzz, and is it worth it?".

- OSS-Fuzz is designed around AFL-style coverage-guided fuzzing of compiled 
languages, with additional use of sanitizers to detect memory errors and 
undefined behaviour.  This makes it highly effective at catching certain C 
programming bugs, including security classics like buffer overflows, but a 
relatively poor choice for high-level semantic tests (where Hypothesis shines).

- The most effective harnesses tend to have a minimum of logic between the 
bytes produced by the fuzzer, and internal logic - for example, David's initial 
proposal just calls `np.loadtxt()` on a fuzzer-generated string.  While Atheris 
has a pretty nice Python interface, it's still designed around very simple 
types for simplicity and speed.  The coverage feedback for an evolutionary 
search also gives asymptotically better performance, which is often a really 
big deal in practice (in my experiments, usually overtaking heuristic-random 
after a few hundred or thousand seconds)

- There's a pretty serious impedance mismatch between Atheris and the more 
complicated parsers inside Hypothesis strategies.  They're much slower than 
Atheris' native code, but also much more expressive and better at finding weird 
edge cases like subnormals, edge cases, signalling nans, etc; equally important 
IMO is that they make it easy to express _all_ possible values instead of just 
the simple ones.  However, that comes at the cost of fewer cases-per-second and 
more rejection sampling; conversely Hypothesis gives you free replay and 
shrinking of any test discovered via Atheris simply by running the test 
normally.

- I designed https://hypofuzz.com/ with an eye to this and making the UX as 
simple as possible; if you're interested I can't provide server(s) to run it on 
but of course it's free for community OSS projects.  There's also 
https://github.com/HypothesisWorks/hypothesis/issues/3086 to provide 
lower-overhead hooks for symbolic execution and Atheris, though it's slow going 
as I don't have enough free time to push that forward at the moment.

I haven't gotten OSS-Fuzz emails myself, but I know they've put a lot of work 
into making the reporting reasonably compact and actionable.

So... if you want to find low-level problems with the C parts of Numpy, I'd 
suggest trying out OSS-Fuzz.  If you want to test the high-level semantics, I'd 
stick with Hypothesis; and if you want to fuzz property-based tests I'd 
recommend HypoFuzz over Atheris unless the latter is much easier to set up 
(plausible, if OSS-Fuzz handles all the infra for you!).

If Numpy maintainers - or anyone else - would like to discuss this in more 
detail, I'll also be at SciPy US in a few weeks and happy to talk it over or 
spend some sprint time then.
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com

[Numpy-discussion] Re: Importing Numpy when using libpython

[Numpy-discussion] Re: Importing Numpy when using libpython

[Numpy-discussion] Re: Importing Numpy when using libpython

[Numpy-discussion] Njumpy-stl

[Numpy-discussion] Re: Njumpy-stl

[Numpy-discussion] Re: Fuzzing integration of Numpy into OSS-Fuzz

[Numpy-discussion] Re: Fuzzing integration of Numpy into OSS-Fuzz

7 matches

Site Navigation

Mail list logo

Footer information