[Numpy-discussion] Re: Need help extending a NumPy array in C

2022-08-19 Thread lpsmith
Thanks for the information!  I've had to work on other project in the meantime, 
but was able to get back to this again.

In an effort to wrap my head around the project's code, I realized that I did 
not have a line like:

#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION

in it.  So, I added the line, fixed the errors the resulted, and recompiled.  
And immediately got segmentation faults.

Investigating, I discovered that the problem was that the pointers to my new 
member variables change mysteriously to invalid values.  The basic flow is:

1) I call PyArray_New with a pointer to my new NamedArrayObject subclass.  
2)  It does stuff
3) It calls the NamedArrayObject_alloc function we wrote, which creates the new 
'rowNames' PyList.
4) It does more stuff
5) It calls the NamedArrayObject_Finalize function we wrote.  The 'rowNames' 
object now has a different, and incorrect, pointer, and at this point if I ever 
call PyList_Size(self->rowNames), I get a segmentation fault.

If, however, I re-comment out the '#define NPY_NO_DEPRECATED_API 
NPY_1_7_API_VERSION" line, everything works fine!  Part 4 above doesn't change 
the 'rowNames' pointer to anything, and it remains valid for the rest of the 
program.

(If I change 1_7_ to 1_23_, I get identical behavior.)

Normally, I'd be happy to just say 'well, I guess I have to use the deprecated 
API', but this issue combined with the fact that I'm getting the "C subclassed 
NumPy array, void scalar, or allocated in a non-standard way" error message, 
makes me think that something has changed in a fundamental way since the code 
was first written in 2014, and that I should do things the new way.  I just... 
don't know what that new way might be ;-)

The upshot is that I have two questions:
1) Does this look like a bug I should file?
2) What areas should I start looking into to change this old code to work with 
modern NumPy?

and possibly:

3) Is there a worked example of a C-extended NumPy array somewhere I could 
steal?

The full code is at

https://github.com/sys-bio/roadrunner/blob/develop/wrappers/Python/roadrunner/PyUtils.cpp

if that helps anyone.  The branch where I'm trying to change things is at

https://github.com/sys-bio/roadrunner/blob/update-numpy/wrappers/Python/roadrunner/PyUtils.cpp

which has a bunch of print statements added, since I was working without a 
debugger on Windows.  The relevant output for a simple Python script that calls 
this was (for the broken #define version):
```
NA_N 0
Debug: PyObject* rr::NamedArrayObject_alloc(PyTypeObject*, Py_ssize_t)
rownames new ref 0x77362180
rownames added to object 0x7747e750
rownames size 0
rownames ref 0x77362180
rownames size 0
Debug: namedArrayObject allocated:  0x7747e750
Debug: namedArrayObject returned obj:  0x7747e750
Debug: Done

Debug: PyObject* rr::NamedArrayObject_Finalize(rr::NamedArrayObject*, PyObject*)
rownames ref 0x55f4f830
Debug: finalizing object self: 0x7747e750; args 0x55aca3e0
rownames ref 0x55f4f830
Debug: NamedArrayObject initialized from constructor. 'None' path taken
Debug: PyObject* 
rr::NamedArrayObject_Finalize_FromConstructor(rr::NamedArrayObject*)
rownames ref 0x55f4f830
rownames ref 0x55f4f830
Debug: Done
```

and for the working non-#define version:

```
NA_N 0
Debug: PyObject* rr::NamedArrayObject_alloc(PyTypeObject*, Py_ssize_t)
rownames new ref 0x77362140
rownames added to object 0x7752e1e0
rownames size 0
rownames ref 0x77362140
rownames size 0
Debug: namedArrayObject allocated:  0x7752e1e0
Debug: namedArrayObject returned obj:  0x7752e1e0
Debug: Done

Debug: PyObject* rr::NamedArrayObject_Finalize(rr::NamedArrayObject*, PyObject*)
rownames ref 0x77362140
Debug: finalizing object self: 0x7752e1e0; args 0x55aca3e0
rownames ref 0x77362140
Debug: NamedArrayObject initialized from constructor. 'None' path taken
Debug: PyObject* 
rr::NamedArrayObject_Finalize_FromConstructor(rr::NamedArrayObject*)
rownames ref 0x77362140
rownames ref 0x77362140
Debug: Done
```

Thanks for bearing with me though this long message!  And particular thanks to 
Sebastien for answering my initial question, which I hope to be able to 
actually address again soon ;-)

-Lucian
___
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3/lists/numpy-discussion.python.org/
Member address: arch...@mail-archive.com


[Numpy-discussion] Re: Need help extending a NumPy array in C

2022-08-19 Thread Sebastian Berg
On Fri, 2022-08-19 at 23:56 +, lpsm...@uw.edu wrote:
> Thanks for the information!  I've had to work on other project in the
> meantime, but was able to get back to this again.
> 
> In an effort to wrap my head around the project's code, I realized
> that I did not have a line like:
> 
> #define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION
> 
> in it.  So, I added the line, fixed the errors the resulted, and
> recompiled.  And immediately got segmentation faults.
> 

With it, the fields are hidden completely which is the intention.  But
that also means the size is wrong for subclassing.  You would have to
use `PyArrayObject_fields` although that basically circumvents the
deprecation, it somehwat makes sense, you should just only use it in
that one place I guess (not for actual access to strides).

Overall, I am not sure if this will ever help us much, but the solution
seems simple here.  There should be no fundamental changes with the
exception of the size of `PyArrayObject_fields`.

- Sebastian


> Investigating, I discovered that the problem was that the pointers to
> my new member variables change mysteriously to invalid values.  The
> basic flow is:
> 
> 1) I call PyArray_New with a pointer to my new NamedArrayObject
> subclass.  
> 2)  It does stuff
> 3) It calls the NamedArrayObject_alloc function we wrote, which
> creates the new 'rowNames' PyList.
> 4) It does more stuff
> 5) It calls the NamedArrayObject_Finalize function we wrote.  The
> 'rowNames' object now has a different, and incorrect, pointer, and at
> this point if I ever call PyList_Size(self->rowNames), I get a
> segmentation fault.
> 
> If, however, I re-comment out the '#define NPY_NO_DEPRECATED_API
> NPY_1_7_API_VERSION" line, everything works fine!  Part 4 above
> doesn't change the 'rowNames' pointer to anything, and it remains
> valid for the rest of the program.
> 
> (If I change 1_7_ to 1_23_, I get identical behavior.)
> 
> Normally, I'd be happy to just say 'well, I guess I have to use the
> deprecated API', but this issue combined with the fact that I'm
> getting the "C subclassed NumPy array, void scalar, or allocated in a
> non-standard way" error message, makes me think that something has
> changed in a fundamental way since the code was first written in
> 2014, and that I should do things the new way.  I just... don't know
> what that new way might be ;-)
> 
> The upshot is that I have two questions:
> 1) Does this look like a bug I should file?
> 2) What areas should I start looking into to change this old code to
> work with modern NumPy?
> 
> and possibly:
> 
> 3) Is there a worked example of a C-extended NumPy array somewhere I
> could steal?
> 
> The full code is at
> 
> https://github.com/sys-bio/roadrunner/blob/develop/wrappers/Python/roadrunner/PyUtils.cpp
> 
> if that helps anyone.  The branch where I'm trying to change things
> is at
> 
> https://github.com/sys-bio/roadrunner/blob/update-numpy/wrappers/Python/roadrunner/PyUtils.cpp
> 
> which has a bunch of print statements added, since I was working
> without a debugger on Windows.  The relevant output for a simple
> Python script that calls this was (for the broken #define version):
> ```
> NA_N 0
> Debug: PyObject* rr::NamedArrayObject_alloc(PyTypeObject*,
> Py_ssize_t)
> rownames new ref 0x77362180
> rownames added to object 0x7747e750
> rownames size 0
> rownames ref 0x77362180
> rownames size 0
> Debug: namedArrayObject allocated:  0x7747e750
> Debug: namedArrayObject returned obj:  0x7747e750
> Debug: Done
> 
> Debug: PyObject* rr::NamedArrayObject_Finalize(rr::NamedArrayObject*,
> PyObject*)
> rownames ref 0x55f4f830
> Debug: finalizing object self: 0x7747e750; args 0x55aca3e0
> rownames ref 0x55f4f830
> Debug: NamedArrayObject initialized from constructor. 'None' path
> taken
> Debug: PyObject*
> rr::NamedArrayObject_Finalize_FromConstructor(rr::NamedArrayObject*)
> rownames ref 0x55f4f830
> rownames ref 0x55f4f830
> Debug: Done
> ```
> 
> and for the working non-#define version:
> 
> ```
> NA_N 0
> Debug: PyObject* rr::NamedArrayObject_alloc(PyTypeObject*,
> Py_ssize_t)
> rownames new ref 0x77362140
> rownames added to object 0x7752e1e0
> rownames size 0
> rownames ref 0x77362140
> rownames size 0
> Debug: namedArrayObject allocated:  0x7752e1e0
> Debug: namedArrayObject returned obj:  0x7752e1e0
> Debug: Done
> 
> Debug: PyObject* rr::NamedArrayObject_Finalize(rr::NamedArrayObject*,
> PyObject*)
> rownames ref 0x77362140
> Debug: finalizing object self: 0x7752e1e0; args 0x55aca3e0
> rownames ref 0x77362140
> Debug: NamedArrayObject initialized from constructor. 'None' path
> taken
> Debug: PyObject*
> rr::NamedArrayObject_Finalize_FromConstructor(rr::NamedArrayObject*)
> rownames ref 0x77362140
> rownames ref 0x77362140
> Debug: Done
> ```
> 
> Thanks for bearing with me though this long message!  And particular
> thanks to Sebastien for answering my initial questi