Re: [Python-Dev] Extended Buffer Interface/Protocol

2007-03-26 Thread Carl Banks
(cc'ing back to Python-dev; the original reply was intended for it by I 
had an email malfunction.)

Travis Oliphant wrote:
 >Carl Banks wrote:
>> 3. Allow getbuffer to return an array of "derefence offsets", one for 
>> each dimension.  For a given dimension i, if derefoff[i] is 
>> nonnegative, it's assumed that the current position (base pointer + 
>> indexing so far) is a pointer to a subarray, and derefoff[i] is the 
>> offest in that array where the current position goes for the next 
>> dimension.  If derefoff[i] is negative, there is no dereferencing.  
>> Here is an example of how it'd work:
> 
> 
> This sounds interesting, but I'm not sure I totally see it.  I probably 
> need a picture to figure out what you are proposing. 

I'll get on it sometime.  For now I hope an example will do.


> The derefoff 
> sounds like some-kind of offset.   Is that enough?  Why not just make 
> derefoff[i] == 0 instead of negative?

I may have misunderstood something.  I had thought the values exported 
by getbuffer could change as the view narrowed, but I'm not sure if it's 
the case now.  I'll assume it isn't for now, because it simplifies 
things and demonstrates the concept better.

Let's start from the beginning.  First, change the prototype to this:

 typedef PyObject *(*getbufferproc)(PyObject *obj, void **buf,
Py_ssize_t *len, int *writeable,
char **format, int *ndims,
Py_ssize_t **shape,
Py_ssize_t **strides,
int **isptr)

"isptr" is a flag indicating whether, for a certain dimension, the 
positision we've strided to so far is a pointer that should be followed 
before proceeding with the rest of the strides.

Now here's what a general "get_item_pointer" function would look like, 
given a set of indices:

void* get_item_pointer(int ndim, void* buf, Py_ssize_t* strides,
Py_ssize_t* derefoff, Py_ssize_t *indices) {
 char* pointer = (char*)buf;
 int i;
 for (i = 0; i < ndim; i++) {
 pointer += strides[i]*indices[i];
 if (isptr[i]) {
 pointer = *(char**)pointer;
 }
 }
 return (void*)pointer;
}


> I don't fully understand the PIL example you gave.

Yeah.  How about more details.  Here is a hypothetical image data object 
structure:

struct rgba {
 unsigned char r, g, b, a;
};

struct ImageObject {
 PyObject_HEAD;
 ...
 struct rgba** lines;
 Py_ssize_t height;
 Py_ssize_t width;
 Py_ssize_t shape_array[2];
 Py_ssize_t stride_array[2];
 Py_ssize_t view_count;
};

"lines" points to malloced 1-D array of (struct rgba*).  Each pointer in 
THAT block points to a seperately malloced array of (struct rgba).  Got 
that?

In order to access, say, the red value of the pixel at x=30, y=50, you'd 
use "lines[50][30].r".

So what does ImageObject's getbuffer do?  Leaving error checking out:

PyObject* getbuffer(PyObject *self, void **buf, Py_ssize_t *len,
 int *writeable, char **format, int *ndims,
 Py_ssize_t **shape, Py_ssize_t **strides,
 int **isptr) {

 static int _isptr[2] = { 1, 0 };

 *buf = self->lines;
 *len = self->height*self->width;
 *writable = 1;
 *ndims = 2;
 self->shape_array[0] = height;
 self->shape_array[1] = width;
 *shape = &self->shape_array;
 self->stride_array[0] = sizeof(struct rgba*);  /* yep */
 self->stride_array[1] = sizeof(struct rgba);
 *strides = &self->stride_array;
 *isptr = _isptr;

 self->view_count ++;
 /* create and return view object here, but for what? */
}


There are three essential differences from a regular, contiguous array.

1. buf is set to point at the array of pointers, not directly to the data.

2. The isptr thing.  isptr[0] is true to indicate that the first 
dimension is an array of pointers, not the actual data.

3. stride[0] is sizeof(struct rgba*), not self->width*sizeof(struct 
rgba) like it would be for a contiguous array.  This is because your 
first stride is through an array of pointers, not the data itself.


So let's examine what "get_item_pointer" above will do given these 
values.  Once again, we're looking for the pixel at x=30, y=50.

First, we set pointer to buf, that is, self->lines.

Then we take the first stride: we add index[0]+strides[0], that is, 
50*4=200, to poitner.  pointer now equals &self->lines[50].

Now, we check isptr[0].  We see that it is true.  Thus, the position 
we've strided to is, in fact, a pointer to a subarray where the actual 
data i

Re: [Python-Dev] Extended Buffer Interface/Protocol

2007-03-26 Thread Carl Banks
Travis Oliphant wrote:
> Carl Banks wrote:
>> We're done.  Return pointer.
> 
> Thank you for this detailed example.  I will have to parse it in more 
> depth but I think I can see what you are suggesting.
> 
>> First, I'm not sure why getbuffer needs to return a view object. 
> 
> The view object in your case would just be the ImageObject.  

ITSM that we are using the word "view" very differently.  Consider this 
example:

A = zeros((100,100))
B = A.transpose()

In this scenario, A would be the exporter object, I think we both would 
call it that.  When I use the word "view", I'm referring to B.  However, 
you seem to be referring to the object returned by A.getbuffer, right? 
What term have you been using to refer to B?  Obviously, it would help 
the discussion if we could get our terminology straight.

(Frankly, I don't agree with your usage; it doesn't agree with other 
uses of the word "view".  For example, consider the proposed Python 3000 
dictionary views:

D = dict()
V = D.items()

Here, V is the view, and it's analogous to B in the above example.)

I'd suggest the object returned by A.getbuffer should be called the 
"buffer provider" or something like that.

For the sake of discussion, I'm going to avoid the word "view" 
altogether.  I'll call A the exporter, as before.  B I'll refer to as 
the requestor.  The object returned by A.getbuffer is the provider.


 > The reason
 > I was thinking the function should return "something" is to provide more
 > flexibility in what a view object actually is.
 >
> I've also been going back and forth between explicitly passing all this 
> information around or placing it in an actual view-object.  In some 
> sense, a view object is a NumPy array in my mind.  But, with the 
> addition of isptr we are actually expanding the memory abstraction of 
> the view object beyond an explicit NumPy array.
>
> In the most common case, I envisioned the view object would just be the 
> object itself in which case it doesn't actually have to be returned. 
> While returning the view object would allow unspecified flexibilty in 
> the future, it really adds nothing to the current vision.
 >
> We could add a view object separately as an abstract API on top of the 
> buffer interface.

Having thought quite a bit about it, and having written several abortive 
replies, I now understand it and see the importance of it.  getbuffer 
returns the object that you are to call releasebuffer on.  It may or may 
not be the same object as exporter.  Makes sense, is easy to explain.

It's easy to see possible use cases for returning a different object.  A 
  hypothetical future incarnation of NumPy might shift the 
responsibility of managing buffers from NumPy array object to a hidden 
raw buffer object.  In this scenario, the NumPy object is the exporter, 
but the raw buffer object the provider.

Considering this use case, it's clear that getbuffer should return the 
shape and stride data independently of the provider.  The raw buffer 
object wouldn't have that information; all it does is store a pointer 
and keep a reference count.  Shape and stride is defined by the exporter.


>> Second question: what happens if a view wants to re-export the buffer? 
>> Do the views of the buffer ever change?  Example, say you create a 
>> transposed view of a Numpy array.  Now you want a slice of the 
>> transposed array.  What does the transposed view's getbuffer export?
> 
> Basically, you could not alter the internal representation of the object 
> while views which depended on those values were around.
>
> In NumPy, a transposed array actually creates a new NumPy object that 
> refers to the same data but has its own shape and strides arrays.
> 
> With the new buffer protocol, the NumPy array would not be able to alter 
> it's shape/strides/or reallocate its data areas while views were being 
> held by other objects.

But requestors could alter their own copies of the data, no?  Back to 
the transpose example: B itself obviously can't use the same "strides" 
array as A uses.  It would have to create its own strides, right?

So, what if someone takes a slice out of B?  When calling B.getbuffer,
does it get B's strides, or A's?

I think it should get B's.  After all, if you're taking a slice of B, 
don't you care about the slicing relative to B's axes?  I can't really 
think of a use case for exporting A's stride data when you take a slice 
of B, and it doesn't seem to simplify memory management, because B has 
to make it's own copies anyways.


> With the shape and strides information, the format information, and the 
> data buffer itself, there are 

Re: [Python-Dev] Extended Buffer Interface/Protocol

2007-03-26 Thread Carl Banks
Travis Oliphant wrote:
> Carl Banks wrote:
> 
>> Tr
>> ITSM that we are using the word "view" very differently.  Consider 
>> this example:
>>
>> A = zeros((100,100))
>> B = A.transpose()
> 
> 
> You are thinking of NumPy's particular use case.  I'm thinking of a 
> generic use case.  So, yes I'm using the word view in two different 
> contexts.
> 
> In this scenario, NumPy does not even use the buffer interface.  It 
> knows how to transpose it's own objects and does so by creating a new 
> NumPy object (with it's own shape and strides space) with a data buffer 
> pointed to by "A".

I realized that as soon as I tried a simple Python demonstration of it. 
  So it's a poor example.  But I hope it's obvious how it would 
generalize to a different type.


>>> Having such a thing as a view object would actually be nice because 
>>> it could hold on to a particular view of data with a given set of 
>>> shape and strides (whose memory it owns itself) and then the 
>>> exporting object would be free to change it's shape/strides 
>>> information as long as the data did not change.
>>
>>
>> What I don't undestand is why it's important for the provider to 
>> retain this data.  The requestor only needs the information when it's 
>> created; it will calculate its own versions of the data, and will not 
>> need the originals again, so no need to the provider to keep them around.
> 
> That is certainly a policy we could enforce (and pretty much what I've 
> been thinking).  We just need to make it explicit that the shape and 
> strides provided is only guaranteed up until a GIL release (i.e. 
> arbitrary Python code could change these memory areas both their content 
> and location) and so if you need them later, make your own copies.
> 
> If this were the policy, then NumPy could simply pass pointers to its 
> stored shape and strides arrays when the buffer interface is called but 
> then not worry about re-allocating these arrays before the "buffer" lock 
> is released.   Another object could hold on to the memory area of the 
> NumPy array but would have to store shape and strides information if it 
> wanted to keep it.
> NumPy could also just pass a pointer to the char * representation of the 
> format (which in NumPy would be stored in the dtype object) and would 
> not have to worry about the dtype being re-assigned later.

Bingo!  This is my preference.


>>>> The reason I ask is: if things like "buf" and "strides" and "shape" 
>>>> could change when a buffer is re-exported, then it can complicate 
>>>> things for PIL-like buffers.  (How would you account for offsets in 
>>>> a dimension that's in a subarray?)
>>>
>>>
>>> I'm not sure what you mean, offsets are handled by changing the 
>>> starting location of the pointer to the buffer.
>>
>>
>>
>> But to anwser your question: you can't just change the starting 
>> location because there's hundreds of buffers.  You'd either have to 
>> change the starting location of each one of them, which is highly 
>> undesirable, or to factor in an offset somehow.  (This was, in fact, 
>> the point of the "derefoff" term in my original suggestion.)
> 
> 
> I get better what your derefoff was doing now.  I was missing the 
> de-referencing that was going on.   Couldn't you still just store a 
> pointer to the start of the array.  In other words, isn't your **isptr  
> suggestion sufficient?   It seems to be.

No.  The problem arises when slicing.  In a single buffer, you would 
adjust the base pointer to point at the element [0,0] of the slice.  But 
you can't do that with multiple buffers.  Instead, you have to add an 
offset after deferencing the pointer to the subarray.

Hence my derefoff proposal.  It dereferenced the pointer, then added an 
offset to get you to the 0 position in that dimension.


>> Anyways, despite the miscommunications so far, I now have a very good 
>> idea of what's going on.  We definitely need to get terms straight.  I 
>> agree that getbuffer should return an object.  I think we need to 
>> think harder about the case when requestors re-export the buffer.  
>> (Maybe it's time to whip up some experimental objects?)
> 
> I'm still not clear what you are concerned about.   If an object 
> consumes the buffer interface and then wants to be able to later export 
> it to another, then from our discussion about the shape/strides and 
> format information, it would have to mainta

Re: [Python-Dev] An updated extended buffer PEP

2007-03-27 Thread Carl Banks
Travis Oliphant wrote:
> Travis Oliphant wrote:
>> Hi Carl and Greg,
>>
>> Here is my updated PEP which incorporates several parts of the 
>> discussions we have been having.
> 
> And here is the actual link:
> 
> http://projects.scipy.org/scipy/numpy/browser/trunk/numpy/doc/pep_buffer.txt 


What's the purpose of void** segments in PyObject_GetBuffer?  It seems 
like it's leftover from an older incarnation?

I'd hope after more recent discussion, we'll end up simplifying 
releasebuffer.  It seems like it'd be a nightmare to keep track of what 
you've released.


Finally, the isptr thing.  It's just not sufficient.  Frankly, I'm 
having doubts whether it's a good idea to support multibuffer at all. 
Sure, it brings generality, but I'm thinking its too hard to explain and 
too hard to get one's head around, and will lead to lots of 
misunderstanding and bugginess.  OTOH, it really doen't burden anyone 
except those who want to export multi-buffered arrays, and we only have 
one shot to do it.  I just hope it doesn't confuse everyone so much that 
no one bothers.

Here's how I would update the isptr thing.  I've changed "derefoff" to 
"subbufferoffsets" to describe it better.


typedef PyObject *(*getbufferproc)(PyObject *obj, void **buf,
Py_ssize_t *len, int *writeable,
char **format, int *ndims,
Py_ssize_t **shape,
Py_ssize_t **strides,
Py_ssize_t **subbufferoffsets);


subbufferoffsets

   Used to export information about multibuffer arrays.  It is an
   address of a ``Py_ssize_t *`` variable that will be set to point at
   an array of ``Py_ssize_t`` of length ``*ndims``.

   [I don't even want to try a verbal description.]

   To demonstrate how subbufferoffsets works, here is am example of a
   function that returns a pointer to an element of ANY N-dimensional
   array, single- or multi-buffered.

void* get_item_pointer(int ndim, void* buf, Py_ssize_t* strides,
 Py_ssize_t* subarrayoffs, Py_ssize_t *indices) {
 char* pointer = (char*)buf;
 int i;
 for (i = 0; i < ndim; i++) {
 pointer += strides[i]*indices[i];
 if (subarraysoffs[i] >= 0) {
 pointer = *(char**)pointer + subarraysoffs[i];
 }
 }
 return (void*)pointer;
 }

   For single buffers, subbufferoffsets is negative in every dimension
   and it reduces to normal single-buffer indexing.  For multi-buffers,
   subbufferoffsets indicates when to dereference the pointer and switch
   to the new buffer, and gives the offset into the buffer to start at.
   In most cases, the subbufferoffset would be zero (indicating it should
   start at the beginning of the new buffer), but can be a positive
   number if the following dimension has been sliced, and thus the 0th
   entry in that dimension would not be at the beginning of the new
   buffer.



Other than that, looks good. :)


Carl Banks
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] An updated extended buffer PEP

2007-03-28 Thread Carl Banks


Travis Oliphant wrote:
> Carl Banks wrote:
>> Travis E. Oliphant wrote:
>>> I think we are getting closer.   What do you think about Greg's idea 
>>> of basically making the provider the bufferinfo structure and having 
>>> the exporter handle copying memory over for shape and strides if it 
>>> wants to be able to change those before the lock is released.
>> It seems like it's just a different way to return the data.  You could 
>> do it by setting values through pointers, or do it by returning a 
>> structure.  Which way you choose is a minor detail in my opinion.  I'd 
>> probably favor returning the information in a structure.
>>
>> I would consider adding two fields to the structure:
>>
>> size_t structsize; /* size of the structure */
> Why is this necessary?  can't you get that by sizeof(bufferinfo)?

In case you want to add something later.  Though if you did that, it 
would be a different major release, meaning you'd have to rebuild 
anyway.  They rashly add fields to the PyTypeObject in the same way. :) 
  So never mind.


>> PyObject* releaser; /* the object you need to call releasebuffer on */ 
> Is this so that another object could be used to manage releases if desired?

Yes, that was a use case I saw for a different "view" object.  I don't 
think it's crucially important to have it, but for exporting objects 
that delegate management of the buffer to another object, then it would 
be very helpful if the exporter could tell consumers that the other 
object is managing the buffer.

Suppose A is an exporting object, but it uses a hidden object R to 
manage the buffer memory.  Thus you have A referring to R, like this:

A -> R

Now object B takes a view of A.  If we don't have this field, then B 
will have to hold a reference to A, like this:

B -> A -> R

A would be responsible for keeping track of views, and A could not be 
garbage collected until B disappears.  If we do have this field, then A 
could tell be B to hold a reference to R instead:

B -> R
A -> R

A is no longer obliged to keep track of views, and it can be garbage 
collected even if B still exists.


Here's a concrete example of where it would be useful: consider a 
ByteBufferSlice object.  Basically, the object represents a 
shared-memory slice of a 1-D array of bytes (for example, Python 3000 
bytes object, or an mmap object).

Now, if the ByteBufferSlice object could tell the consumer that someone 
else is managing the buffer, then it wouldn't have to keep track of 
views, thus simplifying things.

P.S. In thinking about this, it occurred to me that there should be a 
way to lock the buffer without requesting details.  ByteBufferSlice 
would already know the details of the buffer, but it would need to 
increment the original buffer's lock count.  Thus, I propose new fuction:

typedef int (*lockbufferproc)(PyObject* self);


Carl Banks
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] An updated extended buffer PEP

2007-03-28 Thread Carl Banks


Carl Banks wrote:
> Here's a concrete example of where it would be useful: consider a 
> ByteBufferSlice object.  Basically, the object represents a 
> shared-memory slice of a 1-D array of bytes (for example, Python 3000 
> bytes object, or an mmap object).
> 
> Now, if the ByteBufferSlice object could tell the consumer that someone 
> else is managing the buffer, then it wouldn't have to keep track of 
> views, thus simplifying things.
> 
> P.S. In thinking about this, it occurred to me that there should be a 
> way to lock the buffer without requesting details.  ByteBufferSlice 
> would already know the details of the buffer, but it would need to 
> increment the original buffer's lock count.  Thus, I propose new fuction:
> 
> typedef int (*lockbufferproc)(PyObject* self);


And, because real examples are better than philosophical speculations, 
here's a skeleton implementation of the ByteBufferSlice array, sans 
boilerplate and error checking, and with some educated guessing about 
future details:


typedef struct  {
   PyObject_HEAD
   PyObject* releaser;
   unsigned char* buf;
   Py_ssize_t length;
}
ByteBufferSliceObject;


PyObject* ByteBufferSlice_new(PyObject* bufobj, Py_ssize_t start, 
Py_ssize_t end) {
   ByteBufferSliceObject* self;
   BufferInfoObject* bufinfo;

   self = (ByteBufferSliceObject*)type->tp_alloc(type, 0);
   bufinfo = PyObject_GetBuffer(bufobj);

   self->releaser = bufinfo->releaser;
   self->buf = bufinfo->buf + start;
   self->length = end-start;

   /* look how soon we're done with this information */
   Py_DECREF(bufinfo);

   return self;
}


PyObject* ByteBufferSlice_dealloc(PyObject* self) {
   PyObject_ReleaseBuffer(self->releaser);
   self->ob_type->tp_free((PyObject*)self);
}


PyObject* ByteBufferSlice_getbuffer(PyObject* self, int flags) {
   BufferInfoObject* bufinfo;
   static Py_ssize_t stridesarray[] = { 1 };

   bufinfo = BufferInfo_New();
   bufinfo->releaser = self->releaser;
   bufinfo->writable = 1;
   bufinfo->buf = self->buf;
   bufinfo->length = self->length;
   bufinfo->ndims = 1;
   bufinfo->strides = stridesarray;
   bufinfo->size = &self->length;
   bufinfo->subbufoffsets = NULL;

   /* Before we go, increase the original buffer's lock count */
   PyObject_LockBuffer(self->releaser);

   return bufinfo;
}


/* don't define releasebuffer or lockbuffer */
/* only objects that manage buffers themselves would define these */


/* Now look how easy this is */
/* Everything works out if ByteBufferSlice reexports the buffer */

PyObject* ByteBufferSlice_getslice(PyObject* self, Py_ssize_t start, 
Py_ssize_t end) {
   return ByteBufferSlice_new(self,start,end);
}


The implementation of this is very straightforward, and it's easy to see 
why and how "bufinfo->release" works, and why it'd be useful.

It's almost like there's two protocols here: a buffer exporter protocol 
(getbuffer) and a buffer manager protocol (lockbuffer and 
releasebuffer).  Some objects would support only exporter protocol; 
others both.


Carl Banks
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Extended buffer PEP

2007-04-07 Thread Carl Banks
Only one concern:

 > typedef int (*getbufferproc)(PyObject *obj, struct bufferinfo *view)


I'd like to see it accept a flags argument over what kind of buffer it's 
allowed to return.  I'd rather not burden the user to check all the 
entries in bufferinfo to make sure it doesn't get something unexpected.

I imagine most uses of buffer protocol would be for direct, 
one-dimensional arrays of bytes with no striding.  It's not clear 
whether read-only or read-write should be the least common denominator, 
so require at least one of these flags:

Py_BUF_READONLY
Py_PUF_READWRITE

Then allow any of these flags to allow more complex access:

Py_BUF_MULTIDIM - allows strided and multidimensional arrays
Py_BUF_INDIRECT - allows indirect buffers (implies Py_BUF_MULTIDIM)

An object is allowed to return a simpler array than requested, but not 
more complex.  If you allow indirect buffers, you might still get a 
one-dimensional array of bytes.


Other than that, I would add a note about the other things considered 
and rejected (the old prototype for getbufferproc, the delegated buffer 
object).  List whether to backport the buffer protocol to 2.6 as an open 
question.

Then submit it as a real PEP.  I believe this idea has run its course as 
PEP XXX and needs a real number.  (I was intending to start making 
patches for the Py3K library modules as soon as that happened.)

Carl Banks


Travis Oliphant wrote:
> 
> Here is my "final" draft of the extended buffer interface PEP.
> For those who have been following the discussion, I eliminated the 
> releaser object and the lock-buffer function.   I decided that there is 
> enough to explain with the new striding and sub-offsets without the 
> added confusion of releasing buffers, especially when it is not clear 
> what is to be gained by such complexity except a few saved lines of code.
> 
> The striding and sub-offsets, however, allow extension module writers to 
> write code (say video and image processing code or scientific computing 
> code or data-base processing code) that works on any object exposing the 
> buffer interface.  I think this will be of great benefit and so is worth 
> the complexity.
> 
> This will take some work to get implemented for Python 3k.  I could use 
> some help with this in order to speed up the process.  I'm working right 
> now on the extensions to the struct module until the rest is approved.
> 
> Thank you for any and all comments:
> 
> -Travis
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3118: Extended buffer protocol (new version)

2007-04-09 Thread Carl Banks


Travis Oliphant wrote:
 > Py_BUF_READONLY
 >The returned buffer must be readonly and the underlying object 
should make
 >its memory readonly if that is possible.

I don't like the "if possible" thing.  If it makes no guarantees, it 
pretty much useless over Py_BUF_SIMPLE.


> Py_BUF_FORMAT
>The consumer will be using the format string information so make sure that 
>member is filled correctly. 

Is the idea to throw an exception if there's some other data format 
besides "b", and this flag isn't set?  It seems superfluous otherwise.


> Py_BUF_SHAPE
>The consumer can (and might) make use of using the ndims and shape members 
> of the structure
>so make sure they are filled in correctly. 
>
> Py_BUF_STRIDES (implies SHAPE)
>The consumer can (and might) make use of the strides member of the 
> structure (as well
>as ndims and shape)

Is there any reasonable benefit for allowing Py_BUF_SHAPE without 
Py_BUF_STRIDES?  Would the array be C- or Fortran-like?


Another little mistake I made: looking at the Python source, it seems 
that most C defines do not use the Py_ prefix, so probably we shouldn't 
here.  Sorry.


Carl
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3118: Extended buffer protocol (new version)

2007-04-11 Thread Carl Banks
Travis Oliphant wrote:
> Carl Banks wrote:
>>
>>
>> Travis Oliphant wrote:
>> > Py_BUF_READONLY
>> >The returned buffer must be readonly and the underlying object 
>> should make
>> >its memory readonly if that is possible.
>>
>> I don't like the "if possible" thing.  If it makes no guarantees, it 
>> pretty much useless over Py_BUF_SIMPLE.
> O.K.  Let's make it raise an error if it can't set it read-only.

The thing that bothers me about this whole flags setup is that different 
flags can do opposite things.

Some of the flags RESTRICT the kind of buffers that can be
exported (Py_BUF_WRITABLE); other flags EXPAND the kind of buffers that
can be exported (Py_BUF_INDIRECT).  That is highly confusing and I'm -1
on any proposal that includes both behaviors.  (Mutually exclusive sets
of flags are a minor exception: they can be thought of as either
RESTICTING or EXPANDING, so they could be mixed with either.)

I originally suggested a small set of flags that expand the set of 
allowed buffers.  Here's a little Venn diagram of buffers to illustrate 
what I was thinking:

http://www.aerojockey.com/temp/venn.png

With no flags, the only buffers allowed to be returned are in the "All"
circle but no others.  Add Py_BUF_WRITABLE and now you can export
writable buffers as well.  Add Py_BUF_STRIDED and the strided circle is
opened to you, and so on.

My recommendation is, any flag should turn on some circle in the Venn
diagram (it could be a circle I didn't draw--shaped arrays, for
example--but it should be *some* circle).


>>> Py_BUF_FORMAT
>>>The consumer will be using the format string information so make 
>>> sure thatmember is filled correctly. 
>>
>> Is the idea to throw an exception if there's some other data format 
>> besides "b", and this flag isn't set?  It seems superfluous otherwise.
> 
> The idea is that a consumer may not care about the format and the 
> exporter may want to know that to simplify the interface.In other 
> words the flag is a way for the consumer to communicate that it wants 
> format information (or not).

I'm -1 on using the flags for this.  It's completely out of character
compared to the rest of the flags.  All other flags are there for the
benefit of the consumer; this flag is useless to the consumer.

More concretely, all the rest of the flags are there to tell the 
exporter what kind of buffer they're prepared to accept.  This flag, 
alone, does not do that.

Even the benefits to the exporter are dubious.  This flag can't reduce 
code complexity, since all buffer objects have to be prepared to furnish 
type information.  At best, this flag is a rare optimization.  In fact, 
most buffers are going to point format to a constant string, regardless 
of whether this flag was passed or not:

bufinfo->format = "b";


> If the exporter wants to raise an exception if the format is not
> requested is up to the exporter.

That seems like a bad idea.  Suppose I have a contiguous numpy array of
floats and I want to view it as a sequence of bytes.  If the exporter's
allowed to raise an exception for this, any consumer that wanted a
data-neutral view of the data would still have to pass Py_BUF_FORMAT to
guard against this.  Wouldn't that be ironic?


>>> Py_BUF_SHAPE
>>>The consumer can (and might) make use of using the ndims and shape 
>>> members of the structure
>>>so make sure they are filled in correctly.Py_BUF_STRIDES 
>>> (implies SHAPE)
>>>The consumer can (and might) make use of the strides member of the 
>>> structure (as well
>>>as ndims and shape)
>>
>> Is there any reasonable benefit for allowing Py_BUF_SHAPE without 
>> Py_BUF_STRIDES?  Would the array be C- or Fortran-like?
> 
> Yes,  I could see a consumer not being able to handle simple striding 
> but could handle shape information.  Many users of NumPy arrays like to 
> think of the array as an N-d array but want to ignore striding.

Ok, but is the indexing row-major or column-major?  That has to be decided.


Carl Banks
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] PEP 3118: Extended buffer protocol (new version)

2007-04-16 Thread Carl Banks
Travis Oliphant wrote:
> Carl Banks wrote:
>> My recommendation is, any flag should turn on some circle in the Venn
>> diagram (it could be a circle I didn't draw--shaped arrays, for
>> example--but it should be *some* circle).
> I don't think your Venn diagram is broad enough and it un-necessarily 
> limits the use of flags to communicate between consumer and exporter.   
> We don't have to ram these flags down that point-of-view for them to be 
> productive.If you have a specific alternative proposal, or specific 
> criticisms, then I'm very willing to hear them.


Ok, I've thought quite a bit about this, and I have an idea that I think 
will be ok with you, and I'll be able to drop my main objection.  It's 
not a big change, either.  The key is to explicitly say whether the flag 
allows or requires.  But I made a few other changes as well.

First of all, let me define how I'm using the word "contiguous": it's a 
single buffer with no gaps.  So, if you were to do this: 
"memset(bufinfo->buf,0,bufinfo->len)", you would not touch any data that 
isn't being exported.

Without further ado, here is my proposal:


--

With no flags, the PyObject_GetBuffer will raise an exception if the 
buffer is not direct, contiguous, and one-dimensional.  Here are the 
flags and how they affect that:

Py_BUF_REQUIRE_WRITABLE - Raise exception if the buffer isn't writable.

Py_BUF_REQUIRE_READONLY - Raise excpetion if the buffer is writable.

Py_BUF_ALLOW_NONCONTIGUOUS - Allow noncontiguous buffers.  (This turns 
on "shape" and "strides".)

Py_BUF_ALLOW_MULTIDIMENSIONAL - Allow multidimensional buffers.  (Also 
turns on "shape" and "strides".)

(Neither of the above two flags implies the other.)

Py_BUF_ALLOW_INDIRECT - Allow indirect buffers.  Implies 
Py_BUF_ALLOW_NONCONTIGUOUS and Py_BUF_ALLOW_MULTIDIMENSIONAL. (Turns on 
"shape", "strides", and "suboffsets".)

Py_BUF_REQUIRE_CONTIGUOUS_C_ARRAY or Py_BUF_REQUIRE_ROW_MAJOR - Raise an 
exception if the array isn't a contiguous array with in C (row-major) 
format.

Py_BUF_REQUIRE_CONTIGUOUS_FORTRAN_ARRAY or Py_BUF_REQUIRE_COLUMN_MAJOR - 
Raise an exception if the array isn't a contiguous array with in Fortran 
(column-major) format.

Py_BUF_ALLOW_NONCONTIGUOUS, Py_BUF_REQUIRE_CONTIGUOUS_C_ARRAY, and 
Py_BUF_REQUIRE_CONTIGUOUS_FORTRAN_ARRAY all conflict with each other, 
and an exception should be raised if more than one are set.

(I would go with ROW_MAJOR and COLUMN_MAJOR: even though the terms only 
make sense for 2D arrays, I believe the terms are commonly generalized 
to other dimensions.)

Possible pseudo-flags:

Py_BUF_SIMPLE = 0;
Py_BUF_ALLOW_STRIDED = Py_BUF_ALLOW_NONCONTIGUOUS
| Py_BUF_ALLOW_MULTIDIMENSIONAL;

--

Now, for each flag, there should be an associated function to test the 
condition, given a bufferinfo struct.  (Though I suppose they don't 
necessarily have to map one-to-one, I'll do that here.)

int PyBufferInfo_IsReadonly(struct bufferinfo*);
int PyBufferInfo_IsWritable(struct bufferinfo*);
int PyBufferInfo_IsContiguous(struct bufferinfo*);
int PyBufferInfo_IsMultidimensional(struct bufferinfo*);
int PyBufferInfo_IsIndirect(struct bufferinfo*);
int PyBufferInfo_IsRowMajor(struct bufferinfo*);
int PyBufferInfo_IsColumnMajor(struct bufferinfo*);

The function PyObject_GetBuffer then has a pretty obvious 
implementation.  Here is an except:

 if ((flags & Py_BUF_REQUIRE_READONLY) &&
 !PyBufferInfo_IsReadonly(&bufinfo)) {
 PyExc_SetString(PyErr_BufferError,"buffer not read-only");
 return 0;
 }

Pretty straightforward, no?

Now, here is a key point: for these functions to work (indeed, for 
PyObject_GetBuffer to work at all), you need enough information in 
bufinfo to figure it out.  The bufferinfo struct should be 
self-contained; you should not need to know what flags were passed to 
PyObject_GetBuffer in order to know exactly what data you're looking at.

Therefore, format must always be supplied by getbuffer.  You cannot tell 
if an array is contiguous without the format string.  (But see below.)

And even if the consumer isn't asking for a contiguous buffer, it has to 
know the item size so it knows what data not to step on.

(This is true even in your own proposal, BTW.  If a consumer asks for a 
non-strided array in your proposal, PyObject_GetBuffer would have to 
know the item size to determine if the array is contiguous.)


--

FAQ:

Q. Why ALLOW_NONCONTIGUOUS and ALLOW_MULTIDIMENSIONAL instead of 
ALLOW_STRIDED and ALLOW_SHAPED?

A. It's more useful to the consumer that way.  With ALLOW_STRIDED and 
ALLOW_SHAPED, there's no way for a consumer to request a general 
one-dimensional array (it can only request a non-strided o

[Python-Dev] PEP 3118: Extended buffer protocol (new version)

2007-04-18 Thread Carl Banks
Travis Oliphant wrote:
> Carl Banks wrote:
>> Ok, I've thought quite a bit about this, and I have an idea that I 
>> think will be ok with you, and I'll be able to drop my main 
>> objection.  It's not a big change, either.  The key is to explicitly 
>> say whether the flag allows or requires.  But I made a few other 
>> changes as well.
> I'm good with using an identifier to differentiate between an "allowed" 
> flag and a "require" flag.   I'm not a big fan of 
> VERY_LONG_IDENTIFIER_NAMES though.  Just enough to understand what it 
> means but not so much that it takes forever to type and uses up 
> horizontal real-estate.

That's fine with me.  I'm not very particular about spellings, as long
as they're not misleading.

>> Now, here is a key point: for these functions to work (indeed, for 
>> PyObject_GetBuffer to work at all), you need enough information in 
>> bufinfo to figure it out.  The bufferinfo struct should be 
>> self-contained; you should not need to know what flags were passed to 
>> PyObject_GetBuffer in order to know exactly what data you're looking at.
> Naturally.
> 
>> Therefore, format must always be supplied by getbuffer.  You cannot 
>> tell if an array is contiguous without the format string.  (But see 
>> below.)
> 
> No, I don't think this is quite true.   You don't need to know what 
> "kind" of data you are looking at if you don't get strides.  If you use 
> the SIMPLE interface, then both consumer and exporter know the object is 
> looking at "bytes" which always has an itemsize of 1.

But doesn't this violate the above maxim?  Suppose these are the
contents of bufinfo:

ndim = 1
len = 20
shape = (10,)
strides = (2,)
format = NULL

How does it know whether it's looking at contiguous array of 10 two-byte
objects, or a discontiguous array of 10 one-byte objects, without having
at least an item size?  Since item size is now in the mix, it's moot, of
course.

The idea that Py_BUF_SIMPLE implies bytes is news to me.  What if you
want a contiguous, one-dimensional array of an arbitrary type?  I was
thinking this would be acceptable with Py_BUF_SIMPLE.  It seems you want
to require Py_BUF_FORMAT for that, which would suggest to me that
Py_BUF_ALLOW_ND amd Py_BUF_ALLOW_NONCONTIGUOUS, etc., would imply
Py_BUF_FORMAT?  IOW, pretty much anything that's not SIMPLE implies FORMAT?

If that's the case, then most of the issues I brought up about item size
don't apply.  Also, if that's the case, you're right that Py_BUF_FORMAT
makes more sense than Py_BUF_DONT_NEED_FORAMT.

But it now it seems even more unnecessary than it did before.  Wouldn't
any consumer that just wants to look at a chunk of bytes always use
Py_BUF_FORMAT, especially if there's danger of a presumptuous exporter
raising an exception?


> I'll update the PEP with my adaptation of your suggestions in a little 
> while.

Ok.  Thanks for taking the lead, and for putting up with my verbiose
nitpicking. :)


Carl Banks

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com