Re: [Python-Dev] Extended Buffer Interface/Protocol
(cc'ing back to Python-dev; the original reply was intended for it by I had an email malfunction.) Travis Oliphant wrote: >Carl Banks wrote: >> 3. Allow getbuffer to return an array of "derefence offsets", one for >> each dimension. For a given dimension i, if derefoff[i] is >> nonnegative, it's assumed that the current position (base pointer + >> indexing so far) is a pointer to a subarray, and derefoff[i] is the >> offest in that array where the current position goes for the next >> dimension. If derefoff[i] is negative, there is no dereferencing. >> Here is an example of how it'd work: > > > This sounds interesting, but I'm not sure I totally see it. I probably > need a picture to figure out what you are proposing. I'll get on it sometime. For now I hope an example will do. > The derefoff > sounds like some-kind of offset. Is that enough? Why not just make > derefoff[i] == 0 instead of negative? I may have misunderstood something. I had thought the values exported by getbuffer could change as the view narrowed, but I'm not sure if it's the case now. I'll assume it isn't for now, because it simplifies things and demonstrates the concept better. Let's start from the beginning. First, change the prototype to this: typedef PyObject *(*getbufferproc)(PyObject *obj, void **buf, Py_ssize_t *len, int *writeable, char **format, int *ndims, Py_ssize_t **shape, Py_ssize_t **strides, int **isptr) "isptr" is a flag indicating whether, for a certain dimension, the positision we've strided to so far is a pointer that should be followed before proceeding with the rest of the strides. Now here's what a general "get_item_pointer" function would look like, given a set of indices: void* get_item_pointer(int ndim, void* buf, Py_ssize_t* strides, Py_ssize_t* derefoff, Py_ssize_t *indices) { char* pointer = (char*)buf; int i; for (i = 0; i < ndim; i++) { pointer += strides[i]*indices[i]; if (isptr[i]) { pointer = *(char**)pointer; } } return (void*)pointer; } > I don't fully understand the PIL example you gave. Yeah. How about more details. Here is a hypothetical image data object structure: struct rgba { unsigned char r, g, b, a; }; struct ImageObject { PyObject_HEAD; ... struct rgba** lines; Py_ssize_t height; Py_ssize_t width; Py_ssize_t shape_array[2]; Py_ssize_t stride_array[2]; Py_ssize_t view_count; }; "lines" points to malloced 1-D array of (struct rgba*). Each pointer in THAT block points to a seperately malloced array of (struct rgba). Got that? In order to access, say, the red value of the pixel at x=30, y=50, you'd use "lines[50][30].r". So what does ImageObject's getbuffer do? Leaving error checking out: PyObject* getbuffer(PyObject *self, void **buf, Py_ssize_t *len, int *writeable, char **format, int *ndims, Py_ssize_t **shape, Py_ssize_t **strides, int **isptr) { static int _isptr[2] = { 1, 0 }; *buf = self->lines; *len = self->height*self->width; *writable = 1; *ndims = 2; self->shape_array[0] = height; self->shape_array[1] = width; *shape = &self->shape_array; self->stride_array[0] = sizeof(struct rgba*); /* yep */ self->stride_array[1] = sizeof(struct rgba); *strides = &self->stride_array; *isptr = _isptr; self->view_count ++; /* create and return view object here, but for what? */ } There are three essential differences from a regular, contiguous array. 1. buf is set to point at the array of pointers, not directly to the data. 2. The isptr thing. isptr[0] is true to indicate that the first dimension is an array of pointers, not the actual data. 3. stride[0] is sizeof(struct rgba*), not self->width*sizeof(struct rgba) like it would be for a contiguous array. This is because your first stride is through an array of pointers, not the data itself. So let's examine what "get_item_pointer" above will do given these values. Once again, we're looking for the pixel at x=30, y=50. First, we set pointer to buf, that is, self->lines. Then we take the first stride: we add index[0]+strides[0], that is, 50*4=200, to poitner. pointer now equals &self->lines[50]. Now, we check isptr[0]. We see that it is true. Thus, the position we've strided to is, in fact, a pointer to a subarray where the actual data i
Re: [Python-Dev] Extended Buffer Interface/Protocol
Travis Oliphant wrote: > Carl Banks wrote: >> We're done. Return pointer. > > Thank you for this detailed example. I will have to parse it in more > depth but I think I can see what you are suggesting. > >> First, I'm not sure why getbuffer needs to return a view object. > > The view object in your case would just be the ImageObject. ITSM that we are using the word "view" very differently. Consider this example: A = zeros((100,100)) B = A.transpose() In this scenario, A would be the exporter object, I think we both would call it that. When I use the word "view", I'm referring to B. However, you seem to be referring to the object returned by A.getbuffer, right? What term have you been using to refer to B? Obviously, it would help the discussion if we could get our terminology straight. (Frankly, I don't agree with your usage; it doesn't agree with other uses of the word "view". For example, consider the proposed Python 3000 dictionary views: D = dict() V = D.items() Here, V is the view, and it's analogous to B in the above example.) I'd suggest the object returned by A.getbuffer should be called the "buffer provider" or something like that. For the sake of discussion, I'm going to avoid the word "view" altogether. I'll call A the exporter, as before. B I'll refer to as the requestor. The object returned by A.getbuffer is the provider. > The reason > I was thinking the function should return "something" is to provide more > flexibility in what a view object actually is. > > I've also been going back and forth between explicitly passing all this > information around or placing it in an actual view-object. In some > sense, a view object is a NumPy array in my mind. But, with the > addition of isptr we are actually expanding the memory abstraction of > the view object beyond an explicit NumPy array. > > In the most common case, I envisioned the view object would just be the > object itself in which case it doesn't actually have to be returned. > While returning the view object would allow unspecified flexibilty in > the future, it really adds nothing to the current vision. > > We could add a view object separately as an abstract API on top of the > buffer interface. Having thought quite a bit about it, and having written several abortive replies, I now understand it and see the importance of it. getbuffer returns the object that you are to call releasebuffer on. It may or may not be the same object as exporter. Makes sense, is easy to explain. It's easy to see possible use cases for returning a different object. A hypothetical future incarnation of NumPy might shift the responsibility of managing buffers from NumPy array object to a hidden raw buffer object. In this scenario, the NumPy object is the exporter, but the raw buffer object the provider. Considering this use case, it's clear that getbuffer should return the shape and stride data independently of the provider. The raw buffer object wouldn't have that information; all it does is store a pointer and keep a reference count. Shape and stride is defined by the exporter. >> Second question: what happens if a view wants to re-export the buffer? >> Do the views of the buffer ever change? Example, say you create a >> transposed view of a Numpy array. Now you want a slice of the >> transposed array. What does the transposed view's getbuffer export? > > Basically, you could not alter the internal representation of the object > while views which depended on those values were around. > > In NumPy, a transposed array actually creates a new NumPy object that > refers to the same data but has its own shape and strides arrays. > > With the new buffer protocol, the NumPy array would not be able to alter > it's shape/strides/or reallocate its data areas while views were being > held by other objects. But requestors could alter their own copies of the data, no? Back to the transpose example: B itself obviously can't use the same "strides" array as A uses. It would have to create its own strides, right? So, what if someone takes a slice out of B? When calling B.getbuffer, does it get B's strides, or A's? I think it should get B's. After all, if you're taking a slice of B, don't you care about the slicing relative to B's axes? I can't really think of a use case for exporting A's stride data when you take a slice of B, and it doesn't seem to simplify memory management, because B has to make it's own copies anyways. > With the shape and strides information, the format information, and the > data buffer itself, there are
Re: [Python-Dev] Extended Buffer Interface/Protocol
Travis Oliphant wrote: > Carl Banks wrote: > >> Tr >> ITSM that we are using the word "view" very differently. Consider >> this example: >> >> A = zeros((100,100)) >> B = A.transpose() > > > You are thinking of NumPy's particular use case. I'm thinking of a > generic use case. So, yes I'm using the word view in two different > contexts. > > In this scenario, NumPy does not even use the buffer interface. It > knows how to transpose it's own objects and does so by creating a new > NumPy object (with it's own shape and strides space) with a data buffer > pointed to by "A". I realized that as soon as I tried a simple Python demonstration of it. So it's a poor example. But I hope it's obvious how it would generalize to a different type. >>> Having such a thing as a view object would actually be nice because >>> it could hold on to a particular view of data with a given set of >>> shape and strides (whose memory it owns itself) and then the >>> exporting object would be free to change it's shape/strides >>> information as long as the data did not change. >> >> >> What I don't undestand is why it's important for the provider to >> retain this data. The requestor only needs the information when it's >> created; it will calculate its own versions of the data, and will not >> need the originals again, so no need to the provider to keep them around. > > That is certainly a policy we could enforce (and pretty much what I've > been thinking). We just need to make it explicit that the shape and > strides provided is only guaranteed up until a GIL release (i.e. > arbitrary Python code could change these memory areas both their content > and location) and so if you need them later, make your own copies. > > If this were the policy, then NumPy could simply pass pointers to its > stored shape and strides arrays when the buffer interface is called but > then not worry about re-allocating these arrays before the "buffer" lock > is released. Another object could hold on to the memory area of the > NumPy array but would have to store shape and strides information if it > wanted to keep it. > NumPy could also just pass a pointer to the char * representation of the > format (which in NumPy would be stored in the dtype object) and would > not have to worry about the dtype being re-assigned later. Bingo! This is my preference. >>>> The reason I ask is: if things like "buf" and "strides" and "shape" >>>> could change when a buffer is re-exported, then it can complicate >>>> things for PIL-like buffers. (How would you account for offsets in >>>> a dimension that's in a subarray?) >>> >>> >>> I'm not sure what you mean, offsets are handled by changing the >>> starting location of the pointer to the buffer. >> >> >> >> But to anwser your question: you can't just change the starting >> location because there's hundreds of buffers. You'd either have to >> change the starting location of each one of them, which is highly >> undesirable, or to factor in an offset somehow. (This was, in fact, >> the point of the "derefoff" term in my original suggestion.) > > > I get better what your derefoff was doing now. I was missing the > de-referencing that was going on. Couldn't you still just store a > pointer to the start of the array. In other words, isn't your **isptr > suggestion sufficient? It seems to be. No. The problem arises when slicing. In a single buffer, you would adjust the base pointer to point at the element [0,0] of the slice. But you can't do that with multiple buffers. Instead, you have to add an offset after deferencing the pointer to the subarray. Hence my derefoff proposal. It dereferenced the pointer, then added an offset to get you to the 0 position in that dimension. >> Anyways, despite the miscommunications so far, I now have a very good >> idea of what's going on. We definitely need to get terms straight. I >> agree that getbuffer should return an object. I think we need to >> think harder about the case when requestors re-export the buffer. >> (Maybe it's time to whip up some experimental objects?) > > I'm still not clear what you are concerned about. If an object > consumes the buffer interface and then wants to be able to later export > it to another, then from our discussion about the shape/strides and > format information, it would have to mainta
Re: [Python-Dev] An updated extended buffer PEP
Travis Oliphant wrote: > Travis Oliphant wrote: >> Hi Carl and Greg, >> >> Here is my updated PEP which incorporates several parts of the >> discussions we have been having. > > And here is the actual link: > > http://projects.scipy.org/scipy/numpy/browser/trunk/numpy/doc/pep_buffer.txt What's the purpose of void** segments in PyObject_GetBuffer? It seems like it's leftover from an older incarnation? I'd hope after more recent discussion, we'll end up simplifying releasebuffer. It seems like it'd be a nightmare to keep track of what you've released. Finally, the isptr thing. It's just not sufficient. Frankly, I'm having doubts whether it's a good idea to support multibuffer at all. Sure, it brings generality, but I'm thinking its too hard to explain and too hard to get one's head around, and will lead to lots of misunderstanding and bugginess. OTOH, it really doen't burden anyone except those who want to export multi-buffered arrays, and we only have one shot to do it. I just hope it doesn't confuse everyone so much that no one bothers. Here's how I would update the isptr thing. I've changed "derefoff" to "subbufferoffsets" to describe it better. typedef PyObject *(*getbufferproc)(PyObject *obj, void **buf, Py_ssize_t *len, int *writeable, char **format, int *ndims, Py_ssize_t **shape, Py_ssize_t **strides, Py_ssize_t **subbufferoffsets); subbufferoffsets Used to export information about multibuffer arrays. It is an address of a ``Py_ssize_t *`` variable that will be set to point at an array of ``Py_ssize_t`` of length ``*ndims``. [I don't even want to try a verbal description.] To demonstrate how subbufferoffsets works, here is am example of a function that returns a pointer to an element of ANY N-dimensional array, single- or multi-buffered. void* get_item_pointer(int ndim, void* buf, Py_ssize_t* strides, Py_ssize_t* subarrayoffs, Py_ssize_t *indices) { char* pointer = (char*)buf; int i; for (i = 0; i < ndim; i++) { pointer += strides[i]*indices[i]; if (subarraysoffs[i] >= 0) { pointer = *(char**)pointer + subarraysoffs[i]; } } return (void*)pointer; } For single buffers, subbufferoffsets is negative in every dimension and it reduces to normal single-buffer indexing. For multi-buffers, subbufferoffsets indicates when to dereference the pointer and switch to the new buffer, and gives the offset into the buffer to start at. In most cases, the subbufferoffset would be zero (indicating it should start at the beginning of the new buffer), but can be a positive number if the following dimension has been sliced, and thus the 0th entry in that dimension would not be at the beginning of the new buffer. Other than that, looks good. :) Carl Banks ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] An updated extended buffer PEP
Travis Oliphant wrote: > Carl Banks wrote: >> Travis E. Oliphant wrote: >>> I think we are getting closer. What do you think about Greg's idea >>> of basically making the provider the bufferinfo structure and having >>> the exporter handle copying memory over for shape and strides if it >>> wants to be able to change those before the lock is released. >> It seems like it's just a different way to return the data. You could >> do it by setting values through pointers, or do it by returning a >> structure. Which way you choose is a minor detail in my opinion. I'd >> probably favor returning the information in a structure. >> >> I would consider adding two fields to the structure: >> >> size_t structsize; /* size of the structure */ > Why is this necessary? can't you get that by sizeof(bufferinfo)? In case you want to add something later. Though if you did that, it would be a different major release, meaning you'd have to rebuild anyway. They rashly add fields to the PyTypeObject in the same way. :) So never mind. >> PyObject* releaser; /* the object you need to call releasebuffer on */ > Is this so that another object could be used to manage releases if desired? Yes, that was a use case I saw for a different "view" object. I don't think it's crucially important to have it, but for exporting objects that delegate management of the buffer to another object, then it would be very helpful if the exporter could tell consumers that the other object is managing the buffer. Suppose A is an exporting object, but it uses a hidden object R to manage the buffer memory. Thus you have A referring to R, like this: A -> R Now object B takes a view of A. If we don't have this field, then B will have to hold a reference to A, like this: B -> A -> R A would be responsible for keeping track of views, and A could not be garbage collected until B disappears. If we do have this field, then A could tell be B to hold a reference to R instead: B -> R A -> R A is no longer obliged to keep track of views, and it can be garbage collected even if B still exists. Here's a concrete example of where it would be useful: consider a ByteBufferSlice object. Basically, the object represents a shared-memory slice of a 1-D array of bytes (for example, Python 3000 bytes object, or an mmap object). Now, if the ByteBufferSlice object could tell the consumer that someone else is managing the buffer, then it wouldn't have to keep track of views, thus simplifying things. P.S. In thinking about this, it occurred to me that there should be a way to lock the buffer without requesting details. ByteBufferSlice would already know the details of the buffer, but it would need to increment the original buffer's lock count. Thus, I propose new fuction: typedef int (*lockbufferproc)(PyObject* self); Carl Banks ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] An updated extended buffer PEP
Carl Banks wrote: > Here's a concrete example of where it would be useful: consider a > ByteBufferSlice object. Basically, the object represents a > shared-memory slice of a 1-D array of bytes (for example, Python 3000 > bytes object, or an mmap object). > > Now, if the ByteBufferSlice object could tell the consumer that someone > else is managing the buffer, then it wouldn't have to keep track of > views, thus simplifying things. > > P.S. In thinking about this, it occurred to me that there should be a > way to lock the buffer without requesting details. ByteBufferSlice > would already know the details of the buffer, but it would need to > increment the original buffer's lock count. Thus, I propose new fuction: > > typedef int (*lockbufferproc)(PyObject* self); And, because real examples are better than philosophical speculations, here's a skeleton implementation of the ByteBufferSlice array, sans boilerplate and error checking, and with some educated guessing about future details: typedef struct { PyObject_HEAD PyObject* releaser; unsigned char* buf; Py_ssize_t length; } ByteBufferSliceObject; PyObject* ByteBufferSlice_new(PyObject* bufobj, Py_ssize_t start, Py_ssize_t end) { ByteBufferSliceObject* self; BufferInfoObject* bufinfo; self = (ByteBufferSliceObject*)type->tp_alloc(type, 0); bufinfo = PyObject_GetBuffer(bufobj); self->releaser = bufinfo->releaser; self->buf = bufinfo->buf + start; self->length = end-start; /* look how soon we're done with this information */ Py_DECREF(bufinfo); return self; } PyObject* ByteBufferSlice_dealloc(PyObject* self) { PyObject_ReleaseBuffer(self->releaser); self->ob_type->tp_free((PyObject*)self); } PyObject* ByteBufferSlice_getbuffer(PyObject* self, int flags) { BufferInfoObject* bufinfo; static Py_ssize_t stridesarray[] = { 1 }; bufinfo = BufferInfo_New(); bufinfo->releaser = self->releaser; bufinfo->writable = 1; bufinfo->buf = self->buf; bufinfo->length = self->length; bufinfo->ndims = 1; bufinfo->strides = stridesarray; bufinfo->size = &self->length; bufinfo->subbufoffsets = NULL; /* Before we go, increase the original buffer's lock count */ PyObject_LockBuffer(self->releaser); return bufinfo; } /* don't define releasebuffer or lockbuffer */ /* only objects that manage buffers themselves would define these */ /* Now look how easy this is */ /* Everything works out if ByteBufferSlice reexports the buffer */ PyObject* ByteBufferSlice_getslice(PyObject* self, Py_ssize_t start, Py_ssize_t end) { return ByteBufferSlice_new(self,start,end); } The implementation of this is very straightforward, and it's easy to see why and how "bufinfo->release" works, and why it'd be useful. It's almost like there's two protocols here: a buffer exporter protocol (getbuffer) and a buffer manager protocol (lockbuffer and releasebuffer). Some objects would support only exporter protocol; others both. Carl Banks ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Extended buffer PEP
Only one concern: > typedef int (*getbufferproc)(PyObject *obj, struct bufferinfo *view) I'd like to see it accept a flags argument over what kind of buffer it's allowed to return. I'd rather not burden the user to check all the entries in bufferinfo to make sure it doesn't get something unexpected. I imagine most uses of buffer protocol would be for direct, one-dimensional arrays of bytes with no striding. It's not clear whether read-only or read-write should be the least common denominator, so require at least one of these flags: Py_BUF_READONLY Py_PUF_READWRITE Then allow any of these flags to allow more complex access: Py_BUF_MULTIDIM - allows strided and multidimensional arrays Py_BUF_INDIRECT - allows indirect buffers (implies Py_BUF_MULTIDIM) An object is allowed to return a simpler array than requested, but not more complex. If you allow indirect buffers, you might still get a one-dimensional array of bytes. Other than that, I would add a note about the other things considered and rejected (the old prototype for getbufferproc, the delegated buffer object). List whether to backport the buffer protocol to 2.6 as an open question. Then submit it as a real PEP. I believe this idea has run its course as PEP XXX and needs a real number. (I was intending to start making patches for the Py3K library modules as soon as that happened.) Carl Banks Travis Oliphant wrote: > > Here is my "final" draft of the extended buffer interface PEP. > For those who have been following the discussion, I eliminated the > releaser object and the lock-buffer function. I decided that there is > enough to explain with the new striding and sub-offsets without the > added confusion of releasing buffers, especially when it is not clear > what is to be gained by such complexity except a few saved lines of code. > > The striding and sub-offsets, however, allow extension module writers to > write code (say video and image processing code or scientific computing > code or data-base processing code) that works on any object exposing the > buffer interface. I think this will be of great benefit and so is worth > the complexity. > > This will take some work to get implemented for Python 3k. I could use > some help with this in order to speed up the process. I'm working right > now on the extensions to the struct module until the rest is approved. > > Thank you for any and all comments: > > -Travis ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 3118: Extended buffer protocol (new version)
Travis Oliphant wrote: > Py_BUF_READONLY >The returned buffer must be readonly and the underlying object should make >its memory readonly if that is possible. I don't like the "if possible" thing. If it makes no guarantees, it pretty much useless over Py_BUF_SIMPLE. > Py_BUF_FORMAT >The consumer will be using the format string information so make sure that >member is filled correctly. Is the idea to throw an exception if there's some other data format besides "b", and this flag isn't set? It seems superfluous otherwise. > Py_BUF_SHAPE >The consumer can (and might) make use of using the ndims and shape members > of the structure >so make sure they are filled in correctly. > > Py_BUF_STRIDES (implies SHAPE) >The consumer can (and might) make use of the strides member of the > structure (as well >as ndims and shape) Is there any reasonable benefit for allowing Py_BUF_SHAPE without Py_BUF_STRIDES? Would the array be C- or Fortran-like? Another little mistake I made: looking at the Python source, it seems that most C defines do not use the Py_ prefix, so probably we shouldn't here. Sorry. Carl ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 3118: Extended buffer protocol (new version)
Travis Oliphant wrote: > Carl Banks wrote: >> >> >> Travis Oliphant wrote: >> > Py_BUF_READONLY >> >The returned buffer must be readonly and the underlying object >> should make >> >its memory readonly if that is possible. >> >> I don't like the "if possible" thing. If it makes no guarantees, it >> pretty much useless over Py_BUF_SIMPLE. > O.K. Let's make it raise an error if it can't set it read-only. The thing that bothers me about this whole flags setup is that different flags can do opposite things. Some of the flags RESTRICT the kind of buffers that can be exported (Py_BUF_WRITABLE); other flags EXPAND the kind of buffers that can be exported (Py_BUF_INDIRECT). That is highly confusing and I'm -1 on any proposal that includes both behaviors. (Mutually exclusive sets of flags are a minor exception: they can be thought of as either RESTICTING or EXPANDING, so they could be mixed with either.) I originally suggested a small set of flags that expand the set of allowed buffers. Here's a little Venn diagram of buffers to illustrate what I was thinking: http://www.aerojockey.com/temp/venn.png With no flags, the only buffers allowed to be returned are in the "All" circle but no others. Add Py_BUF_WRITABLE and now you can export writable buffers as well. Add Py_BUF_STRIDED and the strided circle is opened to you, and so on. My recommendation is, any flag should turn on some circle in the Venn diagram (it could be a circle I didn't draw--shaped arrays, for example--but it should be *some* circle). >>> Py_BUF_FORMAT >>>The consumer will be using the format string information so make >>> sure thatmember is filled correctly. >> >> Is the idea to throw an exception if there's some other data format >> besides "b", and this flag isn't set? It seems superfluous otherwise. > > The idea is that a consumer may not care about the format and the > exporter may want to know that to simplify the interface.In other > words the flag is a way for the consumer to communicate that it wants > format information (or not). I'm -1 on using the flags for this. It's completely out of character compared to the rest of the flags. All other flags are there for the benefit of the consumer; this flag is useless to the consumer. More concretely, all the rest of the flags are there to tell the exporter what kind of buffer they're prepared to accept. This flag, alone, does not do that. Even the benefits to the exporter are dubious. This flag can't reduce code complexity, since all buffer objects have to be prepared to furnish type information. At best, this flag is a rare optimization. In fact, most buffers are going to point format to a constant string, regardless of whether this flag was passed or not: bufinfo->format = "b"; > If the exporter wants to raise an exception if the format is not > requested is up to the exporter. That seems like a bad idea. Suppose I have a contiguous numpy array of floats and I want to view it as a sequence of bytes. If the exporter's allowed to raise an exception for this, any consumer that wanted a data-neutral view of the data would still have to pass Py_BUF_FORMAT to guard against this. Wouldn't that be ironic? >>> Py_BUF_SHAPE >>>The consumer can (and might) make use of using the ndims and shape >>> members of the structure >>>so make sure they are filled in correctly.Py_BUF_STRIDES >>> (implies SHAPE) >>>The consumer can (and might) make use of the strides member of the >>> structure (as well >>>as ndims and shape) >> >> Is there any reasonable benefit for allowing Py_BUF_SHAPE without >> Py_BUF_STRIDES? Would the array be C- or Fortran-like? > > Yes, I could see a consumer not being able to handle simple striding > but could handle shape information. Many users of NumPy arrays like to > think of the array as an N-d array but want to ignore striding. Ok, but is the indexing row-major or column-major? That has to be decided. Carl Banks ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] PEP 3118: Extended buffer protocol (new version)
Travis Oliphant wrote: > Carl Banks wrote: >> My recommendation is, any flag should turn on some circle in the Venn >> diagram (it could be a circle I didn't draw--shaped arrays, for >> example--but it should be *some* circle). > I don't think your Venn diagram is broad enough and it un-necessarily > limits the use of flags to communicate between consumer and exporter. > We don't have to ram these flags down that point-of-view for them to be > productive.If you have a specific alternative proposal, or specific > criticisms, then I'm very willing to hear them. Ok, I've thought quite a bit about this, and I have an idea that I think will be ok with you, and I'll be able to drop my main objection. It's not a big change, either. The key is to explicitly say whether the flag allows or requires. But I made a few other changes as well. First of all, let me define how I'm using the word "contiguous": it's a single buffer with no gaps. So, if you were to do this: "memset(bufinfo->buf,0,bufinfo->len)", you would not touch any data that isn't being exported. Without further ado, here is my proposal: -- With no flags, the PyObject_GetBuffer will raise an exception if the buffer is not direct, contiguous, and one-dimensional. Here are the flags and how they affect that: Py_BUF_REQUIRE_WRITABLE - Raise exception if the buffer isn't writable. Py_BUF_REQUIRE_READONLY - Raise excpetion if the buffer is writable. Py_BUF_ALLOW_NONCONTIGUOUS - Allow noncontiguous buffers. (This turns on "shape" and "strides".) Py_BUF_ALLOW_MULTIDIMENSIONAL - Allow multidimensional buffers. (Also turns on "shape" and "strides".) (Neither of the above two flags implies the other.) Py_BUF_ALLOW_INDIRECT - Allow indirect buffers. Implies Py_BUF_ALLOW_NONCONTIGUOUS and Py_BUF_ALLOW_MULTIDIMENSIONAL. (Turns on "shape", "strides", and "suboffsets".) Py_BUF_REQUIRE_CONTIGUOUS_C_ARRAY or Py_BUF_REQUIRE_ROW_MAJOR - Raise an exception if the array isn't a contiguous array with in C (row-major) format. Py_BUF_REQUIRE_CONTIGUOUS_FORTRAN_ARRAY or Py_BUF_REQUIRE_COLUMN_MAJOR - Raise an exception if the array isn't a contiguous array with in Fortran (column-major) format. Py_BUF_ALLOW_NONCONTIGUOUS, Py_BUF_REQUIRE_CONTIGUOUS_C_ARRAY, and Py_BUF_REQUIRE_CONTIGUOUS_FORTRAN_ARRAY all conflict with each other, and an exception should be raised if more than one are set. (I would go with ROW_MAJOR and COLUMN_MAJOR: even though the terms only make sense for 2D arrays, I believe the terms are commonly generalized to other dimensions.) Possible pseudo-flags: Py_BUF_SIMPLE = 0; Py_BUF_ALLOW_STRIDED = Py_BUF_ALLOW_NONCONTIGUOUS | Py_BUF_ALLOW_MULTIDIMENSIONAL; -- Now, for each flag, there should be an associated function to test the condition, given a bufferinfo struct. (Though I suppose they don't necessarily have to map one-to-one, I'll do that here.) int PyBufferInfo_IsReadonly(struct bufferinfo*); int PyBufferInfo_IsWritable(struct bufferinfo*); int PyBufferInfo_IsContiguous(struct bufferinfo*); int PyBufferInfo_IsMultidimensional(struct bufferinfo*); int PyBufferInfo_IsIndirect(struct bufferinfo*); int PyBufferInfo_IsRowMajor(struct bufferinfo*); int PyBufferInfo_IsColumnMajor(struct bufferinfo*); The function PyObject_GetBuffer then has a pretty obvious implementation. Here is an except: if ((flags & Py_BUF_REQUIRE_READONLY) && !PyBufferInfo_IsReadonly(&bufinfo)) { PyExc_SetString(PyErr_BufferError,"buffer not read-only"); return 0; } Pretty straightforward, no? Now, here is a key point: for these functions to work (indeed, for PyObject_GetBuffer to work at all), you need enough information in bufinfo to figure it out. The bufferinfo struct should be self-contained; you should not need to know what flags were passed to PyObject_GetBuffer in order to know exactly what data you're looking at. Therefore, format must always be supplied by getbuffer. You cannot tell if an array is contiguous without the format string. (But see below.) And even if the consumer isn't asking for a contiguous buffer, it has to know the item size so it knows what data not to step on. (This is true even in your own proposal, BTW. If a consumer asks for a non-strided array in your proposal, PyObject_GetBuffer would have to know the item size to determine if the array is contiguous.) -- FAQ: Q. Why ALLOW_NONCONTIGUOUS and ALLOW_MULTIDIMENSIONAL instead of ALLOW_STRIDED and ALLOW_SHAPED? A. It's more useful to the consumer that way. With ALLOW_STRIDED and ALLOW_SHAPED, there's no way for a consumer to request a general one-dimensional array (it can only request a non-strided o
[Python-Dev] PEP 3118: Extended buffer protocol (new version)
Travis Oliphant wrote: > Carl Banks wrote: >> Ok, I've thought quite a bit about this, and I have an idea that I >> think will be ok with you, and I'll be able to drop my main >> objection. It's not a big change, either. The key is to explicitly >> say whether the flag allows or requires. But I made a few other >> changes as well. > I'm good with using an identifier to differentiate between an "allowed" > flag and a "require" flag. I'm not a big fan of > VERY_LONG_IDENTIFIER_NAMES though. Just enough to understand what it > means but not so much that it takes forever to type and uses up > horizontal real-estate. That's fine with me. I'm not very particular about spellings, as long as they're not misleading. >> Now, here is a key point: for these functions to work (indeed, for >> PyObject_GetBuffer to work at all), you need enough information in >> bufinfo to figure it out. The bufferinfo struct should be >> self-contained; you should not need to know what flags were passed to >> PyObject_GetBuffer in order to know exactly what data you're looking at. > Naturally. > >> Therefore, format must always be supplied by getbuffer. You cannot >> tell if an array is contiguous without the format string. (But see >> below.) > > No, I don't think this is quite true. You don't need to know what > "kind" of data you are looking at if you don't get strides. If you use > the SIMPLE interface, then both consumer and exporter know the object is > looking at "bytes" which always has an itemsize of 1. But doesn't this violate the above maxim? Suppose these are the contents of bufinfo: ndim = 1 len = 20 shape = (10,) strides = (2,) format = NULL How does it know whether it's looking at contiguous array of 10 two-byte objects, or a discontiguous array of 10 one-byte objects, without having at least an item size? Since item size is now in the mix, it's moot, of course. The idea that Py_BUF_SIMPLE implies bytes is news to me. What if you want a contiguous, one-dimensional array of an arbitrary type? I was thinking this would be acceptable with Py_BUF_SIMPLE. It seems you want to require Py_BUF_FORMAT for that, which would suggest to me that Py_BUF_ALLOW_ND amd Py_BUF_ALLOW_NONCONTIGUOUS, etc., would imply Py_BUF_FORMAT? IOW, pretty much anything that's not SIMPLE implies FORMAT? If that's the case, then most of the issues I brought up about item size don't apply. Also, if that's the case, you're right that Py_BUF_FORMAT makes more sense than Py_BUF_DONT_NEED_FORAMT. But it now it seems even more unnecessary than it did before. Wouldn't any consumer that just wants to look at a chunk of bytes always use Py_BUF_FORMAT, especially if there's danger of a presumptuous exporter raising an exception? > I'll update the PEP with my adaptation of your suggestions in a little > while. Ok. Thanks for taking the lead, and for putting up with my verbiose nitpicking. :) Carl Banks ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com