Attached is my PEP for extending the buffer protocol to allow array data to be shared.


PEP: <unassigned>
Title: Extending the buffer protocol to include the array interface
Version: $Revision: $
Last-Modified: $Date:  $
Author: Travis Oliphant <[EMAIL PROTECTED]>
Status: Draft
Type: Standards Track
Created: 28-Aug-2006
Python-Version: 2.6

Abstract

    This PEP proposes extending the tp_as_buffer structure to include 
    function pointers that incorporate information about the intended
    shape and data-format of the provided buffer.  In essence this will
    place something akin to the array interface directly into Python. 

Rationale

    Several extensions to Python utilize the buffer protocol to share
    the location of a data-buffer that is really an N-dimensional
    array.  However, there is no standard way to exchange the
    additional N-dimensional array information so that the data-buffer
    is interpreted correctly.  The NumPy project introduced an array
    interface (http://numpy.scipy.org/array_interface.shtml) through a
    set of attributes on the object itself.  While this approach
    works, it requires attribute lookups which can be expensive when
    sharing many small arrays.  

    One of the key reasons that users often request to place something
    like NumPy into the standard library is so that it can be used as
    standard for other packages that deal with arrays.  This PEP
    provides a mechanism for extending the buffer protocol (which
    already allows data sharing) to add the additional information
    needed to understand the data.  This should be of benefit to all
    third-party modules that want to share memory through the buffer
    protocol such as GUI toolkits, PIL, PyGame, CVXOPT, PyVoxel,
    PyMedia, audio libraries, video libraries etc.


Proposal
 
    Add a bf_getarrayinfo function pointer to the buffer protocol to
    allow objects to share additional information about the returned
    memory pointer.  Add the TP_HAS_EXT_BUFFER flag to types that
    define the extended buffer protocol. 

Specification:
    
    static int 

    bf_getarrayinfo (PyObject *obj, Py_intptr_t **shape, 
                     Py_intptr_t **strides, PyObject **dataformat)
       
    Inputs:  
             obj -- The Python object being questioned.
 
    Outputs: 
 
             [function result] -- the number of dimensions (n)

             *shape -- A C-array of 'n' integers indicating the
                      shape of the array. Can be NULL if n==0.
        
             *strides -- A C-array of 'n' integers indicating
                        the number of bytes to jump to get to the next
                        element in each dimension. Can be NULL if the 
                        array is C-contiguous (or n==0).

             *dataformat -- A Python object describing the data-format
                            each element of the array should be
                            interpreted as.
    
       
Discussion Questions:

    1) How is data-format information supposed to be shared?  A companion
    proposal suggests returning a data-format object which carries the
    information about the buffer area. 

    2) Should the single function pointer call be extended into
    multiple calls or should it's arguments be compressed into a structure
    that is filled?

    3) Should a C-API function(s) be created which wraps calls to this function
    pointer much like is done now with the buffer protocol?  What should
    the interface of this function (or these functions) be.

    4) Should a mask (for missing values) be shared as well? 

Reference Implementation

    Supplied when the PEP is accepted. 

Copyright

    This document is placed in the public domain.
_______________________________________________
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com

Reply via email to