Re: [Numpy-discussion] Need help for implementing a fast clip in numpy (was slow clip)

David Cournapeau Thu, 11 Jan 2007 07:52:22 -0800

Francesc Altet wrote:

A Dimecres 10 Gener 2007 22:49, Stefan van der Walt escrigué:
On Wed, Jan 10, 2007 at 08:28:14PM +0100, Francesc Altet wrote:
El dt 09 de 01 del 2007 a les 23:19 +0900, en/na David Cournapeau va
escriure:
time (putmask)--> 1.38
time (where)--> 2.713
time (numexpr where)--> 1.291
time (fancy+assign)--> 0.967
time (numexpr clip)--> 0.596

It is interesting to see there how fancy-indexing + assignation is quite
more efficient than putmask.
Not on my machine:

time (putmask)--> 0.181
time (where)--> 0.783
time (numexpr where)--> 0.26
time (fancy+assign)--> 0.202
Yeah, a lot of difference indeed. Just for reference, my results above weredone using a Duron (an Athlon but with only 128 KB of secondary cache) at 0.9GHz. Now, using my laptop (Intel 4 @ 2 GHz, 512 KB of secondary cache), Iget:
time (putmask)--> 0.244
time (where)--> 2.111
time (numexpr where)--> 0.427
time (fancy+assign)--> 0.316
time (numexpr clip)--> 0.184
so, on my laptop fancy+assign is way slower than putmask. It should be notedalso that the implementation of clip in numexpr (i.e. in pure C) is not thatmuch faster than putmask (just a 30%); so perhaps it is not so necessary tocome up with a pure C implementation for clip (or at least, on Intel P4machines!).
In any case, it is really shocking seeing how differently can perform theseveral CPU architectures on this apparently simple problem.

I am not sure it is such a simple problem: it involves massive branching.

I have never taken a look a numexpr, but the idea seems reallyinteresting, I will take a look at it when I will have some time.

Anyway, I've just finished and tested a pure C implementation of theclip function. As it is, it should be able to replace PyArray_Clip callsby PyArray_FastClip in numpy/core/multiarray.c. The idea is that for'easy' cases, it uses a trivial but fast implementation; for all othercases, it uses the old implementation for now. By easy cases, I meanscalar min and max, for non-complex number with native endianness (fromnpy_bool to npy_longdouble), which should cover most usages.


There are still some things I am unsure:

- the original clip is supposed to work with complex numbers, but Iam not sure about the semantics in this case.- If you have a float32 input, but float64 min/max values, theoriginal clip does not upcast the input. If you have integer input butfloating point min/max, the original clip fails. Is this the wantedbehaviour ? My implementation upcasts whenever possible; but then, I amnot sure how to handle the cases where no copy is asked (which I am nothandling myself for now).

As for now, when PyArray_FastClip uses a fast implementation, it isroughly 5x faster for float32 and float64 input. I expect a double speedonce the no copy option is implemented (again, for easy cases).

I attached blop.c which implements the fast clip in a module, theclip_imp.c which implements the clipping for all native types (it isgenerated by autogen because I wanted to avoid depending onnumpy.distutils for development), a Makefile and a test file which alsoprofile the clip function with float32 inputs.

Does it look Ok to other so that it can be commited to numpy (once thetwo above problems are solved, of course, to keep the same behaviourthan PyArray_Clip) ?


cheers,

David

#include "Python.h"
#include "structmember.h"

#include "numpy/arrayobject.h"

#include "clip_imp.c"

#define _ARET(x) PyArray_Return((PyArrayObject *)(x))
#define GET_REF_COUNT(x) ( ((PyObject*)(x))->ob_refcnt )

/* function callable from python */
static PyObject* PyArray_MyClip_GlobalMeth(PyObject *dummy, PyObject *args);
static PyObject *array_my_clip(PyArrayObject *self, PyObject *args, PyObject *kwds);

/* function NOT supposed to be callable from python */
static PyObject* PyArray_FastClip(PyArrayObject *input, PyObject *min, 
        PyObject *max, PyArrayObject *out);
static int numeric_scalar_clip(PyArrayObject *input, PyArrayObject *min, 
        PyArrayObject *max);

/* Not python specific functions */

static PyMethodDef mymethods[] = {
    {"myclip", PyArray_MyClip_GlobalMeth, METH_VARARGS, NULL}, 
    {"mykwclip", (PyCFunction)array_my_clip, METH_VARARGS | METH_KEYWORDS, NULL}, 
    {NULL, NULL, 0, NULL} /* Sentinel */
};

PyMODINIT_FUNC init_blop(void);

PyMODINIT_FUNC
init_blop(void)
{
    (void)Py_InitModule("_blop", mymethods);
    import_array();
}

static PyObject *
array_my_clip(PyArrayObject *self, PyObject *args, PyObject *kwds)
{
    PyObject *min, *max;
    PyArrayObject *out=NULL;
    static char *kwlist[] = {"in", "min", "max", "out", NULL};

    if (!PyArg_ParseTupleAndKeywords(args, kwds, "O&OOO&", kwlist,
                        PyArray_OutputConverter, &self, 
                        &min, &max, PyArray_OutputConverter, &out)) {
        return NULL;
    }

    return _ARET(PyArray_FastClip(self, min, max, out));
}


/*
 * expect 3 inputs in args: the input array, minvalue, maxvalue. input array
 * expected to be con vertible to array, minval and maxval convertibale to 
 * scalars.
 *
 * Only handles native double for now. For all other cases, pass down to
 * numpy clip func.
 */
static PyObject* PyArray_MyClip_GlobalMeth(PyObject *dummy, PyObject *args)
{
    PyObject        *in_o, *min_o, *max_o;
    (void)dummy;

    if (!PyArg_ParseTuple(args, "O!OO", &PyArray_Type, &in_o, &min_o, &max_o)) {
        return NULL;
    }

    return PyArray_FastClip((PyArrayObject*)in_o, min_o, max_o, (PyArrayObject*)NULL);
}

/*
 * Behaviour of clip:
 *  - Endianness wise:
 *      - returns an array which has the same endianness than input
 *      - if min and/or max has non native endianness, still returns input's endianness
 *  - If out is NULL, does a copy  of input. Otherwise, use the given out.
 *  - If min and/or max is not scalar, it has to be the same shape than input.
 *  - Contiguous-wise: ?
 *
 *  Problems :
 *  - Complex case ?
 *  - if input type is "smaller" than min or max ? current clip does not
 *  upcast, we do.
 *
 */
static PyObject* PyArray_FastClip(PyArrayObject *input, PyObject *min, 
        PyObject *max, PyArrayObject *out)
{
    PyArray_Descr   *ndescr, *tmpdescr;
    PyArrayObject   *in_a, *min_a, *max_a;
    
    int flags, typenum, st;
    int is_num, is_scalar, is_in_native, is_win_native;
 
    (void)out;

    /* checking input is a numpy array */
    if (!PyArray_Check(input)) {
        PyErr_SetString(PyExc_TypeError,
                        "clip: first argument must "\
                        "be an array");
        goto fail;
    }
    
    if (out != NULL) {
        /*
         * If out is not NULL, uses the old implementation for now
         */

        /*
         * TODO: what to do if a given out does not have required typenum ? Check
         * the behaviour of old clip
         *
         * Also, beware that out == input is possible. Check this possibility
         */
        PyObject*   tmp;
        tmp = PyArray_Clip(input, min, max, out);
        if (tmp == NULL) {
            goto fail;
        }

        return (PyObject*)out;
    }

    /* Get common type */
    typenum = 0;
    typenum = PyArray_ObjectType((PyObject*)input, typenum);
    typenum = PyArray_ObjectType((PyObject*)min, typenum);
    typenum = PyArray_ObjectType((PyObject*)max, typenum);

    is_num          = PyTypeNum_ISNUMBER(typenum);
    is_in_native    = PyArray_ISNOTSWAPPED(input);

    /* 
     * do a contiguous and aligned copy of input array, with casting to common
     * type. Take care of keeping endian order of input
     */

    /* We need to create a new type descriptor with same endianness than input,
     * but with common type typenum */
    ndescr  = PyArray_DescrFromType(typenum);
    if (ndescr == NULL) {
        goto fail;
    }

    /* If necessary, converts the new type descr to the needed byte order */
    if (!is_in_native) {
        tmpdescr = PyArray_DescrNewByteorder(ndescr, 
                ((PyArrayObject*)input)->descr->byteorder);
        if (tmpdescr == NULL) {
            goto clean_ndescr;
        }
        Py_XDECREF(ndescr);
        ndescr = tmpdescr;
    }

    /*
     *  If input is copied, we can make the copy native endian and convert
     *  to original endianness in place later (this should be faster); the copy 
     *  is called working input. If input is not copied, then working input 
     *  is equal to input. working input endianess matters 
     *  for the actual computation, 
     *  and input endianness matters for output endianness.
     */

    /* Creating the new copy of input array now that we have the right type descr */
    flags   = NPY_ENSURECOPY | NPY_IN_ARRAY;
    //flags   = NPY_IN_ARRAY;
    //Py_INCREF(ndescr);
    in_a    = (PyArrayObject*)PyArray_FromArray((PyArrayObject*)input, 
                    ndescr, flags);
    if (in_a  == NULL) {
        //Py_XDECREF(ndescr);
        goto clean_ndescr;
    } 
    is_win_native   = PyArray_ISNOTSWAPPED(in_a);

    /* 
     * Get min and max as numpy arrays. If both are scalars, then we use scalar
     * code; if they are not scalar, check that they have the same shape than input.
     */
    min_a   = (PyArrayObject *)
        PyArray_FromObject(min, typenum, 0, 0);
    if (min_a == NULL) {
        PyErr_SetString(PyExc_TypeError,
                        "Error while converting min to value,"\
                        " sorry");
        goto clean_in_a;
    }
    is_scalar   = PyArray_CheckScalar(min_a);

    max_a   = (PyArrayObject *)
        PyArray_ContiguousFromAny(max, typenum, 0, 0);
    if (max_a == NULL) {
        PyErr_SetString(PyExc_TypeError,
                        "Error while converting max to value,"\
                        " sorry");
        goto clean_min_a;
    }
    is_scalar   = PyArray_CheckScalar(max_a) && is_scalar;
    
    /*
     * Now, we have all necessary information to call the correct implementation 
     * depending on arguments type, shape, etc...
     *
     * Case to differentiate:
     *  - if typenum is not numeric, than call the old implemention
     *  - if typenum is numeric: 
     *      - scalar min/max:
     *          - working input is native -> type specific contiguous native
     *          - working input is not native -> contiguous non native
     *      - non scalar min/max: converts scalars if any to native endianness
     *          - working input AND min AND max are contiguous
     *              - working input AND min AND max native -> type speficic 
     *              contiguous native
     *              - otherwise -> contiguous non native
     *          - min/max non contiguous
     *              - working input AND min AND max native -> non contiguous native
     *              - otherwise -> non contiguous non native
     */

    if (is_num) {
        if (is_scalar && is_win_native) {
            /*
             * Numeric, scalar case: this is fast
             */
            st  = numeric_scalar_clip(in_a, min_a, max_a);
            if (st != 0 ) {
                goto clean_max_a;
            }
        } else {
            /*
             * Numeric, non scalar case 
             * Numeric, non native scalar case
             */

            /* TODO: 
             *  - check that I got the refcount right here 
             *  - custom implementation 
             */
            out     = (PyArrayObject*)PyArray_Clip(in_a, min, max, NULL);
            if (out == NULL) {
                goto clean_max_a;
            }
            //fprintf(stderr, "%s:%s, line %d: in (%d), min (%d), max (%d), out (%d)\n",
            //         __FILE__, __func__, __LINE__, GET_REF_COUNT(in_a), 
            //         GET_REF_COUNT(min_a), GET_REF_COUNT(max_a), GET_REF_COUNT(out));
            in_a    = out;
        }
    } else {
        /*
         * Non Numeric case (can current clip really handle those ?)
         */

        /* TODO: 
         *  - check that I got the refcount right here 
         *  - custom implementation 
         */
        out     = (PyArrayObject*)PyArray_Clip(in_a, min, max, NULL);
        if (out == NULL) {
            goto clean_max_a;
        }
        in_a    = out;
    }
    
    /*
     * We're done, clean everything
     */
    Py_XDECREF(max_a);
    Py_XDECREF(min_a);
    Py_XDECREF(ndescr);

    return (PyObject*)in_a;

clean_max_a:
    Py_XDECREF(max_a);
clean_min_a:
    Py_XDECREF(min_a);
clean_in_a:
    Py_XDECREF(in_a);
clean_ndescr:
    Py_XDECREF(ndescr);
fail:
    return NULL;
}

/*
 * Expect input, min and max to be same typenum, and input to be native endianness
 */
static int numeric_scalar_clip(PyArrayObject *input, PyArrayObject *min, 
        PyArrayObject *max)
{
    int st;
    
    /*
     * Call the implementation
     */
    st  = native_scalar_clip(input, min, max);
    if (st) {
        return -1;
    }
    return 0;

}

/*
 * Last Change: Fri Jan 12 12:00 AM 2007 J
 * 
 * vim:syntax=c
 */


/* 
 * (npy_bool version) Clip in-place an array with scalars min and max
 * (native endianness), contiguous array 
 *
 */
static int native_scalar_bool_clip(npy_bool* in, 
        npy_intp size, npy_bool min, npy_bool max) 
{
    //npy_bool* kcoeff, npy_bool* err)
    npy_intp i;

    for (i = 0; i < size; ++i) {
        if (in[i] <= min) {
            in[i]   = min;
        }
        if (in[i] >= max) {
            in[i]   = max;
        }
    }
    return 0;
}


/* 
 * (npy_byte version) Clip in-place an array with scalars min and max
 * (native endianness), contiguous array 
 *
 */
static int native_scalar_byte_clip(npy_byte* in, 
        npy_intp size, npy_byte min, npy_byte max) 
{
    //npy_byte* kcoeff, npy_byte* err)
    npy_intp i;

    for (i = 0; i < size; ++i) {
        if (in[i] <= min) {
            in[i]   = min;
        }
        if (in[i] >= max) {
            in[i]   = max;
        }
    }
    return 0;
}


/* 
 * (npy_ubyte version) Clip in-place an array with scalars min and max
 * (native endianness), contiguous array 
 *
 */
static int native_scalar_ubyte_clip(npy_ubyte* in, 
        npy_intp size, npy_ubyte min, npy_ubyte max) 
{
    //npy_ubyte* kcoeff, npy_ubyte* err)
    npy_intp i;

    for (i = 0; i < size; ++i) {
        if (in[i] <= min) {
            in[i]   = min;
        }
        if (in[i] >= max) {
            in[i]   = max;
        }
    }
    return 0;
}


/* 
 * (npy_short version) Clip in-place an array with scalars min and max
 * (native endianness), contiguous array 
 *
 */
static int native_scalar_sht_clip(npy_short* in, 
        npy_intp size, npy_short min, npy_short max) 
{
    //npy_short* kcoeff, npy_short* err)
    npy_intp i;

    for (i = 0; i < size; ++i) {
        if (in[i] <= min) {
            in[i]   = min;
        }
        if (in[i] >= max) {
            in[i]   = max;
        }
    }
    return 0;
}


/* 
 * (npy_ushort version) Clip in-place an array with scalars min and max
 * (native endianness), contiguous array 
 *
 */
static int native_scalar_usht_clip(npy_ushort* in, 
        npy_intp size, npy_ushort min, npy_ushort max) 
{
    //npy_ushort* kcoeff, npy_ushort* err)
    npy_intp i;

    for (i = 0; i < size; ++i) {
        if (in[i] <= min) {
            in[i]   = min;
        }
        if (in[i] >= max) {
            in[i]   = max;
        }
    }
    return 0;
}


/* 
 * (npy_int version) Clip in-place an array with scalars min and max
 * (native endianness), contiguous array 
 *
 */
static int native_scalar_int_clip(npy_int* in, 
        npy_intp size, npy_int min, npy_int max) 
{
    //npy_int* kcoeff, npy_int* err)
    npy_intp i;

    for (i = 0; i < size; ++i) {
        if (in[i] <= min) {
            in[i]   = min;
        }
        if (in[i] >= max) {
            in[i]   = max;
        }
    }
    return 0;
}


/* 
 * (npy_uint version) Clip in-place an array with scalars min and max
 * (native endianness), contiguous array 
 *
 */
static int native_scalar_uint_clip(npy_uint* in, 
        npy_intp size, npy_uint min, npy_uint max) 
{
    //npy_uint* kcoeff, npy_uint* err)
    npy_intp i;

    for (i = 0; i < size; ++i) {
        if (in[i] <= min) {
            in[i]   = min;
        }
        if (in[i] >= max) {
            in[i]   = max;
        }
    }
    return 0;
}


/* 
 * (npy_long version) Clip in-place an array with scalars min and max
 * (native endianness), contiguous array 
 *
 */
static int native_scalar_long_clip(npy_long* in, 
        npy_intp size, npy_long min, npy_long max) 
{
    //npy_long* kcoeff, npy_long* err)
    npy_intp i;

    for (i = 0; i < size; ++i) {
        if (in[i] <= min) {
            in[i]   = min;
        }
        if (in[i] >= max) {
            in[i]   = max;
        }
    }
    return 0;
}


/* 
 * (npy_ulong version) Clip in-place an array with scalars min and max
 * (native endianness), contiguous array 
 *
 */
static int native_scalar_ulong_clip(npy_ulong* in, 
        npy_intp size, npy_ulong min, npy_ulong max) 
{
    //npy_ulong* kcoeff, npy_ulong* err)
    npy_intp i;

    for (i = 0; i < size; ++i) {
        if (in[i] <= min) {
            in[i]   = min;
        }
        if (in[i] >= max) {
            in[i]   = max;
        }
    }
    return 0;
}


/* 
 * (npy_longlong version) Clip in-place an array with scalars min and max
 * (native endianness), contiguous array 
 *
 */
static int native_scalar_llong_clip(npy_longlong* in, 
        npy_intp size, npy_longlong min, npy_longlong max) 
{
    //npy_longlong* kcoeff, npy_longlong* err)
    npy_intp i;

    for (i = 0; i < size; ++i) {
        if (in[i] <= min) {
            in[i]   = min;
        }
        if (in[i] >= max) {
            in[i]   = max;
        }
    }
    return 0;
}


/* 
 * (npy_ulonglong version) Clip in-place an array with scalars min and max
 * (native endianness), contiguous array 
 *
 */
static int native_scalar_ullong_clip(npy_ulonglong* in, 
        npy_intp size, npy_ulonglong min, npy_ulonglong max) 
{
    //npy_ulonglong* kcoeff, npy_ulonglong* err)
    npy_intp i;

    for (i = 0; i < size; ++i) {
        if (in[i] <= min) {
            in[i]   = min;
        }
        if (in[i] >= max) {
            in[i]   = max;
        }
    }
    return 0;
}


/* 
 * (npy_float version) Clip in-place an array with scalars min and max
 * (native endianness), contiguous array 
 *
 */
static int native_scalar_fbl_clip(npy_float* in, 
        npy_intp size, npy_float min, npy_float max) 
{
    //npy_float* kcoeff, npy_float* err)
    npy_intp i;

    for (i = 0; i < size; ++i) {
        if (in[i] <= min) {
            in[i]   = min;
        }
        if (in[i] >= max) {
            in[i]   = max;
        }
    }
    return 0;
}


/* 
 * (npy_double version) Clip in-place an array with scalars min and max
 * (native endianness), contiguous array 
 *
 */
static int native_scalar_dbl_clip(npy_double* in, 
        npy_intp size, npy_double min, npy_double max) 
{
    //npy_double* kcoeff, npy_double* err)
    npy_intp i;

    for (i = 0; i < size; ++i) {
        if (in[i] <= min) {
            in[i]   = min;
        }
        if (in[i] >= max) {
            in[i]   = max;
        }
    }
    return 0;
}


/* 
 * (npy_longdouble version) Clip in-place an array with scalars min and max
 * (native endianness), contiguous array 
 *
 */
static int native_scalar_ldbl_clip(npy_longdouble* in, 
        npy_intp size, npy_longdouble min, npy_longdouble max) 
{
    //npy_longdouble* kcoeff, npy_longdouble* err)
    npy_intp i;

    for (i = 0; i < size; ++i) {
        if (in[i] <= min) {
            in[i]   = min;
        }
        if (in[i] >= max) {
            in[i]   = max;
        }
    }
    return 0;
}



/* This function does the dispatching based on input type */
static int native_scalar_clip(PyArrayObject* input, 
    const PyArrayObject* min, const PyArrayObject *max)
{
    npy_intp    ni;
    int         typenum, st;

    ni      = PyArray_SIZE(input);

    /* 
     * Check we have what we expect for type
     */
    typenum = input->descr->type_num;
    if(typenum != min->descr->type_num || typenum != max->descr->type_num) {
        PyErr_SetString(PyExc_TypeError,
                        "BUG: expected all objs to be same type");
        goto fail;
    }

    switch (typenum) {
        /* 
         * Take care to convert data pointer to correct pointer type before
         * indexing ! 
         */
        
        case NPY_BOOL:
            st  = native_scalar_bool_clip(
                    (npy_bool*)(input->data), ni,
                    ((npy_bool*)min->data)[0], 
                    ((npy_bool*)max->data)[0]); 
            break;
        
        case NPY_BYTE:
            st  = native_scalar_byte_clip(
                    (npy_byte*)(input->data), ni,
                    ((npy_byte*)min->data)[0], 
                    ((npy_byte*)max->data)[0]); 
            break;
        
        case NPY_UBYTE:
            st  = native_scalar_ubyte_clip(
                    (npy_ubyte*)(input->data), ni,
                    ((npy_ubyte*)min->data)[0], 
                    ((npy_ubyte*)max->data)[0]); 
            break;
        
        case NPY_SHORT:
            st  = native_scalar_sht_clip(
                    (npy_short*)(input->data), ni,
                    ((npy_short*)min->data)[0], 
                    ((npy_short*)max->data)[0]); 
            break;
        
        case NPY_USHORT:
            st  = native_scalar_usht_clip(
                    (npy_ushort*)(input->data), ni,
                    ((npy_ushort*)min->data)[0], 
                    ((npy_ushort*)max->data)[0]); 
            break;
        
        case NPY_INT:
            st  = native_scalar_int_clip(
                    (npy_int*)(input->data), ni,
                    ((npy_int*)min->data)[0], 
                    ((npy_int*)max->data)[0]); 
            break;
        
        case NPY_UINT:
            st  = native_scalar_uint_clip(
                    (npy_uint*)(input->data), ni,
                    ((npy_uint*)min->data)[0], 
                    ((npy_uint*)max->data)[0]); 
            break;
        
        case NPY_LONG:
            st  = native_scalar_long_clip(
                    (npy_long*)(input->data), ni,
                    ((npy_long*)min->data)[0], 
                    ((npy_long*)max->data)[0]); 
            break;
        
        case NPY_ULONG:
            st  = native_scalar_ulong_clip(
                    (npy_ulong*)(input->data), ni,
                    ((npy_ulong*)min->data)[0], 
                    ((npy_ulong*)max->data)[0]); 
            break;
        
        case NPY_LONGLONG:
            st  = native_scalar_llong_clip(
                    (npy_longlong*)(input->data), ni,
                    ((npy_longlong*)min->data)[0], 
                    ((npy_longlong*)max->data)[0]); 
            break;
        
        case NPY_ULONGLONG:
            st  = native_scalar_ullong_clip(
                    (npy_ulonglong*)(input->data), ni,
                    ((npy_ulonglong*)min->data)[0], 
                    ((npy_ulonglong*)max->data)[0]); 
            break;
        
        case NPY_FLOAT:
            st  = native_scalar_fbl_clip(
                    (npy_float*)(input->data), ni,
                    ((npy_float*)min->data)[0], 
                    ((npy_float*)max->data)[0]); 
            break;
        
        case NPY_DOUBLE:
            st  = native_scalar_dbl_clip(
                    (npy_double*)(input->data), ni,
                    ((npy_double*)min->data)[0], 
                    ((npy_double*)max->data)[0]); 
            break;
        
        case NPY_LONGDOUBLE:
            st  = native_scalar_ldbl_clip(
                    (npy_longdouble*)(input->data), ni,
                    ((npy_longdouble*)min->data)[0], 
                    ((npy_longdouble*)max->data)[0]); 
            break;
        
        default:
            PyErr_SetString(PyExc_TypeError,
                            "type is not supported yet,"\
                            " sorry");
            goto fail;
            break;
    }
    return st;

fail:
    return -1;
}

CC      = colorgcc
LD      = gcc

PYTHONINC       = -I/usr/include/python2.4
NUMPYINC        = 
-I/home/david/local/lib/python2.4/site-packages/numpy/core/include
OPTIMS          = -O2 -funroll-all-loops 
#OPTIMS         = -g -DDEBUG
WARN            = -W -Wall -Wstrict-prototypes -Winline -Wstrict-prototypes \
                        -Wmissing-prototypes -Wcast-align \
                        -Wcast-qual -Wnested-externs -Wshadow 
-Wbad-function-cast #-Wwrite-strings

CFLAGS  = $(PYTHONINC) $(NUMPYINC) $(OPTIMS) $(WARN)

AUTOGEN         = autogen

_blop.so: blop.o 
        $(LD) -o $@ -shared -Wl,-soname,$@ blop.o

blop.o: blop.c clip_imp.c
        $(CC) -c $(CFLAGS) -fPIC $<

#=========================
# Autogenerated c sources
#=========================
clip_imp.c: clip_imp.def clip_imp.tpl
        $(AUTOGEN) clip_imp.def

clean:
        rm -f *.o
        rm -f *.so
        rm -f clip_imp.c

# Last Change: Fri Jan 12 12:00 AM 2007 J

import numpy as N
from _blop import myclip as _myclip
from _blop import mykwclip

from numpy.testing import assert_array_equal

def myclip(ar, min, max):
    return _myclip(ar, min, max)

def mypyclip(ar, m, M):
    a   = ar.copy()
    N.putmask(a, a<=m, m)
    N.putmask(a, a>=M, M)
    return a

def mypyclip2(ar, m, M):
    a   = ar.copy()
    a[a<m]  = m
    a[a>M]  = M
    return a

def generate_data(n, m):
    return N.random.randn(n, m)

def generate_flt_data(n, m):
    return (N.random.randn(n, m)).astype(N.float32)

def generate_non_native_data(n, m):
    data    = N.random.randn(n, m)
    nd      = data.dtype
    nd      = nd.newbyteorder('>')
    data    = data.astype(nd)
    assert not data.dtype.isnative
    return data

def generate_int_data(n, m):
    return (10 * N.random.randn(n, m)).astype(int)

def test_clip_speed(nr, nc, niter):
    a   = generate_flt_data(nr, nc)

    def bmyclip(a, b, c):
        return myclip(a, b, c)

    def bclip(a, b, c):
        return a.clip(b, c)
        #a.clip(b, c, out = a)
        #return a

    m   = N.float32(-0.3)
    M   = N.float32(-0.1)
    for i in range(niter):
        ac  = myclip(a, m, M)
    for i in range(niter):
        act = bclip(a, m, M)
    
    assert_array_strict_equal(ac, act)

def test_clip_wo_out(nr, nc):
    print "==================="
    print "     NO OUT arg  "
    print "==================="

    
    print "=== testing native double input with scalar min/max ==="
    a   = generate_data(nr, nc)
    m   = -0.5
    M   = 0.6
    ac  = myclip(a, m, M)
    act = N.clip(a, m, M)
    assert_array_strict_equal(ac, act)

    print "=== testing native int input with scalar min/max ==="
    a   = generate_int_data(nr, nc)
    a   = a.astype(int) 
    m   = -2
    M   = 4
    ac  = myclip(a, m, M)
    act = N.clip(a, m, M)
    assert_array_strict_equal(ac, act)

    print "=== testing native double input with array min/max ==="
    a   = generate_data(nr, nc)
    m   = N.min(a) - a.copy()
    M   = m + 0.5
    ac  = myclip(a, m, M)
    act = N.clip(a, m, M)
    assert_array_strict_equal(ac, act)

    print "=== testing non native double input with scalar min/max ==="
    a   = generate_non_native_data(nr, nc)
    m   = -0.5
    M   = 0.6
    ac  = myclip(a, m, M)
    act = N.clip(a, m, M)
    assert_array_strict_equal(ac, act)

    print "=== testing native double input with non native double scalar min/max ==="
    a   = generate_data(nr, nc)
    m   = -0.5
    M   = N.array(0.6, dtype = a.dtype.newbyteorder('>'))
    assert not M.dtype.isnative
    ac  = myclip(a, m, M)
    act = N.clip(a, m, M)
    assert_array_strict_equal(ac, act)

def test_clip_with_out(nr, nc):
    print "==================="
    print "   WITH OUT arg  "
    print "==================="

    
    print "=== testing native double input with scalar min/max ==="
    a   = generate_data(nr, nc)
    m   = -0.5
    M   = 0.6
    ac  = N.zeros(a.shape)
    act = N.zeros(a.shape)
    mykwclip(a, m, M, ac)
    a.clip(m, M, act)
    assert_array_strict_equal(ac, act)

    #print "=== testing native int input with scalar min/max ==="
    #a   = generate_int_data(nr, nc)
    #a   = a.astype(int) 
    #m   = -2
    #M   = 4
    #ac  = myclip(a, m, M)
    #act = N.clip(a, m, M)
    #assert_array_strict_equal(ac, act)

    #print "=== testing native double input with array min/max ==="
    #a   = generate_data(nr, nc)
    #m   = N.min(a) - a.copy()
    #M   = m + 0.5
    #ac  = myclip(a, m, M)
    #act = N.clip(a, m, M)
    #assert_array_strict_equal(ac, act)

    #print "=== testing non native double input with scalar min/max ==="
    #a   = generate_non_native_data(nr, nc)
    #m   = -0.5
    #M   = 0.6
    #ac  = myclip(a, m, M)
    #act = N.clip(a, m, M)
    #assert_array_strict_equal(ac, act)

    #print "=== testing native double input with non native double scalar min/max ==="
    #a   = generate_data(nr, nc)
    #m   = -0.5
    #M   = N.array(0.6, dtype = a.dtype.newbyteorder('>'))
    #assert not M.dtype.isnative
    #ac  = myclip(a, m, M)
    #act = N.clip(a, m, M)
    #assert_array_strict_equal(ac, act)

def assert_array_strict_equal(x, y):
    assert_array_equal(x, y)
    # Check flags
    assert x.flags.c_contiguous == y.flags.c_contiguous
    assert x.flags.f_contiguous == y.flags.f_contiguous
    assert x.flags.owndata == y.flags.owndata
    assert x.flags.writeable == y.flags.writeable
    assert x.flags.aligned == y.flags.aligned
    assert x.flags.updateifcopy == y.flags.updateifcopy
    # check endianness
    assert x.dtype.isnative == y.dtype.isnative

def test():
    n   = 2
    m   = 3

    test_clip_wo_out(n, m)
    test_clip_with_out(n, m)

def bench_clip():
    n   = 8000
    m   = 256

    print "==================="
    print "   PROFILING       "
    print "==================="
    test_clip_speed(n, m, 10)

if __name__ == '__main__':
    test()
    import hotshot, hotshot.stats
    profile_file    = 'clip.prof'
    prof    = hotshot.Profile(profile_file, lineevents=1)
    prof.runcall(bench_clip)
    p = hotshot.stats.load(profile_file)
    print p.sort_stats('cumulative').print_stats(20)
    prof.close()

#def test_parsing():
#    print "=== testing myclip for native double input and native double " + \
#        "scalar min/max ==="
#    a   = generate_data(2, 3)
#    ac  = myclip(a, 0, 1)
#    act = N.clip(a, 0, 1)
#    print ac
#    print act
#
#    print "=== testing myclip for native double input and native double " + \
#        "array min/max ==="
#    a       = generate_data(2, 3)
#    clipm   = (N.min(a) + 0.1) * N.ones(a.shape)
#    clipM   = (N.max(a) - 0.1) * N.ones(a.shape)
#    ac      = myclip(a, clipm, clipM)
#    act     = N.clip(a, clipm, clipM)
#    print a
#    print ac
#    print act
#
#    print "=== testing myclip for native double input and native double " + \
#        "array min/max ==="
#
#def test_output_type():
#    print "========================"
#    print "     CONTIGUOUS      "
#    print "========================"
#
#    print "=== testing clip for native input and native scalar min/max ==="
#    a   = generate_data(2, 3)
#    ac  = N.clip(a, 0., 1.)
#    print ac.dtype.isnative
#    print ac.flags
#
#    print "=== testing clip for non native input and native scalar min/max ==="
#    nda = a.dtype
#    nda = nda.newbyteorder('>')
#    bea = a.astype(nda)
#    assert not bea.dtype.isnative
#    ac  = N.clip(bea, 0., 1.)
#    print ac.dtype.isnative
#    print ac.flags
#
#    print "=== testing clip for native input and non native scalar min/max ==="
#    nda = a.dtype
#    nda = nda.newbyteorder('>')
#    clipm   = N.array(0).astype(nda)
#    assert not clipm.dtype.isnative
#    clipM   = N.array(1).astype(nda)
#    assert not clipM.dtype.isnative
#    ac  = N.clip(a, clipm, clipM)
#    print ac.dtype.isnative
#    print ac.flags
#
#    print "=== testing clip for native input and non native array min/max ==="
#    a       = generate_data(2, 3)
#    clipm   = generate_data(2, 3)
#    clipM   = clipm + 2
#    ndclipm = clipm.dtype
#    ndclipm = ndclipm.newbyteorder('>')
#    clipm   = clipm.astype(ndclipm)
#    assert not clipm.dtype.isnative
#    ndclipM = clipM.dtype
#    ndclipM = ndclipM.newbyteorder('>')
#    clipM   = clipM.astype(ndclipM)
#    assert not clipM.dtype.isnative
#    ac  = N.clip(a, clipm, clipM)
#    print ac.dtype.isnative
#    print ac.flags
#
#    print "========================"
#    print "     NON CONTIGUOUS     "
#    print "========================"
#
#    print "=== testing clip for native input and native scalar min/max ==="
#    a   = generate_data(2, 3)
#    a   = N.require(a, requirements = 'F_CONTIGUOUS')
#    ac  = N.clip(a, 0., 1.)
#    print ac.dtype.isnative
#    print ac.flags
#
#    a   = generate_data(2, 3)
#    b   = generate_data(2, 3)
#    nda = a.dtype
#    nda = nda.newbyteorder('>')
#    a   = a.astype(nda)
#    assert not a.dtype.isnative
#    assert b.dtype.isnative
#
#    c   = a + b
#    print c.dtype.isnative
#
#    b   = b.astype(nda)
#    assert not b.dtype.isnative
#    assert not a.dtype.isnative
#    c   = a + b
#    print c.dtype.isnative
#
#    print "========================="
#    print "     with out arg        "
#    print "========================="
#    a   = generate_data(2, 3)
#
#def test_clip(a, min, max):
#    pass
#
#def test_clip_with_out():
#    print "==================="
#    print "   WITH OUT arg    "
#    print "==================="
#    nda = N.dtype(int)
#    nda = nda.newbyteorder('>')
#    bea = a.astype(nda)
#    b   = N.clip(a, min, max)
#    c   = myclip(a, min, max)
#    d   = myclip(a.astype(N.float32), min, max)
#    e   = myclip(a.astype(N.float32), N.float32(0), N.float32(1))
#    e   = myclip(a.astype(N.int32), N.int32(0), N.int32(1))
#    # Convert data to big endian
#    print "=== non native input array === "
#    f   = myclip(bea, N.int32(0), N.int32(1))
#    print f.dtype.isnative
#    f   = N.clip(bea, N.int32(0), N.int32(1))
#    print f.dtype.isnative
#    f   = mypyclip(bea, N.int32(0), N.int32(1))
#    print f.dtype.isnative
#    print "=== native integer input === "
#    f   = N.clip(a.astype(N.int32), N.int32(0), N.int32(1))
#    print f.dtype.isnative
#    print "=== native integer input, non native clipping values  === "
#    nda = N.dtype(int)
#    nda = nda.newbyteorder('>')
#    clipm   = N.array(0).astype(nda)
#    assert not clipm.dtype.isnative
#    f   = N.clip(a.astype(N.int32), clipm, N.int32(1).astype(nda))
#    print f.dtype.isnative
#    f   = mypyclip(a.astype(N.int32), clipm, N.int32(1).astype(nda))
#    print f.dtype.isnative
#    f   = myclip(a.astype(N.int32), clipm, N.int32(1).astype(nda))
#    print f.dtype.isnative
#    print "=== using non scalar values for clipping values === "
#    g   = myclip(a, 1, ["blop"])
#
#    N.testing.assert_array_equal(b, c)
#

_______________________________________________
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Need help for implementing a fast clip in numpy (was slow clip)

Reply via email to