I'm trying to subclass an ndarray so that I can add some additional fields. When I do this however, I get new odd behavior when my object is passed to a variety of numpy functions. For example nanmin returns now return an object of the type of my new array class, whereas previously I'd get a float64. Why? Is this a bug with nanmin or my class?
import numpy as np class NDArrayWithColumns(np.ndarray): def __new__(cls, obj, columns=None): obj = obj.view(cls) obj.columns = tuple(columns) return obj def __array_finalize__(self, obj): if obj is None: return self.columns = getattr(obj, 'columns', None) NAN = float("nan") r = np.array([1.,0.,1.,0.,1.,0.,1.,0.,NAN, 1., 1.])print "MIN", np.nanmin(r), type(np.nanmin(r)) gives: MIN 0.0 <type 'numpy.float64'> but >>> r = NDArrayWithColumns(r, ["a"])>>> print "MIN", np.nanmin(r), >>> type(np.nanmin(r)) MIN 0.0 <class '__main__.NDArrayWithColumns'>>>> print r.shape # ?!(11,) Note the change in type, and also that str(np.nanmin(r)) shows 1 field, not 11 as indicated by its shape. This seems wrong. Is there a way to get my subclass to behave more like an ndarray? I realize from the docs that I can override __array_wrap__, but its not clear me how how to use it to solve this issue. Or whether its the right tool. In case you're interested, I'm subclassing because I'd like to track column names in matrices of a single type. This is pretty common wish in scikit pipelines. Structured arrays and record type arrays allow for varying type. Pandas provides this functionality, but dealing with numpy arrays is easier (and more efficient) when writing cython extensions. Also, I think the structured arrays and record types are unlikely to play nice with cython because they're more freely typed -- I want to deal exclusively with arrays of doubles. Any thoughts of how to subclass ndarray and keep original behavior in ufuncs?
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion