On Wed, Aug 24, 2011 at 6:09 PM, Wes McKinney <wesmck...@gmail.com> wrote:
> On Wed, Aug 24, 2011 at 8:19 PM, Mark Wiebe <mwwi...@gmail.com> wrote: > > On Fri, Aug 19, 2011 at 11:37 AM, Bruce Southey <bsout...@gmail.com> > wrote: > >> > >> Hi, > >> <snip> > >> > >> 2) Can the 'skipna' flag be added to the methods? > >> >>> a.sum(skipna=True) > >> Traceback (most recent call last): > >> File "<stdin>", line 1, in <module> > >> TypeError: 'skipna' is an invalid keyword argument for this function > >> >>> np.sum(a,skipna=True) > >> nan > > > > I've added this now, as well. I think that finishes up the changes you > > suggested in this email which felt right to me. > > Cheers, > > Mark > > > >> > >> <snip> > > > > _______________________________________________ > > NumPy-Discussion mailing list > > NumPy-Discussion@scipy.org > > http://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > > > Sorry I haven't had a chance to have a tinker yet. My initial observations: > > - I haven't decided whether this is a problem: > > In [50]: arr = np.arange(100) > > In [51]: arr[5:10] = np.NA > --------------------------------------------------------------------------- > ValueError Traceback (most recent call last) > /home/wesm/<ipython-input-51-7e07a94409e9> in <module>() > ----> 1 arr[5:10] = np.NA > > ValueError: Cannot set NumPy array values to NA values without first > enabling NA support in the array > > I assume when you flip the maskna switch that a mask is created? > That's correct, it creates a fully exposed mask when you set the flag. The thought was that having an assignment automatically add a mask to an array would be a bad idea ("explicit vs implicit"). > > - Performance with skipna is a bit disappointing: > > In [52]: arr = np.random.randn(1e6) > In [54]: arr.flags.maskna = True > In [56]: arr[::2] = np.NA > In [58]: timeit arr.sum(skipna=True) > 100 loops, best of 3: 7.31 ms per loop > > this goes down to 2.12 ms if there are no NAs present. > The alternating case is going to get the worst possible performance currently. The masked loop has no specialization to the operation or data type whatsoever yet, it simply calls the regular inner loop on the appropriate runs of data. > but: > > In [59]: import bottleneck as bn > In [60]: arr = np.random.randn(1e6) > In [61]: arr[::2] = np.nan > In [62]: timeit bn.nansum(arr) > 1000 loops, best of 3: 1.17 ms per loop > > do you have a sense if this gap can be closed? I assume you've been, > as you should, focused on a correct implementation as opposed with > squeezing out performance. > I've been focusing on a correct implementation while installing hooks in the right places so that the performance can be improved later. For the straightforward masked copying code, I previously created a ticket describing what needs to be done: http://projects.scipy.org/numpy/ticket/1901 For element-wise ufuncs, the changes needed are similar, creating inner loops specialized for masks. In doing these changes, I also figured out a way to add the ability to more properly specialize the inner loops along the lines of einsum without breaking ABI compatibility, so I set up the API as required for this. Thanks for taking a look, Mark > > best, > Wes > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > http://mail.scipy.org/mailman/listinfo/numpy-discussion >
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion