I'm trying to file a set of data points, defined by genome coordinates, into
bins, also based on genome coordinates. Each data point is (chromosome, start,
end, point) and each bin is (chromosome, start, end). I have about 140 million
points to file into around 100,000 bins. Both are (roughly)
I've bodged my way through my median problems (see previous postings). Now I
need to take a z-score of an array that might contain nans. At the moment, if
the array, which is 7000 elements, contains 1 nan or more, all the results come
out as nan.
My other problem is that my array is indexed from
David Cournapeau ar.media.kyoto-u.ac.jp> writes:
> Unfortunately, we can't, because we would loose generality: we need to
> compute median on any axis, not only the last one. The proper solution
> would be to have a sort/max/min/etc... which knows about nan in numpy,
> which is what Chuck and I a
David Cournapeau ar.media.kyoto-u.ac.jp> writes:
> Still, it is indeed really slow for your case; when I fixed nanmean and
> co, I did not know much about numpy, I just wanted them to give the
> right answer :) I think this can be made faster, specially for your case
> (where the axis along which
Alan G Isaac american.edu> writes:
> Recently I needed to fill a 2d array with values
> from computations that could "go wrong".
> I created an array of NaN and then replaced
> the elements where the computation produced
> a useful value. I then applied ``nanmax``,
> to get the maximum of the us
Pierre GM gmail.com> writes:
> I think there were some changes on the C side of numpy between 1.0 and 1.1,
> you may have to recompile scipy and matplotlib from sources. What versions
> are you using for those 2 packages ?
>
$ dpkg -l | grep scipy
ii python-scipy
David Cournapeau ar.media.kyoto-u.ac.jp> writes:
> It may be that nanmedian is slow. But I would sincerly be surprised if
> it were slower than python list, except for some pathological cases, or
> maybe a bug in nanmedian. What do your data look like ? (size, number of
> nan, etc...)
>
I've po
David Cournapeau ar.media.kyoto-u.ac.jp> writes:
> You can use nanmean (from scipy.stats):
>
I rejoiced when I saw this answer, because it looks like a function I can just
drop in and it works. Unfortunately, nanmedian seems to be quite a bit slower
than just using lists (ignoring nan values fr
Pierre GM gmail.com> writes:
> Mmh, typo?
>
Yes, apologies. I was aiming for thorough, but ended up just careless. It's been
a long day.
> Ohoh. What version of numpy are you using ?
The version in the Ubuntu package repository. It says 1:1.0.4-6ubuntu3.
> if you don't give an axis
> param
physics.ucf.edu> writes:
> Currently the only way you can handle NaNs is by using masked arrays.
> Create a mask by doing isfinite(a), then call the masked array
> median(). There's an example here:
>
> http://sd-2116.dedibox.fr/pydocweb/doc/numpy.ma/
>
I had looked at masked arrays, bu
I have data from biological experiments that is represented as a list of
about 5000 triples. I would like to convert this to a list of the median
of each triple. I did some profiling and found that numpy was much about
12 times faster for this application than using regular Python lists and
a l
11 matches
Mail list logo