On Sa, 2015-12-19 at 06:55 -0600, Andy Ray Terrel wrote: > A simple fix would certainly by pass the check in random.choice, but I > don't know how to get that. So let's focus on the summation. > > > I believe you are hitting an instability in summing small numbers as a > power to 10th order would produce. Here is an example: > > > mymatrix = np.random.rand(1024,1024).astype('float16')*1e-7 > row_normalized = mymatrix / np.sum(mymatrix, axis=1, keepdims=True) > > sums = row_normalized.sum(axis=1) > len(sums[sums != 1]) # -> 108 > > > One can use things like Kahan summation and you will need to collect > the size of the error and truncate all numbers in mymatrix under that > error. I'm not quite sure how to quickly implement such a thing in > numpy without a loop.
In fact, the code even seems to do kahan summation, however, I think it always assumes double precision for the p keyword argument, so as a work around at least, you have to sum to convert to and normalize it as double. - Sebastian > > On Fri, Dec 18, 2015 at 7:00 PM, Nathaniel Smith <n...@pobox.com> > wrote: > On Fri, Dec 18, 2015 at 1:25 PM, Ryan R. Rosario > <r...@bytemining.com> wrote: > > Hi, > > > > I have a matrix whose entries I must raise to a certain > power and then normalize by row. After I do that, when I pass > some rows to numpy.random.choice, I get a ValueError: > probabilities do not sum to 1. > > > > I understand that floating point is not perfect, and my > matrix is so large that I cannot use np.longdouble because I > will run out of RAM. > > > > As an example on a smaller matrix: > > > > np.power(mymatrix, 10, out=mymatrix) > > row_normalized = np.apply_along_axis(lambda x: x / > np.sum(x), 1, mymatrix) > > I'm sorry I don't have a solution to your actual problem off > the top > of my head, but it's probably helpful in general to know that > a better > way to write this would be just > > row_normalized = mymatrix / np.sum(mymatrix, axis=1, > keepdims=True) > > apply_along_axis is slow and can almost always be replaced by > a > broadcasting expression like this. > > > sums = row_normalized.sum(axis=1) > > sums[np.where(sums != 1)] > > And here you can just write > > sums[sums != 1] > > i.e. the call to where() isn't doing anything useful. > > -n > > -- > Nathaniel J. Smith -- http://vorpus.org > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion > > > > _______________________________________________ > NumPy-Discussion mailing list > NumPy-Discussion@scipy.org > https://mail.scipy.org/mailman/listinfo/numpy-discussion
signature.asc
Description: This is a digitally signed message part
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org https://mail.scipy.org/mailman/listinfo/numpy-discussion