[Numpy-discussion] Question about numpy.random.choice with probabilties

2017-01-17 Thread Nadav Har'El
Hi, I'm looking for a way to find a random sample of C different items out
of N items, with a some desired probabilty Pi for each item i.

I saw that numpy has a function that supposedly does this,
numpy.random.choice (with replace=False and a probabilities array), but
looking at the algorithm actually implemented, I am wondering in what sense
are the probabilities Pi actually obeyed...

To me, the code doesn't seem to be doing the right thing... Let me explain:

Consider a simple numerical example: We have 3 items, and need to pick 2
different ones randomly. Let's assume the desired probabilities for item 1,
2 and 3 are: 0.2, 0.4 and 0.4.

Working out the equations there is exactly one solution here: The random
outcome of numpy.random.choice in this case should be [1,2] at probability
0.2, [1,3] at probabilty 0.2, and [2,3] at probability 0.6. That is indeed
a solution for the desired probabilities because it yields item 1 in
[1,2]+[1,3] = 0.2 + 0.2 = 2*P1 of the trials, item 2 in [1,2]+[2,3] =
0.2+0.6 = 0.8 = 2*P2, etc.

However, the algorithm in numpy.random.choice's replace=False generates, if
I understand correctly, different probabilities for the outcomes: I believe
in this case it generates [1,2] at probability 0.2, [1,3] also 0.2333,
and [2,3] at probability 0.5.

My question is how does this result fit the desired probabilities?

If we get [1,2] at probability 0.2 and [1,3] at probability 0.2333,
then the expect number of "1" results we'll get per drawing is 0.2 +
0.2333 = 0.4, and similarly for "2" the expected number 0.7666, and for
"3" 0.7. As you can see, the proportions are off: Item 2 is NOT twice
common than item 1 as we originally desired (we asked for probabilities
0.2, 0.4, 0.4 for the individual items!).


--
Nadav Har'El
n...@scylladb.com
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NumPy 1.12.0 release

2017-01-17 Thread Neal Becker
Charles R Harris wrote:

> Hi All,
> 
> I'm pleased to announce the NumPy 1.12.0 release. This release supports
> Python 2.7 and 3.4-3.6. Wheels for all supported Python versions may be
> downloaded from PiPY
> , the tarball
> and zip files may be downloaded from Github
> . The release notes
> and files hashes may also be found at Github
>  .
> 
> NumPy 1.12.0rc 2 is the result of 418 pull requests submitted by 139
> contributors and comprises a large number of fixes and improvements. Among
> the many improvements it is difficult to  pick out just a few as standing
> above the others, but the following may be of particular interest or
> indicate areas likely to have future consequences.
> 
> * Order of operations in ``np.einsum`` can now be optimized for large
> speed improvements.
> * New ``signature`` argument to ``np.vectorize`` for vectorizing with core
> dimensions.
> * The ``keepdims`` argument was added to many functions.
> * New context manager for testing warnings
> * Support for BLIS in numpy.distutils
> * Much improved support for PyPy (not yet finished)
> 
> Enjoy,
> 
> Chuck

I've installed via pip3 on linux x86_64, which gives me a wheel.  My 
question is, am I loosing significant performance choosing this pre-built 
binary vs. compiling myself?  For example, my processor might have some more 
features than the base version used to build wheels.

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] [REL] matplotlib v2.0.0

2017-01-17 Thread Thomas Caswell
Folks,

We are happy to announce the release of (long delayed) matplotlib 2.0!
This release completely overhauls the default style of the plots.

The source tarball and wheels for Mac, Win, and manylinux for python 2.7,
3.4-3.6 are available on pypi

   pip install --upgrade matplotlib

and conda packages for Mac, Win, linux for python 2.7, 3.4-3.6 are
available from conda-forge

   conda install matplotlib -c conda-forge


Highlights include:

 - 'viridis' is default color map instead of jet.
 - Modernized the default color cycle.
 - Many more functions respect the color cycle.
 - Line dash patterns scale with linewidth.
 - Change default font to DejaVu, now supports most Western alphabets
(including Greek, Cyrillic and Latin with diacritics), math symbols and
emoji out of the box.
 - Faster text rendering.
 - Improved auto-limits.
 - Ticks out and only on the right and bottom spines by default.
 - Improved auto-ticking, particularly for log scales and dates.
 - Improved image support (imshow respects scales and eliminated a class of
artifacts).

For a full list of the default changes (along with how to revert them)
please see http://matplotlib.org/users/dflt_style_changes.html and
http://matplotlib.org/users/whats_new.html#new-in-matplotlib-2-0.

There were a number of small API changes documented at
http://matplotlib.org/api/api_changes.html#api-changes-in-2-0-0

I would like to thank everyone who helped on this release in anyway.  The
people at 2015 scipy BOF where this got started, users who provided
feedback and suggestions along the way, the beta-testers, Nathaniel, Stefan
and Eric for the new color maps, and all of the documentation and code
contributors.

Please report any issues to matplotlib-us...@python.org (will have to join
to post un-moderated) or https://github.com/matplotlib/matplotlib/issues .

Tom
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Question about numpy.random.choice with probabilties

2017-01-17 Thread aleba...@gmail.com
Hi Nadav,

I may be wrong, but I think that the result of the current implementation
is actually the expected one.
Using you example: probabilities for item 1, 2 and 3 are: 0.2, 0.4 and 0.4

P([1,2]) = P([2] | 1st=[1]) P([1]) + P([1] | 1st=[2]) P([2])

Now, P([1]) = 0.2 and P([2]) = 0.4. However:
P([2] | 1st=[1]) = 0.5 (2 and 3 have the same sampling probability)
P([1] | 1st=[2]) = 1/3 (1 and 3 have probability 0.2 and 0.4 that, once
normalised, translate into 1/3 and 2/3 respectively)
Therefore P([1,2]) = 0.7/3 = 0.2
Similarly, P([1,3]) = 0.2 and P([2,3]) = 1.6/3 = 0.53

What am I missing?

Alessandro


2017-01-17 13:00 GMT+01:00 :

> Hi, I'm looking for a way to find a random sample of C different items out
> of N items, with a some desired probabilty Pi for each item i.
>
> I saw that numpy has a function that supposedly does this,
> numpy.random.choice (with replace=False and a probabilities array), but
> looking at the algorithm actually implemented, I am wondering in what sense
> are the probabilities Pi actually obeyed...
>
> To me, the code doesn't seem to be doing the right thing... Let me explain:
>
> Consider a simple numerical example: We have 3 items, and need to pick 2
> different ones randomly. Let's assume the desired probabilities for item 1,
> 2 and 3 are: 0.2, 0.4 and 0.4.
>
> Working out the equations there is exactly one solution here: The random
> outcome of numpy.random.choice in this case should be [1,2] at probability
> 0.2, [1,3] at probabilty 0.2, and [2,3] at probability 0.6. That is indeed
> a solution for the desired probabilities because it yields item 1 in
> [1,2]+[1,3] = 0.2 + 0.2 = 2*P1 of the trials, item 2 in [1,2]+[2,3] =
> 0.2+0.6 = 0.8 = 2*P2, etc.
>
> However, the algorithm in numpy.random.choice's replace=False generates, if
> I understand correctly, different probabilities for the outcomes: I believe
> in this case it generates [1,2] at probability 0.2, [1,3] also 0.2333,
> and [2,3] at probability 0.5.
>
> My question is how does this result fit the desired probabilities?
>
> If we get [1,2] at probability 0.2 and [1,3] at probability 0.2333,
> then the expect number of "1" results we'll get per drawing is 0.2 +
> 0.2333 = 0.4, and similarly for "2" the expected number 0.7666, and for
> "3" 0.7. As you can see, the proportions are off: Item 2 is NOT twice
> common than item 1 as we originally desired (we asked for probabilities
> 0.2, 0.4, 0.4 for the individual items!).
>
>
> --
> Nadav Har'El
> n...@scylladb.com
> -- next part --
> An HTML attachment was scrubbed...
> URL: <https://mail.scipy.org/pipermail/numpy-discussion/
> attachments/20170117/d1f0a1db/attachment-0001.html>
>
> --
>
> Subject: Digest Footer
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
> --
>
> End of NumPy-Discussion Digest, Vol 124, Issue 24
> *
>



-- 
--
NOTICE: Dlgs 196/2003 this e-mail and any attachments thereto may contain
confidential information and are intended for the sole use of the
recipient(s) named above. If you are not the intended recipient of this
message you are hereby notified that any dissemination or copying of this
message is strictly prohibited. If you have received this e-mail in error,
please notify the sender either by telephone or by e-mail and delete the
material from any computer. Thank you.
--
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NumPy 1.12.0 release

2017-01-17 Thread Matthew Brett
Hi,

On Tue, Jan 17, 2017 at 5:56 AM, Neal Becker  wrote:
> Charles R Harris wrote:
>
>> Hi All,
>>
>> I'm pleased to announce the NumPy 1.12.0 release. This release supports
>> Python 2.7 and 3.4-3.6. Wheels for all supported Python versions may be
>> downloaded from PiPY
>> , the tarball
>> and zip files may be downloaded from Github
>> . The release notes
>> and files hashes may also be found at Github
>>  .
>>
>> NumPy 1.12.0rc 2 is the result of 418 pull requests submitted by 139
>> contributors and comprises a large number of fixes and improvements. Among
>> the many improvements it is difficult to  pick out just a few as standing
>> above the others, but the following may be of particular interest or
>> indicate areas likely to have future consequences.
>>
>> * Order of operations in ``np.einsum`` can now be optimized for large
>> speed improvements.
>> * New ``signature`` argument to ``np.vectorize`` for vectorizing with core
>> dimensions.
>> * The ``keepdims`` argument was added to many functions.
>> * New context manager for testing warnings
>> * Support for BLIS in numpy.distutils
>> * Much improved support for PyPy (not yet finished)
>>
>> Enjoy,
>>
>> Chuck
>
> I've installed via pip3 on linux x86_64, which gives me a wheel.  My
> question is, am I loosing significant performance choosing this pre-built
> binary vs. compiling myself?  For example, my processor might have some more
> features than the base version used to build wheels.

I guess you are thinking about using this built wheel on some other
machine?   You'd have to be lucky for that to work; the wheel depends
on the symbols it found at build time, which may not exist in the same
places on your other machine.

If it does work, the speed will primarily depend on your BLAS library.

The pypi wheels should be pretty fast; they are built with OpenBLAS,
which is at or near top of range for speed, across a range of
platforms.

Cheers,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Question about numpy.random.choice with probabilties

2017-01-17 Thread Nadav Har'El
1,2]+[2,3] =
>> 0.2+0.6 = 0.8 = 2*P2, etc.
>>
>> However, the algorithm in numpy.random.choice's replace=False generates,
>> if
>> I understand correctly, different probabilities for the outcomes: I
>> believe
>> in this case it generates [1,2] at probability 0.2, [1,3] also 0.2333,
>> and [2,3] at probability 0.5.
>>
>> My question is how does this result fit the desired probabilities?
>>
>> If we get [1,2] at probability 0.2 and [1,3] at probability 0.2333,
>> then the expect number of "1" results we'll get per drawing is 0.2 +
>> 0.2333 = 0.4, and similarly for "2" the expected number 0.7666, and
>> for
>> "3" 0.7. As you can see, the proportions are off: Item 2 is NOT twice
>> common than item 1 as we originally desired (we asked for probabilities
>> 0.2, 0.4, 0.4 for the individual items!).
>>
>>
>> --
>> Nadav Har'El
>> n...@scylladb.com
>> -- next part --
>> An HTML attachment was scrubbed...
>> URL: <https://mail.scipy.org/pipermail/numpy-discussion/attachmen
>> ts/20170117/d1f0a1db/attachment-0001.html>
>>
>> --
>>
>> Subject: Digest Footer
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>> --
>>
>> End of NumPy-Discussion Digest, Vol 124, Issue 24
>> *
>>
>
>
>
> --
> --
> NOTICE: Dlgs 196/2003 this e-mail and any attachments thereto may contain
> confidential information and are intended for the sole use of the
> recipient(s) named above. If you are not the intended recipient of this
> message you are hereby notified that any dissemination or copying of this
> message is strictly prohibited. If you have received this e-mail in error,
> please notify the sender either by telephone or by e-mail and delete the
> material from any computer. Thank you.
> --
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Question about numpy.random.choice with probabilties

2017-01-17 Thread josef . pktd
t;
>
>
>>
>> What am I missing?
>>
>> Alessandro
>>
>>
>> 2017-01-17 13:00 GMT+01:00 :
>>>
>>> Hi, I'm looking for a way to find a random sample of C different items
out
>>> of N items, with a some desired probabilty Pi for each item i.
>>>
>>> I saw that numpy has a function that supposedly does this,
>>> numpy.random.choice (with replace=False and a probabilities array), but
>>> looking at the algorithm actually implemented, I am wondering in what
sense
>>> are the probabilities Pi actually obeyed...
>>>
>>> To me, the code doesn't seem to be doing the right thing... Let me
explain:
>>>
>>> Consider a simple numerical example: We have 3 items, and need to pick 2
>>> different ones randomly. Let's assume the desired probabilities for
item 1,
>>> 2 and 3 are: 0.2, 0.4 and 0.4.
>>>
>>> Working out the equations there is exactly one solution here: The random
>>> outcome of numpy.random.choice in this case should be [1,2] at
probability
>>> 0.2, [1,3] at probabilty 0.2, and [2,3] at probability 0.6. That is
indeed
>>> a solution for the desired probabilities because it yields item 1 in
>>> [1,2]+[1,3] = 0.2 + 0.2 = 2*P1 of the trials, item 2 in [1,2]+[2,3] =
>>> 0.2+0.6 = 0.8 = 2*P2, etc.
>>>
>>> However, the algorithm in numpy.random.choice's replace=False
generates, if
>>> I understand correctly, different probabilities for the outcomes: I
believe
>>> in this case it generates [1,2] at probability 0.2, [1,3] also
0.2333,
>>> and [2,3] at probability 0.5.
>>>
>>> My question is how does this result fit the desired probabilities?
>>>
>>> If we get [1,2] at probability 0.2 and [1,3] at probability 0.2333,
>>> then the expect number of "1" results we'll get per drawing is 0.2 +
>>> 0.2333 = 0.4, and similarly for "2" the expected number 0.7666, and
for
>>> "3" 0.7. As you can see, the proportions are off: Item 2 is NOT
twice
>>> common than item 1 as we originally desired (we asked for probabilities
>>> 0.2, 0.4, 0.4 for the individual items!).
>>>
>>>
>>> --
>>> Nadav Har'El
>>> n...@scylladb.com
>>> -- next part --
>>> An HTML attachment was scrubbed...
>>> URL: <
https://mail.scipy.org/pipermail/numpy-discussion/attachments/20170117/d1f0a1db/attachment-0001.html
>
>>>
>>> --
>>>
>>> Subject: Digest Footer
>>>
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@scipy.org
>>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>>
>>> --
>>>
>>> End of NumPy-Discussion Digest, Vol 124, Issue 24
>>> *
>>
>>
>>
>>
>> --
>>
--
>> NOTICE: Dlgs 196/2003 this e-mail and any attachments thereto may
contain confidential information and are intended for the sole use of the
recipient(s) named above. If you are not the intended recipient of this
message you are hereby notified that any dissemination or copying of this
message is strictly prohibited. If you have received this e-mail in error,
please notify the sender either by telephone or by e-mail and delete the
material from any computer. Thank you.
>>
--
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Question about numpy.random.choice with probabilties

2017-01-17 Thread aleba...@gmail.com
gt;> Alessandro
>>
>>
>> 2017-01-17 13:00 GMT+01:00 :
>>
>>> Hi, I'm looking for a way to find a random sample of C different items
>>> out
>>> of N items, with a some desired probabilty Pi for each item i.
>>>
>>> I saw that numpy has a function that supposedly does this,
>>> numpy.random.choice (with replace=False and a probabilities array), but
>>> looking at the algorithm actually implemented, I am wondering in what
>>> sense
>>> are the probabilities Pi actually obeyed...
>>>
>>> To me, the code doesn't seem to be doing the right thing... Let me
>>> explain:
>>>
>>> Consider a simple numerical example: We have 3 items, and need to pick 2
>>> different ones randomly. Let's assume the desired probabilities for item
>>> 1,
>>> 2 and 3 are: 0.2, 0.4 and 0.4.
>>>
>>> Working out the equations there is exactly one solution here: The random
>>> outcome of numpy.random.choice in this case should be [1,2] at
>>> probability
>>> 0.2, [1,3] at probabilty 0.2, and [2,3] at probability 0.6. That is
>>> indeed
>>> a solution for the desired probabilities because it yields item 1 in
>>> [1,2]+[1,3] = 0.2 + 0.2 = 2*P1 of the trials, item 2 in [1,2]+[2,3] =
>>> 0.2+0.6 = 0.8 = 2*P2, etc.
>>>
>>> However, the algorithm in numpy.random.choice's replace=False generates,
>>> if
>>> I understand correctly, different probabilities for the outcomes: I
>>> believe
>>> in this case it generates [1,2] at probability 0.2, [1,3] also
>>> 0.2333,
>>> and [2,3] at probability 0.5.
>>>
>>> My question is how does this result fit the desired probabilities?
>>>
>>> If we get [1,2] at probability 0.2 and [1,3] at probability 0.2333,
>>> then the expect number of "1" results we'll get per drawing is 0.2 +
>>> 0.2333 = 0.4, and similarly for "2" the expected number 0.7666, and
>>> for
>>> "3" 0.7. As you can see, the proportions are off: Item 2 is NOT twice
>>> common than item 1 as we originally desired (we asked for probabilities
>>> 0.2, 0.4, 0.4 for the individual items!).
>>>
>>>
>>> --
>>> Nadav Har'El
>>> n...@scylladb.com
>>> -- next part --
>>> An HTML attachment was scrubbed...
>>> URL: <https://mail.scipy.org/pipermail/numpy-discussion/attachmen
>>> ts/20170117/d1f0a1db/attachment-0001.html>
>>>
>>> --
>>>
>>> Subject: Digest Footer
>>>
>>> ___
>>> NumPy-Discussion mailing list
>>> NumPy-Discussion@scipy.org
>>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>
>>>
>>> --
>>>
>>> End of NumPy-Discussion Digest, Vol 124, Issue 24
>>> *
>>>
>>
>>
>>
>> --
>> 
>> --
>> NOTICE: Dlgs 196/2003 this e-mail and any attachments thereto may
>> contain confidential information and are intended for the sole use of the
>> recipient(s) named above. If you are not the intended recipient of this
>> message you are hereby notified that any dissemination or copying of this
>> message is strictly prohibited. If you have received this e-mail in error,
>> please notify the sender either by telephone or by e-mail and delete the
>> material from any computer. Thank you.
>> 
>> --
>>
>> ___
>> NumPy-Discussion mailing list
>> NumPy-Discussion@scipy.org
>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>
>>
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>


-- 
--
NOTICE: Dlgs 196/2003 this e-mail and any attachments thereto may contain
confidential information and are intended for the sole use of the
recipient(s) named above. If you are not the intended recipient of this
message you are hereby notified that any dissemination or copying of this
message is strictly prohibited. If you have received this e-mail in error,
please notify the sender either by telephone or by e-mail and delete the
material from any computer. Thank you.
--
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [SciPy-Dev] NumPy 1.12.0 release

2017-01-17 Thread Matthew Brett
On Tue, Jan 17, 2017 at 3:47 PM, Neal Becker  wrote:
> Matthew Brett wrote:
>
>> Hi,
>>
>> On Tue, Jan 17, 2017 at 5:56 AM, Neal Becker  wrote:
>>> Charles R Harris wrote:
>>>
 Hi All,

 I'm pleased to announce the NumPy 1.12.0 release. This release supports
 Python 2.7 and 3.4-3.6. Wheels for all supported Python versions may be
 downloaded from PiPY
 , the
 tarball and zip files may be downloaded from Github
 . The release notes
 and files hashes may also be found at Github
  .

 NumPy 1.12.0rc 2 is the result of 418 pull requests submitted by 139
 contributors and comprises a large number of fixes and improvements.
 Among
 the many improvements it is difficult to  pick out just a few as
 standing above the others, but the following may be of particular
 interest or indicate areas likely to have future consequences.

 * Order of operations in ``np.einsum`` can now be optimized for large
 speed improvements.
 * New ``signature`` argument to ``np.vectorize`` for vectorizing with
 core dimensions.
 * The ``keepdims`` argument was added to many functions.
 * New context manager for testing warnings
 * Support for BLIS in numpy.distutils
 * Much improved support for PyPy (not yet finished)

 Enjoy,

 Chuck
>>>
>>> I've installed via pip3 on linux x86_64, which gives me a wheel.  My
>>> question is, am I loosing significant performance choosing this pre-built
>>> binary vs. compiling myself?  For example, my processor might have some
>>> more features than the base version used to build wheels.
>>
>> I guess you are thinking about using this built wheel on some other
>> machine?   You'd have to be lucky for that to work; the wheel depends
>> on the symbols it found at build time, which may not exist in the same
>> places on your other machine.
>>
>> If it does work, the speed will primarily depend on your BLAS library.
>>
>> The pypi wheels should be pretty fast; they are built with OpenBLAS,
>> which is at or near top of range for speed, across a range of
>> platforms.
>>
>> Cheers,
>>
>> Matthew
>
> I installed using pip3 install, and it installed a wheel package.  I did not
> build it - aren't wheels already compiled packages?  So isn't it built for
> the common denominator architecture, not necessarily as fast as one I built
> myself on my own machine?  My question is, on x86_64, is this potential
> difference large enough to bother with not using precompiled wheel packages?

Ah - my guess is that you'd be hard pressed to make a numpy that is as
fast as the precompiled wheel.   The OpenBLAS library included in
numpy selects the routines for your CPU at run-time, so they will
generally be fast on your CPU.   You might be able to get equivalent
or even better performance with a ATLAS BLAS library recompiled on
your exact machine, but that's quite a serious investment of time to
get working, and you'd have to benchmark to find if you were really
doing any better.

Cheers,

Matthew
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] [SciPy-Dev] NumPy 1.12.0 release

2017-01-17 Thread Nathaniel Smith
On Tue, Jan 17, 2017 at 3:47 PM, Neal Becker  wrote:
> Matthew Brett wrote:
>
>> Hi,
>>
>> On Tue, Jan 17, 2017 at 5:56 AM, Neal Becker  wrote:
>>> Charles R Harris wrote:
>>>
 Hi All,

 I'm pleased to announce the NumPy 1.12.0 release. This release supports
 Python 2.7 and 3.4-3.6. Wheels for all supported Python versions may be
 downloaded from PiPY
 , the
 tarball and zip files may be downloaded from Github
 . The release notes
 and files hashes may also be found at Github
  .

 NumPy 1.12.0rc 2 is the result of 418 pull requests submitted by 139
 contributors and comprises a large number of fixes and improvements.
 Among
 the many improvements it is difficult to  pick out just a few as
 standing above the others, but the following may be of particular
 interest or indicate areas likely to have future consequences.

 * Order of operations in ``np.einsum`` can now be optimized for large
 speed improvements.
 * New ``signature`` argument to ``np.vectorize`` for vectorizing with
 core dimensions.
 * The ``keepdims`` argument was added to many functions.
 * New context manager for testing warnings
 * Support for BLIS in numpy.distutils
 * Much improved support for PyPy (not yet finished)

 Enjoy,

 Chuck
>>>
>>> I've installed via pip3 on linux x86_64, which gives me a wheel.  My
>>> question is, am I loosing significant performance choosing this pre-built
>>> binary vs. compiling myself?  For example, my processor might have some
>>> more features than the base version used to build wheels.
>>
>> I guess you are thinking about using this built wheel on some other
>> machine?   You'd have to be lucky for that to work; the wheel depends
>> on the symbols it found at build time, which may not exist in the same
>> places on your other machine.
>>
>> If it does work, the speed will primarily depend on your BLAS library.
>>
>> The pypi wheels should be pretty fast; they are built with OpenBLAS,
>> which is at or near top of range for speed, across a range of
>> platforms.
>>
>> Cheers,
>>
>> Matthew
>
> I installed using pip3 install, and it installed a wheel package.  I did not
> build it - aren't wheels already compiled packages?  So isn't it built for
> the common denominator architecture, not necessarily as fast as one I built
> myself on my own machine?  My question is, on x86_64, is this potential
> difference large enough to bother with not using precompiled wheel packages?

Ultimately, it's going to depend on all sorts of things, including
most importantly your actual code. Like most speed questions, the only
real way to know is to try it and measure the difference.

The wheels do ship with a fast BLAS (OpenBLAS configured to
automatically adapt to your CPU at runtime), so the performance will
at least be reasonable. Possible improvements would include using a
different and somehow better BLAS (MKL might be faster in some cases),
tweaking your compiler options to take advantage of whatever SIMD ISAs
your particular CPU supports (numpy's build system doesn't do this
automatically but in principle you could do it by hand -- were you
bothering before? does it even make a difference in practice? I
dunno), and using a new compiler (the linux wheels use a somewhat
ancient version of gcc for Reasons; newer compilers are better at
optimizing -- how much does it matter? again I dunno).

Basically: if you want to experiment and report back then I think we'd
all be interested to hear; OTOH if you aren't feeling particularly
curious/ambitious then I wouldn't worry about it :-).

-n

-- 
Nathaniel J. Smith -- https://vorpus.org
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Question about numpy.random.choice with probabilties

2017-01-17 Thread josef . pktd
t;P([1,2]) = 0.2(not 0.23 as above)
>>P([1,3]) = 0.2
>>P([2,3]) = 0.6(not 0.53 as above)
>>
>> Then, we get exactly the right P(1), P(2), P(3): 0.2, 0.4, 0.4
>>
>> Interestingly, fixing things like I suggest is not always possible.
>> Consider a different probability-vector example for three items - 0.99,
>> 0.005, 0.005. Now, no matter which algorithm we use for randomly picking
>> pairs from these three items, *each* returned pair will inevitably contain
>> one of the two very-low-probability items, so each of those items will
>> appear in roughly half the pairs, instead of in a vanishingly small
>> percentage as we hoped.
>>
>> But in other choices of probabilities (like the one in my original
>> example), there is a solution. For 2-out-of-3 sampling we can actually show
>> a system of three linear equations in three variables, so there is always
>> one solution but if this solution has components not valid as probabilities
>> (not in [0,1]) we end up with no solution - as happens in the 0.99, 0.005,
>> 0.005 example.
>>
>>
>>
>>> What am I missing?
>>>
>>> Alessandro
>>>
>>>
>>> 2017-01-17 13:00 GMT+01:00 :
>>>
>>>> Hi, I'm looking for a way to find a random sample of C different items
>>>> out
>>>> of N items, with a some desired probabilty Pi for each item i.
>>>>
>>>> I saw that numpy has a function that supposedly does this,
>>>> numpy.random.choice (with replace=False and a probabilities array), but
>>>> looking at the algorithm actually implemented, I am wondering in what
>>>> sense
>>>> are the probabilities Pi actually obeyed...
>>>>
>>>> To me, the code doesn't seem to be doing the right thing... Let me
>>>> explain:
>>>>
>>>> Consider a simple numerical example: We have 3 items, and need to pick 2
>>>> different ones randomly. Let's assume the desired probabilities for
>>>> item 1,
>>>> 2 and 3 are: 0.2, 0.4 and 0.4.
>>>>
>>>> Working out the equations there is exactly one solution here: The random
>>>> outcome of numpy.random.choice in this case should be [1,2] at
>>>> probability
>>>> 0.2, [1,3] at probabilty 0.2, and [2,3] at probability 0.6. That is
>>>> indeed
>>>> a solution for the desired probabilities because it yields item 1 in
>>>> [1,2]+[1,3] = 0.2 + 0.2 = 2*P1 of the trials, item 2 in [1,2]+[2,3] =
>>>> 0.2+0.6 = 0.8 = 2*P2, etc.
>>>>
>>>> However, the algorithm in numpy.random.choice's replace=False
>>>> generates, if
>>>> I understand correctly, different probabilities for the outcomes: I
>>>> believe
>>>> in this case it generates [1,2] at probability 0.2, [1,3] also
>>>> 0.2333,
>>>> and [2,3] at probability 0.5.
>>>>
>>>> My question is how does this result fit the desired probabilities?
>>>>
>>>> If we get [1,2] at probability 0.2 and [1,3] at probability 0.2333,
>>>> then the expect number of "1" results we'll get per drawing is 0.2 +
>>>> 0.2333 = 0.4, and similarly for "2" the expected number 0.7666, and
>>>> for
>>>> "3" 0.7. As you can see, the proportions are off: Item 2 is NOT
>>>> twice
>>>> common than item 1 as we originally desired (we asked for probabilities
>>>> 0.2, 0.4, 0.4 for the individual items!).
>>>>
>>>>
>>>> --
>>>> Nadav Har'El
>>>> n...@scylladb.com
>>>> -- next part --
>>>> An HTML attachment was scrubbed...
>>>> URL: <https://mail.scipy.org/pipermail/numpy-discussion/attachmen
>>>> ts/20170117/d1f0a1db/attachment-0001.html>
>>>>
>>>> --
>>>>
>>>> Subject: Digest Footer
>>>>
>>>> ___
>>>> NumPy-Discussion mailing list
>>>> NumPy-Discussion@scipy.org
>>>> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>>>>
>>>>
>>>> --
>>>>
>>>> End of NumPy-Discussion Digest, Vol 124, Issue 24
>>>> ***

Re: [Numpy-discussion] NumPy 1.12.0 release

2017-01-17 Thread Jerome Kieffer
On Tue, 17 Jan 2017 08:56:42 -0500
Neal Becker  wrote:

> I've installed via pip3 on linux x86_64, which gives me a wheel.  My 
> question is, am I loosing significant performance choosing this pre-built 
> binary vs. compiling myself?  For example, my processor might have some more 
> features than the base version used to build wheels.

Hi,

I have done some benchmarking (%timeit) for my code running in a
jupyter-notebook within a venv installed with pip+manylinux wheels
versus ipython and debian packages (on the same computer).
I noticed the debian installation was ~20% faster.

I did not investigate further if those 20% came from the manylinux (I
suspect) or from the notebook infrastructure.

HTH,
-- 
Jérôme Kieffer

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] NumPy 1.12.0 release

2017-01-17 Thread Nathan Goldbaum
I've seen reports on the anaconda mailing list of people seeing similar
speed ups when they compile e.g. Numpy with a recent gcc. Anaconda has the
same issue as manylinux in that they need to use versions of GCC available
on CentOS 5.

Given the upcoming official EOL for CentOS5, it might make sense to think
about making a pep for a CentOS 6-based manylinux2 docker image, which will
allow compiling with a newer GCC.

On Tue, Jan 17, 2017 at 9:15 PM Jerome Kieffer 
wrote:

> On Tue, 17 Jan 2017 08:56:42 -0500
>
> Neal Becker  wrote:
>
>
>
> > I've installed via pip3 on linux x86_64, which gives me a wheel.  My
>
> > question is, am I loosing significant performance choosing this pre-built
>
> > binary vs. compiling myself?  For example, my processor might have some
> more
>
> > features than the base version used to build wheels.
>
>
>
> Hi,
>
>
>
> I have done some benchmarking (%timeit) for my code running in a
>
> jupyter-notebook within a venv installed with pip+manylinux wheels
>
> versus ipython and debian packages (on the same computer).
>
> I noticed the debian installation was ~20% faster.
>
>
>
> I did not investigate further if those 20% came from the manylinux (I
>
> suspect) or from the notebook infrastructure.
>
>
>
> HTH,
>
> --
>
> Jérôme Kieffer
>
>
>
> ___
>
> NumPy-Discussion mailing list
>
> NumPy-Discussion@scipy.org
>
> https://mail.scipy.org/mailman/listinfo/numpy-discussion
>
>
___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
https://mail.scipy.org/mailman/listinfo/numpy-discussion