[Numpy-discussion] Error in Covariance and Variance calculation

2020-03-20 Thread Gunter Meissner
Dear Programmers, 

 

This is Gunter Meissner. I am currently writing a book on Forecasting and 
derived the regression coefficient with Numpy:

 

import numpy as np
X=[1,2,3,4]
Y=[1,8000,5000,1000]
print(np.cov(X,Y))
print(np.var(X))
Beta1 = np.cov(X,Y)/np.var(X)
print(Beta1)



However, Numpy is using the SAMPLE covariance , (which divides by n-1) and the 
POPULATION variance

VarX =   (which divides by n). Therefore the regression coefficient BETA1 is 
not correct.

The solution is easy: Please use the population approach (dividing by n) for 
BOTH covariance and variance or use the sample approach (dividing by n-1) 

for BOTH covariance and variance. You may also allow the user to use both as in 
EXCEL, where the user can choose between Var.S and Var.P

and Cov.P and Var.P.

 

Thanks!!!

Gunter 

 

 

Gunter Meissner, PhD

University of Hawaii

Adjunct Professor of MathFinance at Columbia University and NYU

President of Derivatives Software www.dersoft.com <http://www.dersoft.com/>   

CEO Cassandra Capital Management  <http://www.cassandracm.com> 
www.cassandracm.com 

CV:  <http://www.dersoft.com/cv.pdf> www.dersoft.com/cv.pdf 

Email:  <mailto:meiss...@hawaii.edu> meiss...@hawaii.edu

Tel: USA (808) 779 3660

 

 

 

 

From: NumPy-Discussion 
 On Behalf Of Ralf 
Gommers
Sent: Wednesday, March 18, 2020 5:16 AM
To: Discussion of Numerical Python 
Subject: Re: [Numpy-discussion] Proposal: NEP 41 -- First step towards a new 
Datatype System

 

 

 

On Tue, Mar 17, 2020 at 9:03 PM Sebastian Berg mailto:sebast...@sipsolutions.net> > wrote:

Hi all,

in the spirit of trying to keep this moving, can I assume that the main
reason for little discussion is that the actual changes proposed are
not very far reaching as of now?  Or is the reason that this is a
fairly complex topic that you need more time to think about it?

 

Probably (a) it's a long NEP on a complex topic, (b) the past week has been a 
very weird week for everyone (in the extra-news-reading-time I could easily 
have re-reviewed the NEP), and (c) the amount of feedback one expects to get on 
a NEP is roughly inversely proportional to the scope and complexity of the NEP 
contents.

 

Today I re-read the parts I commented on before. This version is a big 
improvement over the previous ones. Thanks in particular for adding clear 
examples and the diagram, it helps a lot.

 

If it is the latter, is there some way I can help with it?  I tried to
minimize how much is part of this initial NEP.

If there is not much need for discussion, I would like to officially
accept the NEP very soon, sending out an official one week notice in
the next days.

 

I agree. I think I would like to keep the option open though to come back to 
the NEP later to improve the clarity of the text about 
motivation/plan/examples/scope, given that this will be the reference for a 
major amount of work for a long time to come.

 

To summarize one more time, the main point is that:

 

This point seems fine, and I'm +1 for going ahead with the described parts of 
the technical design.

 

Cheers,

Ralf

 


type(np.dtype(np.float64))

will be `np.dtype[float64]`, a subclass of dtype, so that:

issubclass(np.dtype[float64], np.dtype)

is true. This means that we will have one class for every current type
number: `dtype.num`. The implementation of these subclasses will be a
C-written (extension) MetaClass, all details of this class are supposed
to remain experimental in flux at this time.

Cheers

Sebastian


On Wed, 2020-03-11 at 17:02 -0700, Sebastian Berg wrote:
> Hi all,
> 
> I am pleased to propose NEP 41: First step towards a new Datatype
> System https://numpy.org/neps/nep-0041-improved-dtype-support.html
> 
> This NEP motivates the larger restructure of the datatype machinery
> in
> NumPy and defines a few fundamental design aspects. The long term
> user
> impact will be allowing easier and more rich featured user defined
> datatypes.
> 
> As this is a large restructure, the NEP represents only the first
> steps
> with some additional information in further NEPs being drafted [1]
> (this may be helpful to look at depending on the level of detail you
> are interested in).
> The NEP itself does not propose to add significant new public API.
> Instead it proposes to move forward with an incremental internal
> refactor and lays the foundation for this process.
> 
> The main user facing change at this time is that datatypes will
> become
> classes (e.g. ``type(np.dtype("float64"))`` will be a float64
> specific
> class.
> For most users, the main impact should be many new datatypes in the
> long run (see the user impact section). However, for those interested
> in API design within NumPy or with respect to implementing new
> datatypes, this and the following NEPs are important decisions in the
> futur

Re: [Numpy-discussion] Error in Covariance and Variance calculation

2020-03-20 Thread Gunter Meissner
Thanks Warren! Worked like a charm 😊 Will mention you in the book...


Gunter Meissner, PhD
University of Hawaii
Adjunct Professor of MathFinance at Columbia University and NYU
President of Derivatives Software www.dersoft.com  
CEO Cassandra Capital Management www.cassandracm.com 
CV: www.dersoft.com/cv.pdf 
Email: meiss...@hawaii.edu
Tel: USA (808) 779 3660



-Original Message-
From: NumPy-Discussion 
 On Behalf Of Warren 
Weckesser
Sent: Friday, March 20, 2020 8:45 AM
To: Discussion of Numerical Python 
Subject: Re: [Numpy-discussion] Error in Covariance and Variance calculation

On 3/20/20, Gunter Meissner  wrote:
> Dear Programmers,
>
>
>
> This is Gunter Meissner. I am currently writing a book on Forecasting 
> and derived the regression coefficient with Numpy:
>
>
>
> import numpy as np
> X=[1,2,3,4]
> Y=[1,8000,5000,1000]
> print(np.cov(X,Y))
> print(np.var(X))
> Beta1 = np.cov(X,Y)/np.var(X)
> print(Beta1)
>
>
>
> However, Numpy is using the SAMPLE covariance , (which divides by n-1) 
> and the POPULATION variance
>
> VarX =   (which divides by n). Therefore the regression coefficient BETA1 is
> not correct.
>
> The solution is easy: Please use the population approach (dividing by 
> n) for BOTH covariance and variance or use the sample approach 
> (dividing by n-1)
>
> for BOTH covariance and variance. You may also allow the user to use 
> both as in EXCEL, where the user can choose between Var.S and Var.P
>
> and Cov.P and Var.P.
>
>
>
> Thanks!!!
>
> Gunter
>


Gunter,

This is an unfortunate discrepancy in the API:  `var` uses the default 
`ddof=0`, while `cov` uses, in effect, `ddof=1` by default.

You can get the consistent behavior you want by using `ddof=1` in both 
functions.  E.g.

Beta1 = np.cov(X,Y, ddof=1) / np.var(X, ddof=1)

Using `ddof=1` in `np.cov` is redundant, but in this context, it is probably 
useful to make explicit to the reader of the code that both functions are using 
the same convention.

Changing the default in either function breaks backwards
compatibility.   That would require a long and potentially painful
deprecation process.

Warren



>
>
>
>
> Gunter Meissner, PhD
>
> University of Hawaii
>
> Adjunct Professor of MathFinance at Columbia University and NYU
>
> President of Derivatives Software www.dersoft.com 
> <http://www.dersoft.com/>
>
>
> CEO Cassandra Capital Management  <http://www.cassandracm.com> 
> www.cassandracm.com
>
> CV:  <http://www.dersoft.com/cv.pdf> www.dersoft.com/cv.pdf
>
> Email:  <mailto:meiss...@hawaii.edu> meiss...@hawaii.edu
>
> Tel: USA (808) 779 3660
>
>
>
>
>
>
>
>
>
> From: NumPy-Discussion
>  On Behalf Of 
> Ralf Gommers
> Sent: Wednesday, March 18, 2020 5:16 AM
> To: Discussion of Numerical Python 
> Subject: Re: [Numpy-discussion] Proposal: NEP 41 -- First step towards 
> a new Datatype System
>
>
>
>
>
>
>
> On Tue, Mar 17, 2020 at 9:03 PM Sebastian Berg 
> mailto:sebast...@sipsolutions.net> > wrote:
>
> Hi all,
>
> in the spirit of trying to keep this moving, can I assume that the 
> main reason for little discussion is that the actual changes proposed 
> are not very far reaching as of now?  Or is the reason that this is a 
> fairly complex topic that you need more time to think about it?
>
>
>
> Probably (a) it's a long NEP on a complex topic, (b) the past week has 
> been a very weird week for everyone (in the extra-news-reading-time I 
> could easily have re-reviewed the NEP), and (c) the amount of feedback 
> one expects to get on a NEP is roughly inversely proportional to the 
> scope and complexity of the NEP contents.
>
>
>
> Today I re-read the parts I commented on before. This version is a big 
> improvement over the previous ones. Thanks in particular for adding 
> clear examples and the diagram, it helps a lot.
>
>
>
> If it is the latter, is there some way I can help with it?  I tried to 
> minimize how much is part of this initial NEP.
>
> If there is not much need for discussion, I would like to officially 
> accept the NEP very soon, sending out an official one week notice in 
> the next days.
>
>
>
> I agree. I think I would like to keep the option open though to come 
> back to the NEP later to improve the clarity of the text about 
> motivation/plan/examples/scope, given that this will be the reference 
> for a major amount of work for a long time to come.
>
>
>
> To summarize one more time, the main point is that:
>
>
>
> This point seems fine, and I'm +1 for going ahead with the described 
> parts of the technical design.
>
>
>
> Cheers,
>

Re: [Numpy-discussion] Unreliable crash when converting using numpy.asarray via C buffer interface

2021-03-29 Thread Gunter Meissner
Aloha Numpy Community,
I am just writing a book on "How to Cheat in Statistics - And get Away with
It".
I noticed there is no built-in syntax for the 'Adjusted R-squared' in any
library (do correct me if I am wrong)
I think it would be a good idea to program it. The math is straight
forward, I can
provide it if desired.
Thank you,
Gunter


On Mon, Feb 15, 2021 at 5:56 AM Sebastian Berg 
wrote:

> On Mon, 2021-02-15 at 10:12 +0100, Friedrich Romstedt wrote:
> > Hi,
> >
> > Am Do., 4. Feb. 2021 um 09:07 Uhr schrieb Friedrich Romstedt
> > :
> > > Am Mo., 1. Feb. 2021 um 09:46 Uhr schrieb Matti Picus <
> > > matti.pi...@gmail.com>:
> > > > Typically, one would create a complete example and then pointing
> > > > to the
> > > > code (as repo or pastebin, not as an attachment to a mail here).
> > >
> > > https://github.com/friedrichromstedt/bughunting-01
> >
> > Last week I updated my example code to be more slim.  There now
> > exists
> > a single-file extension module:
> >
> https://github.com/friedrichromstedt/bughunting-01/blob/master/lib/bugIhuntingfrmod/bughuntingfrmod.cpp
> <https://github.com/friedrichromstedt/bughunting-01/blob/master/lib/bughuntingfrmod/bughuntingfrmod.cpp>
> > .
> > The corresponding test program
> >
> https://github.com/friedrichromstedt/bughunting-01/blob/master/test/2021-02-11_0909.py
> > crashes "properly" both on Windows 10 (Python 3.8.2, numpy 1.19.2) as
> > well as on Arch Linux (Python 3.9.1, numpy 1.20.0), when the
> > ``print``
> > statement contained in the test file is commented out.
> >
> > My hope to be able to fix my error myself by reducing the code to
> > reproduce the problem has not been fulfillled.  I feel that the
> > abovementioned test code is short enough to ask for help with it
> > here.
> > Any hint on how I could solve my problem would be appreciated very
> > much.
>
> I have tried it out, and can confirm that using debugging tools (namely
> valgrind), will allow you track down the issue (valgrind reports it
> from within python, running a python without debug symbols may
> obfuscate the actual problem; if that is the limiting you, I can post
> my valgrind output).
> Since you are running a linux system, I am confident that you can run
> it in valgrind to find it yourself.  (There may be other ways.)
>
> Just remember to run valgrind with `PYTHONMALLOC=malloc valgrind` and
> ignore some errors e.g. when importing NumPy.
>
> Cheers,
>
> Sebastian
>
>
> >
> > There are some points which were not clarified yet; I am citing them
> > below.
> >
> > So far,
> > Friedrich
> >
> > > > - There are tools out there to analyze refcount problems. Python
> > > > has
> > > > some built-in tools for switching allocation strategies.
> > >
> > > Can you give me some pointer about this?
> > >
> > > > - numpy.asarray has a number of strategies to convert instances,
> > > > which
> > > > one is it using?
> > >
> > > I've tried to read about this, but couldn't find anything.  What
> > > are
> > > these different strategies?
> > ___
> > NumPy-Discussion mailing list
> > NumPy-Discussion@python.org
> > https://mail.python.org/mailman/listinfo/numpy-discussion
> >
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discussion@python.org
> https://mail.python.org/mailman/listinfo/numpy-discussion
>


-- 

Gunter Meissner, PhD

University of Hawaii

Adjunct Professor of MathFinance at Columbia University and NYU

President of Derivatives Software www.dersoft.com

CEO Cassandra Capital Management www.cassandracm.com

CV: www.dersoft.com/cv.pdf

Email: meiss...@hawaii.edu

Tel: USA (808) 779 3660
___
NumPy-Discussion mailing list
NumPy-Discussion@python.org
https://mail.python.org/mailman/listinfo/numpy-discussion