Re: [Numpy-discussion] Weighted Covariance/correlation

Tom Poole Sun, 24 Aug 2014 13:06:05 -0700

Hi all,

Any input to this? Last time it generated a fair bit of discussion, which I’ll 
summarise here.

It’s currently possible to calculate a weighted average using np.average, but 
the corresponding functionality does not exist for (co)variance or corrcoeff 
calculations. In this case it’s less straightforward, and we need to worry 
about what type of information the weights contain.

Repeat type weights are the easiest to explain. Here the variances of

[x1, x2, x3] with weights [2, 1, 3]

and

[x1, x1, x2, x3, x3, x3]

are identical. For Bessel correction the total number of samples is obtained by 
summing the weights. These weights do not have to be integer, and in this case 
the only important assumption is that their sum represents the total sample 
size.

The second type of weights are importances or accuracies. Here the weighs 
represent the relative strength of contributions from each of the associated 
samples. Because this is a purely relative relation, there’s no concrete 
information about the total number of samples. This has to be obtained from the 
effective sample size, given by (sum(weights)^2)/sum(weights^2).

I think the the clearest way of providing both options is to have a boolean 
switch indicating if the weights represent repeats or frequency type 
information. I can’t immediately see a good motivation for allowing both 
concurrently, and think this could cause confusion.

Tom 

On 15 Aug 2014, at 14:46, Sebastian Berg <[email protected]> wrote:

> Hi all,
> 
> Tom Poole has opened pull request
> https://github.com/numpy/numpy/pull/4960 to implement weights into
> np.cov (correlation can be added), somewhat picking up the effort
> started by Noel Dawe in https://github.com/numpy/numpy/pull/3864.
> 
> The pull request would currently implement an accuracy type `weights`
> keyword argument as default, but have a switch `repeat_weights` to use
> repeat type weights instead (frequency type are a special case of this I
> think).
> 
> As far as I can see, the code is in a state that it can be tested. But
> since it is a new feature, the names/defaults are up for discussion, so
> maybe someone who might use such a feature has a preference. I know we
> had a short discussion about this before, but it was a while ago. For
> example another option would be to have the two weights as two keyword
> arguments, instead of a boolean switch.
> 
> Regards,
> 
> Sebastian
> 
> _______________________________________________
> NumPy-Discussion mailing list
> [email protected]
> http://mail.scipy.org/mailman/listinfo/numpy-discussion

_______________________________________________
NumPy-Discussion mailing list
[email protected]
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Weighted Covariance/correlation

Reply via email to