Re: [R] Does the correlations of component makes the correlation of one phenomena ?

David L Carlson Sun, 02 Dec 2018 16:44:10 -0800

This is really a statistics question rather than an R question, but you did 
provide reproducible data. You have some moderate correlations for some of the 
tests, but they are all different relationships. You used a combination of base 
R and dplyr code, but I'll just stick with base R:


> Mesures.split <- split(Mesures, Mesures$test)
> Corrs <- sapply(Mesures.split, function(x) cor(x[, 3], x[, 4]))
> options(digits=3)
> Corrs
     1      2      3      4      5      6      7      8      9     10 
 0.551  0.437  0.905 -0.106  0.841  0.556  0.809  0.772  0.709  0.512 

> sapply(Mesures.split, function(x) coef(lm(x[, 3]~x[, 4])))
                 1      2       3        4      5      6      7
(Intercept) 0.6875 0.6530 -0.2597  2.24313 0.3498 1.4436 0.4103
x[, 4]      0.0309 0.0034  0.0353 -0.00668 0.0171 0.0168 0.0137
                  8      9      10
(Intercept) -0.7379 0.2929 0.48115
x[, 4]       0.0255 0.0129 0.00891

This gives you the intercept and slope for the regression lines for each test. 
Notice that they vary considerably. The slope value for predicting behavior 
from simulated varies from -0.007 to .031. When you average over space you 
effectively eliminate the correlations at the test level:

> Mesures_aggregated <- aggregate(Mesures[, 3:4], by=list(Mesures$Space), sum)
> cor(Mesures_aggregated[, 2:3])[1, 2]
[1] 0.0771

If you sum predicted values for empirical behavior using the 10 regression 
equations and compare that to the summed empirical value, things work out 
better.

> pred <- rowSums(sapply(Mesures.split, function(x) predict(lm(x[, 3]~x[, 4]))))
> cor(Mesures_aggregated[, 2], pred)
[1] 0.776

Without knowing where the simulated values come from, especially if they are 
completely independent of the empirical values, I can't say if this approach is 
wise.

---------------------------------------
David L. Carlson
Department of Anthropology
Texas A&M University


-----Original Message-----
From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Fatma Ell
Sent: Sunday, December 2, 2018 4:50 AM
To: r-help@r-project.org
Subject: [R] Does the correlations of component makes the correlation of one 
phenomena ?

Hi,

I have the following dataset Mesures. It contains test which is a given
context, Space is portion of this following context test. For each test we
have twelve Space and an empirical measure of a behavior Behavior_empirical and
a mesure of simulated behavior Behavior_simulated.

Mesures=structure(list(test = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L,
3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L,
4L, 4L, 4L, 4L, 4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L,
6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 7L,
7L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L, 8L,
8L, 8L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 10L, 10L, 10L,
10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L, 10L), Space = c(1L, 2L, 3L,
4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L,
9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L,
1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L,
6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L,
10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L,
2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L,
7L, 8L, 9L, 10L, 11L, 12L, 1L, 2L, 3L, 4L, 5L, 6L, 7L, 8L, 9L, 10L,
11L, 12L), Behavior_empirical = c(3.02040816326531, 7.95918367346939,
10.6162790697674, 4.64150943396226, 1.86538461538462, 1.125,
1.01020408163265, 1.2093023255814, 0.292452830188679, 0, 0, 0, 0,
1.3265306122449, 0, 3.09433962264151, 0, 1.6875, 2.02040816326531,
1.2093023255814, 1.75471698113208, 1.79347826086957,
0.243589743589744, 0, 0.377551020408163, 1.98979591836735,
6.75581395348837, 6.18867924528302, 7.46153846153846, 0.75, 0, 0,
0.292452830188679, 0, 0, 0, 0, 1.3265306122449, 1.93023255813953,
10.8301886792453, 3.73076923076923, 0, 2.69387755102041,
0.604651162790698, 1.75471698113208, 0, 0, 0, 1.51020408163265,
2.6530612244898, 3.86046511627907, 1.54716981132075, 1.86538461538462,
1.875, 2.35714285714286, 1.2093023255814, 0.292452830188679, 0, 0,
0.823529411764706, 6.79591836734694, 15.2551020408163,
5.7906976744186, 1.54716981132075, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0.773584905660377, 0, 0, 0.673469387755102, 1.81395348837209,
1.75471698113208, 2.51086956521739, 3.10576923076923,
3.70588235294118, 3.77551020408163, 9.28571428571428,
3.86046511627907, 1.54716981132075, 0, 0, 0, 0, 1.4622641509434, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0.673469387755102, 0, 0.292452830188679,
4.30434782608696, 1.09615384615385, 5.76470588235294, 0, 0,
1.93023255813953, 4.64150943396226, 3.73076923076923, 2.625,
0.673469387755102, 0.604651162790698, 0, 0, 0, 0), Behavior_simulated
= c(18, 61, 129, 198, 128, 57, 44, 80, 36, 8, 0, 0, 0, 0, 0, 49, 50,
194, 211, 353, 352, 214, 120, 15, 10, 74, 145, 224, 158, 99, 26, 19,
7, 2, 0, 0, 180, 89, 47, 36, 34, 56, 51, 65, 44, 4, 0, 0, 116, 133,
131, 103, 74, 132, 75, 44, 0, 0, 0, 0, 532, 165, 18, 5, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 1, 0, 0, 6, 47, 164, 193, 185, 91, 239, 219, 168,
83, 1, 14, 45, 136, 129, 89, 5, 0, 0, 0, 0, 0, 0, 0, 0, 6, 17, 92,
280, 273, 0, 6, 25, 108, 129, 285, 171, 181, 39, 2, 0, 0)), .Names =
c("test", "Space", "Behavior_empirical", "Behavior_simulated"),
row.names = c(NA, 120L), class = "data.frame")

For each test we study correlation between Behavior_empirical
Behavior_simulatedelation

Correlation <- character()for(i in 1:10){Mes=Mesures[(Mesures$test==i),]
co=data.frame(test=i,value=cor(Mes$Behavior_empirical,Mes$Behavior_simulated))Correlation
<- rbind(Correlation, as.data.frame(co))
i=i+1}

which give us for each test many good correlation values :

    test      value1     1  0.55086832     2  0.43690913     3
0.90498064     4 -0.10627145     5  0.84101656     6  0.55608257     7
 0.80880348     8  0.77212329     9  0.708862410   10  0.5116938

Now , we want to conclude that, if the we have good values of
Behavior_simulated for each test. It could build the final distribution
which is the sum of Behavior_simulated and then compare with the sum of
Behavior_empirical.

Mesures_aggregated<- Mesures %>% group_by(Space) %>%
summarize(Sum_Behavior_empirical=sum(Behavior_empirical),Sum_Behavior_simulated=sum(Behavior_simulated))

I may think that my final correlation result should be good. But it is not
the case

> cor(Mesures_aggregated$ 
> Sum_Behavior_empirical,Mesures_aggregated$Sum_Behavior_simulated)[1] 
> 0.07710804

Is correlation could be a result of correlations of the component of one
phenomena ? and How to evaluate the contribution of each component test in
building the 'Sum`?


Thanks  a lot for your help.


Lenny

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Does the correlations of component makes the correlation of one phenomena ?

Reply via email to