Re: [R] Pearson correlation with randomization

David Winsemius Fri, 21 Jan 2011 13:08:09 -0800


On Jan 21, 2011, at 3:29 PM, Brahmachary, Manisha wrote:

Hi David,

Thanks a lot for you inputs. I have modified my code accordingly.There

is one more place that I need some help.
This is my code:

========================================================================

======

X<- read.table("X.txt",as.is=T,header=T,row.names=1)
Y<- read.table("Y.txt",as.is=T,header=T,row.names=1)

X.mat<- as.matrix(X)
Y.mat<- as.matrix(Y)

# calculating the true correlation values from my original dataset
True.Corrs<- matrix()
for (k in 1:nrow(SNP.mat)){
True.Corrs[k]<- cor.test(X.mat[k,],Y.mat[k,],alternative
=c("greater"),method= c("pearson"))$p.value
}

# Creating the random distribution of Correlation p-values
X.rand <- list()
Y.rand<- list()

X.rand<-replicate(1000,(X[sample(1:ncol(Y))]),simplify=FALSE)  #
Randomizing the column values for each row
Y.rand<-replicate(1000,Y,simplify=FALSE) # Creating an equivalent list

of the Y matrix (non-randomised), to be able to do a pair-wisecor.test


Corrs.rand<- list()
tmp<- list()
for (i in 1:2){
for (j in 1:3){
# How to store a multiple values per element of list?
tmp[[j]] <- cor.test(t(X.rand[[i]][j,]),t(Y.rand[[i]][j,]),alternative
=c("greater"),method= c("pearson"))$p.value
Corrs.rand[[i]] <- rbind(Corrs.rand[[j]],tmp[[j]])
}
}

========================================================================


At this step:

for (i in 1:length(X.rand)){
for (j in 1:nrow(X.rand[[1]]){
# How to store a multiple values per element of list?
tmp[[j]] <- cor.test(t(X.rand[[i]][j,]),t(Y.rand[[i]][j,]),alternative
=c("greater"),method= c("pearson"))$p.value
Corrs.rand[[i]] <- rbind(Corrs.rand[[j]],tmp[[j]])
}
}

I am not sure how I can store multiple values per element.

The usual way would be to pre-allocate a matrix or a data.frame andthen use "[<-" to assign either a whole row at a time or assignindividual elements one by one. rbind in a loop is definitely going toslow you down.

I haven't followed through the individual steps of all the for-loops.I guess I have lost the ability to think that way after learning touse the matrix and indexing features of R, alas. If Corrs.rand is a1000 x 12 data.frame (which is just a special square list) then youcan assign the i-th row with:


Corrs.rand[i, ] <- <some-12-element-object>

Or you can assign the i,j-th element with:

Corrs.rand[i,j ] <- <vector-of-length-1>

The same syntax works for matrices.
--
David.

For eg. I

want a list of length 1000 (which is the number of randompermutations Ihave generated for my dataset) and in each element of the list Ineed to

store 12 p.values where 12 corresponds to the number of rows I have in
my randomized dataset. Eg.

[[1]]
0.23
0.05
0.78
0.78
0.87
0.11
0.003
0.9
0.76
0.11
0.23
0.56
[[2]]
0.08
0.67
0.45
0.23
0.35
0.85
0.99
0.78
0.66
0.45
0.06
0.1
[[3]]
So on...

I maybe going about this in a complicated way and there may be other
ways of storing the p.values for each of my randomized dataset. So if
anybody has ideas please oblige me.
======================================================
X dataset:(sample)
#Probes X10851  X12144  X12155  X11882  X10860  X12762  X12239  X12154
1       1       1       0       0       1       0       2       0
2       0       0       0       0       0       0       0       0
3       2       2       2       2       1       2       1       2
4       0       0       0       0       0       0       0       0
5       2       2       2       2       2       2       2       2
6       0       1       0       0       1       1       1       1
7       2       2       NaN     2       2       2       2       2
8       2       2       2       2       2       2       2       2
9       0       1       0       1       1       NaN     1       2
10      2       2       2       2       2       2       2       2
11      2       0       0       0       0       0       0       0
12      0       1       0       1       1       0       1       1


Y dataset:(sample)

Probes  X10851  X12144  X12155  X11882  X10860  X12762  X12239  X12154
1       793.0830793     788.1813828     867.8504057     729.8321265
816.8519963     805.2113707     774.5990003     854.6384306
2       12.8695023      4.312894024     10.69769375     5.872212512
13.79299806     9.394132659     6.297552848     9.307943304
3       699.7791876     826.997429      795.6409729     770.9376141
806.1241089     782.3970486     817.107482      859.7154906
4       892.8217221     869.0481458     806.3386667     812.0431017
873.5565439     794.4752191     813.9587056     814.8681274
5       892.8217221     869.0481458     806.3386667     812.0431017
873.5565439     794.4752191     813.9587056     814.8681274
6       839.7350251     943.4455677     950.7575323     859.0208018
894.246041      853.524053      941.4841508     913.0246205
7       653.1272418     751.5217836     750.1757745     737.382114
757.8486157     758.2407075     724.2185775     770.8669409
8       12.8695023      4.312894024     10.69769375     5.872212512
13.79299806     9.394132659     6.297552848     9.307943304
9       839.7350251     943.4455677     950.7575323     859.0208018
894.246041      853.524053      941.4841508     913.0246205
10      653.1272418     751.5217836     750.1757745     737.382114
757.8486157     758.2407075     724.2185775     770.8669409
11      653.1272418     751.5217836     750.1757745     737.382114
757.8486157     758.2407075     724.2185775     770.8669409
12      839.7350251     943.4455677     950.7575323     859.0208018
894.246041      853.524053      941.4841508     913.0246205

Thanks again

Manisha




-----Original Message-----
From: David Winsemius [mailto:[email protected]]
Sent: Tuesday, January 18, 2011 11:56 PM
To: Brahmachary, Manisha
Cc: [email protected]
Subject: Re: [R] Pearson correlation with randomization


On Jan 18, 2011, at 11:23 PM, Brahmachary, Manisha wrote:

Hello,



I will be very obliged if someone can help me with this statistical R
problem:

I am trying to do a Pearson correlation on my datasets X, Y with
randomization test. My X and Y datasets are pairs.

1.     I want to randomize (rearrange) only my X dataset per
row ,while
keeping the my Y dataset as it is.


X <- X[sample(1:nrow(Y)), ]


2.     Then Calculate the correlation  for this pair, and compare it
to
your true value of correlation.

3.     Repeat 2 and 3 maybe a 100 times


You may want to look at the replicate function.

4. If your true p-value is greater than 95% of the randomvalues,
then you can reject the null hypothesis at   p<0.05.


You won't have a very stable estimate of the 95th order statistics
with "maybe" 100 replications.

--
David.

I am stuck at the randomization step. I need some help inimplementing

it the appropriate randomization step in my correlation.

Below is my incomplete code. I will be very obliged if someone could
help:



X <- read.table("X.txt",as.is=T,header=T,row.names=1)

Y <- read.table("Y.txt",as.is=T,header=T,row.names=1)



X.mat<- as.matrix(X)

Y.mat<- as.matrix(Y)

Corrs<- cor.test(X.mat[1,],Y.mat[1,],alternative=c("greater"),method=

c("pearson"))



Corrs.rand <- list()



for (i in 1:length(X.mat)){

for (j in 1:100){



# This doesnot seem to wrok correctly. How do I run sample function
100
times for the same row?



SNP.rand<- sample(SNP.mat[i,],56, replace = FALSE, prob = NULL)

Corrs.rand[[j]]<- cor.test(SNP.rand,CNV.mat[j,],alternative
=c("greater"),method= c("pearson"))

# need to calculate how many times my pvalue from true p-value>random

pvalue

}

}



X dataset:



#Probes

X10851

X12144

X12155

X11882

X10860

X12762

X12239

X12154

1

1

1

0

0

1

0

2

0

2

0

0

0

0

0

0

0

0

3

2

2

2

2

1

2

1

2

4

0

0

0

0

0

0

0

0

5

2

2

2

2

2

2

2

2

6

0

1

0

0

1

1

1

1

7

2

2

NaN

2

2

2

2

2

8

2

2

2

2

2

2

2

2

9

0

1

0

1

1

NaN

1

2

10

2

2

2

2

2

2

2

2

11

2

0

0

0

0

0

0

0

12

0

1

0

1

1

0

1

1



Y dataset:

Probes

X10851

X12144

X12155

X11882

X10860

X12762

X12239

X12154

1

793.0831

788.1814

867.8504

729.8321

816.852

805.2114

774.599

854.6384

2

12.8695

4.312894

10.69769

5.872213

13.793

9.394133

6.297553

9.307943

3

699.7792

826.9974

795.641

770.9376

806.1241

782.397

817.1075

859.7155

4

892.8217

869.0481

806.3387

812.0431

873.5565

794.4752

813.9587

814.8681

5

892.8217

869.0481

806.3387

812.0431

873.5565

794.4752

813.9587

814.8681

6

839.735

943.4456

950.7575

859.0208

894.246

853.5241

941.4842

913.0246

7

653.1272

751.5218

750.1758

737.3821

757.8486

758.2407

724.2186

770.8669

8

12.8695

4.312894

10.69769

5.872213

13.793

9.394133

6.297553

9.307943

9

839.735

943.4456

950.7575

859.0208

894.246

853.5241

941.4842

913.0246

10

653.1272

751.5218

750.1758

737.3821

757.8486

758.2407

724.2186

770.8669

11

653.1272

751.5218

750.1758

737.3821

757.8486

758.2407

724.2186

770.8669

12

839.735

943.4456

950.7575

859.0208

894.246

853.5241

941.4842

913.0246







Thanks in advance



Manisha



Mount Sinai School of Medicine

Icahn Medical Institute,

1425 Madison Avenue, Box 1498

NY-10029, NEW-YORK, USA




        [[alternative HTML version deleted]]

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide

http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
West Hartford, CT


David Winsemius, MD
West Hartford, CT

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Pearson correlation with randomization

Reply via email to