Re: [R] Chi square test on data frame

Bansal, Vikas Wed, 17 Aug 2011 12:08:58 -0700

Dear Michael,

Thanks a lot for your reply and for your help.I was struggling so much but your 
suggestion showed me a path to the solution of my problem.I have tried your 
code on my data frame step wise and it looks fine to me.But when i tried chi 
square test-


res=chisq.test(y1[id],p=y2[id],rescale.p=T)

        Chi-squared test for given probabilities

data:  y1[id] 
X-squared = NaN, df = 19997, p-value = NA

Warning message:
In chisq.test(y1[id], p = y2[id], rescale.p = T) :
  Chi-squared approximation may be incorrect

It is not giving p value.Then i checked observed and expected values,it is 
taking all numbers under consideration.but as i mentioned earlier i want p 
value for each row and therefore degree of freedom will be 1. example-

I have a data frame with 8 columns-
      V1   V2       V3       V4      W1   W2        W3       W4
1     0    84       22       10       0      84          0          0
2    35    84        0        0     22      84          0          0
3     0     0          0      48       0       0            0         48
4     0    48        0        0       0      48           0          0
5     0    84        0        0       0      84           0          0
6     0     0        0       48       0       0            0         48

example for first row is-

first two largest values are 84(in V2) and 22 (in V3).so these are considered 
as observed values.Now if the largest values are in V2 and V3,we have to pick 
expected values from W2 and W3 which are 84 and 0.I know for chi square test 
values should not be 0 but we will ignore the warning.

now it should generate p value for next row taking 35 and 84 (v1 and v2) as 
observed and 22 and 84 (w1 and w2) as expected.so here it will do chi square 
test for all 6 rows and will generate 6 p values.My data frame has lot of 
rows(approx. 9999).

Can you please help me with this.



Thanking you,
Warm Regards
Vikas Bansal
Msc Bioinformatics
Kings College London
________________________________________
From: R. Michael Weylandt [michael.weyla...@gmail.com]
Sent: Wednesday, August 17, 2011 7:11 PM
To: Bansal, Vikas
Cc: r-help@r-project.org
Subject: Re: [R] Chi square test on data frame

I think everything below is right, but it's all a little helter-skelter so take 
it with a grain of salt:

First things first, make your data with dput() for the list.

Y = structure(c(0, 35, 0, 0, 0, 0, 84, 84, 0, 48, 84, 0, 22, 0, 0,
0, 0, 0, 10, 0, 48, 0, 0, 48, 0, 22, 0, 0, 0, 0, 84, 84, 0, 48,
84, 0, 0, 0, 0, 0, 0, 0, 0, 0, 48, 0, 0, 48), .Dim = c(6L, 8L
), .Dimnames = list(c("1", "2", "3", "4", "5", "6"), c("V1",
"V2", "V3", "V4", "W1", "W2", "W3", "W4")))

Now,

Y1 = Y[,1:4]
Y2 = Y[,-(1:4)]

id = apply(Y1,1,order,decreasing=T)[1:2,]
# This has the columns you want in each row, but it's not directly appropriate 
for subsetting
# Specifically, the problem is that the row information is implicit in where 
the col index is in id
# We directly extract and force into a 2-col vector that gives rows and columns 
for each data point
id = cbind(as.vector(col(id)),as.vector(id))

Now you can take

Y1[id] as the observed values and Y2[id] as the expected.

But, to be honest, it sounds like you have more problems in using a chi-sq test 
than anything else. Beyond all the zeros, you should note that you always have 
#obs >= #expected because Y1>= Y2. I'll leave that up to you though.

Hope this helps and please make sure you can take my code apart piece by piece 
to understand it: there's some odd data manipulation that takes advantage of 
R's way of coercing matrices to vectors and if your actual data isn't like the 
provided example, you may have to modify.

Michael Weylandt

On Wed, Aug 17, 2011 at 10:26 AM, Bansal, Vikas 
<vikas.ban...@kcl.ac.uk<mailto:vikas.ban...@kcl.ac.uk>> wrote:
Is there anyone who can help me with chi square test on data frame.I am 
struggling from last 2 days.I will be very  thankful to you.

Dear all,

I have been working on this problem from so many hours but did not find any 
solution.
I have a data frame with 8 columns-
      V1   V2       V3       V4      W1   W2        W3       W4
1     0    84       22       10       0      84          0          0
2    35    84        0        0     22      84          0          0
3     0     0          0      48       0       0            0         48
4     0    48        0        0       0      48           0          0
5     0    84        0        0       0      84           0          0
6     0     0        0       48       0       0            0         48

from first four columns, for each row I have to take two largest values. and 
these two values will be considered as observed values.And from last four 
column we will get the expected values.So i have to perform chi square test for 
each row to get p values.

example for first row is-

first two largest values are 84(in V2) and 22 (in V3).so these are considered 
as observed values.Now if the largest values are in V2 and V3,we have to pick 
expected values from W2 and W3 which are 84 and 0.I know for chi square test 
values should not be 0 but we will ignore the warning.
Now as we have observed value as well as expected we have to perform chi square 
test to get p values for each row in a new column.


So far I was working as returning the index for two largest value with-
sort.int<http://sort.int>(df,index.return=TRUE)$ix[c(4,3)]
 but it does not accept data frame.

Can you please give some idea how to do this,because it is very tricky and 
after studying a lot, I am not able to perform.Please help.



Thanking you,
Warm Regards
Vikas Bansal
Msc Bioinformatics
Kings College London
______________________________________________
R-help@r-project.org<mailto:R-help@r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org<mailto:R-help@r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Chi square test on data frame

Reply via email to