HI,

I am not sure about whether your subset function is correct.  If you look into 
this link (http://stat.ethz.ch/R-manual/R-devel/library/base/html/subset.html), 
it says about how to use subset (subset(data, condition) instead of 
(subset=data==condition).  Also, the one I am describing about use a different 
format.  For eg, in your data, both Group1 and Group2 are separate columns with 
each having the same values for the independent variables.  Normally, for 
different groups (or factors with multiple levels), it will be in the same 
column like this:
 >dat2
   ID Group Mem   Gen Chance MSELGM MSELVR MSELFM MSELRL MSELEL ADOS Age
1   1     1  75  50.0     50     53     52     62     57     56    3  25
2   2     1  75  12.5     50     46     48     47     52     55    2  30
3   3     1  25  37.5     50     48     43     52     63     63    3  24
4   4     1  25  37.5     50     51     62     52     59     54    0  31
5   5     1  50  87.5     50     45     58     42     46     43    6  31
6   6     1 100 100.0     50     45     80     49     69     63    1  31
7   7     2  75  50.0     50     53     52     62     57     56    3  25
8   8     2  75  12.5     50     46     48     47     52     55    2  30
9   9     2  25  37.5     50     48     43     52     63     63    3  24
10 10     2  25  37.5     50     51     62     52     59     54    0  31
11 11     2  50  87.5     50     45     58     42     46     43    6  31
12 12     2 100 100.0     50     45     80     49     69     63    1  31


dat3<-subset(dat2,Group==1)
dat4<-subset(dat2,Group==2)
> dat4
   ID Group Mem   Gen Chance MSELGM MSELVR MSELFM MSELRL MSELEL ADOS Age
7   7     2  75  50.0     50     53     52     62     57     56    3  25
8   8     2  75  12.5     50     46     48     47     52     55    2  30
9   9     2  25  37.5     50     48     43     52     63     63    3  24
10 10     2  25  37.5     50     51     62     52     59     54    0  31
11 11     2  50  87.5     50     45     58     42     46     43    6  31
12 12     2 100 100.0     50     45     80     49     69     63    1  31


> fit1<-lm(Gen~MSELEL,data=dat3)
> fit2<-lm(Gen~MSELEL,data=dat4)

cor.test (dat3$Gen, dat3$MSELEL, method="pearson")

In the sample dataset that you showed here, you will get the same correlation 
results and regression results for both groups as there was no change in the 
values of the dependent or independent variables.

I guess this helps.



A.K.

  



----- Original Message -----
From: jacaranda tree <myjacara...@yahoo.com>
To: "R-help@r-project.org" <R-help@r-project.org>
Cc: 
Sent: Sunday, June 3, 2012 11:51 AM
Subject: [R] a question about subsetting

Hi all,
I started using R about 3 weeks ago, and now I've pretty much figured out how 
to do the types of statistical modeling, graphs, tables etc. that I frequently  
use (with zero background in computer languages or other statistical packages 
that are similar to R like S or SAS!). So it's been a  quite  rewarding process 
so far, and I thank you all R gurus for all your generous help!
That being said, my question is about applying a model or an analysis to 
different groups based on a grouping variable. Below is the first six rows of 
my data:

   ID Group1 Group2 Mem   Gen Chance MSELGM MSELVR MSELFM MSELRL MSELEL ADOS Age
1  1      1           1        75     50.0     50         53               52   
         62             57            56        3        25
2  2      1           1        75     12.5     50         46               48   
         47             52            55        2        30
3  3      1           1        25     37.5     50         48               43   
         52             63            63        3        24
4  4      1           1        25     37.5     50         51               62   
         52             59            54        0        31
5  5      1           1        50     87.5     50         45               58   
         42             46            43        6        31
6  6      1           1       100    100.0   50         45               80     
       49             69            63        1        31

Group1: First grouping variable
Group2: Second grouping variable
Mem: Memory trial
Gen: Generalization trial
MSEL: Mullen Scales of Early Learning (a scale measuring various skills in 
little children). GM: Gross Motor Scale, VR: Visual Reception, FM: Fine Motor, 
RL: receptive Language, EL: Expressive Language. 
ADOS: An autism-specific measure.

First I wanted to do correlations between Generalization (variable Gen) and 
expressive language (MSELEL) for each group of Group1. For this, I used lapply 
or by functions which work just fine. Here is the code with 
lapply: lapply(split(mydata, mydata$Group1), function(x){cor.test(x[,5],
x[,11], method = "pearson")})

Then I did regression. My DV is the variable Gen, and the IV is MSELEL. And 
again I wanted to do this for each group. Here is the code I came up with for 
each group:
fit1<-lm(Gen~ MSELEL, data=mydata, subset=mydata$Group1==1)

fit2<-lm(Gen~MSELEL, data=mydata, subset=mydata$Group1==2)

This works fine for regression, but when I used the "subset" function with the 
correlation (e.g.   cor.test (mydata$Gen, mydata$MSELEL, method="pearson", 
subset=mydata$Group1==1) , it did not work. It just did the correlation for the 
entire group and then used this for both groups. I was just curious as to why 
subset function works with regression, but not with correlation. Any thoughts? 
Thanks,
    [[alternative HTML version deleted]]


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to