Re: [R] Complicated For Loop (to me)

David Winsemius Mon, 09 Nov 2009 20:01:34 -0800


On Nov 9, 2009, at 9:51 PM, agm. wrote:

Sorry, I've been trying to work around this and just got back tocheck my
email.

dput wasn't working too well for me because the data set also has 450
variables and I needed more time to figure out how to properly showyou all
what you needed to know.

That's not a very convincing story. Nabble lets you put files wherethey can be accessed. There are examples of nabble users doing thattoday.

But to show you the idea, a very simple data set would be:

NWEIGHT  ETHNIC   RACE   SLUNCH   DIVISION .......
1234            0           1         1               1
2345            1           1         0               5
3243            0           3         1               3
  .                .           .          .                .
  .                .           .          .                .
  .                .           .          .                .
  .                .           .          .                .

So basically, I already have the data subset by division and race.(I did

that the inefficient way by coding it by hand)


Probably did not not need to do that:

?split

But now I need to calculate the percentage of each division (byrace) that
participates in SLUNCH (a 0 1 variable)


?tapply
?by

So I am trying to avoid writing out code such as:

w.cd1.s <- sum(ifelse(white.cd1$SLUNCH==1, white.cd1$NWEIGHT,
0))/sum(white.cd1$NWEIGHT)


Perhaps:

by(white.cd1$NMEIGHT, white.cd1$SLUNCH, sum, na.rm=TRUE)/sum(white.cd1$NWEIGHT, na.rm=TRUE)

w.cd2.s <- sum(ifelse(white.cd2$SLUNCH==1, white.cd2$NWEIGHT,
0))/sum(white.cd2$NWEIGHT)

.... for all the variables.


?apply
?lapply

One other method that I tried, which gets me the "names" i need, butdoesn'tput them into a dataframe (which I am currently trying to fix) is byusing

this code:


names <- c("white","black","hispanic","asian")
regions <- c("cd1","cd2","cd3","cd4","cd5","cd6","cd7","cd8","cd9")
type <- c("l", "p", "r")
name.region <- c()
for (j in 1:length(names)){
        for(i in 1:length(regions)){
                for(k in 1:length(type)){

name.holder <- paste(names[j],".",paste(regions[i],".", type[k],sep=""),

sep="")
                name.region <- c(name.region, name.holder)
                }
        }
}

(The "l", "p", "r" represent other variables that I am trying to dothe same

thing as SLUNCH)

From here I've been trouble-shooting how to switch these namedvariables

back into a data.frame context.

Everyone's help has been really appreciated! I've learned a lottoday that

will hopefully move me slowly from using for loops to more efficient

functions. I unfortunately am still learning those and have someknowledgeabout how to use loops compared to almost no knowledge of the morepowerfulfunctions like sapply, lapply, etc. (I'm waiting on MASS4 to bereturned to

the library to read it.)


Thanks!


John Kane-2 wrote:

I think that we probably need a sample database of your originaldata.

A few lines of the dataset would probably be enough as long as it was

fairly representative of the overall data set. See ?dput for a wayof

conveniently supply a sample data set.

Otherwise off the top of my head, I would think that you could justput

all your subsets into a list and use lapply  but I'm simply guessing
without seeing the data.

--- On Mon, 11/9/09, agm. <amur...@vt.edu> wrote:

From: agm. <amur...@vt.edu>
Subject: Re: [R] Complicated For Loop (to me)
To: r-help@r-project.org
Received: Monday, November 9, 2009, 3:18 PM

I've looked through ?split and run all of the code, but I
am not sure that I
can use it in such a way to make it do what I need.
Another suggestion was
using "lists", but again, I am sure that the process can do
what I need, but
I am not sure it would work with so many observations.

I might have been too simple in my code.  Let me try
to explain it more
clearly:

I've got a data set of 4500 observations.  I have
already subset it into
race/ethnicity (which I did by simple code).  Now I
needed to subset each
race/ethnicity again into 9 separate regions.  I again
did this by simple
code.

The problem is now, I need to calculate a percentage for
three different
variables for all 9 regions for each race.  I was
trying to do this through
a loop command.

So a snippet of my code is :

names <- c("white", "black", "asian", "hispanic")
for(j in 1:length(names)){
for(i in 1:9){
names[j].cd[i].es.wash <- subset(names[j].cd[i],
SLUNCH==1)
es.cd[i].names.w <-
sum(names.cd[i].es.wash$NWEIGHT)/sum(names.cd[i]$NWEIGHT)
}
}

Maybe that makes it clearer.  If not, I
apologize.  Thanks for the help that
I have already received.  It is greatly appreciated.

Tony


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Complicated For Loop (to me)

Reply via email to