On Nov 9, 2009, at 9:51 PM, agm. wrote:


Sorry, I've been trying to work around this and just got back to check my
email.

dput wasn't working too well for me because the data set also has 450
variables and I needed more time to figure out how to properly show you all
what you needed to know.

That's not a very convincing story. Nabble lets you put files where they can be accessed. There are examples of nabble users doing that today.

But to show you the idea, a very simple data set would be:

NWEIGHT  ETHNIC   RACE   SLUNCH   DIVISION .......
1234            0           1         1               1
2345            1           1         0               5
3243            0           3         1               3
  .                .           .          .                .
  .                .           .          .                .
  .                .           .          .                .
  .                .           .          .                .

So basically, I already have the data subset by division and race. (I did
that the inefficient way by coding it by hand)

Probably did not not need to do that:

?split


But now I need to calculate the percentage of each division (by race) that
participates in SLUNCH (a 0 1 variable)

?tapply
?by

So I am trying to avoid writing out code such as:

w.cd1.s <- sum(ifelse(white.cd1$SLUNCH==1, white.cd1$NWEIGHT,
0))/sum(white.cd1$NWEIGHT)

Perhaps:

by(white.cd1$NMEIGHT, white.cd1$SLUNCH, sum, na.rm=TRUE)/ sum(white.cd1$NWEIGHT, na.rm=TRUE)


w.cd2.s <- sum(ifelse(white.cd2$SLUNCH==1, white.cd2$NWEIGHT,
0))/sum(white.cd2$NWEIGHT)

.... for all the variables.

?apply
?lapply

One other method that I tried, which gets me the "names" i need, but doesn't put them into a dataframe (which I am currently trying to fix) is by using
this code:


names <- c("white","black","hispanic","asian")
regions <- c("cd1","cd2","cd3","cd4","cd5","cd6","cd7","cd8","cd9")
type <- c("l", "p", "r")
name.region <- c()
for (j in 1:length(names)){
        for(i in 1:length(regions)){
                for(k in 1:length(type)){
name.holder <- paste(names[j],".",paste(regions[i],".", type[k], sep=""),
sep="")
                name.region <- c(name.region, name.holder)
                }
        }
}

(The "l", "p", "r" represent other variables that I am trying to do the same
thing as SLUNCH)

From here I've been trouble-shooting how to switch these named variables
back into a data.frame context.

Everyone's help has been really appreciated! I've learned a lot today that
will hopefully move me slowly from using for loops to more efficient
functions. I unfortunately am still learning those and have some knowledge about how to use loops compared to almost no knowledge of the more powerful functions like sapply, lapply, etc. (I'm waiting on MASS4 to be returned to
the library to read it.)


Thanks!


John Kane-2 wrote:

I think that we probably need a sample database of your original data.
A few lines of the dataset would probably be enough as long as it was
fairly representative of the overall data set. See ?dput for a way of
conveniently supply a sample data set.

Otherwise off the top of my head, I would think that you could just put
all your subsets into a list and use lapply  but I'm simply guessing
without seeing the data.

--- On Mon, 11/9/09, agm. <amur...@vt.edu> wrote:

From: agm. <amur...@vt.edu>
Subject: Re: [R] Complicated For Loop (to me)
To: r-help@r-project.org
Received: Monday, November 9, 2009, 3:18 PM

I've looked through ?split and run all of the code, but I
am not sure that I
can use it in such a way to make it do what I need.
Another suggestion was
using "lists", but again, I am sure that the process can do
what I need, but
I am not sure it would work with so many observations.

I might have been too simple in my code.  Let me try
to explain it more
clearly:

I've got a data set of 4500 observations.  I have
already subset it into
race/ethnicity (which I did by simple code).  Now I
needed to subset each
race/ethnicity again into 9 separate regions.  I again
did this by simple
code.

The problem is now, I need to calculate a percentage for
three different
variables for all 9 regions for each race.  I was
trying to do this through
a loop command.

So a snippet of my code is :

names <- c("white", "black", "asian", "hispanic")
for(j in 1:length(names)){
for(i in 1:9){
names[j].cd[i].es.wash <- subset(names[j].cd[i],
SLUNCH==1)
es.cd[i].names.w <-
sum(names.cd[i].es.wash$NWEIGHT)/sum(names.cd[i]$NWEIGHT)
}
}


Maybe that makes it clearer.  If not, I
apologize.  Thanks for the help that
I have already received.  It is greatly appreciated.

Tony

David Winsemius, MD
Heritage Laboratories
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to