Reduce the non-unique case to the unique case in the first line and then form the ids ensuring that ids has names. Finally sapply over the ids summing across rows using drop = FALSE so that the two cases in your code are both handled at once. Adding 0 converts to numeric.
ag <- aggregate(data.table.b[3], data.table.b[2], paste, collapse = ";") ids <- strsplit(as.character(ag$FEATURE), ";") names(ids) <- ag$CATEGORY f <- function(fs) 0+(rowSums(data.table.a[, as.numeric(fs), drop = FALSE]) > 0) sapply(ids, f) Here is the output: Cardiac CNS Gastro Respiratory Patient 1 1 1 0 0 Patient 2 1 0 1 0 Patient 3 1 0 1 0 Patient 4 0 0 0 1 Patient 5 0 0 1 0 Patient 6 1 1 1 1 Patient 7 0 0 1 0 Patient 8 1 0 1 0 Patient 9 1 0 0 0 Patient 10 1 0 1 1 Patient 11 1 0 1 1 Patient 12 1 1 1 1 Patient 13 1 0 1 0 Patient 14 1 1 1 0 Patient 15 1 0 1 1 Patient 16 1 0 1 1 Patient 17 1 1 1 0 Patient 18 0 0 1 0 Patient 19 1 0 1 1 Patient 20 1 0 1 0 On Tue, May 11, 2010 at 2:05 AM, Greg Orm <splice...@gmail.com> wrote: > Apologies. > > Let me clarify. I have included my code below : > > data.table.b represents the medical nomenclature, whereas data.table.a is a > patient derived database. > data.table.b$CATEGORY categorizes features (e.g. 'cardiac', 'respiratory'), > whereas data.table.b$FEATURE is a corresponding disease (e.g. under CATEGORY > 'cardiac', there could be heart attack, heart failure as FEATURES) > > When data.table.b$CATEGORY is unique, for example below, where there are 20 > patients in data.table.a, and data.table.b contains 6 categories (Cardiac … > Endocrine), with a total of 8 features (1:8), it is not a problem for me to > extract the data (e.g. for features 3 and 5, so long as one is positive, the > final category under Cardiac is positive) > > data.table.a <- > matrix(data=round(runif(160)),nrow=20,ncol=8,dimnames=list(paste("Patient",1:20),paste("Feature",1:8))) > data.table.b <- data.frame > (ID=c(1,2,3,4,5,6),CATEGORY=c("Cardiac","Respiratory","Gastro","Renal","CNS","Endocrine"),FEATURE=c("3;5","7","4","6","1;2","8")) > ids <- strsplit(as.character(data.table.b$FEATURE),";") > > i=vector() > outcome=matrix(data=NA,nrow=20,ncol=6) > > for (i in 1: 6){ > if (is.vector ( data.table.a[,as.integer(ids [[i]]) ])) { > outcome [,i] <- data.table.a[,as.integer(ids [[i]]) ] > } > else { > outcome [,i] <- rowSums(data.table.a[,as.integer(ids [[i]])])>0 > } > } > #the if else is needed because I can't figure out what command can work both > on a vector(single feature) or an array(multiple features in the same cell, > such as 3;5) > #RowSums is used here kind of like a Boolean OR for the categories > > colnames(outcome) <- data.table.b$CATEGORY > rownames(outcome) <- rownames(data.table.a) > > #outcome is what I need. > > > The problem I am having, and because I am not very good at manipulating > tables, is just how to manage a situation where CATEGORY is non-unique, such > as in the example below : > > > data.table.a <- > matrix(data=round(runif(160)),nrow=20,ncol=8,dimnames=list(paste("Patient",1:20),paste("Feature",1:8))) > data.table.b <- data.frame > (ID=c(1,2,3,4,5,6),CATEGORY=c("Cardiac","Cardiac","Respiratory","Gastro","Gastro","CNS"),FEATURE=c("3;5","7","4","6","1;2","8")) > > Thanks. > > I hope this is a bit clearer. > > Regards, > Greg > > On Tue, May 11, 2010 at 1:34 AM, Daniel Malter <dan...@umd.edu> wrote: > >> >> Hi, even after rereading, I have little of a clue what it is exactly that >> you >> are trying to do. It'd help if you provided a more concise, step-by-step >> description and/or the smallest unambiguous example of the two tables AND >> of >> what should come out at the end. Also, unless for relatively trivial >> problems, the list typically likes to see some own effort and where you are >> stuck, rather than to solve the whole problem. >> >> Best, >> Daniel >> >> -- >> View this message in context: >> http://r.789695.n4.nabble.com/Advice-needed-on-awkward-tables-tp2173289p2173341.html >> Sent from the R help mailing list archive at Nabble.com. >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]] > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.