On Oct 23, 2013, at 4:36 PM, Jon BR wrote: > Hello, > I've been running several programs in the unix shell, and it's time to > combine results from several different pipelines. I've been writing shell > scripts with heavy use of awk and grep to make big text files, but I'm > thinking it would be better to have all my data in one big structure in R > so that I can query whatever attributes I like, and print several > corresponding tables to separate files. > > I haven't used R in years, so I was hoping somebody might be able to > suggest a solution or combinatin of functions that could help me get > oriented.. > > Right now, I can import my data into a data frame that looks like this: > > df <- > data.frame(case=c("case_1","case_1","case_2","case_3"),gene=c("gene1","gene1","gene1","gene2"),issue=c("nsyn","amp","del","UTR")) >> df > case gene issue > 1 case_1 gene1 nsyn > 2 case_1 gene1 amp > 3 case_2 gene1 del > 4 case_3 gene2 UTR > > > I'd like to cook up some combination of functions/scripting that can > convert a table like df to produce a list or a data frame/ matrix that > looks like df2: > >> df2 > case_1 case_2 case_3 > gene1 nsyn,amp del 0 > gene2 0 0 UTR > > I can build df2 manually, like this: > df2 > <-data.frame(case_1=c("nsyn,amp","0"),case_2=c("del","0"),case_3=c("0","UTR")) > rownames(df2)<-c("gene1","gene2")
Factors will be a hassle: df <- data.frame(case=c("case_1","case_1","case_2","case_3"), gene=c("gene1","gene1","gene1","gene2"), issue=c("nsyn","amp","del","UTR"), stringsAsFactors=FALSE) df with( df, matrix( tapply(issue, list(gene, case), list) , nrow=length(unique(gene)),ncol=length(unique(case)) ) ) [,1] [,2] [,3] [1,] Character,2 "del" NA [2,] NA NA "UTR" > dmat[1,1] [[1]] [1] "nsyn" "amp" > as.data.frame(dmat) V1 V2 V3 1 nsyn, amp del NA 2 NA NA UTR > > but obviously do not want to do this by hand; I want R to generate df2 from > df. > > Any pointers/ideas would be most welcome! > > Thanks, > Jonathan > > [[alternative HTML version deleted]] R is a plain text mailing list. Old school, admittedly, but much better for coding questions. Surely an awk user can appreciate the wisdom of that request? -- David Winsemius Alameda, CA, USA ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.