On Oct 23, 2013, at 5:24 PM, David Winsemius wrote: > > On Oct 23, 2013, at 4:36 PM, Jon BR wrote: > >> Hello, >> I've been running several programs in the unix shell, and it's time to >> combine results from several different pipelines. I've been writing shell >> scripts with heavy use of awk and grep to make big text files, but I'm >> thinking it would be better to have all my data in one big structure in R >> so that I can query whatever attributes I like, and print several >> corresponding tables to separate files. >> >> I haven't used R in years, so I was hoping somebody might be able to >> suggest a solution or combinatin of functions that could help me get >> oriented.. >> >> Right now, I can import my data into a data frame that looks like this: >> >> df <- >> data.frame(case=c("case_1","case_1","case_2","case_3"),gene=c("gene1","gene1","gene1","gene2"),issue=c("nsyn","amp","del","UTR")) >>> df >> case gene issue >> 1 case_1 gene1 nsyn >> 2 case_1 gene1 amp >> 3 case_2 gene1 del >> 4 case_3 gene2 UTR >> >> >> I'd like to cook up some combination of functions/scripting that can >> convert a table like df to produce a list or a data frame/ matrix that >> looks like df2: >> >>> df2 >> case_1 case_2 case_3 >> gene1 nsyn,amp del 0 >> gene2 0 0 UTR >> >> I can build df2 manually, like this: >> df2 >> <-data.frame(case_1=c("nsyn,amp","0"),case_2=c("del","0"),case_3=c("0","UTR")) >> rownames(df2)<-c("gene1","gene2") > > Factors will be a hassle: > > df <- > data.frame(case=c("case_1","case_1","case_2","case_3"), > gene=c("gene1","gene1","gene1","gene2"), issue=c("nsyn","amp","del","UTR"), > stringsAsFactors=FALSE)
Note also that stringsAsFactors can be set globally with options as well as during input functions with any of hte cousins of read.table. > df > > with( df, matrix( tapply(issue, list(gene, case), list) , > nrow=length(unique(gene)),ncol=length(unique(case)) ) > ) > > [,1] [,2] [,3] > [1,] Character,2 "del" NA > [2,] NA NA "UTR" > >> dmat[1,1] > [[1]] > [1] "nsyn" "amp" > >> as.data.frame(dmat) > V1 V2 V3 > 1 nsyn, amp del NA > 2 NA NA UTR > It's possible that coming back to R after many years you are not familiar with data.table. It's particularly well suited for large text files. It's syntax with argumets to "[" is quite different. > dt <- data.table(df) # To make a list in each category you would need to supply a "doubly `list`-ed" arguemtn to "j". > dt[ , list(list(issue)), by=c("gene", 'case') ] gene case V1 1: gene1 case_1 nsyn,amp 2: gene1 case_2 del 3: gene2 case_3 UTR > dt[ , list(issue), by=c("gene", 'case') ] gene case issue 1: gene1 case_1 nsyn 2: gene1 case_1 amp 3: gene1 case_2 del 4: gene2 case_3 UTR > >> >> but obviously do not want to do this by hand; I want R to generate df2 from >> df. >> >> Any pointers/ideas would be most welcome! >> >> Thanks, >> Jonathan >> >> [[alternative HTML version deleted]] > > R is a plain text mailing list. Old school, admittedly, but much better for > coding questions. Surely an awk user can appreciate the wisdom of that > request? > > -- > David Winsemius > Alameda, CA, USA > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius Alameda, CA, USA ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.