Yes, that works beautifully on both the test dataset and my real dataset. This was exactly what I was looking for. Thank you!
/ Mia On May 21, 2010, at 6:10 PM, William Dunlap wrote: > >> -----Original Message----- >> From: r-help-boun...@r-project.org >> [mailto:r-help-boun...@r-project.org] On Behalf Of Mia Bengtsson >> Sent: Friday, May 21, 2010 3:39 AM >> To: Dennis Murphy; Henrique Dallazuanna >> Cc: r-help@r-project.org >> Subject: Re: [R] reshaping data >> >> Thank you Dennis and Henrique for your help! >> >> Both solutions work! I just need to find a way of removing >> the empty "cells" from the final "long" dataframe since they >> are not NAs. >> >> Maybe there is an easier way of doing this of the data is not >> treated as a dataframe? The original data file that is >> derived from another program (mothur) is a textfile with the >> following format: >> >> red \t A,B,C >> green \t D >> blue \t E,F >> >> The first column "species" is separated from the >> "sequences"(A, B, C...) with tab, and then the "sequences" >> are separated from each other with commas. >> >> I imported into R as what I thought was a dataframe using: >> >> test1<-readLines("path/test") >> test2<-gsub(pattern= "\t", otu, replacement=",") >> test3<-textConnection(test2) >> test.df<-read.csv(test3, header=F) >> >> Should I rather have imported it as something else if I want >> to reshape it into a list as described previously? > > Does the following do what you want, where my "txt" should > resemble the output of your test1, the output of > readLines("path/test")? > >> txt <- c("red \t A,B,C", "green \t D", "blue \t E,F") >> f <- function (textLines) { > tmp <- strsplit(textLines, " *\t *") > letters <- strsplit(vapply(tmp, FUN = `[`, 2, FUN.VALUE = ""), > ",") > numLetters <- vapply(letters, FUN = length, FUN.VALUE = 0L) > data.frame(Species = rep(vapply(tmp, FUN = `[`, 1, FUN.VALUE = ""), > numLetters), Letter = unlist(letters)) > } >> f(txt) > Species Letter > 1 red A > 2 red B > 3 red C > 4 green D > 5 blue E > 6 blue F > > vapply() is new in R 2.11.? and is like sapply but lets > you specify what the return value of FUN is expected to > be. Thus it gives you some error checking, saves some > time over sapply, and works nicely when the length of the > input is 0. If you don't have 2.11 replace with by sapply > and remove the FUN.VALUE argument. > > Bill Dunlap > Spotfire, TIBCO Software > wdunlap tibco.com >> >> Thanks a million! >> >> / Mia Bengtsson >> >> >> On May 21, 2010, at 2:15 AM, Dennis Murphy wrote: >> >>> Hi: >>> >>> >>> On Thu, May 20, 2010 at 10:13 AM, Mia Bengtsson >> <mia.bengts...@bio.uib.no> wrote: >>> Hello, >>> >>> I am a relatively new R-user who has a lot to learn. I have >> a large dataset that is in the following dataframe format: >>> >>> red A B C >>> green D >>> blue E F >>> >>> This isn't a data frame in R - if it were, it would have NA >> (or at least ""/" "padding at the end of each row. >>> Data frames are not ragged arrays. To have this type of >> structure in R, the data would have to be in a list. >>> >>> This matters because Henrique's solution with reshape() >> assumes a data frame as input. A similar solution >>> would be to use melt() in the reshape package, something like >>> >>> library(reshape) >>> longdf <- melt(yourdf, id.var = 'species') >>> longdf >>> >>> If you have NA padding, the way to get rid of them in the >> reshaped data frame is (with the above approach) >>> >>> longdf[!is.na(longdf$value), -longdf$variable] >>> >>> If the padding is with blanks, then Henrique's solution >> works here, too. >>> >>> HTH, >>> Dennis >>> >>> >>> Where red, green and blue are "species" names and A, B and >> C are observations (corresponding to DNA sequences). Each >> observation can only belong to one species. I would like to >> list the observations in one column, with the species they >> belong to in the next. Like this: >>> >>> A red >>> B red >>> C red >>> D green >>> E blue >>> F blue >>> >>> I have tried using reshape() and stack() but I cannot get >> my head around it. Any help is highly appreciated! >>> >>> Thanks in advance, >>> __________________________________ >>> >>> Mia Bengtsson, PhD-student >>> Department of Biology >>> University of Bergen >>> +47 55584715 >>> +47 97413634 >>> mia.bengts...@bio.uib.no >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.