Hi Allaisone, I took a slightly different approach but you might find this either as or more useful than your approach, or at least a start on the path to a solution you need.
df1 <- data.frame(CustId=c(1,1,1,2,3,3,4,4,4),DietType=c("a","c","b","f","a","j","c","c","f"), stringsAsFactors=FALSE) custs <- unique(df1$CustId) dtype <- unique(df1$DietType) nc <- length(custs) nd <- length(dtype) df2 <- as.data.frame( matrix(rep(0,nc*(nd+1)),nrow=nc), stringsAsFactors=FALSE) colnames(df2) <- c("CustId",dtype[order(dtype)]) df2$CustId <- custs[ order(custs) ] for ( i in 1:nrow(df1) ) { iRow <- match(df1$CustId[i],df2$CustId) iCol <- match(df1$DietType[i],colnames(df2)) df2[ iRow, iCol ] <- df2[ iRow, iCol] + 1 } > df2 # CustId a b c f j # 1 1 1 1 1 0 0 # 2 2 0 0 0 0 0 # 3 3 1 0 0 0 1 # 4 4 0 0 2 1 0 The dataframe df2 will have a column for the CustId and one column for each unique diet type. Each row is a unique customerId, and each entry contains the number of times the given diet type occurred for that customer. I hope that helps, Eric On Sun, Feb 25, 2018 at 7:08 PM, Bert Gunter <bgunter.4...@gmail.com> wrote: > I believe you need to spend time with an R tutorial or two: a data frame > (presumably the "table" data structure you describe) can *not* contain > "blanks" -- all columns must be the same length, which means NA's are > filled in as needed. > > Also, 8e^5 * 7e^4 = 5.6e^10, which almost certainly will not fit into any > local version of R (maybe it would in some server version -- others more > knowledgeable should comment on this). > > Cheers, > Bert > > > > Bert Gunter > > "The trouble with having an open mind is that people keep coming along and > sticking things into it." > -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) > > On Sun, Feb 25, 2018 at 4:59 AM, Allaisone 1 <allaiso...@hotmail.com> > wrote: > > > Hi All > > > > I have a datafram which looks like this : > > > > CustomerID DietType > > 1 a > > 1 c > > 1 b > > 2 f > > 2 a > > 3 j > > 4 c > > 4 c > > 4 f > > > > And I would like to reshape this so I can see the list of DietTypes per > > customer in rows instead of columns like this : > > > > > MyDf > > CustomerID DietType DietType DietType > > 1 a c b > > 2 f a > > 3 j > > 4 c c f > > > > I tried many times using melt(),spread (),and dcast () functions but was > > not able to produce the desired table. The best attempt was by typing : > > > > # 1) Adding new column with unique values: > > MyDf $newcol <- c (1:9) > > #2) then : > > NewDf <- dcast (MyDf,CustomerID~newcol,value.var=DietType) > > > > This produces the desired table but with many NA values like this : > > > > CustomerID 1 2 3 4 5 6 7 8 9 > > 1 a c b NA NA NA NA NA NA > > 2 NA NA NA f a NA NA NA NA > > 3 NA NA NA NA NA j NA NA NA > > 4 NA NA NA NA NA NA c c f > > > > As you see, the lette/s indicating DietType move to the right side each > > time we move down leaving many NA values and as my original files is very > > large, I expect that the final output would contain around 800,000 > columns > > and 70,000 rows. This is why my code works with small data but does not > > work with my large file because of memory issue even though I'm using > large > > PC. > > > > What changes I need to do with my code to produce the desired table where > > the list of DietTypes are grouped in rows exactly like the second table > > shown abover? > > > > Regards > > Allaisnoe > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/ > > posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.