Something like the following? AllMAFs <- list()
for (i in length(SeparatedGroupsofmealsCombs) { AllMAFs[[i]] <- apply(SeparatedGroupsofmealsCombs[[i]], 2, function(x)maf(tabulate(x+1))) } (untested, of course) Also the solution is a bit generic since I don't know what the output of maf() looks like in your case, and I don't understand why you use tabulate because I would have assumed that's what maf() does - but that's not for me to worry about :-) B. > On Nov 17, 2017, at 7:15 PM, Allaisone 1 <allaiso...@hotmail.com> wrote: > > > Thanks Boris , this was very helpful but I'm struggling with the last part. > > 1) I combined the first 2 columns :- > > > library(tidyr) > SingleMealsCode <-unite(MyData, MealsCombinations, c(MealA, MealB), > remove=FALSE) > SingleMealsCode <- SingleMealsCode[,-2] > > 2) I separated this dataframe into different dataframes based on > "MealsCombination" > column so R will recognize each meal combination separately : > > SeparatedGroupsofmealsCombs <- > split(SingleMealCode,SingleMealCode$MealsCombinations) > > after investigating the structure of "SeparatedGroupsofmealsCombs" , I can > see > a list of different databases, each of which represents a different Meal > combinations which is great. > > No, I'm struggling with the last part, how can I run the maf code for all > dataframes? > > when I run this code as before :- > > maf <- apply(SeparatedGroupsofmealsCombs, 2, function(x)maf(tabulate(x+1))) > > an error message says : dim(X) must have a positive length . I'm not sure > which length > I need to specify.. any suggestions to correct this syntax ? > > Regards > Allaisone > From: Boris Steipe <boris.ste...@utoronto.ca> > Sent: 17 November 2017 21:12:06 > To: Allaisone 1 > Cc: R-help > Subject: Re: [R] Complicated analysis for huge databases > > Combine columns 1 and 2 into a column with a single ID like "33.55", "44.66" > and use split() on these IDs to break up your dataset. Iterate over the list > of data frames split() returns. > > > B. > > > On Nov 17, 2017, at 12:59 PM, Allaisone 1 <allaiso...@hotmail.com> wrote: > > > > > > Hi all .., > > > > > > I have a large dataset of around 600,000 rows and 600 columns. The first > > col is codes for Meal A, the second columns is codes for Meal B. The third > > column is customers IDs where each customer had a combination of meals. > > Each column of the rest columns contains values 0,1,or 2. The dataset is > > organised in a way so that the first group of customers had similar meals > > combinations, this is followed by another group of customers with similar > > meals combinations but different from the first group and so on. The > > dataset looks like this :- > > > > > >> MyData > > > > Meal A Meal B Cust.ID I II III IV > > ...... 600 > > > > 1 33 55 1 0 1 2 > > 0 > > > > 2 33 55 3 1 0 2 > > 2 > > > > 3 33 55 5 2 1 1 > > 2 > > > > 4 44 66 7 0 2 > > 2 2 > > > > 5 44 66 4 1 1 > > 0 1 > > > > 6 44 66 9 2 0 > > 1 2 > > > > . > > > > . > > > > 600,000 > > > > > > > > I wanted to find maf() for each column(from 4 to 600) after calculating the > > frequency of the 3 values (0,1,2) but this should be done group by group > > (i.e. group(33-55) : rows 1:3 then group(44-66) :rows 4:6 and so on). > > > > > > I can do the analysis for the entire column but not group by group like > > this : > > > > > > MAF <- apply(MyData[,4:600], 2, function(x)maf(tabulate(x+1))) > > > > How can I modify this code to tell R to do the analysis group by group for > > each column so I get maf value for 33-55 group of clolumn I, then maf value > > for group 44-66 in the same column I,then the rest of groups in this column > > and do the same for the remaining columns. > > > > In fact, I'm interested in doing this analysis for only 300 columns but all > > of the 600 columns. > > I have another sheet contains names of columns of interest like this : > > > >> ColOfinterest > > > > Col > > I > > IV > > V > > . > > . > > 300 > > > > Any one would help with the best combination of syntax to perform this > > complex analysis? > > > > Regards > > Allaisone > > > > > > > > > > > > > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.