Thanks Jim, I acted on your suggestion and found the result unchanged. :-( Then I noticed that fitdist doesn't like a sample size of 1 either.
If, then, "drop = TRUE" results in all empty combinations of m_id, year and week being excluded, then (noticing the requirement is actually that the sample size be greater than 1), I can only conclude that at least one of the samples has only 1 record. I hadn't realized that some of the subsamples were that small. In my reply to Erik, I wrote: But that is too small. Is there a way to allow the above code to apply > fitdist only if the sample size of a given subsample is greater than, say, > 100? Even better, is there a way to make the split more dynamic, so that it > groups a given m_id's data by month if the average weekly subsample size is > less than 100, or by day if the average weekly subsample is greater than > 1000? > Thanks Ted On Mon, Jul 12, 2010 at 4:02 PM, jim holtman <jholt...@gmail.com> wrote: > try 'drop=TRUE' on the split function call. This will prevent the > NULL set from being sent to the function. > > On Mon, Jul 12, 2010 at 3:10 PM, Ted Byers <r.ted.by...@gmail.com> wrote: > > >From the documentation I have found, it seems that one of the functions > from > > package plyr, or a combination of functions like split and lapply would > > allow me to have a really short R script to analyze all my data (I have > > reduced it to a couple hundred thousand records with about half a dozen > > records. > > > > I get the same result from ddply and split/lapply: > > > >> ddply(moreinfo,c("m_id","sale_year","sale_week"), > >> + function(df) data.frame(res = fitdist(df$elapsed_time,"exp"),est > = > >> res$estimate,sd = res$sd)) > >> Error in fitdist(df$elapsed_time, "exp") : > >> data must be a numeric vector of length greater than 1 > >> > > > > and > > > >> > >> > lapply(split(moreinfo,list(moreinfo$m_id,moreinfo$sale_year,moreinfo$sale_week)), > >> + function(df) fitdist(df$elapsed_time,"exp")) > >> Error in fitdist(df$elapsed_time, "exp") : > >> data must be a numeric vector of length greater than 1 > >> > > > > Now, in retrospect, unless I misunderstood the properties of a > data.frame, I > > suppose a data.frame might not have been entirely appropriate as the m_id > > samples start and end on very different dates, but I would have thought a > > list data structure should have been able to handle that. It would seem > > that split is making groups that have the same start and end dates (or > that > > if, for example, I have sale data for precisely the last year, split > would > > insist on both 2009 and 2010 having weeks from 0 through 52 instead of > just > > the weeks in each year that actually have data: 26 through 52 for last > year > > and 1 through 25 for this year). I don't see how else the data passed to > > fitdist could have a sample size of 0. > > > > I'd appreciate understanding how to resolve this. However, it isn't s > show > > stopper as it now seems trivial to just break it out into a loop > (followed > > by a lapply/split combo using only sale year and sale month). > > > > While I am asking, is there a better way to split such temporally ordered > > data into weekly samples that respective the year in which the sample is > > taken as well as the week in which it is taken? > > > > Thanks > > > > Ted > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > > > > -- > Jim Holtman > Cincinnati, OH > +1 513 646 9390 > > What is the problem that you are trying to solve? > -- R.E.(Ted) Byers, Ph.D.,Ed.D. t...@merchantservicecorp.com CTO Merchant Services Corp. 350 Harry Walker Parkway North, Suite 8 Newmarket, Ontario L3Y 8L3 [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.