Thanks Jim,

I acted on your suggestion and found the result unchanged.  :-(  Then I
noticed that fitdist doesn't like a sample size of 1 either.

If, then, "drop = TRUE" results in all empty combinations of m_id, year and
week being excluded, then (noticing the requirement is actually that the
sample size be greater than 1), I can only conclude that at least one of the
samples has only 1 record.     I hadn't realized that some of the subsamples
were that small.  In my reply to Erik, I wrote:

But that is too small.  Is there a way to allow the above code to apply
> fitdist only if the sample size of a given subsample is greater than, say,
> 100?  Even better, is there a way to make the split more dynamic, so that it
> groups a given m_id's data by month if the average weekly subsample size is
> less than 100, or by day if the average weekly subsample is greater than
> 1000?
>

Thanks

Ted

On Mon, Jul 12, 2010 at 4:02 PM, jim holtman <jholt...@gmail.com> wrote:

> try 'drop=TRUE' on the split function call.  This will prevent the
> NULL set from being sent to the function.
>
> On Mon, Jul 12, 2010 at 3:10 PM, Ted Byers <r.ted.by...@gmail.com> wrote:
> > >From the documentation I have found, it seems that one of the functions
> from
> > package plyr, or a combination of functions like split and lapply would
> > allow me to have a really short R script to analyze all my data (I have
> > reduced it to a couple hundred thousand records with about half a dozen
> > records.
> >
> > I get the same result from ddply and split/lapply:
> >
> >> ddply(moreinfo,c("m_id","sale_year","sale_week"),
> >> +       function(df) data.frame(res = fitdist(df$elapsed_time,"exp"),est
> =
> >> res$estimate,sd = res$sd))
> >> Error in fitdist(df$elapsed_time, "exp") :
> >>   data must be a numeric vector of length greater than 1
> >>
> >
> > and
> >
> >>
> >>
> lapply(split(moreinfo,list(moreinfo$m_id,moreinfo$sale_year,moreinfo$sale_week)),
> >> +       function(df) fitdist(df$elapsed_time,"exp"))
> >> Error in fitdist(df$elapsed_time, "exp") :
> >>   data must be a numeric vector of length greater than 1
> >>
> >
> > Now, in retrospect, unless I misunderstood the properties of a
> data.frame, I
> > suppose a data.frame might not have been entirely appropriate as the m_id
> > samples start and end on very different dates, but I would have thought a
> > list data structure should have been able to handle that.  It would seem
> > that split is making groups that have the same start and end dates (or
> that
> > if, for example, I have sale data for precisely the last year, split
> would
> > insist on both 2009 and 2010 having weeks from 0 through 52 instead of
> just
> > the weeks in each year that actually have data: 26 through 52 for last
> year
> > and 1 through 25 for this year).  I don't see how else the data passed to
> > fitdist could have a sample size of 0.
> >
> > I'd appreciate understanding how to resolve this.  However, it isn't s
> show
> > stopper as it now seems trivial to just break it out into a loop
> (followed
> > by a lapply/split combo using only sale year and sale month).
> >
> > While I am asking, is there a better way to split such temporally ordered
> > data into weekly samples that respective the year in which the sample is
> > taken as well as the week in which it is taken?
> >
> > Thanks
> >
> > Ted
> >
> >        [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>
>
>
> --
> Jim Holtman
> Cincinnati, OH
> +1 513 646 9390
>
> What is the problem that you are trying to solve?
>



-- 
R.E.(Ted) Byers, Ph.D.,Ed.D.
t...@merchantservicecorp.com
CTO
Merchant Services Corp.
350 Harry Walker Parkway North, Suite 8
Newmarket, Ontario
L3Y 8L3

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to