Re: [R] question about using _apply and/or aggregate functions

David Winsemius Mon, 22 Jun 2009 16:58:10 -0700


On Jun 22, 2009, at 6:16 PM, Clifford Long wrote:

Hi David,

I appreciate the advice.  I had coerced 'list4' to as.list, but forgot
to specify "list=()" in the call to aggregate.  I made the correction,
and now get the following:

slope.mult = simarray[,1]
adj.slope.value = simarray[,2]
adj.slope.level = simarray[,2]
qc.run.violation = simarray[,5]

simarray.part = cbind(slope.mult, adj.slope.value,qc.run.violation, adj.slope.level)

list4 = as.list(simarray.part[,4])
agg.result = aggregate(simarray.part[,3], by=list(list4), FUN = mean)

Error in sort.list(unique.default(x), na.last = TRUE) :
 'x' must be atomic for 'sort.list'
Have you called 'sort' on a list?

... I'm not sure what this means that I've done wrong.  I did check
'list4' using "is.list", and get "TRUE" back as an answer, so feel
that my mistake is some other fundamental aspect of R that I'm failing
to grasp.

To your note on 'tapply' ... I did try this as well (actually, tried
it first) with no initial success.  On your recommendation, I gave
tapply another go, and get something recognizable:

vtt = tapply(simarray.part[,3], simarray.part[,2], mean)

dim(vtt)

[1] 50

length(vtt)

[1] 50

vtt[1:5]

0.003132 0.006264 0.009396 0.012528  0.01566
   0.29     0.24     0.23     0.16     0.22

vtt[1]

0.003132
   0.29

vtt[[1]][1]

[1] 0.29


I see that the output stored in "vtt" has one dimension with
length=50.  But each place in vtt appears to hold two values.

Nope, that's just the output from an implicit print(vtt). vtt is anarray with one row and an associated group of labels. If you doubt me(and I had some trouble with this myself) at that point, tryis.matrix(vtt) and I predict will you get TRUE.

 I'll
admit that I'm not sure how to access/reference the entirety of all 50
values =  0.29  0.24  0.23  0.16  0.22 (and so on).  I don't appear to
be able to access/reference only what appears to be an embedded index
= 0.003132   0.006264   0.009396  etc.   What am I missing?


names(vtt)

 Is there
a reference that I need to re-read?


?tapply

"Value

When FUN is present, tapply calls FUN for each cell that has any datain it. If FUN returns a single atomic value for each such cell (e.g.,functions mean or var) and when simplify is TRUE, tapplyreturns amulti-way array containing the values, and NA for the empty cells. "

I would like to be able to plot
one against the other.


plot(names(vtt), vtt)


Thanks again for taking the time outside of your "day job" for your
earlier reply!

Cliff

On Mon, Jun 22, 2009 at 11:28 AM, David Winsemius<dwinsem...@comcast.net> wrote:


On Jun 22, 2009, at 12:04 PM, Clifford Long wrote:

Hi R-list,

I'll apologize in advance for (1) the wordiness of my note (not sure
how to avoid it) and (2) any deficiencies on my part that lead to my
difficulties.

I have an application with several stages that is meant to simulate
and explore different scenarios with respect to product sales (in
units sold per month). My session info is at the bottom of thisnote.
The steps include (1) an initial linear regression model, (2)buildingan ARIMA model, (3) creating 12 months of simulated 'real' data -for
various values of projected changes in slope from the linear
regression - from the ARIMA model using arima.sim with subsequent
undifferencing and appending to historical data, (3) one-step-ahead
forecasting for 12 months using the 'fixed' arima model andsimulated
data, (4) taking the residuals from the forecasting (simulated minus
forecast for each of the 12 months) and applying QC charting, and(5)
counting the number (if any) of runs-rules violations detected.
The simulation-aspect calculates 100 simulations for each of 50values of
slope.

All of this seems to work fine (although I'm sure that the coding
could be improved through functions, vectorization, etc. ... ).
However, the problem I'm having is at the end where I had hoped that
things would be easier. I want to summarize and graph theprobability
of detecting a runs-rule violation vs. the amount of the shift in
slope (of logunit sales).
The output data array passed from the qcc section at the endincludes:
 - the adjustment made to the slope (a multiplier)
 - the actual value of the slope
 - the iteration number of the simulation loop (within each value of
slope)
 - the count of QC charting limits violations
 - the count of QC charting runs rules violations
My code is in the attached file ("generic_code.R) and my initialsales
data needed to "prime" the simulation is in the other attached file
("generic_data.csv").  The relevant section of my code is at the
bottom of the .R file after the end of the outer loop. I've triedto
provide meaningful comments.
I've read the help files for _apply, aggregate, arrays and datatypes,
and have also consulted with several texts (Maindonald and Braun;
Spector; Venebles and Ripley for S-plus).  Somehow I still don't get
it.  My attempts usually result in a message like the following:
agg.result = aggregate(simarray.part[,3], by=list4, FUN = mean)
Error in FUN(X[[1L]], ...) : arguments must have same length

I cannot comment on the overall strategy, but wondered if thisminor mod to

the code might succeed;

agg.result = aggregate(simarray.part[,3], by=list(list4), FUN =mean)

My personal experience with aggregate() is not positive. Igenerally end upturning to tapply() (which is at the heart of aggregate() anyway)probablybecause I forget to wrap the second argument in a list. Slowlearner, I

guess.

But when I check the length of the arguments, they appear tomatch. (??)

length(simarray.part[,3])


[1] 5000


length(simarray.part[,4])


[1] 5000


length(list4)


[1] 5000


I would have rather passed along a subset of the simulation/loop
output dataset, but was unsure how to save it to preserve whatever
properties that I may have missed that are causing my difficulties.
If anyone still wants to help at this point, I believe that you'll

need to load the .csv data and run the simulation (maybe reducingthe

number of iterations).

Many thanks to anyone who can shed some light on my difficulties
(whether self-induced or otherwise).

Cliff



I'm using a pre-compiled binary version of R for Windows.

Session info:

sessionInfo()


R version 2.9.0 (2009-04-17)
i386-pc-mingw32

locale:
LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

other attached packages:
[1] qcc_1.3         forecast_1.24   tseries_0.10-18 zoo_1.5-5
[5] quadprog_1.4-11

loaded via a namespace (and not attached):
[1] grid_2.9.0      lattice_0.17-22

Sys.getlocale()


[1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United
States.1252;LC_MONETARY=English_United
States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


David Winsemius, MD
Heritage Laboratories
West Hartford, CT


David Winsemius, MD
Heritage Laboratories
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] question about using _apply and/or aggregate functions

Reply via email to