Re: [R] Another quantmod question

Jeff Ryan Sun, 08 May 2011 20:25:16 -0700

Hi Russ,

We're of course getting into some incredibly fine-level detail on how all of
this works.  I'll try and explain issues as I recall them over the
development of xts and cbind.xts

xts started as an extension of zoo.  zoo is an extension of 'ts' (greatly
simplified comparison of course, but stay with me)

Achim and Gabor have put tremendous effort into the design of zoo - with a
primary focus on keeping it consistent with base R behavior.  That is, try
not to introduce unnecessary changes to the interface an R user is
accustomed to.  The logic being that this makes for a more consistent
interface as well as a easier learning curve and hence greater/faster
adoption rate.

'xts' extends this, though with a bit more flexibility in terms of
consistency.  Why? Simply put - some things about R annoyed me coming from a
time-series background.  Number one was the fact that lag() is backwards.
 Backwards from expectation, nearly all literature, and all standard
definitions.  So xts breaks with lag(, n=1) behavior.  This is obviously
confusing to some - but was the gamble I was willing to take - consistency
(with R) be damned! ;-)

So, now back to cbind.  cbind and merge in zoo-land (and xts by extension)
are the same. This isn't the case for other classes that use these - but
that is 'allowable' and 'expected' under a class dispatch system.  The docs
for ?cbind state:

For cbind (rbind) the column (row) names are taken from the
     colnames (rownames) of the arguments if these are matrix-like.
     Otherwise from the names of the arguments or where those are not
     supplied and deparse.level > 0, by deparsing the expressions
     given, for deparse.level = 1 only if that gives a sensible name
     (a symbol, see is.symbol).

Based on that, I'd argue that xts does it "right". Of course I'll also point
out that this is incorrect thinking as well - since this is a description
for the generic - and not for xts.  But again in a highly configurable
object/class system, where you start to make a distinction of right and
wrong is itself up for debate.

At the other end of the argument spectrum is _why not_.  That is, why can't
cbind.xts handle the names to replace the colnames of objects passed in.
 Here is where I'll point out that I am really just going by memory.

Three major items are involved in cbind.  One is that dispatch is quite
unlike nearly every other dispatch in R.  This is a fact - nothing to do
with xts.

*  cbind isn't a generic (it's an .Internal call)
*  it uses ...
*  cbind can be called in numerous ways (I'll list only the common ones -
but with R you can do even crazier things)

   do.call(cbind,
   do.call(cbind.xts,
   cbind,
   cbind.xts,
   merge,
   merge.xts,
   do.call(merge,
   do.call(merge.xts

The rules of dispatch on cbind are really at a level that R-help has no
business discussing.  The second part is where things actually get tricky
though.  They all behave differently with respect to how args are handled -
when eval'd, etc.

I'm sure you have read how R strains itself on 'big data'.  This is true and
false.  Improper use (or just naive use) can cause object copies in places
you really don't want.  Much of xts at this point is implemented in custom C
code. The gain here is that you can make it eas(ier) to avoid copies until
you need them by writing in C.  Obvious, but needs to be said.

To figure out what the columns have - and if names are attached to the
objects in the pairlist (the "..." in this context) - you have to be very
careful.  Touch anything in the wrong place or wrong time and you lose a
figurative arm and leg to memory copies.  So, in 99.9999% of cases - where
you aren't naming (which would be an extra feature above and beyond c(olumn)
binding [the reason for cbind] - you run a very real risk of getting nailed
for copies you don't want.  On 10MM obs that is almost manageable. On 100's
of millions or billions - it is kill -9 time.

To compound the issue - recall all of those different dispatch methods.  Yep
- they all behave just a bit differently.  How?  Honestly - I don't know or
care.  I simply know you can't easily make the behavior consistent amongst
those calls.  I have tried. And tried.

End of day, and a very long R-help email, xts is different than base R.  It
is even different than it's 'parent' zoo behavior.  But in exchange for this
difference (and bit of learning/adjustment) you get a class that is faster
than anything else.

Period.

> x <- .xts(1:1e7, 1:1e7)  # our time series object
> m <- coredata(x)  # a matrix

> str(x)
An xts object from 1969-12-31 18:00:01 to 1970-04-26 12:46:40 containing:
  Data: int [1:10000000, 1] 1 2 3 4 5 6 7 8 9 10 ...
  Indexed by objects of class: [POSIXt,POSIXct] TZ: America/Chicago
  xts Attributes:
 NULL

> str(m)
 int [1:10000000, 1] 1 2 3 4 5 6 7 8 9 10 ...

> system.time(x[,1])  # get the first column
   user  system elapsed
  0.017   0.000   0.017
> system.time(m[,1])  # ditto
   user  system elapsed
  0.152   0.000   0.153

Yep, nearly 10x faster than a matrix op - AND you still have the time index.
To get there you need to sometimes make sacrifices.  xts does, though I like
to think they are well thought out and consistent*

*enough ;-)

Best,
Jeff

On Sun, May 8, 2011 at 8:57 PM, Joshua Ulrich <josh.m.ulr...@gmail.com>wrote:

> Russ,
>
> On May 8, 2011 6:29 PM, "Russ Abbott" <russ.abb...@gmail.com> wrote:
> >
> > Hi Jeff,
> >
> > The xts class has some very nice features, and you have done a valuable
> > service in developing it.
> >
> > My primary frustration is how difficult it seems to be to find out what
> went
> > wrong when my code doesn't work.  I've been writing quite sophisticated
> code
> > for a fairly long time. It's not that I'm new to software development.
> >
> > The column name rule is a good example.  I'm willing to live with the
> rule
> > that column names are not changed for efficiency sake.  What's difficult
> for
> > me is that I never saw that rule anywhere before.  Of course, I'm not an
> R
> > expect. I've been using it for only a couple of months. But still, I
> would
> > have expected to run into a rule like that.
> >
> > Worse, since the rule is in conflict with the explicit intent of
> cbind--one
> > can name columns when using cbind; in fact the examples illustrate how to
> do
> > it--it would really be nice of cbind would issue a warning when one
> attempts
> > to rename a column in violation of that rule.  Instead, cbind is silent,
> > giving no hint about what went wrong.
> >
> Naming columns is not the explicit intent of cbind.  The explicit
> intent is to combine objects by columns.  Please don't overstate the
> case.
>
> While the examples for the generic show naming columns, neither
> ?cbind.zoo or ?cbind.xts have such examples.  That's a hint.
>
> > It's those sorts of things that have caused me much frustration. And it's
> > these sorts of things that seem pervasive in R.  One never knows what one
> is
> > dealing with. Did something not work because there is a special case rule
> > that I haven't heard of? Did it not work because a special convenience
> was
> > programmed into a function in a way that conflicted with normal use?
>  Since
> > these sorts of things seem to come up so often, I find myself feeling
> that
> > there is no good way to track down problems, which leads to a sense of
> > helplessness and confusion. That's not what one wants in a programming
> > language.
> >
> If that's not what one wants, one can always write their own
> programming language.
>
> Seriously, it seems like you want to rant more than understand what's
> going on.  You have the R and xts help pages and the source code.  The
> "Note" section of help(cbind) tells you that the method dispatch is
> different.  It even tells you what R source file to look at to see how
> dispatching is done.  Compare the relevant source files from
> base::cbind and xts::cbind.xts, look at the "R Language Definition"
> manual to see how method dispatch is normally done.
>
> But you've been writing quite sophisticated code for a fairly long
> time, so I'm not telling you anything you don't know... you just don't
> think you should have to do the legwork.
>
> > -- Russ
> >
> >
>
> --
> Joshua Ulrich  |  FOSS Trading: www.fosstrading.com
>
>
>
> > On Sun, May 8, 2011 at 2:42 PM, Jeff Ryan <jeff.a.r...@gmail.com> wrote:
> >
> > > Hi Russ,
> > >
> > > Colnames don't get rewritten if they already exist. The reason is due
> to
> > > performance and how cbind is written at the R level.
> > >
> > > It isn't perfect per se, but the complexity and variety of dispatch
> that
> > > can take place for cbind in R, as it isn't a generic, is quite
> challenging
> > > to get to behave as one may hope.  After years of trying I'd say it is
> > > nearly impossible to do what you want without causing horrible memory
> issues
> > > on non trivial objects they are use in production systems **using** xts
> on
> > > objects with billions of rows.  Your simple case that has a simple
> > > workaround would cost everyone using in the other 99.999% of cases to
> pay a
> > > recurring cost that isn't tolerable.
> > >
> > > If this is frustrating to you you should stop using the class.
> > >
> > > Jeff
> > >
> > > Jeffrey Ryan    |    Founder    |     <jeffrey.r...@lemnica.com>
> > > jeffrey.r...@lemnica.com
> > >
> > > www.lemnica.com
> > >
> > > On May 8, 2011, at 2:07 PM, Russ Abbott <russ.abb...@gmail.com> wrote:
> > >
> > > I'm having troubles with the names of columns.
> > >
> > > quantmod deal with stock quotes.  I've created an array of the first 5
> > > closing prices from Jan 2007. (Is there a problem that the name is the
> same
> > > as the variable name? There shouldn't be.)
> > >
> > > > close
> > >
> > >              close
> > >
> > > 2007-01-03 1416.60
> > >
> > > 2007-01-04 1418.34
> > >
> > > 2007-01-05 1409.71
> > >
> > > 2007-01-08 1412.84
> > >
> > > 2007-01-09 1412.11
> > >
> > >
> > > When I try to create a more complex array by adding columns, the names
> get
> > > fouled up.  Here's a simple example.
> > >
> > > > cbind(changed.close = close+1, zero = 0, close)
> > >
> > >              close zero close.1
> > >
> > > 2007-01-03 1417.60    0 1416.60
> > >
> > > 2007-01-04 1419.34    0 1418.34
> > >
> > > 2007-01-05 1410.71    0 1409.71
> > >
> > > 2007-01-08 1413.84    0 1412.84
> > >
> > > 2007-01-09 1413.11    0 1412.11
> > >
> > >
> > > The first column should be called "changed.close", but it's called
> "close".
> > > The second column has the right name. The third column should be called
> > > "close" but it's called "close.1". Why is that? Am I missing something?
> > >
> > > If I change the order of the columns and let close have its original
> name,
> > > there is still a problem.
> > >
> > > > cbind(close, zero = 0, changed.close = close+1)
> > >
> > >              close zero close.1
> > >
> > > 2007-01-03 1416.60    0 1417.60
> > >
> > > 2007-01-04 1418.34    0 1419.34
> > >
> > > 2007-01-05 1409.71    0 1410.71
> > >
> > > 2007-01-08 1412.84    0 1413.84
> > >
> > > 2007-01-09 1412.11    0 1413.11
> > >
> > >
> > > Now the names on the first two columns are ok, but the third column is
> > > still wrong. Again, why is that?  Apparently it's not letting me assign
> a
> > > name to a column that comes from something that already has a name.  Is
> that
> > > the way it should be?
> > >
> > > I don't get that same problem on a simpler example.
> > >
> > >
> > > > IX <- cbind(I=0, X=(1:3))
> > >
> > >  > IX
> > >
> > >      I X
> > >
> > > [1,] 0 1
> > >
> > > [2,] 0 2
> > >
> > > [3,] 0 3
> > >
> > > > cbind(Y = 1, Z = IX[, "I"], W = IX[, "X"])
> > >
> > >      Y Z W
> > >
> > > [1,] 1 0 1
> > >
> > > [2,] 1 0 2
> > >
> > > [3,] 1 0 3
> > >
> > >
> > > Is this a peculiarity to xts objects?
> > >
> > > Thanks.
> > >
> > > *-- Russ *
> > > *
> > > *
> > > P.S. Once again I feel frustrated because it's taken me far more time
> than
> > > it deserves to track down and characterize this problem. I can fix it
> by
> > > using the names function. But I shouldn't have to do that.
> > >
> > >
> >
> >        [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
>

-- 
Jeffrey Ryan
jeffrey.r...@lemnica.com

www.lemnica.com

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Another quantmod question

Reply via email to