[Rd] key-value mapping in C inside R?
I'm testing some modifications in arima.c. I've noticed that a big internal array of double (rbar) is usually sparse and I'd like to add an option to store it as key-value mapping. Is there a library function or some other approach already used inside the R core for key-value mappings? Thanks, Matteo Bertini __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] key-value mapping in C inside R?
Il giorno 30/dic/2010, alle ore 16.03, Simon Urbanek ha scritto: > On Dec 30, 2010, at 7:50 AM, Matteo Bertini wrote: > >> I'm testing some modifications in arima.c. >> I've noticed that a big internal array of double (rbar) is usually sparse >> and I'd like to add an option to store it as key-value mapping. >> >> Is there a library function or some other approach already used inside the R >> core for key-value mappings? >> > > environment created with hash=TRUE, e.g.: > e = new.env(TRUE, emptyenv()) > e[["foo"]] = "bar" > > You can even set "size" with the expected size if you know it in advance. Interesting, but is it possible to use this from the C code? Thanks, Matteo __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] key-value mapping in C inside R?
On Thu, Dec 30, 2010 at 9:34 PM, Hadley Wickham wrote: > Why not use a sparse Matrix package? > Hadley In the original arima.c code the matrix is represented as an array of double. I'd like to add a minimal overhead/changes on the code. I think I'll use a simple sorted linked list to store the key-value map. It will be enough for testing and I'll use something better if needed. Thanks, Matteo Bertini __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] stats/arima.c memory allocation
Looking at the arima.c code related to arima fitting I noticed that the code is mainly a merge of: - Gardner, G, Harvey, A. C. and Phillips, G. D. A. (1980) Algorithm AS154. An algorithm for exact maximum likelihood estimation of autoregressive-moving average models by means of Kalman filtering. Applied Statistics 29, 311322. - Jones, R. H. (1980) Maximum likelihood fitting of ARMA models to time series with missing observations. Technometrics 20 389395. The first is used to fit the initial P0 matrix, and the second to do the forecasts. The AS154 implementation of P0 computation is O(r^4/8) in memory requirements, where r is roughly the period length. This is the origin of the ugly: src/library/stats/src/arima.c:838:if(r > 350) error(_("maximum supported lag is 350")); I noted on the same AS154 paper that the initial P0 verify this equation: P0 = T P0 T' + R R' So I modified the arima.c code to find iteratively the solution of this equation (starting from P0 = I) The resulting code finds a solution very similar to the one of the original code in a fraction of the occupied memory and in a time that is similar for small lags and faster for bigger lags (without the r<350 limit). Here the modified code: https://gist.github.com/911292 The question is, there are theoretical guarantees that the iterative solution is the right solution? Some hints/directions/books? Matteo Bertini [[alternative HTML version deleted]] __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] maximum supported lag is 350
I have found a strange error: > fit = Arima(flow, c(1,0,1), list(order=c(0,1,1), period=96*7)) Error in makeARIMA(trarma[[1L]], trarma[[2L]], Delta, kappa) : maximum supported lag is 350 Is in fact quite common to have a lag > 350 using a weekly period in 15min steps = 672 (standard in traffic flow prediction litterature for example). Is there any reasons against doing what is suggested in the comment and avoid the hardcoded limit? Thanks, Matteo Bertini # https://svn.r-project.org/R/trunk/src/library/stats/src/arima.c SEXP getQ0(SEXP sPhi, SEXP sTheta) { [8<] /* This is the limit using an int index. We could use size_t and get more on a 64-bit system, but there seems no practical need. */ if(r > 350) error(_("maximum supported lag is 350")); __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] maximum supported lag is 350
Il 22-07-2009 16:26, Matteo Bertini ha scritto: I have found a strange error: > fit = Arima(flow, c(1,0,1), list(order=c(0,1,1), period=96*7)) Error in makeARIMA(trarma[[1L]], trarma[[2L]], Delta, kappa) : maximum supported lag is 350 Is in fact quite common to have a lag > 350 using a weekly period in 15min steps = 672 (standard in traffic flow prediction litterature for example). Is there any reasons against doing what is suggested in the comment and avoid the hardcoded limit? Thanks, Matteo Bertini # https://svn.r-project.org/R/trunk/src/library/stats/src/arima.c SEXP getQ0(SEXP sPhi, SEXP sTheta) { [8<] /* This is the limit using an int index. We could use size_t and get more on a 64-bit system, but there seems no practical need. */ if(r > 350) error(_("maximum supported lag is 350")); Can someone review this analysis? I can propose a patch to fix the problem if it is ok. As I can see from the code: /* NB: nrbar could overflow */ int r = max(p, q + 1); size_t np = r * (r + 1) / 2, nrbar = np * (np - 1) / 2; where we want nrbar <= SIZE_MAX/2 considering that: np = r * (r + 1) / 2 < (r + 1)^2 / 2 = r2 and nrbar = np * (np - 1) / 2 < np^2 / 2 < r2^2 / 2 we have a protective: nrbar < (r + 1)^4 / 8 and we can check for: (r + 1)^4 / 8 <= SIZE_MAX/2 mat...@zarathustra:~/Documents/src$ cat size_max.c #include #include #include #define MAX_LAG (int)(sqrt(sqrt(8)) * sqrt(sqrt(SIZE_MAX/2)) - 1) int main(void) { int r; size_t np, nrbar; printf("MAX_LAG = %d\n", MAX_LAG); for (r=MAX_LAG-5; rbut perhaps I was missing some detail given the overflow occurs at SIZE_MAX/2 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel