[Rd] getParseData() for imaginary numbers
Hi, The imaginary unit is gone in the 'text' column in the returned data frame from getParseData(), e.g. in the example below, perhaps the text should be 1i instead of 1: > p=parse(text='1i') > getParseData(p) line1 col1 line2 col2 id parent token terminal text 1 11 12 1 2 NUM_CONST TRUE1 2 11 12 2 0 exprFALSE > sessionInfo() R version 3.0.1 (2013-05-16) Platform: x86_64-pc-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=C LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base Regards, Yihui -- Yihui Xie Web: http://yihui.name Department of Statistics, Iowa State University 2215 Snedecor Hall, Ames, IA __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Design for classes with database connection
Dear R-Devels, I am designing right now a package intended to simplify the handling of market microstructure data (tick data, order data, etc). As these data is most times pretty huge and needs to be reordered quite often (e.g. if several security data is batched together or if only a certain time range should be considered) - the package needs to handle this. Before I start, I would like to mention some facts which made me decide to construct an own package instead of using e.g. the packages bigmemory, highfrequency, zoo or xts: AFAIK big memory does not provide the opportunity to handle data with different types (timestamp, string and numerics) and their appropriate sorting, for this task databases offer better tools. Package highfrequency is designed to work specifically with a certain data structure and the data in market microstructure has much greater versatility. Packages zoo and xts offer a lot of versatility but do not offer the data sorting ability needed for such big data. I would like to get some feedback in regard to my decision and in regard to the short design overview following. My design idea is now: 1. Base the package on S4 classes, with one class that handles data-reading from external sources, structuring and reordering. Structuring is done in regard to specific data variables, i.e. security ID, company ID, timestamp, price, volume (not all have to be provided, but some surely exist on market microstructure data). The less important variables are considered as a slot @other and are only ordered in regard to the other variables. Something like this: .mmstruct <- setClass('mmstruct', representation( name= "character", index = "array", N = "integer", K = "integer", compiD = "array", secID = "array", tradetime = "POSIXlt", flag= "array", price = "array", vol = "array", other = "data.frame")) 2. To enable a lightweight ordering function, the class should basically create an SQLite database on construction and delete it if 'rm()' is called. Throughout its life an object holds the database path and can execute queries on the database tables. By this, I can use the table sorting of SQLite (e.g. by constructing an index for each important variable). I assume this is faster and more efficient than programming something on my own - why reinventing the wheel? For this I would use VIRTUAL classes like: .mmstructBASE <- setClass('mmstructBASE', representation( dbName = "character", dbTable = "character")) .mmstructDB <- setClass('mmstructDB', representation( conn= "SQLiteConnection"), contains= c("mmstructBASE")) .mmstruct <- setClass('mmstruct', representation( name= "character", index = "array", N = "integer", K = "integer", compiD = "array", secID = "array", tradetime = "POSIXlt", price = "array", vol = "array", other = "data.frame"), contains = c("mmstructDB")) The slots in the mistrust class hold then a view (e.g. only the head()) of the data or can be used to hold retrieved data from the underlying database. 3. The workflow would than be something like: a) User reads in the data from an external source and gets a data.frame from it. b) This data.frame then can be used to construct an mmstruct object from it by formatting the variables and read them into the SQLite database constructed. c) Given the data structure in the database, the user can sort the data by secID, timestamp etc. and can use several algorithms for cleaning the data (package-specific not in the database) d) Example: The user makes a query to get only price from entries compID = "AA" with tradetime < "2012-03-09" or with trade time only first trading day in a month. This can then be converted e.g. to a 'ts' object in R by coercing e) In addition the user can p
[Rd] dbeta may hang R session for very large values of the shape parameters
Dear all, we received a bug report for betareg, that in some cases the optim call in betareg.fit would hang the R session and the command cannot be interrupted by Ctrl-C… We narrowed down the problem to the dbeta function which is used for the log likelihood evaluation in betareg.fit. Particularly, the following command hangs the R session to a 100% CPU usage in all systems we tried it (OS X 10.8.4, Debian GNU Linux, Ubuntu 12.04) with either R-3.0.1 and with the R-devel version (in all systems I waited 3 minutes before I kill R): ## Warning: this will hang the R session dbeta(0.9, 1e+308, 10) Furthermore, through a trial and error investigation using the following code ## Warning: this will hang the R session x <- 0.9 for (i in 0:100) { a <- 1e+280*2^i b <- 10 cat("shape1 =", a, "\n") cat("shape2 =", b, "\n") cat("Beta density", dbeta(x, shape1 = a, shape2 = b), "\n") cat("===\n") } I noticed that: * this seems to happen when shape1 is about 1e+308, seemingly irrespective of the value of shape2 (run the above with another value of b), and as it appears only when x>=0.9 and x < 1 (run the above lines with x <- 0.8 for example and everything works as expected). * similar problems are encountered for small x values when shape2 is massive. I am not sure why this happens but it looks deep to me. The temporary fix for the purposes of betareg was a hack (a simple if command that returns NA for the log likelihood if any shape parameter has values greater than 1e+300 say). Nevertheless, I thought that this is an issue worth reporting to R-devel (instead of R-help), especially since dbeta may be used within generic optimisers and figuring that dbeta is the problem can be hard --- it took us some time before we started suspecting dbeta. Interestingly, this appears to happen close to what R considers infinity. Typing 1.799e+308 into R returns Inf. I hope the above limited in scope analysis is informative. Best regards, Ioannis -- Dr Ioannis Kosmidis Department of Statistical Science, University College, London, WC1E 6BT, UK Webpage: http://www.ucl.ac.uk/~ucakiko __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Vignette problem and CRAN policies
Hello, All: The vignette with the sos package used "upquote.sty", required for R Journal when it was published in 2009. Current CRAN policy disallows "upquote.sty", and I've so far not found a way to pass "R CMD check" with sos without upquote.sty. I changed sos.Rnw per an email exchange with Prof. Ripley without solving the problem; see below. The key error messages (see the results of "R CMD build" below) appear to be "sos.tex:16: LaTeX Error: Environment article undefined" and " sos.tex:558: LaTeX Error: \begin{document} ended by \end{article}." When the article worked, it had bot \begin{document} and \begin{article}, with matching \end statements for both. I've tried commenting out either without success. The current nonworking code is available on R-Forge via anonymous SVN checkout using "svn checkout svn://svn.r-forge.r-project.org/svnroot/rsitesearch/". Any suggestions on how to fix this would be greatly appreciated. Thanks, Spencer ## COMPLETE RESULTS FROM R CMD check Microsoft Windows [Version 6.1.7600] Copyright (c) 2009 Microsoft Corporation. All rights reserved. C:\Users\sgraves>cd 2013 C:\Users\sgraves\2013>cd R_pkgs C:\Users\sgraves\2013\R_pkgs>cd sos C:\Users\sgraves\2013\R_pkgs\sos>cd pkg C:\Users\sgraves\2013\R_pkgs\sos\pkg>R CMD build sos * checking for file 'sos/DESCRIPTION' ... OK * preparing 'sos': * checking DESCRIPTION meta-information ... OK * installing the package to re-build vignettes * creating vignettes ... ERROR Loading required package: brew Attaching package: 'sos' The following object is masked from 'package:utils': ? Loading required package: WriteXLS Perl found. The following Perl modules were not found on this system: Text::CSV_XS If you have more than one Perl installation, be sure the correct one was used he re. Otherwise, please install the missing modules. See the package INSTALL file for more information. Loading required package: RODBC Warning in odbcUpdate(channel, query, mydata, coldata[m, ], test = test, : character data 'Adrian Baddeley and Rolf Turner with substantial contributions of code by Kasper Klitgaard Berthelsen;Abdollah Jalilian; Marie-Colette van Liesho ut; Ege Rubak; Dominic Schuhmacher;and Rasmus Waagepetersen. Additional contributionsby Q.W. Ang;S. Azaele; C. Beale; R. Bernhardt; T. Bendtsen;A. Bevan; B. Biggerstaff; R. Bivan d; F. Bonneu; J. Burgos; S. Byers; Y.M. Chang; J.B. Che n; I. Chernayavsky;Y.C. Chin; B. Christensen; J.-F. Co eurjolly; R. Corria Ainslie; M. de la Cruz; P. Dalgaard; P.J. Dig gle;P. Donnelly;I. Dryden; S. Eglen; O. Flores;N. Funwi-Gabga; A. Gault; M. Genton; J. Gilbey; J. Goldstick; P. Graba rnik; C. Graf;J. Franklin;U. Hahn;A. Hardegen; M. Herin g; M.B. Hansen;M. Hazelton;J. Heikkinen; K. Hornik; R. Ihaka ; A. Jammalamadaka; R. John-Chandran; D. Johnson; M. Kuhn; J. Laake; F. Lavancier; T. Lawrence;R.A. Lamb; J. Lee; G.P. Leser; [... truncated] Warning in odbcUpdate(channel, query, mydata, coldata[m, ], test = test, : character data 'John Fox [aut, cre], Sanford Weisberg [aut], Douglas Bates [ct b], Steve Ellison [ctb], David Firth [ctb], Michael Friendly [ctb], Gregor Gorja nc [ctb], Spencer Graves [ctb], Richard Heiberger [ctb], Rafael Laboissiere [ctb ], Georges Monette [ctb], Henric Nilsson [ctb], Derek Ogle [ctb], Brian Ripley [ ctb], Achim Zeileis [ctb], R-Core [ctb]' truncated to 255 bytes in column 'Autho r' Warning in odbcUpdate(channel, query, mydata, coldata[m, ], test = test, : character data 'John Fox [aut, cre], Liviu Andronic [ctb], Michael Ash [ctb], Milan Bouchet-Valat [ctb], Theophilius Boye [ctb], Stefano Calza [ctb], Andy Cha ng [ctb], Philippe Grosjean [ctb], Richard Heiberger [ctb], Kosar Karimi Pour [c tb], G. Jay Kerns [ctb], Renaud Lancelot [ctb], Matthieu Lesnoff [ctb], Uwe Ligg es [ctb], Samir Messad [ctb], Martin Maechler [ctb], Robert Muenchen [ctb], Dunc an Murdoch [ctb], Erich Neuwirth [ctb], Dan Putler [ctb], Brian Ripley [ctb], Mi roslav Ristic [ctb], Peter Wolf [ctb]' truncated to 255 bytes in column 'Author' Perl found. The following Perl modules were not found on this system: Text::CSV_XS If you have more than one Perl installation, be sure the correct one was used he re. Otherwise, please install the missing modules. See the package INSTALL file for more information. Warning in odbcUpdate(channel, query, mydata, coldata[m, ], test = test, : character data 'Adrian Baddeley and Rolf Turner with substantial contributions of code by Kasper Klitgaard Berthelsen;Abdollah Jalilian; Marie-Colette van Liesho ut; Ege Rubak; Dominic Schuhmacher;and Rasmus Waagepetersen. Additional contr