[Rd] getParseData() for imaginary numbers

2013-09-18 Thread Yihui Xie
Hi,

The imaginary unit is gone in the 'text' column in the returned data
frame from getParseData(), e.g. in the example below, perhaps the text
should be 1i instead of 1:

> p=parse(text='1i')
> getParseData(p)
  line1 col1 line2 col2 id parent token terminal text
1 11 12  1  2 NUM_CONST TRUE1
2 11 12  2  0  exprFALSE
> sessionInfo()
R version 3.0.1 (2013-05-16)
Platform: x86_64-pc-linux-gnu (64-bit)

locale:
 [1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
 [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
 [5] LC_MONETARY=en_US.UTF-8LC_MESSAGES=en_US.UTF-8
 [7] LC_PAPER=C LC_NAME=C
 [9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

Regards,
Yihui
--
Yihui Xie 
Web: http://yihui.name
Department of Statistics, Iowa State University
2215 Snedecor Hall, Ames, IA

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Design for classes with database connection

2013-09-18 Thread Simon Zehnder
Dear R-Devels,

I am designing right now a package intended to simplify the handling of market 
microstructure data (tick data, order data, etc). As these data is most times 
pretty huge and needs to be reordered quite often (e.g. if several security 
data is batched together or if only a certain time range should be considered) 
- the package needs to handle this. 

Before I start, I would like to mention some facts which made me decide to 
construct an own package instead of using e.g. the packages bigmemory, 
highfrequency, zoo or xts: AFAIK big memory does not provide the opportunity to 
handle data with different types (timestamp, string and numerics) and their 
appropriate sorting, for this task databases offer better tools. Package 
highfrequency is designed to work specifically with a certain data structure 
and the data in market microstructure has much greater versatility. Packages 
zoo and xts offer a lot of versatility but do not offer the data sorting 
ability needed for such big data. 

I would like to get some feedback in regard to my decision and in regard to the 
short design overview following.  
  
My design idea is now:

1. Base the package on S4 classes, with one class that handles data-reading 
from external sources, structuring and reordering. Structuring is done in 
regard to specific data variables, i.e. security ID, company ID, timestamp, 
price, volume (not all have to be provided, but some surely exist on market 
microstructure data). The less important variables are considered as a slot 
@other and are only ordered in regard to the other variables. Something like 
this:

.mmstruct <- setClass('mmstruct', representation(
name= "character",
index   = "array",
N   = "integer",
K   = "integer",
compiD  = "array",
secID   = "array",
tradetime   = "POSIXlt",
flag= "array",
price   = "array",
vol = "array",
other   = "data.frame"))

2. To enable a lightweight ordering function, the class should basically create 
an SQLite database on construction and delete it if 'rm()' is called. 
Throughout its life an object holds the database path and can execute queries 
on the database tables. By this, I can use the table sorting of SQLite (e.g. by 
constructing an index for each important variable). I assume this is faster and 
more efficient than programming something on my own - why reinventing the 
wheel? For this I would use VIRTUAL classes like:

.mmstructBASE   <- setClass('mmstructBASE', representation(
dbName  = "character",
dbTable = "character"))

.mmstructDB <- setClass('mmstructDB', representation(
conn= "SQLiteConnection"),
contains= 
c("mmstructBASE"))

.mmstruct <- setClass('mmstruct', representation(
name= "character",
index   = "array",
N   = "integer",
K   = "integer",
compiD  = "array",
secID   = "array",
tradetime   = "POSIXlt",
price   = "array",
vol = "array",
other   = "data.frame"),
contains = c("mmstructDB"))

The slots in the mistrust class hold then a view (e.g. only the head()) of the 
data or can be used to hold retrieved data from the underlying database. 

3. The workflow would than be something like:   a) User reads in the data from 
an external source and gets a data.frame from it. 

b) This data.frame then can be used to construct an mmstruct object 
from it by formatting the variables and read them into the SQLite database 
constructed. 

c) Given the data structure in the database, the user can sort the data 
by secID, timestamp etc. and can use several algorithms for cleaning the data 
(package-specific not in the database) 

d) Example: The user makes a query to get only price from entries 
compID = "AA" with tradetime < "2012-03-09" or with trade time only first 
trading day in a month. This can then be converted e.g. to a 'ts' object in R 
by coercing 

e) In addition the user can p

[Rd] dbeta may hang R session for very large values of the shape parameters

2013-09-18 Thread Kosmidis, Ioannis
Dear all,

we received a bug report for betareg, that in some cases the optim call in 
betareg.fit would hang the R session and the command cannot be interrupted by 
Ctrl-C…

We narrowed down the problem to the dbeta function which is used for the log 
likelihood evaluation in betareg.fit. 

Particularly, the following command hangs the R session to a 100% CPU usage in 
all systems we tried it (OS X 10.8.4, Debian GNU Linux, Ubuntu 12.04) with 
either R-3.0.1 and with the R-devel version (in all systems I waited 3 minutes 
before I kill R):

## Warning: this will hang the R session
dbeta(0.9, 1e+308, 10)

Furthermore, through a trial and error investigation using the following code

## Warning: this will hang the R session
x <- 0.9
for (i in 0:100) {
 a <- 1e+280*2^i
 b <- 10
 cat("shape1 =", a, "\n")
 cat("shape2 =", b, "\n")
 cat("Beta density", dbeta(x, shape1 = a, shape2 = b), "\n")
 cat("===\n")
}

I noticed that:
* this seems to happen when shape1 is about 1e+308, seemingly irrespective of 
the value of shape2 (run the above with another value of b), and as it appears 
only when x>=0.9 and x < 1 (run the above lines with x <- 0.8 for example 
and everything works as expected). 
* similar problems are encountered for small x values when shape2 is massive.

I am not sure why this happens but it looks deep to me. The temporary fix for 
the purposes of betareg was a hack (a simple if command that returns NA for the 
log likelihood if any shape parameter has values greater than 1e+300 say). 

Nevertheless, I thought that this is an issue worth reporting to R-devel 
(instead of R-help), especially since dbeta may be used within generic 
optimisers and figuring that dbeta is the problem can be hard --- it took us 
some time before we started suspecting dbeta.

Interestingly, this appears to happen close to what R considers infinity. Typing
1.799e+308
into R returns Inf.

I hope the above limited in scope analysis is informative.

Best regards,
Ioannis



-- 
Dr Ioannis Kosmidis
Department of Statistical  Science,
University College,
London, WC1E 6BT, UK
Webpage: http://www.ucl.ac.uk/~ucakiko

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Vignette problem and CRAN policies

2013-09-18 Thread Spencer Graves

Hello, All:


	  The vignette with the sos package used "upquote.sty", required for R 
Journal when it was published in 2009.  Current CRAN policy disallows 
"upquote.sty", and I've so far not found a way to pass "R CMD check" 
with sos without upquote.sty.



	  I changed sos.Rnw per an email exchange with Prof. Ripley without 
solving the problem; see below.  The key error messages (see the results 
of "R CMD build" below) appear to be "sos.tex:16: LaTeX Error: 
Environment article undefined" and " sos.tex:558: LaTeX Error: 
\begin{document} ended by \end{article}."  When the article worked, it 
had bot \begin{document} and \begin{article}, with matching \end 
statements for both.  I've tried commenting out either without success.



	  The current nonworking code is available on R-Forge via anonymous SVN 
checkout using "svn checkout 
svn://svn.r-forge.r-project.org/svnroot/rsitesearch/".  Any suggestions 
on how to fix this would be greatly appreciated.



   Thanks,
   Spencer


## COMPLETE RESULTS FROM R CMD check 


Microsoft Windows [Version 6.1.7600]
Copyright (c) 2009 Microsoft Corporation.  All rights reserved.

C:\Users\sgraves>cd 2013
C:\Users\sgraves\2013>cd R_pkgs
C:\Users\sgraves\2013\R_pkgs>cd sos
C:\Users\sgraves\2013\R_pkgs\sos>cd pkg
C:\Users\sgraves\2013\R_pkgs\sos\pkg>R CMD build sos
* checking for file 'sos/DESCRIPTION' ... OK
* preparing 'sos':
* checking DESCRIPTION meta-information ... OK
* installing the package to re-build vignettes
* creating vignettes ... ERROR
Loading required package: brew

Attaching package: 'sos'

The following object is masked from 'package:utils':

 ?

Loading required package: WriteXLS
Perl found.

The following Perl modules were not found on this system:

Text::CSV_XS

If you have more than one Perl installation, be sure the correct one was 
used he

re.

Otherwise, please install the missing modules. See the package INSTALL 
file for

more information.

Loading required package: RODBC
Warning in odbcUpdate(channel, query, mydata, coldata[m, ], test = test,  :
   character data 'Adrian Baddeley   and 
Rolf Turner

   with substantial contributions of code by
Kasper Klitgaard Berthelsen;Abdollah Jalilian; Marie-Colette van Liesho
ut; Ege Rubak;  Dominic Schuhmacher;and Rasmus 
Waagepetersen.

Additional contributionsby Q.W. Ang;S. Azaele;  C. Beale;
R. Bernhardt;   T. Bendtsen;A. Bevan;   B. Biggerstaff; 
R. Bivan
d;  F. Bonneu;  J. Burgos;  S. Byers;   Y.M. Chang; 
J.B. Che
n;  I. Chernayavsky;Y.C. Chin;  B. Christensen; 
J.-F. Co
eurjolly;   R. Corria Ainslie;  M. de la Cruz;  P. Dalgaard; 
P.J. Dig
gle;P. Donnelly;I. Dryden;  S. Eglen; O. Flores;N. 
Funwi-Gabga;
 A. Gault;   M. Genton;  J. Gilbey;  J. Goldstick; 
  P. Graba
rnik;   C. Graf;J. Franklin;U. Hahn;A. Hardegen; 
M. Herin
g;  M.B. Hansen;M. Hazelton;J. Heikkinen;   K. Hornik; 
R. Ihaka
;   A. Jammalamadaka;   R. John-Chandran;   D. Johnson; 
M. Kuhn;
 J. Laake;   F. Lavancier;   T. Lawrence;R.A. Lamb; 
  J. Lee;

 G.P. Leser; [... truncated]
Warning in odbcUpdate(channel, query, mydata, coldata[m, ], test = test,  :
   character data 'John Fox [aut, cre], Sanford Weisberg [aut], Douglas 
Bates [ct
b], Steve Ellison [ctb], David Firth [ctb], Michael Friendly [ctb], 
Gregor Gorja
nc [ctb], Spencer Graves [ctb], Richard Heiberger [ctb], Rafael 
Laboissiere [ctb
], Georges Monette [ctb], Henric Nilsson [ctb], Derek Ogle [ctb], Brian 
Ripley [
ctb], Achim Zeileis [ctb], R-Core [ctb]' truncated to 255 bytes in 
column 'Autho

r'
Warning in odbcUpdate(channel, query, mydata, coldata[m, ], test = test,  :
   character data 'John Fox [aut, cre], Liviu Andronic [ctb], Michael 
Ash [ctb],
Milan Bouchet-Valat [ctb], Theophilius Boye [ctb], Stefano Calza [ctb], 
Andy Cha
ng [ctb], Philippe Grosjean [ctb], Richard Heiberger [ctb], Kosar Karimi 
Pour [c
tb], G. Jay Kerns [ctb], Renaud Lancelot [ctb], Matthieu Lesnoff [ctb], 
Uwe Ligg
es [ctb], Samir Messad [ctb], Martin Maechler [ctb], Robert Muenchen 
[ctb], Dunc
an Murdoch [ctb], Erich Neuwirth [ctb], Dan Putler [ctb], Brian Ripley 
[ctb], Mi
roslav Ristic [ctb], Peter Wolf [ctb]' truncated to 255 bytes in column 
'Author'


Perl found.

The following Perl modules were not found on this system:

Text::CSV_XS

If you have more than one Perl installation, be sure the correct one was 
used he

re.

Otherwise, please install the missing modules. See the package INSTALL 
file for

more information.

Warning in odbcUpdate(channel, query, mydata, coldata[m, ], test = test,  :
   character data 'Adrian Baddeley   and 
Rolf Turner

   with substantial contributions of code by
Kasper Klitgaard Berthelsen;Abdollah Jalilian; Marie-Colette van Liesho
ut; Ege Rubak;  Dominic Schuhmacher;and Rasmus 
Waagepetersen.

Additional contr