Re: [Rd] Inconsistent behavior for the C AP's R_ParseVector() ?
Le lun. 9 déc. 2019 à 09:57, Tomas Kalibera a écrit : > On 12/9/19 2:54 PM, Laurent Gautier wrote: > > > > Le lun. 9 déc. 2019 à 05:43, Tomas Kalibera a > écrit : > >> On 12/7/19 10:32 PM, Laurent Gautier wrote: >> >> Thanks for the quick response Tomas. >> >> The same error is indeed happening when trying to have a zero-length >> variable name in an environment. The surprising bit is then "why is this >> happening during parsing" (that is why are variables assigned to an >> environment) ? >> >> The emitted R error (in the R console) is not a parse (syntax) error, but >> an error emitted during parsing when the parser tries to intern a name - >> look it up in a symbol table. Empty string is not allowed as a symbol name, >> and hence the error. In the call "list(''=1)" , the empty name is what >> could eventually become a name of a local variable inside list(), even >> though not yet during parsing. >> > > Thanks Tomas. > > I guess this has do with R expressions being lazily evaluated, and names > of arguments in a call are also part of the expression. Now the puzzling > part is why is that at all part of the parsing: I would have expected > R_ParseVector() to be restricted to parsing... Now it feels like > R_ParseVector() is performing parsing, and a first level of evalution for > expressions that "should never work" (the empty name). > > Think of it as an exception in say Python. Some failures during parsing > result in an exception (called error in R and implemented using a long > jump). Any time you are calling into R you can get an error; out of memory > is also signalled as R error. > The surprising bit for me was that I had expected the function to solely perform parsing. I did expect an exception (and a jmp smashing the stack) when the function concerned is in the C-API, is parsing a string, and is using a parameter (pointer) to store whether parsing was a failure or a success. Since you are making a comparison with Python, the distinction I am making between parsing and evaluation seem to apply there. For example: ``` >>> import parser >>> parser.expr('1+') Traceback (most recent call last): File "", line 1, in File "", line 1 1+ ^ SyntaxError: unexpected EOF while parsing >>> p = parser.expr('list(""=1)') >>> p >>> eval(p) Traceback (most recent call last): File "", line 1, in TypeError: eval() arg 1 must be a string, bytes or code object >>> list(""=1) File "", line 1 SyntaxError: keyword can't be an expression ``` > There is probably some error in how the external code is handling R >> errors (Fatal error: unable to initialize the JIT, stack smashing, etc) >> and possibly also how R is initialized before calling ParseVector. Probably >> you would get the same problem when running say "stop('myerror')". Please >> note R errors are implemented as long-jumps, so care has to be taken when >> calling into R, Writing R Extensions has more details (and section 8 >> specifically about embedding R). This is unlike parse (syntax) errors >> signaled via return value to ParseVector() >> > > The issue is that the segfault (because of stack smashing, therefore > because of what also suspected to be an incontrolled jump) is happening > within the execution of R_ParseVector(). I would think that an issue with > the initialization of R is less likely because the project is otherwise > used a fair bit and is well covered by automated continuous tests. > > After looking more into R's gram.c I suspect that an execution context is > required for R_ParseVector() to know to properly work (know where to jump > in case of error) when the parsing code decides to fail outside what it > thinks is a syntax error. If the case, this would make R_ParseVector() > function well when called from say, a C-extension to an R package, but fail > the way I am seeing it fail when called from an embedded R. > > Yes, contexts are used internally to handle errors. For external use > please see Writing R Extensions, section 6.12. > I have wrapped my call to R_ParseVector() in a R_tryCatchError(), and this is seems to help me overcome the issue. Thanks for the pointer. Best, Laurent > Best > Tomas > > > Best, > > Laurent > >> Best, >> Tomas >> >> >> We are otherwise aware that the error is not occurring in the R console, >> but can be traced to a call to R_ParseVector() in R's C API:( >> https://github.com/rpy2/rpy2/blob/master/rpy2/rinterface_lib/_rinterface_capi.py#L509 >> ). >> >> Our specific setup is calling an embedded R from Python, using the cffi >> library. An error on end was the first possibility considered, but the >> puzzling specificity of the error (as shown below other parsing errors are >> handled properly) and the difficulty tracing what is in happening in >> R_ParseVector() made me ask whether someone on this list had a suggestion >> about the possible issue" >> >> ``` >> >> >>> import rpy2.rinterface as ri>>> ri.initr()>>> e = ri.parse("list(''=1+") >> >>> -
Re: [Rd] Inconsistent behavior for the C AP's R_ParseVector() ?
Laurent, the main point here is that ParseVector() just like any other R API has to be called in a correct context since it can raise errors so the issue was that your C code has a bug of not setting R correctly (my guess would be your'e not creating the initial context necessary in embedded R). There are many different errors, your is just one of many that can occur - any R API call that does allocation (and parsing obviously does) can cause errors. Note that this is true for pretty much all R API functions. Cheers, Simon > On Dec 14, 2019, at 11:25 AM, Laurent Gautier wrote: > > Le lun. 9 déc. 2019 à 09:57, Tomas Kalibera a > écrit : > >> On 12/9/19 2:54 PM, Laurent Gautier wrote: >> >> >> >> Le lun. 9 déc. 2019 à 05:43, Tomas Kalibera a >> écrit : >> >>> On 12/7/19 10:32 PM, Laurent Gautier wrote: >>> >>> Thanks for the quick response Tomas. >>> >>> The same error is indeed happening when trying to have a zero-length >>> variable name in an environment. The surprising bit is then "why is this >>> happening during parsing" (that is why are variables assigned to an >>> environment) ? >>> >>> The emitted R error (in the R console) is not a parse (syntax) error, but >>> an error emitted during parsing when the parser tries to intern a name - >>> look it up in a symbol table. Empty string is not allowed as a symbol name, >>> and hence the error. In the call "list(''=1)" , the empty name is what >>> could eventually become a name of a local variable inside list(), even >>> though not yet during parsing. >>> >> >> Thanks Tomas. >> >> I guess this has do with R expressions being lazily evaluated, and names >> of arguments in a call are also part of the expression. Now the puzzling >> part is why is that at all part of the parsing: I would have expected >> R_ParseVector() to be restricted to parsing... Now it feels like >> R_ParseVector() is performing parsing, and a first level of evalution for >> expressions that "should never work" (the empty name). >> >> Think of it as an exception in say Python. Some failures during parsing >> result in an exception (called error in R and implemented using a long >> jump). Any time you are calling into R you can get an error; out of memory >> is also signalled as R error. >> > > > The surprising bit for me was that I had expected the function to solely > perform parsing. I did expect an exception (and a jmp smashing the stack) > when the function concerned is in the C-API, is parsing a string, and is > using a parameter (pointer) to store whether parsing was a failure or a > success. > > Since you are making a comparison with Python, the distinction I am making > between parsing and evaluation seem to apply there. For example: > > ``` import parser parser.expr('1+') > Traceback (most recent call last): > File "", line 1, in > File "", line 1 >1+ > ^ > SyntaxError: unexpected EOF while parsing p = parser.expr('list(""=1)') p > eval(p) > Traceback (most recent call last): > File "", line 1, in > TypeError: eval() arg 1 must be a string, bytes or code object > list(""=1) > File "", line 1 > SyntaxError: keyword can't be an expression > ``` > > >> There is probably some error in how the external code is handling R >>> errors (Fatal error: unable to initialize the JIT, stack smashing, etc) >>> and possibly also how R is initialized before calling ParseVector. Probably >>> you would get the same problem when running say "stop('myerror')". Please >>> note R errors are implemented as long-jumps, so care has to be taken when >>> calling into R, Writing R Extensions has more details (and section 8 >>> specifically about embedding R). This is unlike parse (syntax) errors >>> signaled via return value to ParseVector() >>> >> >> The issue is that the segfault (because of stack smashing, therefore >> because of what also suspected to be an incontrolled jump) is happening >> within the execution of R_ParseVector(). I would think that an issue with >> the initialization of R is less likely because the project is otherwise >> used a fair bit and is well covered by automated continuous tests. >> >> After looking more into R's gram.c I suspect that an execution context is >> required for R_ParseVector() to know to properly work (know where to jump >> in case of error) when the parsing code decides to fail outside what it >> thinks is a syntax error. If the case, this would make R_ParseVector() >> function well when called from say, a C-extension to an R package, but fail >> the way I am seeing it fail when called from an embedded R. >> >> Yes, contexts are used internally to handle errors. For external use >> please see Writing R Extensions, section 6.12. >> > > I have wrapped my call to R_ParseVector() in a R_tryCatchError(), and this > is seems to help me overcome the issue. Thanks for the pointer. > > Best, > > > Laurent > > >> Best >> Tomas >> >> >> Best, >> >> Laurent >> >>> Best, >>> Tomas >
Re: [Rd] Inconsistencies in wilcox.test
> Martin Maechler > on Thu, 12 Dec 2019 17:20:47 +0100 writes: > Karolis Koncevičius > on Mon, 9 Dec 2019 23:43:36 +0200 writes: >> So I tried adding Infinity support for all cases. And it >> is (as could be expected) more complicated than I >> thought. > "Of course !" Thank you, Karolis, in any case! >> It is easy to add Inf support for the test. The problems >> start with conf.int=TRUE. >> Currently confidence intervals are computed via >> `uniroot()` and, in the case of infinities, we are >> computationally looking for roots over infinite interval >> which results in an error. I suspect this is the reason >> Inf values were removed in the first place. > Maybe. It's still wrong to be done "up front". I'm sure > 98% (or so ;-) of all calls to wilcox.test() do *not* use > conf.int = TRUE >> Just a note, I found a few more errors/inconsistencies >> when requesting confidence intervals with paired=TRUE >> (due to Infinities being left in). >> Current error in Inf-Inf scenario: >> wilcox.test(c(1,2,Inf), c(4,8,Inf), paired=TRUE, >> conf.int=TRUE) Error in if (ZEROES) x <- x[x != 0] : >> missing value where TRUE/FALSE needed > Good catch .. notably as it also happens when > conf.int=FALSE as by default. My version of wilcox.test() > now does give the same as when the to 'Inf' are left away. >> NaN confidence intervals: >> wilcox.test(c(1:9,Inf), c(21:28,Inf,30), paired=TRUE, >> conf.int=TRUE) >> Wilcoxon signed rank test with continuity correction >> data: c(1:9, Inf) and c(21:28, Inf, 30) V = 9.5, p-value >> = 0.0586 alternative hypothesis: true location shift is >> not equal to 0 0 percent confidence interval: NaN NaN >> sample estimates: midrange NaN > I don't see a big problem here. The NaN's in some sense > show the best that can be computed with this data. > Statistic and P-value, but no conf.int. >> The easiest "fix" for consistency would be to simply >> remove Infinity support from the paired=TRUE case. > I strongly disagree. We are not pruning good > functionality just for some definition of consistency. >> But going with the more desirable approach of adding >> Infinity support for non-paired cases - it is currently >> not clear to me what confidence intervals and >> pseudomedian should be. Specially when Infinities are on >> both sides. > I deem that not to be a big deal. They are not defined > given the default formulas and that is reflected by NA / > NaN in those parts of the result. >> Regards, Karolis Koncevičius. > But I have also spent a few hours now on > wilcox.test.default() behavior notably also looking at the > "rounding" / "machine precision" situation, and also on > your remark that the 'method: ...' does not indicate well > enough what was computed. > In my (not yet committed) but hereby proposed enhancement > of wilcox.test(), I have a new argument, 'digits.rank = > Inf' (the default 'Inf' corresponding to the current > behavior) with help page documentation: > digits.rank: a number; if finite, ‘rank(signif(r, > digits.rank))’ will be used to compute ranks for the test > statistic instead of (the default) ‘rank(r)’. > and then in 'Details :' > For stability reasons, it may be advisable to use > rounded data or to set ‘digits.rank = 7’, say, such that > determination of ties does not depend on very small > numeric differences (see the example). > and then in 'Examples: ' > ## accuracy in ties determination via 'digits.rank': > wilcox.test( 4:2, 3:1, paired=TRUE) # Warning: cannot > compute exact p-value with ties wilcox.test((4:2)/10, > (3:1)/10, paired=TRUE) # no ties => *no* warning > wilcox.test((4:2)/10, (3:1)/10, paired=TRUE, digits.rank = > 9) # same ties as (4:2, 3:1) > -- > Lastly, I propose to replace "test" by "exact test" in the > 'method' component (and print out) in case exact > computations were used. This information should be part > of the returned "htest" object, and not only visible from > the arguments and warnings that are printed during the > computations. This last change is in some sense the "most > back-incompatible" change of these, because many > wilcox.test() printouts would slightly change, e.g., >> w0 <- wilcox.test( 1:5, 4*(0:4), paired=TRUE) > Wilcoxon signed rank exact test > data: 1:5 and 4 * (0:4) V = 1, p-value = 0.125 > alternative hypothesis: true location shift is not equal > to 0 > where before (in R <= 3.6.x) it is just > Wilcoxon signed rank test > data: . ... ... > but I think R 4.0.0 is a good occasion for such a change. > Constructive
Re: [Rd] Inconsistent behavior for the C AP's R_ParseVector() ?
Hi Simon, Widespread errors would have caught my earlier as the way that code is using only one initialization of the embedded R, is used quite a bit, and is covered by quite a few unit tests. This is the only situation I am aware of in which an error occurs. What is a "correct context", or initial context, the code should from ? Searching for "context" in the R-exts manual does not return much. Best, Laurent Le sam. 14 déc. 2019 à 12:20, Simon Urbanek a écrit : > Laurent, > > the main point here is that ParseVector() just like any other R API has to > be called in a correct context since it can raise errors so the issue was > that your C code has a bug of not setting R correctly (my guess would be > your'e not creating the initial context necessary in embedded R). There are > many different errors, your is just one of many that can occur - any R API > call that does allocation (and parsing obviously does) can cause errors. > Note that this is true for pretty much all R API functions. > > Cheers, > Simon > > > > > On Dec 14, 2019, at 11:25 AM, Laurent Gautier > wrote: > > > > Le lun. 9 déc. 2019 à 09:57, Tomas Kalibera a > > écrit : > > > >> On 12/9/19 2:54 PM, Laurent Gautier wrote: > >> > >> > >> > >> Le lun. 9 déc. 2019 à 05:43, Tomas Kalibera > a > >> écrit : > >> > >>> On 12/7/19 10:32 PM, Laurent Gautier wrote: > >>> > >>> Thanks for the quick response Tomas. > >>> > >>> The same error is indeed happening when trying to have a zero-length > >>> variable name in an environment. The surprising bit is then "why is > this > >>> happening during parsing" (that is why are variables assigned to an > >>> environment) ? > >>> > >>> The emitted R error (in the R console) is not a parse (syntax) error, > but > >>> an error emitted during parsing when the parser tries to intern a name > - > >>> look it up in a symbol table. Empty string is not allowed as a symbol > name, > >>> and hence the error. In the call "list(''=1)" , the empty name is what > >>> could eventually become a name of a local variable inside list(), even > >>> though not yet during parsing. > >>> > >> > >> Thanks Tomas. > >> > >> I guess this has do with R expressions being lazily evaluated, and names > >> of arguments in a call are also part of the expression. Now the puzzling > >> part is why is that at all part of the parsing: I would have expected > >> R_ParseVector() to be restricted to parsing... Now it feels like > >> R_ParseVector() is performing parsing, and a first level of evalution > for > >> expressions that "should never work" (the empty name). > >> > >> Think of it as an exception in say Python. Some failures during parsing > >> result in an exception (called error in R and implemented using a long > >> jump). Any time you are calling into R you can get an error; out of > memory > >> is also signalled as R error. > >> > > > > > > The surprising bit for me was that I had expected the function to solely > > perform parsing. I did expect an exception (and a jmp smashing the stack) > > when the function concerned is in the C-API, is parsing a string, and is > > using a parameter (pointer) to store whether parsing was a failure or a > > success. > > > > Since you are making a comparison with Python, the distinction I am > making > > between parsing and evaluation seem to apply there. For example: > > > > ``` > import parser > parser.expr('1+') > > Traceback (most recent call last): > > File "", line 1, in > > File "", line 1 > >1+ > > ^ > > SyntaxError: unexpected EOF while parsing > p = parser.expr('list(""=1)') > p > > > eval(p) > > Traceback (most recent call last): > > File "", line 1, in > > TypeError: eval() arg 1 must be a string, bytes or code object > > > list(""=1) > > File "", line 1 > > SyntaxError: keyword can't be an expression > > ``` > > > > > >> There is probably some error in how the external code is handling R > >>> errors (Fatal error: unable to initialize the JIT, stack smashing, > etc) > >>> and possibly also how R is initialized before calling ParseVector. > Probably > >>> you would get the same problem when running say "stop('myerror')". > Please > >>> note R errors are implemented as long-jumps, so care has to be taken > when > >>> calling into R, Writing R Extensions has more details (and section 8 > >>> specifically about embedding R). This is unlike parse (syntax) errors > >>> signaled via return value to ParseVector() > >>> > >> > >> The issue is that the segfault (because of stack smashing, therefore > >> because of what also suspected to be an incontrolled jump) is happening > >> within the execution of R_ParseVector(). I would think that an issue > with > >> the initialization of R is less likely because the project is otherwise > >> used a fair bit and is well covered by automated continuous tests. > >> > >> After looking more into R's gram.c I suspect that an execution context > is > >> required for R_ParseVector() to know to properly work (kno
Re: [Rd] Inconsistent behavior for the C AP's R_ParseVector() ?
Laurent, > On Dec 14, 2019, at 5:29 PM, Laurent Gautier wrote: > > Hi Simon, > > Widespread errors would have caught my earlier as the way that code is > using only one initialization of the embedded R, is used quite a bit, and > is covered by quite a few unit tests. This is the only situation I am aware > of in which an error occurs. > It may or may not be "widespread" - almost all R API functions can raise errors (e.g., unable to allocate). You'll only find out once they do and that's too late ;). > What is a "correct context", or initial context, the code should from ? > Searching for "context" in the R-exts manual does not return much. > It depends which embedded API use - see R-ext 8.1 the two options are run_Rmainloop() and R_ReplDLLinit() which both setup the top-level context with SETJMP. If you don't use either then you have to use one of the advanced R APIs that do it such as R_ToplevelExec() or R_UnwindProtect(), otherwise your point to abort to on error doesn't exist. Embedding R is much more complex than many think ... Cheers, Simon > Best, > > Laurent > > > Le sam. 14 déc. 2019 à 12:20, Simon Urbanek a > écrit : > >> Laurent, >> >> the main point here is that ParseVector() just like any other R API has to >> be called in a correct context since it can raise errors so the issue was >> that your C code has a bug of not setting R correctly (my guess would be >> your'e not creating the initial context necessary in embedded R). There are >> many different errors, your is just one of many that can occur - any R API >> call that does allocation (and parsing obviously does) can cause errors. >> Note that this is true for pretty much all R API functions. >> >> Cheers, >> Simon >> >> >> >>> On Dec 14, 2019, at 11:25 AM, Laurent Gautier >> wrote: >>> >>> Le lun. 9 déc. 2019 à 09:57, Tomas Kalibera a >>> écrit : >>> On 12/9/19 2:54 PM, Laurent Gautier wrote: Le lun. 9 déc. 2019 à 05:43, Tomas Kalibera >> a écrit : > On 12/7/19 10:32 PM, Laurent Gautier wrote: > > Thanks for the quick response Tomas. > > The same error is indeed happening when trying to have a zero-length > variable name in an environment. The surprising bit is then "why is >> this > happening during parsing" (that is why are variables assigned to an > environment) ? > > The emitted R error (in the R console) is not a parse (syntax) error, >> but > an error emitted during parsing when the parser tries to intern a name >> - > look it up in a symbol table. Empty string is not allowed as a symbol >> name, > and hence the error. In the call "list(''=1)" , the empty name is what > could eventually become a name of a local variable inside list(), even > though not yet during parsing. > Thanks Tomas. I guess this has do with R expressions being lazily evaluated, and names of arguments in a call are also part of the expression. Now the puzzling part is why is that at all part of the parsing: I would have expected R_ParseVector() to be restricted to parsing... Now it feels like R_ParseVector() is performing parsing, and a first level of evalution >> for expressions that "should never work" (the empty name). Think of it as an exception in say Python. Some failures during parsing result in an exception (called error in R and implemented using a long jump). Any time you are calling into R you can get an error; out of >> memory is also signalled as R error. >>> >>> >>> The surprising bit for me was that I had expected the function to solely >>> perform parsing. I did expect an exception (and a jmp smashing the stack) >>> when the function concerned is in the C-API, is parsing a string, and is >>> using a parameter (pointer) to store whether parsing was a failure or a >>> success. >>> >>> Since you are making a comparison with Python, the distinction I am >> making >>> between parsing and evaluation seem to apply there. For example: >>> >>> ``` >> import parser >> parser.expr('1+') >>> Traceback (most recent call last): >>> File "", line 1, in >>> File "", line 1 >>> 1+ >>>^ >>> SyntaxError: unexpected EOF while parsing >> p = parser.expr('list(""=1)') >> p >>> >> eval(p) >>> Traceback (most recent call last): >>> File "", line 1, in >>> TypeError: eval() arg 1 must be a string, bytes or code object >>> >> list(""=1) >>> File "", line 1 >>> SyntaxError: keyword can't be an expression >>> ``` >>> >>> There is probably some error in how the external code is handling R > errors (Fatal error: unable to initialize the JIT, stack smashing, >> etc) > and possibly also how R is initialized before calling ParseVector. >> Probably > you would get the same problem when running say "stop('myerror')". >> Please > note R errors are implemented as long-jumps, so care has to be taken >> w