[Rd] predict.loess() segfaults for large n?
Hi, I am segfaulting when using predict.loess() (checked with r62092). I've traced the source with the help of valgrind (output pasted below) and it appears that this is due to int overflow when allocating an int work array in loess_workspace(): liv = 50 + ((int)pow((double)2, (double)D) + 4) * nvmax + 2 * N; where liv is an (global) int. For D=1 (one x variable), this overflows at approx N = 4089 where N is the fitted sample size (not prediction sample size). I am aware that you are in the process of introducing long vectors but a quick fix would be to error when predict.loess(..., se=TRUE) and N is too large. (Ideally, one would use long int but does fortran portably support long int?) The threshold N value may depend on surface type (above is for surface=="interpolate"). The following sample code does not result in segfault but when run with valgrind, it produces the warning about large range. (In the code that segfaults N is about 77,000). set.seed(1) n = 5000 # n=4000 seems ok x = rnorm(n) y = x + rnorm(n) yf = loess(y~x, span=0.75, control=loess.control(trace.hat="approximate")) print( predict(yf, data.frame(x=1), se=TRUE) ) ##---valgrid output with segfault (abridged): > test4() ==30841== Warning: set address range perms: large range [0x3962a040, 0x5fb42608) (defined) ==30841== Warning: set address range perms: large range [0x5fb43040, 0xf8c8e130) (defined) ==30841== Invalid write of size 4 ==30841==at 0xCD719F0: ehg139_ (loessf.f:1444) ==30841==by 0xCD72E0C: ehg131_ (loessf.f:467) ==30841==by 0xCD73A5A: lowesb_ (loessf.f:1530) ==30841==by 0xCD2C774: loess_ise (loessc.c:219) ==30841==by 0x486C7F: do_dotCode (dotcode.c:1744) ==30841==by 0x4AB040: bcEval (eval.c:4544) ==30841==by 0x4B6B3F: Rf_eval (eval.c:498) ==30841==by 0x4BAD87: Rf_applyClosure (eval.c:960) ==30841==by 0x4B6D5E: Rf_eval (eval.c:611) ==30841==by 0x4B7A1E: do_eval (eval.c:2193) ==30841==by 0x4AB040: bcEval (eval.c:4544) ==30841==by 0x4B6B3F: Rf_eval (eval.c:498) ==30841== Address 0xf8cd4144 is not stack'd, malloc'd or (recently) free'd ==30841== *** caught segfault *** address 0xf8cd4144, cause 'memory not mapped' Traceback: 1: predLoess(y, x, newx, s, weights, pars$robust, pars$span, pars$degree, pars$normalize, pars$parametric, pars$drop.square, pars$surface, pars$cell, pars$family, kd, divisor, se = se) 2: eval(expr, envir, enclos) 3: eval(substitute(expr), data, enclos = parent.frame()) 4: with.default(object, predLoess(y, x, newx, s, weights, pars$robust, pars$span, pars$degree, pars$normalize, pars$parametric, pars$drop.square, pars$surface, pars$cell, pars$family, kd, divisor, se = se)) 5: with(object, predLoess(y, x, newx, s, weights, pars$robust, pars$span, pars$degree, pars$normalize, pars$parametric, pars$drop.square, pars$surface, pars$cell, pars$family, kd, divisor, se = se)) 6: predict.loess(y2, data.frame(hours = xmin), se = TRUE) 7: predict(y2, data.frame(hours = xmin), se = TRUE) 8: test4() aborting ... ==30841== -- +--- | Hiroyuki Kawakatsu | Business School, Dublin City University | Dublin 9, Ireland. Tel +353 (0)1 700 7496 __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] conflict between rJava and data.table
Simon Urbanek wrote : Can you elaborate on the details as of where this will be a problem? Packages should not be affected since they should be importing the namespaces from the packages they use, so the only problem would be in a package that uses both data.table and rJava -- and this is easily resolved in the namespace of such package. So there is no technical reason why you can't have multiple definitions of J - that's what namespaces are for. Right. It's users using J() in their own code, iiuc. rJava's manual says "J is the high-level access to Java." When they use J() on its own they probably want the rJava one, but if data.table is higher they get that one. They don't want to have to write out rJava::J(...). It is not just rJava but package XLConnect, too. If there's a better way would be interested but I didn't mind removing J from data.table. Bunny/Matt, To add to Steve's reply here's some background. This is well documented in NEWS and Googling "data.table J rJava" and similar returns useful links to NEWS and datatable-help (so you shouldn't have needed to post to r-devel). From 1.8.2 (Jul 2012) : o The J() alias is now deprecated outside DT[...], but will still work inside DT[...], as in DT[J(...)]. J() is conflicting with function J() in package XLConnect (#1747) and rJava (#2045). For data.table to change is easier, with some efficiency advantages too. The next version of data.table will issue a warning from J() when used outside DT[...]. The version after will remove it. Only then will the conflict with rJava and XLConnect be resolved. Please use data.table() directly instead of J(), outside DT[...]. From 1.8.4 (Nov 2012) : o J() now issues a warning (when used *outside* DT[...]) that using it outside DT[...] is deprecated. See item below in v1.8.2. Use data.table() directly instead of J(), outside DT[...]. Or, define an alias yourself. J() will continue to work *inside* DT[...] as documented. From 1.8.7 (soon to be on CRAN) : o The J() alias is now removed *outside* DT[...], but will still work inside DT[...]; i.e., DT[J(...)] is fine. As warned in v1.8.2 (see below in this file) and deprecated with warning() in v1.8.6. This resolves the conflict with function J() in package XLConnect (#1747) and rJava (#2045). Please use data.table() directly instead of J(), outside DT[...]. Matthew __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] conflict between rJava and data.table
On Mar 1, 2013, at 8:03 AM, Matthew Dowle wrote: > > Simon Urbanek wrote : >> Can you elaborate on the details as of where this will be a problem? Packages >> should not be affected since they should be importing the namespaces from the >> packages they use, so the only problem would be in a package that uses both >> data.table and rJava -- and this is easily resolved in the namespace of such >> package. So there is no technical reason why you can't have multiple >> definitions of J - that's what namespaces are for. > > Right. It's users using J() in their own code, iiuc. rJava's manual says "J is > the high-level access to Java." When they use J() on its own they probably > want the rJava one, but if data.table is higher they get that one. > They don't want to have to write out rJava::J(...). > > It is not just rJava but package XLConnect, too. If there's a better way would > be interested but I didn't mind removing J from data.table. > For packages there is really no issue - if something breaks in XTConnect then the authors are probably importing the wrong function in their namespace (I still didn't see a reproducible example, though). The only difference is for interactive use so not having conflicting J() [if possible] would be actually useful there, since J() in rJava is primarily intended for interactive use. Cheers, Simon > Bunny/Matt, > > To add to Steve's reply here's some background. This is well documented in > NEWS > and Googling "data.table J rJava" and similar returns useful links to NEWS and > datatable-help (so you shouldn't have needed to post to r-devel). > > From 1.8.2 (Jul 2012) : > > o The J() alias is now deprecated outside DT[...], but will still work inside > DT[...], as in DT[J(...)]. > J() is conflicting with function J() in package XLConnect (#1747) > and rJava (#2045). For data.table to change is easier, with some efficiency > advantages too. The next version of data.table will issue a warning from J() > when used outside DT[...]. The version after will remove it. Only then will > the conflict with rJava and XLConnect be resolved. > Please use data.table() directly instead of J(), outside DT[...]. > > From 1.8.4 (Nov 2012) : > > o J() now issues a warning (when used *outside* DT[...]) that using it > outside DT[...] is deprecated. See item below in v1.8.2. > Use data.table() directly instead of J(), outside DT[...]. Or, define > an alias yourself. J() will continue to work *inside* DT[...] as documented. > > From 1.8.7 (soon to be on CRAN) : > > o The J() alias is now removed *outside* DT[...], but will still work inside > DT[...]; > i.e., DT[J(...)] is fine. As warned in v1.8.2 (see below in this file) and > deprecated > with warning() in v1.8.6. This resolves the conflict with function J() in > package > XLConnect (#1747) and rJava (#2045). > Please use data.table() directly instead of J(), outside DT[...]. > > Matthew > > > __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] conflict between rJava and data.table
On 01.03.2013 16:13, Simon Urbanek wrote: On Mar 1, 2013, at 8:03 AM, Matthew Dowle wrote: Simon Urbanek wrote : Can you elaborate on the details as of where this will be a problem? Packages should not be affected since they should be importing the namespaces from the packages they use, so the only problem would be in a package that uses both data.table and rJava -- and this is easily resolved in the namespace of such package. So there is no technical reason why you can't have multiple definitions of J - that's what namespaces are for. Right. It's users using J() in their own code, iiuc. rJava's manual says "J is the high-level access to Java." When they use J() on its own they probably want the rJava one, but if data.table is higher they get that one. They don't want to have to write out rJava::J(...). It is not just rJava but package XLConnect, too. If there's a better way would be interested but I didn't mind removing J from data.table. For packages there is really no issue - if something breaks in XTConnect then the authors are probably importing the wrong function in their namespace (I still didn't see a reproducible example, though). The only difference is for interactive use so not having conflicting J() [if possible] would be actually useful there, since J() in rJava is primarily intended for interactive use. Yes that's what I wrote above isn't it? i.e. It's users using J() in their own code, iiuc. "J is the high-level access to Java." Not just interactive use (i.e. at the R prompt) but inside their functions and scripts, too. Although, I don't know the rJava package at all. So why J() might be used for interactive use but not in functions and scripts isn't clear to me. Any use of J from example(J) will serve as a reproducible example; e.g., library(rJava) # load rJava first library(data.table) # then data.table J("java.lang.Double") There is no error or warning, but the user would be returned a 1 row 1 column data.table rather than something related to Java. Then the errors/warnings follow from there. The user can either load the packages the other way around, or, use :: library(rJava) # load rJava first library(data.table) # then data.table rJava::J("java.lang.Double")# ok now Cheers, Simon Bunny/Matt, To add to Steve's reply here's some background. This is well documented in NEWS and Googling "data.table J rJava" and similar returns useful links to NEWS and datatable-help (so you shouldn't have needed to post to r-devel). From 1.8.2 (Jul 2012) : o The J() alias is now deprecated outside DT[...], but will still work inside DT[...], as in DT[J(...)]. J() is conflicting with function J() in package XLConnect (#1747) and rJava (#2045). For data.table to change is easier, with some efficiency advantages too. The next version of data.table will issue a warning from J() when used outside DT[...]. The version after will remove it. Only then will the conflict with rJava and XLConnect be resolved. Please use data.table() directly instead of J(), outside DT[...]. From 1.8.4 (Nov 2012) : o J() now issues a warning (when used *outside* DT[...]) that using it outside DT[...] is deprecated. See item below in v1.8.2. Use data.table() directly instead of J(), outside DT[...]. Or, define an alias yourself. J() will continue to work *inside* DT[...] as documented. From 1.8.7 (soon to be on CRAN) : o The J() alias is now removed *outside* DT[...], but will still work inside DT[...]; i.e., DT[J(...)] is fine. As warned in v1.8.2 (see below in this file) and deprecated with warning() in v1.8.6. This resolves the conflict with function J() in package XLConnect (#1747) and rJava (#2045). Please use data.table() directly instead of J(), outside DT[...]. Matthew __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] conflict between rJava and data.table
On Mar 1, 2013, at 11:40 AM, Matthew Dowle wrote: > On 01.03.2013 16:13, Simon Urbanek wrote: >> On Mar 1, 2013, at 8:03 AM, Matthew Dowle wrote: >> >>> >>> Simon Urbanek wrote : Can you elaborate on the details as of where this will be a problem? Packages should not be affected since they should be importing the namespaces from the packages they use, so the only problem would be in a package that uses both data.table and rJava -- and this is easily resolved in the namespace of such package. So there is no technical reason why you can't have multiple definitions of J - that's what namespaces are for. >>> >>> Right. It's users using J() in their own code, iiuc. rJava's manual says "J >>> is >>> the high-level access to Java." When they use J() on its own they probably >>> want the rJava one, but if data.table is higher they get that one. >>> They don't want to have to write out rJava::J(...). >>> >>> It is not just rJava but package XLConnect, too. If there's a better way >>> would >>> be interested but I didn't mind removing J from data.table. >>> >> >> For packages there is really no issue - if something breaks in >> XTConnect then the authors are probably importing the wrong function >> in their namespace (I still didn't see a reproducible example, >> though). The only difference is for interactive use so not having >> conflicting J() [if possible] would be actually useful there, since >> J() in rJava is primarily intended for interactive use. > > Yes that's what I wrote above isn't it? i.e. > >> It's users using J() in their own code, iiuc. >> "J is the high-level access to Java." > > Not just interactive use (i.e. at the R prompt) but inside their functions > and scripts, too. > Although, I don't know the rJava package at all. So why J() might be used for > interactive > use but not in functions and scripts isn't clear to me. > Any use of J from example(J) will serve as a reproducible example; e.g., > >library(rJava) # load rJava first >library(data.table) # then data.table >J("java.lang.Double") > > There is no error or warning, but the user would be returned a 1 row 1 column > data.table rather than something related to Java. Then the errors/warnings > follow from there. > > The user can either load the packages the other way around, or, use :: > >library(rJava) # load rJava first >library(data.table) # then data.table >rJava::J("java.lang.Double")# ok now > Matt, there are two entirely separate uses a) interactive use b) use in packages you are describing a) and as I said in the latter part above J() in rJava is meant for that so it would be useful to not have a conflict there. However, in my first part of the e-mail I was referring to b) where there is no conflict, because packages define which package will a symbol come from, so the user search path plays no role. Today, all packages should be using imports so search path pollution should no longer be an issue, so the order in which the user attached packages to their search path won't affect the functionality of the packages (that's why namespaces are mandatory). Therefore, if XLConnect breaks (again, I don't know, I didn't see it) due to the order on the search path, it indicates there is a bug in the its namespace as it's apparently importing the wrong J - it should be importing it from rJava and not data.table. Is that more clear? Cheers, Simon > >> >> Cheers, >> Simon >> >> >>> Bunny/Matt, >>> >>> To add to Steve's reply here's some background. This is well documented in >>> NEWS >>> and Googling "data.table J rJava" and similar returns useful links to NEWS >>> and >>> datatable-help (so you shouldn't have needed to post to r-devel). >>> >>> From 1.8.2 (Jul 2012) : >>> >>> o The J() alias is now deprecated outside DT[...], but will still work >>> inside >>> DT[...], as in DT[J(...)]. >>> J() is conflicting with function J() in package XLConnect (#1747) >>> and rJava (#2045). For data.table to change is easier, with some efficiency >>> advantages too. The next version of data.table will issue a warning from >>> J() >>> when used outside DT[...]. The version after will remove it. Only then will >>> the conflict with rJava and XLConnect be resolved. >>> Please use data.table() directly instead of J(), outside DT[...]. >>> >>> From 1.8.4 (Nov 2012) : >>> >>> o J() now issues a warning (when used *outside* DT[...]) that using it >>> outside DT[...] is deprecated. See item below in v1.8.2. >>> Use data.table() directly instead of J(), outside DT[...]. Or, define >>> an alias yourself. J() will continue to work *inside* DT[...] as >>> documented. >>> >>> From 1.8.7 (soon to be on CRAN) : >>> >>> o The J() alias is now removed *outside* DT[...], but will still work >>> inside DT[...]; >>> i.e., DT[J(...)] is fine. As warned in v1.8.2 (see below in this file) a
Re: [Rd] conflict between rJava and data.table
On 01.03.2013 20:19, Simon Urbanek wrote: On Mar 1, 2013, at 11:40 AM, Matthew Dowle wrote: On 01.03.2013 16:13, Simon Urbanek wrote: On Mar 1, 2013, at 8:03 AM, Matthew Dowle wrote: Simon Urbanek wrote : Can you elaborate on the details as of where this will be a problem? Packages should not be affected since they should be importing the namespaces from the packages they use, so the only problem would be in a package that uses both data.table and rJava -- and this is easily resolved in the namespace of such package. So there is no technical reason why you can't have multiple definitions of J - that's what namespaces are for. Right. It's users using J() in their own code, iiuc. rJava's manual says "J is the high-level access to Java." When they use J() on its own they probably want the rJava one, but if data.table is higher they get that one. They don't want to have to write out rJava::J(...). It is not just rJava but package XLConnect, too. If there's a better way would be interested but I didn't mind removing J from data.table. For packages there is really no issue - if something breaks in XTConnect then the authors are probably importing the wrong function in their namespace (I still didn't see a reproducible example, though). The only difference is for interactive use so not having conflicting J() [if possible] would be actually useful there, since J() in rJava is primarily intended for interactive use. Yes that's what I wrote above isn't it? i.e. It's users using J() in their own code, iiuc. "J is the high-level access to Java." Not just interactive use (i.e. at the R prompt) but inside their functions and scripts, too. Although, I don't know the rJava package at all. So why J() might be used for interactive use but not in functions and scripts isn't clear to me. Any use of J from example(J) will serve as a reproducible example; e.g., library(rJava) # load rJava first library(data.table) # then data.table J("java.lang.Double") There is no error or warning, but the user would be returned a 1 row 1 column data.table rather than something related to Java. Then the errors/warnings follow from there. The user can either load the packages the other way around, or, use :: library(rJava) # load rJava first library(data.table) # then data.table rJava::J("java.lang.Double")# ok now Matt, there are two entirely separate uses a) interactive use b) use in packages you are describing a) and as I said in the latter part above J() in rJava is meant for that so it would be useful to not have a conflict there. Yes (a) is the problem. Good, so I did the right thing in July 2012 by starting to deprecate J in data.table when this problem was first reported. However, in my first part of the e-mail I was referring to b) where there is no conflict, because packages define which package will a symbol come from, so the user search path plays no role. Today, all packages should be using imports so search path pollution should no longer be an issue, so the order in which the user attached packages to their search path won't affect the functionality of the packages (that's why namespaces are mandatory). Therefore, if XLConnect breaks (again, I don't know, I didn't see it) due to the order on the search path, it indicates there is a bug in the its namespace as it's apparently importing the wrong J - it should be importing it from rJava and not data.table. Is that more clear? Yes, thanks. (b) isn't a problem. rJava and XLConnect aren't breaking, the users aren't reporting that. It's merely problem (a); e.g. where end users of both rJava and data.table use J() in their own code. Cheers, Simon Cheers, Simon Bunny/Matt, To add to Steve's reply here's some background. This is well documented in NEWS and Googling "data.table J rJava" and similar returns useful links to NEWS and datatable-help (so you shouldn't have needed to post to r-devel). From 1.8.2 (Jul 2012) : o The J() alias is now deprecated outside DT[...], but will still work inside DT[...], as in DT[J(...)]. J() is conflicting with function J() in package XLConnect (#1747) and rJava (#2045). For data.table to change is easier, with some efficiency advantages too. The next version of data.table will issue a warning from J() when used outside DT[...]. The version after will remove it. Only then will the conflict with rJava and XLConnect be resolved. Please use data.table() directly instead of J(), outside DT[...]. From 1.8.4 (Nov 2012) : o J() now issues a warning (when used *outside* DT[...]) that using it outside DT[...] is deprecated. See item below in v1.8.2. Use data.table() directly instead of J(), outside DT[...]. Or, define an alias yourself. J() will continue to work *inside* DT[...] as documented. From 1.8.7 (soon to be on CRAN) : o The J() alias is now removed *outside* DT[...], but will still work
[Rd] .Call interface: Use R SEXP as C mutable *char
Dear R Developers, DISCLAIMER: I am new to package development in R and new to this list. I am trying to do something along the lines of: SEXP test_fun (SEXP filename) { const char *inputfile = translateChar(STRING_ELT(filename, 0)); int abc = some_function(inputfile); ... } The code compiles fine, but I get a warning: "passing argument of 'some_function' discards qualifiers from pointer target type" I read up on my issue and found this posting: https://stat.ethz.ch/pipermail/r-devel/2011-June/061221.html I gather that the 'some_function' (which is a function from another library) takes just '*char' as argument type so the 'const' qualifier is discarded. Of course I want my package to compile without warnings. All my other attempts led to similar 'discard' warnings (mainly initializations of helper variables). What is the recommended approach here? Best Regards, Michael Bach __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] .Call interface: Use R SEXP as C mutable *char
Michael, On Mar 1, 2013, at 4:53 PM, Michael Bach wrote: > Dear R Developers, > > > DISCLAIMER: I am new to package development in R and new to this list. > > I am trying to do something along the lines of: > > SEXP test_fun (SEXP filename) { > > const char *inputfile = translateChar(STRING_ELT(filename, 0)); > > int abc = some_function(inputfile); > > ... > > } > > The code compiles fine, but I get a warning: > "passing argument of 'some_function' discards qualifiers from pointer target > type" > > I read up on my issue and found this posting: > https://stat.ethz.ch/pipermail/r-devel/2011-June/061221.html > > I gather that the 'some_function' (which is a function from another library) > takes just '*char' as argument type so the 'const' qualifier is discarded. > > Of course I want my package to compile without warnings. All my other > attempts led to similar 'discard' warnings (mainly initializations of helper > variables). > > What is the recommended approach here? > Well, it really depends on some_function. The issue here is that inputfile you get is immutable (aka read-only). However, the warning tells you that some_function() declares that it wants to modify its input, so you cannot pass an immutable object to it. So there are two options (rather just one, really ;)): a) some_function() really means it, you have to create a copy - there are many ways to do it, this is just one of them, pick your best static char buf[512]; if (strlen(inputfile) + 1 > sizeof(buf)) Rf_error("File name is too long"); strcpy(buf, inputfile); int abc = some_function(buf); b) some_function() doesn't really mean it - it's just a bug in the declaration and the author really meant int some_function(const char *fn) This is dangerous, because you have to know for sure that this is a bug that will be fixed. Meanwhile you can work around the bug with int abc = some_function((char*) buf); but that will remove all checking so if some_function() decides to actually modify the argument (which it legally can as it was telling you it will), you are in deep trouble, because memory is being corrupted affecting the whole R. So don't do this! Cheers, Simon > Best Regards, > Michael Bach > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > > __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel