Re: [Rd] R-devel does not update the C++ returned variables

2015-03-03 Thread Martin Maechler
> Hervé Pagès 
> on Mon, 2 Mar 2015 13:00:47 -0800 writes:

> Hi,
> On 03/02/2015 12:18 PM, Dénes Tóth wrote:
>> 
>> 
>> On 03/02/2015 04:37 PM, Martin Maechler wrote:
>>> 
 On 2 March 2015 at 09:09, Duncan Murdoch wrote:
 | I generally recommend that people use Rcpp, which hides a lot of the
 | details.  It will generate your .Call calls for you, and generate the
 | C++ code that receives them; you just need to think about the real
 | problem, not the interface.  It has its own learning curve, but I
 think
 | it is easier than using the low-level code that you need to work
 with .Call.
>>> 
 Thanks for that vote, and I second that.
>>> 
 And these days the learning is a lot flatter than it was a decade ago:
>>> 
R> Rcpp::cppFunction("NumericVector doubleThis(NumericVector x) {
 return(2*x); }")
R> doubleThis(c(1,2,3,21,-4))
 [1]  2  4  6 42 -8
R> 
>>> 
 That defined, compiled, loaded and run/illustrated a simple function.
>>> 
 Dirk
>>> 
>>> Indeed impressive,  ... and it also works with integer vectors
>>> something also not 100% trivial when working with compiled code.
>>> 
>>> When testing that, I've went a step further:
>>> 
>>> ## now "test":
>>> require(microbenchmark)
>>> i <- 1:10
>> 
>> Note that the relative speed of the algorithms also depends on the size
>> of the input vector. i + i becomes the winner for longer vectors (e.g. i
>> <- 1:1e6), but a proper Rcpp version is still approximately twice as 
fast.

> The difference in speed is probably due to the fact that R does safe
> arithmetic. C or C++ do not:

>> doubleThisInt(i)
> [1]  2147483642  2147483644  2147483646  NA -2147483646 
> -2147483644

>> 2L * i
> [1] 2147483642 2147483644 2147483646 NA NA NA
> Warning message:
> In 2L * i : NAs produced by integer overflow

> H.

Exactly, excellent, Hervé!

Luke also told me so in a private message.
and 'i+i' is looking up 'i' twice  which is relatively costly
for very small i  as in my example.

This ("no safe integer arithmetic in C, but in R") is another
good example {as Martin Morgan's}  why using 
Rccp -- or .Call() directly -- may be a too sharp edged sword and
maybe should be advocated for good programmers only.

Martin


>> 
>> Rcpp::cppFunction("NumericVector doubleThisNum(NumericVector x) {
>> return(2*x); }")
>> Rcpp::cppFunction("IntegerVector doubleThisInt(IntegerVector x) {
>> return(2*x); }")
>> i <- 1:1e6
>> mb <- microbenchmark::microbenchmark(doubleThisNum(i), doubleThisInt(i),
>> i*2, 2*i, i*2L, 2L*i, i+i, times=100)
>> plot(mb, log="y", notch=TRUE)
>> 
>> 
>>> (mb <- microbenchmark(doubleThis(i), i*2, 2*i, i*2L, 2L*i, i+i,
>>> times=2^12))
>>> ## Lynne (i7; FC 20), R Under development ... (2015-03-02 r67924):
>>> ## Unit: nanoseconds
>>> ##   expr min  lq  mean median   uq   max neval cld
>>> ##  doubleThis(i) 762 985 1319.5974   1124 1338 17831  4096   b
>>> ##  i * 2 124 151  258.4419164  221 4  4096  a
>>> ##  2 * i 127 154  266.4707169  216 20213  4096  a
>>> ## i * 2L 143 164  250.6057181  234 16863  4096  a
>>> ## 2L * i 144 177  269.5015193  237 16119  4096  a
>>> ##  i + i 152 183  272.6179199  243 10434  4096  a
>>> 
>>> plot(mb, log="y", notch=TRUE)
>>> ## hmm, looks like even the simple arithm. differ slightly ...
>>> ##
>>> ## ==> zoom in:
>>> plot(mb, log="y", notch=TRUE, ylim = c(150,300))
>>> 
>>> dev.copy(png, file="mbenchm-doubling.png")
>>> dev.off() # [ <- why do I need this here for png ??? ]
>>> ##--> see the appended *png graphic
>>> 
>>> Those who've learnt EDA or otherwise about boxplot notches, will
>>> know that they provide somewhat informal but robust pairwise tests on
>>> approximate 5% level.
>>> From these, one *could* - possibly wrongly - conclude that
>>> 'i * 2' is significantly faster than both 'i * 2L' and also
>>> 'i + i'  which I find astonishing, given that  i is integer here...
>>> 
>>> Probably no reason for deep thoughts here, but if someone is
>>> enticed, this maybe slightly interesting to read.
>>> 
>>> Martin Maechler, ETH Zurich
>>> 
>>> 
>>> 
>>> __
>>> R-devel@r-project.org mailing list
>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>> 
>> 
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel

> -- 
> Hervé Pagès

> Program in Computational Biology
> Division of Public Health Scien

Re: [Rd] Import data set from another package?

2015-03-03 Thread Prof Brian Ripley

On 02/03/2015 22:48, Therneau, Terry M., Ph.D. wrote:

I've moved nlme from Depends to Imports in my coxme package. However, a
few of the examples for lmekin use one of the data sets from nlme.  This
is on purpose, to show how the results are the same and how they differ.

  If I use  data(nlme::ergoStool)  the data is not found,
data(nlme:::ergoStool) does no better.
  If I add importFrom(nlme, "ergoStool") the error message is that
ergoStool is not exported.

There likely is a simple way, but I currently don't see it.


There were some off-the-mark suggestions in this thread.  If you just 
want a dataset from a package, use


data("ergoStool", package = "nlme")

In particular, it is somewhat wasteful to load a large namespace like 
nlme when it is not needed.



--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Emeritus Professor of Applied Statistics, University of Oxford
1 South Parks Road, Oxford OX1 3TG, UK

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [R] Why does R replace all row values with NAs

2015-03-03 Thread Martin Maechler
Diverted from R-help :
 as it gets into musing about new R language "primitives"

> William Dunlap 
> on Fri, 27 Feb 2015 08:04:36 -0800 writes:

> You could define functions like

> is.true <- function(x) !is.na(x) & x
> is.false <- function(x) !is.na(x) & !x

> and use them in your selections.  E.g.,
>> x <- data.frame(a=1:10,b=2:11,c=c(1,NA,3,NA,5,NA,7,NA,NA,10))
>> x[is.true(x$c >= 6), ]
> a  b  c
> 7   7  8  7
> 10 10 11 10

> Bill Dunlap
> TIBCO Software
> wdunlap tibco.com

Yes; the Matrix package has had these

is0  <- function(x) !is.na(x) & x == 0
isN0 <- function(x)  is.na(x) | x != 0
is1  <- function(x) !is.na(x) & x   # also == "isTRUE componentwise"

namespace hidden for a while  [note the comment of the last one!]
and using them for readibility in its own code.

Maybe we should (again) consider providing some versions of
these with R ?

The Matrix package also has had fast 

allFalse <- all0 <- function(x) .Call(R_all0, x)
anyFalse <- any0 <- function(x) .Call(R_any0, x)
## 
## anyFalse <- function(x) isTRUE(any(!x))   ## ~= any0
## any0 <- function(x) isTRUE(any(x == 0))## ~= anyFalse

namespace hidden as well, already, which probably could also be
brought to base R.

One big reason to *not* go there (to internal C code) at all with R is that
S3 and S4 dispatch for '==' ('!=', etc, the 'Compare' group generics) 
and 'is.na() have been known and package writers have
programmed methods for these.
To ensure that S3 and S4 dispatch works "correctly" also inside
such new internals is much less easily achieved, and so
such a C-based internal function  is0() would no longer be
equivalent with!is.na(x) & x == 0
as soon as 'x' is an "object" with a '==', 'Compare' and/or an is.na() method.

OTOH, simple R versions such as your  'is.true',  called 'is1'
inside Matrix maybe optimizable a bit by the byte compiler (and
jit and other such tricks) and still keep the full
semantic including correct method dispatch.

Martin Maechler, ETH Zurich


> On Fri, Feb 27, 2015 at 7:27 AM, Dimitri Liakhovitski <
> dimitri.liakhovit...@gmail.com> wrote:

>> Thank you very much, Duncan.
>> All this being said:
>> 
>> What would you say is the most elegant and most safe way to solve such
>> a seemingly simple task?
>> 
>> Thank you!
>> 
>> On Fri, Feb 27, 2015 at 10:02 AM, Duncan Murdoch
>>  wrote:
>> > On 27/02/2015 9:49 AM, Dimitri Liakhovitski wrote:
>> >> So, Duncan, do I understand you correctly:
>> >>
>> >> When I use x$x<6, R doesn't know if it's TRUE or FALSE, so it returns
>> >> a logical value of NA.
>> >
>> > Yes, when x$x is NA.  (Though I think you meant x$c.)
>> >
>> >> When this logical value is applied to a row, the R says: hell, I don't
>> >> know if I should keep it or not, so, just in case, I am going to keep
>> >> it, but I'll replace all the values in this row with NAs?
>> >
>> > Yes.  Indexing with a logical NA is probably a mistake, and this is one
>> > way to signal it without actually triggering a warning or error.
>> >
>> > BTW, I should have mentioned that the example where you indexed using
>> > -which(x$c>=6) is a bad idea:  if none of the entries were 6 or more,
>> > this would be indexing with an empty vector, and you'd get nothing, not
>> > everything.
>> >
>> > Duncan Murdoch
>> >
>> >
>> >>
>> >> On Fri, Feb 27, 2015 at 9:13 AM, Duncan Murdoch
>> >>  wrote:
>> >>> On 27/02/2015 9:04 AM, Dimitri Liakhovitski wrote:
>>  I know how to get the output I need, but I would benefit from an
>>  explanation why R behaves the way it does.
>> 
>>  # I have a data frame x:
>>  x = data.frame(a=1:10,b=2:11,c=c(1,NA,3,NA,5,NA,7,NA,NA,10))
>>  x
>>  # I want to toss rows in x that contain values >=6. But I don't want
>>  to toss my NAs there.
>> 
>>  subset(x,c<6) # Works correctly, but removes NAs in c, understand 
why
>>  x[which(x$c<6),] # Works correctly, but removes NAs in c, understand
>> why
>>  x[-which(x$c>=6),] # output I need
>> 
>>  # Here is my question: why does the following line replace the 
values
>>  of all rows that contain an NA # in x$c with NAs?
>> 
>>  x[x$c<6,]  # Leaves rows with c=NA, but makes the whole row an NA.
>> Why???
>>  x[(x$c<6) | is.na(x$c),] # output I need - I have to be
>> super-explicit
>> 
>>  Thank you very much!
>> >>>
>> >>> Most of your examples (except the ones using which()) are doing 
logical
>> >>> indexing.  In logical indexing, TRUE keeps a line, FALSE drops the
>> line,
>> >>> and NA returns NA.  Since "x$c < 6" is NA if x$c is NA, you get the
>> >>> third kind of indexing.
>> >>>
>> >>> Your last example 

[Rd] Asking for tasks of summer code 2015

2015-03-03 Thread han cao
Hey everyone:
I am a Master student from Saarland Unirversity, Germany with the major of
Bioinformatics. And I am interested in statistical learning which is also
my major work in the future with the implementation by R. So I 'd like join
the google summer code this year by doing tasks in your community. However
I can not find whether there are tasks available provided for this year,
anyone can tell me?


Hank Cao

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Asssistance

2015-03-03 Thread Evans Otieno Ochiaga
Hi to All,

I am building a package in R and whenever I run command "R CMD build OAR"
in the terminal, I get the following error:

* checking for file ‘OAR/DESCRIPTION’ ... OK
* preparing ‘OAR’:
* checking DESCRIPTION meta-information ... ERROR
Malformed Depends or Suggests or Imports or Enhances field.
Offending entries:
  R (>=3.0.2)
Entries must be names of packages optionally followed by '<=' or '>=',
white space, and a valid version number in parentheses.

See the information on DESCRIPTION files in section 'Creating R
packages' of the 'Writing R Extensions' manual.

This is my first time to build a package using R and it's very hard for me
to figure out where the problem is. I kindly call for your assistance in
fixing the problem. Below is my function;

bcidata <- read.csv("~/Desktop/Files_for_Package/data.csv"); bcidata

Modelsfunc<- function(bcidata){

  occupancymean.data.frame <- NULL

  for (k in seq(2.5,250,by=2.5)){

i <- 1000/k

j <- 500/k

bcidata$Xgrid <- cut(bcidata$PX, breaks = i, include.lowest = T)

bcidata$Ygrid <- cut(bcidata$PY, breaks = j, include.lowest = T)

bcidata$IDgrid <- with(bcidata, interaction(Xgrid,Ygrid))

bcidata$IDNgrid <- factor(bcidata$IDgrid)

levels(bcidata$IDgrid) <- seq_along(levels(bcidata$IDgrid))

bcidata$count <- ave(bcidata$PX, bcidata$IDgrid, FUN = length)

aggregate <- aggregate(bcidata$PX,bcidata[,c("Xgrid","Ygrid","IDNgrid")],
FUN = length)

Totalgrids <- length(levels(bcidata$IDgrid))

Occupiedgrids <- length(aggregate$IDNgrid)

sum <- sum(aggregate$x)

TotalArea <- 50

Area <- (1000/i*500/j)

Occupancy <- (Occupiedgrids/Totalgrids)

Mean <- length(bcidata$Latin)/(Occupiedgrids)

Variance <- var(aggregate$x)

occupancymean.data.frame <- rbind(occupancymean.data.frame,
data.frame(Area, Totalgrids, Occupiedgrids, Occupancy, Mean, Variance))

  }

  occupancymean.data.frame

  Occupancy <- occupancymean.data.frame$Occupancy

  Mean <- occupancymean.data.frame$Mean

  poission <- nls(Occupancy ~ 1-exp(-rho*Mean), start = list(rho = 2.1),
data = occupancymean.data.frame)

  nachman <- nls(Occupancy ~ 1-exp(-alpha*Mean^beta), start = list(alpha =
0.2, beta = 0.1), data = occupancymean.data.frame)

  logistic <- nls(Occupancy ~ (alpha*Mean^beta)/(1+alpha*Mean^beta), start
= list(alpha = 0.2, beta = 0.1),data = occupancymean.data.frame)

  nbd <- nls(Occupancy ~ 1-(1+(Mean)/k)^-k, start = list(k = 1), data =
occupancymean.data.frame)

  power <- nls(Occupancy ~ alpha*Mean^beta, start = list(alpha = 0.2, beta=
0.1), data = occupancymean.data.frame)

  inbd <- nls(Occupancy ~
1-(alpha*(Mean^(beta-1)))^(Mean/(1-alpha*Mean^(beta-1))), start =
list(alpha = 0.2, beta = 0.3),

  data = occupancymean.data.frame)

  fnbd <- nls(Occupancy ~ 1- (gamma(N +
k/(Mean*A/N)-k)*gamma(k/(Mean*A/N)))/(gamma(k/(Mean*A/N)-k)*gamma(N+k/(Mean*A/N))),


  start = list(k = 0.2, A = 0.1, N = 0.2), data =  occupancymean
.data.frame)

  bayesianII <- nls(Occupancy ~ 1-(theta*beta^(2*(TotalArea
*Mean/sum)^0.5)*delta^(TotalArea*Mean/sum)), start = list(theta=0.9956,
beta=1, delta=1), data = occupancymean.data.frame)


  return(list(summary(poission), summary(nachman), summary(logistic),
summary(nbd),

  summary(power), summary(inbd), summary(fnbd), summary(
bayesianII)))

}

Modelsfunc(bcidata)

Your assistance will be highly appreciated. Thanks in advance.

Regards,


*Evans Ochiaga*

*African Institute for Mathematical Sciences*

*6 Melrose Road*

*Muizenberg, South Africa*

*Msc in Mathematical Sciences+27 84 61 69 183 *

*"When I cannot understand my Father’s leading, And it seems to be but hard
and cruel fate, Still I hear that gentle whisper ever pleading, God is
working, God is faithful—Only wait."*

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Asssistance

2015-03-03 Thread Duncan Murdoch
On 03/03/2015 5:47 AM, Evans Otieno Ochiaga wrote:
> Hi to All,
> 
> I am building a package in R and whenever I run command "R CMD build OAR"
> in the terminal, I get the following error:
> 
> * checking for file ‘OAR/DESCRIPTION’ ... OK
> * preparing ‘OAR’:
> * checking DESCRIPTION meta-information ... ERROR
> Malformed Depends or Suggests or Imports or Enhances field.
> Offending entries:
>   R (>=3.0.2)
> Entries must be names of packages optionally followed by '<=' or '>=',
> white space, and a valid version number in parentheses.

That looks okay; I'm guessing it is out of place.  Can you show use your
DESCRIPTION file?

Duncan Murdoch

> 
> See the information on DESCRIPTION files in section 'Creating R
> packages' of the 'Writing R Extensions' manual.
> 
> This is my first time to build a package using R and it's very hard for me
> to figure out where the problem is. I kindly call for your assistance in
> fixing the problem. Below is my function;
> 
> bcidata <- read.csv("~/Desktop/Files_for_Package/data.csv"); bcidata
> 
> Modelsfunc<- function(bcidata){
> 
>   occupancymean.data.frame <- NULL
> 
>   for (k in seq(2.5,250,by=2.5)){
> 
> i <- 1000/k
> 
> j <- 500/k
> 
> bcidata$Xgrid <- cut(bcidata$PX, breaks = i, include.lowest = T)
> 
> bcidata$Ygrid <- cut(bcidata$PY, breaks = j, include.lowest = T)
> 
> bcidata$IDgrid <- with(bcidata, interaction(Xgrid,Ygrid))
> 
> bcidata$IDNgrid <- factor(bcidata$IDgrid)
> 
> levels(bcidata$IDgrid) <- seq_along(levels(bcidata$IDgrid))
> 
> bcidata$count <- ave(bcidata$PX, bcidata$IDgrid, FUN = length)
> 
> aggregate <- aggregate(bcidata$PX,bcidata[,c("Xgrid","Ygrid","IDNgrid")],
> FUN = length)
> 
> Totalgrids <- length(levels(bcidata$IDgrid))
> 
> Occupiedgrids <- length(aggregate$IDNgrid)
> 
> sum <- sum(aggregate$x)
> 
> TotalArea <- 50
> 
> Area <- (1000/i*500/j)
> 
> Occupancy <- (Occupiedgrids/Totalgrids)
> 
> Mean <- length(bcidata$Latin)/(Occupiedgrids)
> 
> Variance <- var(aggregate$x)
> 
> occupancymean.data.frame <- rbind(occupancymean.data.frame,
> data.frame(Area, Totalgrids, Occupiedgrids, Occupancy, Mean, Variance))
> 
>   }
> 
>   occupancymean.data.frame
> 
>   Occupancy <- occupancymean.data.frame$Occupancy
> 
>   Mean <- occupancymean.data.frame$Mean
> 
>   poission <- nls(Occupancy ~ 1-exp(-rho*Mean), start = list(rho = 2.1),
> data = occupancymean.data.frame)
> 
>   nachman <- nls(Occupancy ~ 1-exp(-alpha*Mean^beta), start = list(alpha =
> 0.2, beta = 0.1), data = occupancymean.data.frame)
> 
>   logistic <- nls(Occupancy ~ (alpha*Mean^beta)/(1+alpha*Mean^beta), start
> = list(alpha = 0.2, beta = 0.1),data = occupancymean.data.frame)
> 
>   nbd <- nls(Occupancy ~ 1-(1+(Mean)/k)^-k, start = list(k = 1), data =
> occupancymean.data.frame)
> 
>   power <- nls(Occupancy ~ alpha*Mean^beta, start = list(alpha = 0.2, beta=
> 0.1), data = occupancymean.data.frame)
> 
>   inbd <- nls(Occupancy ~
> 1-(alpha*(Mean^(beta-1)))^(Mean/(1-alpha*Mean^(beta-1))), start =
> list(alpha = 0.2, beta = 0.3),
> 
>   data = occupancymean.data.frame)
> 
>   fnbd <- nls(Occupancy ~ 1- (gamma(N +
> k/(Mean*A/N)-k)*gamma(k/(Mean*A/N)))/(gamma(k/(Mean*A/N)-k)*gamma(N+k/(Mean*A/N))),
> 
> 
>   start = list(k = 0.2, A = 0.1, N = 0.2), data =  occupancymean
> .data.frame)
> 
>   bayesianII <- nls(Occupancy ~ 1-(theta*beta^(2*(TotalArea
> *Mean/sum)^0.5)*delta^(TotalArea*Mean/sum)), start = list(theta=0.9956,
> beta=1, delta=1), data = occupancymean.data.frame)
> 
> 
>   return(list(summary(poission), summary(nachman), summary(logistic),
> summary(nbd),
> 
>   summary(power), summary(inbd), summary(fnbd), summary(
> bayesianII)))
> 
> }
> 
> Modelsfunc(bcidata)
> 
> Your assistance will be highly appreciated. Thanks in advance.
> 
> Regards,
> 
> 
> *Evans Ochiaga*
> 
> *African Institute for Mathematical Sciences*
> 
> *6 Melrose Road*
> 
> *Muizenberg, South Africa*
> 
> *Msc in Mathematical Sciences+27 84 61 69 183 *
> 
> *"When I cannot understand my Father’s leading, And it seems to be but hard
> and cruel fate, Still I hear that gentle whisper ever pleading, God is
> working, God is faithful—Only wait."*
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Asssistance

2015-03-03 Thread Gregor Kastner
Hi Evans,

> * checking for file ‘OAR/DESCRIPTION’ ... OK
> * preparing ‘OAR’:
> * checking DESCRIPTION meta-information ... ERROR
> Malformed Depends or Suggests or Imports or Enhances field.
> Offending entries:
>   R (>=3.0.2)
> Entries must be names of packages optionally followed by '<=' or '>=',
> white space, and a valid version number in parentheses.

The _white space_ (see explanation above) seems to be missing.

Try "R (>= 3.0.2)"

Best,
/g

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Import data set from another package?

2015-03-03 Thread Therneau, Terry M., Ph.D.

As I expected: there was something simple and obvious, which I somehow could 
not see.
Thanks for the pointer.

Terry T.


On 03/03/2015 03:12 AM, Prof Brian Ripley wrote:

On 02/03/2015 22:48, Therneau, Terry M., Ph.D. wrote:

I've moved nlme from Depends to Imports in my coxme package. However, a
few of the examples for lmekin use one of the data sets from nlme.  This
is on purpose, to show how the results are the same and how they differ.

  If I use  data(nlme::ergoStool)  the data is not found,
data(nlme:::ergoStool) does no better.
  If I add importFrom(nlme, "ergoStool") the error message is that
ergoStool is not exported.

There likely is a simple way, but I currently don't see it.


There were some off-the-mark suggestions in this thread.  If you just want a 
dataset from
a package, use

data("ergoStool", package = "nlme")

In particular, it is somewhat wasteful to load a large namespace like nlme when 
it is not
needed.




__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Asking for tasks of summer code 2015

2015-03-03 Thread beleites,claudia
On Mo, 2015-03-02 at 16:53 +0100, han cao wrote:
> Hey everyone:
> I am a Master student from Saarland Unirversity, Germany with the major of
> Bioinformatics. And I am interested in statistical learning which is also
> my major work in the future with the implementation by R. So I 'd like join
> the google summer code this year by doing tasks in your community. However
> I can not find whether there are tasks available provided for this year,
> anyone can tell me?

The R GSoC pages moved to GitHub:
https://github.com/rstats-gsoc/gsoc2015/wiki


Best,

Claudia

> 
> 
> Hank Cao
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Feature request: copy attributes in gzcon

2015-03-03 Thread Jeroen Ooms
The `gzcon` function both modifies and copies a connection object:

  # compressed text
  con1 <- url("http://www.stats.ox.ac.uk/pub/datasets/csb/ch12.dat.gz";)
  con2 <- gzcon(con1)

  # almost indistinguishable
  con1==con2
  identical(summary(con2), summary(con1))

  # both support gzip
  readLines(con1, n = 3)
  readLines(con2, n = 3)

  # opening one opens both
  isOpen(con2)
  open(con1)
  isOpen(con2)

In the example, `con1` and `con2` are two different objects
interfacing the same connection. It might seem as if gzcon has simply
returned the modified connection object, but the documentation
explains that it in fact creates a copy referencing the same
connection but with a "modified internal structure".

It is unclear to me how `con1` is different from `con2`, but given
that they represent one and the same connection, would it be possible
to make gzcon copy over attributes from the input connection to the
output object?

This would allow custom connection implementations such as the curl
package to use attributes for storing additional metadata about
connection. Currently those attributes get dropped after calling gzcon
on the connection:

  library(curl)
  con <- curl("http://www.stats.ox.ac.uk/pub/datasets/csb/ch12.dat.gz";)
  attr(con, "foo") <- "bar"

  con <- gzcon(con)
  attr(con, "foo")

It would be very helpful if gzcon would instead copy attributes onto
the output object, such that any potential meta-data about the
connection as stored in attributes gets retained.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] [R] Why does R replace all row values with NAs

2015-03-03 Thread Hervé Pagès



On 03/03/2015 02:28 AM, Martin Maechler wrote:

Diverted from R-help :
 as it gets into musing about new R language "primitives"


William Dunlap 
 on Fri, 27 Feb 2015 08:04:36 -0800 writes:


 > You could define functions like

 > is.true <- function(x) !is.na(x) & x
 > is.false <- function(x) !is.na(x) & !x

 > and use them in your selections.  E.g.,
 >> x <- data.frame(a=1:10,b=2:11,c=c(1,NA,3,NA,5,NA,7,NA,NA,10))
 >> x[is.true(x$c >= 6), ]
 > a  b  c
 > 7   7  8  7
 > 10 10 11 10

 > Bill Dunlap
 > TIBCO Software
 > wdunlap tibco.com

Yes; the Matrix package has had these

is0  <- function(x) !is.na(x) & x == 0
isN0 <- function(x)  is.na(x) | x != 0
is1  <- function(x) !is.na(x) & x   # also == "isTRUE componentwise"


Note that using %in% to block propagation of NAs is about 2x faster:

> x <- sample(c(NA_integer_, 1:1), 50, replace=TRUE)
> microbenchmark(as.logical(x) %in% TRUE, !is.na(x) & x)
Unit: milliseconds
expr   minlq  mean   medianuq
 as.logical(x) %in% TRUE  6.034744  6.264382  6.999083  6.29488  6.346028
   !is.na(x) & x 11.202808 11.402437 11.469101 11.44848 11.517576
  max neval
 40.36472   100
 11.90916   100





namespace hidden for a while  [note the comment of the last one!]
and using them for readibility in its own code.

Maybe we should (again) consider providing some versions of
these with R ?

The Matrix package also has had fast

allFalse <- all0 <- function(x) .Call(R_all0, x)
anyFalse <- any0 <- function(x) .Call(R_any0, x)
##
## anyFalse <- function(x) isTRUE(any(!x))## ~= any0
## any0 <- function(x) isTRUE(any(x == 0)) ## ~= anyFalse

namespace hidden as well, already, which probably could also be
brought to base R.

One big reason to *not* go there (to internal C code) at all with R is that
S3 and S4 dispatch for '==' ('!=', etc, the 'Compare' group generics)
and 'is.na() have been known and package writers have
programmed methods for these.
To ensure that S3 and S4 dispatch works "correctly" also inside
such new internals is much less easily achieved, and so
such a C-based internal function  is0() would no longer be
equivalent with!is.na(x) & x == 0
as soon as 'x' is an "object" with a '==', 'Compare' and/or an is.na() method.


Excellent point. Thank you! It really makes a big difference for 
developers who maintain a complex hierarchy of S4 classes and methods,

when functions like is.true, anyFalse, etc..., which can be expressed in
terms of more basic operations like ==, !=, !, is.na, etc..., just work
out-of-the-box on objects for which these basic operations are defined.

There is conceptually a small set of "building blocks", at least for
objects with a vector-like or list-like semantic, that can be used
to formally describe the semantic of many functions in base R. This
is what the man page for anyNA does by saying:

  anyNA implements any(is.na(x))

even though the actual implementation differs, but that's ok, as long
as anyNA is equivalent to doing any(is.na(x)) on any object for which
building block is.na() is implemented.

Unfortunately there is no clearly identified set of building blocks
in base R. For example, if I want the comparison operations to work
on my object, I need to implement ==, >, <, !=, <=, and >= (the
'Compare' group generics) even though it should be enough to implement
== and >=, because all the others can be described in terms of these
2 building blocks. unique/duplicated is another example (unique(x) is
conceptually x[!duplicated(x)]). And so on...

Cheers,
H.



OTOH, simple R versions such as your  'is.true',  called 'is1'
inside Matrix maybe optimizable a bit by the byte compiler (and
jit and other such tricks) and still keep the full
semantic including correct method dispatch.

Martin Maechler, ETH Zurich


 > On Fri, Feb 27, 2015 at 7:27 AM, Dimitri Liakhovitski <
 > dimitri.liakhovit...@gmail.com> wrote:

 >> Thank you very much, Duncan.
 >> All this being said:
 >>
 >> What would you say is the most elegant and most safe way to solve such
 >> a seemingly simple task?
 >>
 >> Thank you!
 >>
 >> On Fri, Feb 27, 2015 at 10:02 AM, Duncan Murdoch
 >>  wrote:
 >> > On 27/02/2015 9:49 AM, Dimitri Liakhovitski wrote:
 >> >> So, Duncan, do I understand you correctly:
 >> >>
 >> >> When I use x$x<6, R doesn't know if it's TRUE or FALSE, so it returns
 >> >> a logical value of NA.
 >> >
 >> > Yes, when x$x is NA.  (Though I think you meant x$c.)
 >> >
 >> >> When this logical value is applied to a row, the R says: hell, I 
don't
 >> >> know if I should keep it or not, so, just in case, I am going to keep
 >> >> it, but I'll replace all the values in this row with NAs?
 >> >
 >> > Yes.  Indexing with a logical NA is probably a mistake, and this is 
one
 >> > way to signal it without actually trigge

Re: [Rd] [R] Why does R replace all row values with NAs

2015-03-03 Thread Stephanie M. Gogarten



On 3/3/15 1:26 PM, Hervé Pagès wrote:



On 03/03/2015 02:28 AM, Martin Maechler wrote:

Diverted from R-help :
 as it gets into musing about new R language "primitives"


William Dunlap 
 on Fri, 27 Feb 2015 08:04:36 -0800 writes:


 > You could define functions like

 > is.true <- function(x) !is.na(x) & x
 > is.false <- function(x) !is.na(x) & !x

 > and use them in your selections.  E.g.,
 >> x <- data.frame(a=1:10,b=2:11,c=c(1,NA,3,NA,5,NA,7,NA,NA,10))
 >> x[is.true(x$c >= 6), ]
 > a  b  c
 > 7   7  8  7
 > 10 10 11 10

 > Bill Dunlap
 > TIBCO Software
 > wdunlap tibco.com

Yes; the Matrix package has had these

is0  <- function(x) !is.na(x) & x == 0
isN0 <- function(x)  is.na(x) | x != 0
is1  <- function(x) !is.na(x) & x   # also == "isTRUE componentwise"


Note that using %in% to block propagation of NAs is about 2x faster:

 > x <- sample(c(NA_integer_, 1:1), 50, replace=TRUE)
 > microbenchmark(as.logical(x) %in% TRUE, !is.na(x) & x)
Unit: milliseconds
 expr   minlq  mean   medianuq
  as.logical(x) %in% TRUE  6.034744  6.264382  6.999083  6.29488  6.346028
!is.na(x) & x 11.202808 11.402437 11.469101 11.44848 11.517576
   max neval
  40.36472   100
  11.90916   100


Unfortunately %in% does not preserve matrix dimensions:

> x <- matrix(sample(c(NA_integer_, 1:100), 500, replace=TRUE), nrow=50)
> dim(x)
[1] 50 10
> dim(!is.na(x) & x)
[1] 50 10
> dim(as.logical(x) %in% TRUE)
NULL

Stephanie







namespace hidden for a while  [note the comment of the last one!]
and using them for readibility in its own code.

Maybe we should (again) consider providing some versions of
these with R ?

The Matrix package also has had fast

allFalse <- all0 <- function(x) .Call(R_all0, x)
anyFalse <- any0 <- function(x) .Call(R_any0, x)
##
## anyFalse <- function(x) isTRUE(any(!x)) ## ~= any0
## any0 <- function(x) isTRUE(any(x == 0))  ## ~= anyFalse

namespace hidden as well, already, which probably could also be
brought to base R.

One big reason to *not* go there (to internal C code) at all with R is
that
S3 and S4 dispatch for '==' ('!=', etc, the 'Compare' group generics)
and 'is.na() have been known and package writers have
programmed methods for these.
To ensure that S3 and S4 dispatch works "correctly" also inside
such new internals is much less easily achieved, and so
such a C-based internal function  is0() would no longer be
equivalent with!is.na(x) & x == 0
as soon as 'x' is an "object" with a '==', 'Compare' and/or an is.na()
method.


Excellent point. Thank you! It really makes a big difference for
developers who maintain a complex hierarchy of S4 classes and methods,
when functions like is.true, anyFalse, etc..., which can be expressed in
terms of more basic operations like ==, !=, !, is.na, etc..., just work
out-of-the-box on objects for which these basic operations are defined.

There is conceptually a small set of "building blocks", at least for
objects with a vector-like or list-like semantic, that can be used
to formally describe the semantic of many functions in base R. This
is what the man page for anyNA does by saying:

   anyNA implements any(is.na(x))

even though the actual implementation differs, but that's ok, as long
as anyNA is equivalent to doing any(is.na(x)) on any object for which
building block is.na() is implemented.

Unfortunately there is no clearly identified set of building blocks
in base R. For example, if I want the comparison operations to work
on my object, I need to implement ==, >, <, !=, <=, and >= (the
'Compare' group generics) even though it should be enough to implement
== and >=, because all the others can be described in terms of these
2 building blocks. unique/duplicated is another example (unique(x) is
conceptually x[!duplicated(x)]). And so on...

Cheers,
H.



OTOH, simple R versions such as your  'is.true',  called 'is1'
inside Matrix maybe optimizable a bit by the byte compiler (and
jit and other such tricks) and still keep the full
semantic including correct method dispatch.

Martin Maechler, ETH Zurich


 > On Fri, Feb 27, 2015 at 7:27 AM, Dimitri Liakhovitski <
 > dimitri.liakhovit...@gmail.com> wrote:

 >> Thank you very much, Duncan.
 >> All this being said:
 >>
 >> What would you say is the most elegant and most safe way to
solve such
 >> a seemingly simple task?
 >>
 >> Thank you!
 >>
 >> On Fri, Feb 27, 2015 at 10:02 AM, Duncan Murdoch
 >>  wrote:
 >> > On 27/02/2015 9:49 AM, Dimitri Liakhovitski wrote:
 >> >> So, Duncan, do I understand you correctly:
 >> >>
 >> >> When I use x$x<6, R doesn't know if it's TRUE or FALSE, so
it returns
 >> >> a logical value of NA.
 >> >
 >> > Yes, when x$x is NA.  (Though I think you meant x$c.)
 >> >
 >> >> When this logical value is applied to a row, the R says:
hell, I don't
 >> >> know i

Re: [Rd] [R] Why does R replace all row values with NAs

2015-03-03 Thread Gabriel Becker
Stephanie,

Actually, it's as.logical that isn't preserving matrix dimensions, because
it coerces to a logical vector:

> x <- matrix(sample(c(NA_integer_, 1:100), 500, replace=TRUE), nrow=50)
> dim(as.logical(x))
NULL

~G

On Tue, Mar 3, 2015 at 2:09 PM, Stephanie M. Gogarten <
sdmor...@u.washington.edu> wrote:

>
>
> On 3/3/15 1:26 PM, Hervé Pagès wrote:
>
>>
>>
>> On 03/03/2015 02:28 AM, Martin Maechler wrote:
>>
>>> Diverted from R-help :
>>>  as it gets into musing about new R language "primitives"
>>>
>>>  William Dunlap 
  on Fri, 27 Feb 2015 08:04:36 -0800 writes:

>>>
>>>  > You could define functions like
>>>
>>>  > is.true <- function(x) !is.na(x) & x
>>>  > is.false <- function(x) !is.na(x) & !x
>>>
>>>  > and use them in your selections.  E.g.,
>>>  >> x <- data.frame(a=1:10,b=2:11,c=c(1,NA,3,NA,5,NA,7,NA,NA,10))
>>>  >> x[is.true(x$c >= 6), ]
>>>  > a  b  c
>>>  > 7   7  8  7
>>>  > 10 10 11 10
>>>
>>>  > Bill Dunlap
>>>  > TIBCO Software
>>>  > wdunlap tibco.com
>>>
>>> Yes; the Matrix package has had these
>>>
>>> is0  <- function(x) !is.na(x) & x == 0
>>> isN0 <- function(x)  is.na(x) | x != 0
>>> is1  <- function(x) !is.na(x) & x   # also == "isTRUE componentwise"
>>>
>>
>> Note that using %in% to block propagation of NAs is about 2x faster:
>>
>>  > x <- sample(c(NA_integer_, 1:1), 50, replace=TRUE)
>>  > microbenchmark(as.logical(x) %in% TRUE, !is.na(x) & x)
>> Unit: milliseconds
>>  expr   minlq  mean   medianuq
>>   as.logical(x) %in% TRUE  6.034744  6.264382  6.999083  6.29488  6.346028
>> !is.na(x) & x 11.202808 11.402437 11.469101 11.44848
>> 11.517576
>>max neval
>>   40.36472 100
>>   11.90916   100
>>
>
> Unfortunately %in% does not preserve matrix dimensions:
>
> > x <- matrix(sample(c(NA_integer_, 1:100), 500, replace=TRUE), nrow=50)
> > dim(x)
> [1] 50 10
> > dim(!is.na(x) & x)
> [1] 50 10
> > dim(as.logical(x) %in% TRUE)
> NULL
>
> Stephanie
>
>
>
>>
>>
>>
>>> namespace hidden for a while  [note the comment of the last one!]
>>> and using them for readibility in its own code.
>>>
>>> Maybe we should (again) consider providing some versions of
>>> these with R ?
>>>
>>> The Matrix package also has had fast
>>>
>>> allFalse <- all0 <- function(x) .Call(R_all0, x)
>>> anyFalse <- any0 <- function(x) .Call(R_any0, x)
>>> ##
>>> ## anyFalse <- function(x) isTRUE(any(!x)) ## ~= any0
>>> ## any0 <- function(x) isTRUE(any(x == 0))  ## ~= anyFalse
>>>
>>> namespace hidden as well, already, which probably could also be
>>> brought to base R.
>>>
>>> One big reason to *not* go there (to internal C code) at all with R is
>>> that
>>> S3 and S4 dispatch for '==' ('!=', etc, the 'Compare' group generics)
>>> and 'is.na() have been known and package writers have
>>> programmed methods for these.
>>> To ensure that S3 and S4 dispatch works "correctly" also inside
>>> such new internals is much less easily achieved, and so
>>> such a C-based internal function  is0() would no longer be
>>> equivalent with!is.na(x) & x == 0
>>> as soon as 'x' is an "object" with a '==', 'Compare' and/or an is.na()
>>> method.
>>>
>>
>> Excellent point. Thank you! It really makes a big difference for
>> developers who maintain a complex hierarchy of S4 classes and methods,
>> when functions like is.true, anyFalse, etc..., which can be expressed in
>> terms of more basic operations like ==, !=, !, is.na, etc..., just work
>> out-of-the-box on objects for which these basic operations are defined.
>>
>> There is conceptually a small set of "building blocks", at least for
>> objects with a vector-like or list-like semantic, that can be used
>> to formally describe the semantic of many functions in base R. This
>> is what the man page for anyNA does by saying:
>>
>>anyNA implements any(is.na(x))
>>
>> even though the actual implementation differs, but that's ok, as long
>> as anyNA is equivalent to doing any(is.na(x)) on any object for which
>> building block is.na() is implemented.
>>
>> Unfortunately there is no clearly identified set of building blocks
>> in base R. For example, if I want the comparison operations to work
>> on my object, I need to implement ==, >, <, !=, <=, and >= (the
>> 'Compare' group generics) even though it should be enough to implement
>> == and >=, because all the others can be described in terms of these
>> 2 building blocks. unique/duplicated is another example (unique(x) is
>> conceptually x[!duplicated(x)]). And so on...
>>
>> Cheers,
>> H.
>>
>>
>>> OTOH, simple R versions such as your  'is.true',  called 'is1'
>>> inside Matrix maybe optimizable a bit by the byte compiler (and
>>> jit and other such tricks) and still keep the full
>>> semantic including correct method dispatch.
>>>
>>> Martin Maechler, ETH Zurich
>>>
>>>
>>>  > On Fri, Feb 27, 2015 at 7:27 AM, Dimitri Liakhovitski <
>>>  > dimitri.liakhov

Re: [Rd] [R] Why does R replace all row values with NAs

2015-03-03 Thread Hervé Pagès



On 03/03/2015 02:17 PM, Gabriel Becker wrote:

Stephanie,

Actually, it's as.logical that isn't preserving matrix dimensions,
because it coerces to a logical vector:

 > x <- matrix(sample(c(NA_integer_, 1:100), 500, replace=TRUE), nrow=50)
 > dim(as.logical(x))


It's true, as.logical() doesn't help here but Stephanie is right, %in%
does not preserve the dimensions either:

> dim(x %in% 1:5)
NULL

That's because match() itself doesn't preserve the dimensions:

> dim(match(x, 1:5))
NULL

So maybe my fast is.true() should be:

is.true <- function(x)
{
  ans <- as.logical(x) %in% TRUE
  if (is.null(dim(x))) {
names(ans) <- names(x)
  } else {
dim(ans) <- dim(x)
dimnames(ans) <- dimnames(x)
  }
  ans
}

or something like that...

H.


NULL

~G

On Tue, Mar 3, 2015 at 2:09 PM, Stephanie M. Gogarten
mailto:sdmor...@u.washington.edu>> wrote:



On 3/3/15 1:26 PM, Hervé Pagès wrote:



On 03/03/2015 02:28 AM, Martin Maechler wrote:

Diverted from R-help :
 as it gets into musing about new R language "primitives"

William Dunlap mailto:wdun...@tibco.com>>
  on Fri, 27 Feb 2015 08:04:36 -0800
writes:


  > You could define functions like

  > is.true <- function(x) !is.na (x) & x
  > is.false <- function(x) !is.na (x) & !x

  > and use them in your selections.  E.g.,
  >> x <-
data.frame(a=1:10,b=2:11,c=c(__1,NA,3,NA,5,NA,7,NA,NA,10))
  >> x[is.true(x$c >= 6), ]
  > a  b  c
  > 7   7  8  7
  > 10 10 11 10

  > Bill Dunlap
  > TIBCO Software
  > wdunlap tibco.com 

Yes; the Matrix package has had these

is0  <- function(x) !is.na (x) & x == 0
isN0 <- function(x) is.na (x) | x != 0
is1  <- function(x) !is.na (x) & x   # also ==
"isTRUE componentwise"


Note that using %in% to block propagation of NAs is about 2x faster:

  > x <- sample(c(NA_integer_, 1:1), 50, replace=TRUE)
  > microbenchmark(as.logical(x) %in% TRUE, !is.na
(x) & x)
Unit: milliseconds
  expr   minlq  mean
  medianuq
   as.logical(x) %in% TRUE  6.034744  6.264382  6.999083
6.29488  6.346028
 !is.na (x) & x 11.202808 11.402437
11.469101 11.44848 11.517576
max neval
40.36472 100 
   11.90916   100


Unfortunately %in% does not preserve matrix dimensions:

 > x <- matrix(sample(c(NA_integer_, 1:100), 500, replace=TRUE),
nrow=50)
 > dim(x)
[1] 50 10
 > dim(!is.na (x) & x)
[1] 50 10
 > dim(as.logical(x) %in% TRUE)
NULL

Stephanie






namespace hidden for a while  [note the comment of the last
one!]
and using them for readibility in its own code.

Maybe we should (again) consider providing some versions of
these with R ?

The Matrix package also has had fast

allFalse <- all0 <- function(x) .Call(R_all0, x)
anyFalse <- any0 <- function(x) .Call(R_any0, x)
##
## anyFalse <- function(x) isTRUE(any(!x)) ## ~= any0
## any0 <- function(x) isTRUE(any(x == 0))  ## ~=
anyFalse

namespace hidden as well, already, which probably could also be
brought to base R.

One big reason to *not* go there (to internal C code) at all
with R is
that
S3 and S4 dispatch for '==' ('!=', etc, the 'Compare' group
generics)
and 'is.na () have been known and package
writers have
programmed methods for these.
To ensure that S3 and S4 dispatch works "correctly" also inside
such new internals is much less easily achieved, and so
such a C-based internal function  is0() would no longer be
equivalent with!is.na (x) & x == 0
as soon as 'x' is an "object" with a '==', 'Compare' and/or
an is.na ()
method.


Excellent point. Thank you! It really makes a big difference for
developers who maintain a complex hierarchy of S4 classes and
methods,
when functions like is.true, anyFalse, etc..., which can be
expressed in
terms of more basic operations like ==, !=, !, is.na
, etc..., just work
out-of-the-box on objects for which these basic operations are
de

Re: [Rd] Errors on Windows with grep(fixed=TRUE) on UTF-8 strings

2015-03-03 Thread Winston Chang
After a bit more investigation, I think I've found the cause of the bug,
and I have a patch.

This bug happens with grep(), when:
* Running on Windows.
* The search uses fixed=TRUE.
* The search pattern is a single byte.
* The current locale has a multibyte encoding.

===
Here's an example that demonstrates the bug:

# First, create a 3-byte UTF-8 character
y <- rawToChar(as.raw(c(0xe6, 0xb8, 0x97)))
Encoding(y) <- "UTF-8"
y
# [1] "渗"

# In my default locale, grep with a single-char pattern and fixed=TRUE
# returns integer(0), as expected.
Sys.getlocale("LC_CTYPE")
# [1] "English_United States.1252"
grep("a", y, fixed = TRUE)
# integer(0)

# When the using a multibyte locale, grep with a single-char
# pattern and fixed=TRUE results in an error.
Sys.setlocale("LC_CTYPE", "chinese")
grep("a", y, fixed = TRUE)
# Error in grep("a", y, fixed = TRUE) : invalid multibyte string at '<97>'


===

I believe the problem is in the main/grep.c file, in the fgrep_one
function. It tests for a multi-byte character string locale
`mbcslocale`, and then for the `use_UTF8`, like so:

if (!useBytes && mbcslocale) {
...
} else if (!useBytes && use_UTF8) {
...
} else ...

This can be seen at
https://github.com/wch/r-source/blob/e92b4c1cba05762480cd3898335144e5dd111cb7/src/main/grep.c#L668-L692

A similar pattern occurs in the fgrep_one_bytes function, at
https://github.com/wch/r-source/blob/e92b4c1cba05762480cd3898335144e5dd111cb7/src/main/grep.c#L718-L736


I believe that the test order should be reversed; it should test first
for `use_UTF8`, and then for `mbcslocale`. This pattern occurs in a few
places in grep.c. It looks like this:

if (!useBytes && use_UTF8) {
...
} else if (!useBytes && mbcslocale) {
...
} else ...


===
This patch does what I described; it simply tests for `use_UTF8` first,
and then `mbcslocale`, in both fgrep_one and fgrep_one_bytes. I made
this patch against the 3.1.2 sources, and tested the example code above.
In both cases, grep() returned integer(0), as expected.

(The reason I made this change against 3.1.2 is because I had problems
getting the current trunk to compile on both Linux or Windows.)


diff --git src/main/grep.c src/main/grep.c
index 6e6ec3e..348c63d 100644
--- src/main/grep.c
+++ src/main/grep.c
@@ -664,27 +664,27 @@ static int fgrep_one(const char *pat, const char *target,
}
return -1;
 }
-if (!useBytes && mbcslocale) { /* skip along by chars */
-   mbstate_t mb_st;
+if (!useBytes && use_UTF8) {
int ib, used;
-   mbs_init(&mb_st);
for (ib = 0, i = 0; ib <= len-plen; i++) {
if (strncmp(pat, target+ib, plen) == 0) {
if (next != NULL) *next = ib + plen;
return i;
}
-   used = (int) Mbrtowc(NULL,  target+ib, MB_CUR_MAX, &mb_st);
+   used = utf8clen(target[ib]);
if (used <= 0) break;
ib += used;
}
-} else if (!useBytes && use_UTF8) {
+} else if (!useBytes && mbcslocale) { /* skip along by chars */
+   mbstate_t mb_st;
int ib, used;
+   mbs_init(&mb_st);
for (ib = 0, i = 0; ib <= len-plen; i++) {
if (strncmp(pat, target+ib, plen) == 0) {
if (next != NULL) *next = ib + plen;
return i;
}
-   used = utf8clen(target[ib]);
+   used = (int) Mbrtowc(NULL,  target+ib, MB_CUR_MAX, &mb_st);
if (used <= 0) break;
ib += used;
}
@@ -714,21 +714,21 @@ static int fgrep_one_bytes(const char *pat, const char 
*target, int len,
if (*p == pat[0]) return i;
return -1;
 }
-if (!useBytes && mbcslocale) { /* skip along by chars */
-   mbstate_t mb_st;
+if (!useBytes && use_UTF8) { /* not really needed */
int ib, used;
-   mbs_init(&mb_st);
for (ib = 0, i = 0; ib <= len-plen; i++) {
if (strncmp(pat, target+ib, plen) == 0) return ib;
-   used = (int) Mbrtowc(NULL, target+ib, MB_CUR_MAX, &mb_st);
+   used = utf8clen(target[ib]);
if (used <= 0) break;
ib += used;
}
-} else if (!useBytes && use_UTF8) { /* not really needed */
+} else if (!useBytes && mbcslocale) { /* skip along by chars */
+   mbstate_t mb_st;
int ib, used;
+   mbs_init(&mb_st);
for (ib = 0, i = 0; ib <= len-plen; i++) {
if (strncmp(pat, target+ib, plen) == 0) return ib;
-   used = utf8clen(target[ib]);
+   used = (int) Mbrtowc(NULL, target+ib, MB_CUR_MAX, &mb_st);
if (used <= 0) break;
ib += used;
}


-Winston

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel