Hi all, In what encoding does format.POSIXct return its output? It doesn't seem to be utf-8:
Sys.setlocale("LC_ALL", "Japanese_Japan.932") times <- c("1970-01-01 01:00:00 UTC", "1970-02-02 22:00:00 UTC") ampm <- format(as.POSIXct(times), format = "%p") x <- gsub(">", "*", paste(ampm, collapse = "+>")) y <- "午前+*午後" identical(x, y) # [1] TRUE # But, confusingly, ... charToRaw(x) # [1] e5 8d 88 e5 89 8d 2b 2a e5 8d 88 e5 be 8c charToRaw(y) # [1] 8c df 91 4f 2b 2a 8c df 8c e3 # So there's at least a small bug with identical # And this causes a problem when you attempt to do # stuff with the string gsub("+", "*", x, fixed = T) # Error in gsub("+", "*", x, fixed = T) : # invalid multibyte string at '<8c>' gsub("+", "*", y, fixed = T) # [1] "午前**午後" My session info is R version 3.0.0 (2013-04-03) Platform: x86_64-w64-mingw32/x64 (64-bit) locale: [1] LC_COLLATE=Japanese_Japan.932 LC_CTYPE=Japanese_Japan.932 [3] LC_MONETARY=Japanese_Japan.932 LC_NUMERIC=C [5] LC_TIME=Japanese_Japan.932 attached base packages: [1] stats graphics grDevices utils datasets methods base loaded via a namespace (and not attached): [1] tools_3.0.0 Any ideas? Thanks! Hadley -- Chief Scientist, RStudio http://had.co.nz/ ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel