On Wed, 21 Dec 2005, Roger D. Peng wrote: > Well, who am I to break this long-standing ritual? :) > > Interestingly, while the printed output looks wrong, I get > > > v <- paste(0:10, "asdf", sep = ".") > > a <- sub(".asdf", "", v, fixed = TRUE) > > b <- as.character(0:10) > > identical(a, b) > [1] TRUE > >
identical is wrong! R character strings have a true length and a C-style length: print() prints the all the characters, even those after embedded nuls. identical uses if(strcmp(CHAR(STRING_ELT(x, i)), CHAR(STRING_ELT(y, i))) != 0) which is C-style. The issue is character.c:1015 whose nr gets trashed: note the first answer in the vector is correct. So easy to fix. This code has been as currently for years, so I don't think this is at all related to the release of 2.2.1. > Peter Dalgaard wrote: >> "Roger D. Peng" <[EMAIL PROTECTED]> writes: >> >> >>> I've noticed what I think is curious behavior in using 'sub(fixed = TRUE)' >>> and >>> was wondering if my expectation is incorrect. Here is one example: >>> >>> v <- paste(0:10, "asdf", sep = ".") >>> sub(".asdf", "", v, fixed = TRUE) >>> >>> The results I get are >>> >>>> sub(".asdf", "", v, fixed = TRUE) >>> [1] "0" "1\0st\0\0" "2\0<af>\001\0\0" "3\0<af>\001\0\0" >>> [5] "4\0mes\0" "5\0<ba>\001\0\0" "6\0\0\0\0\0" "7\0\0\0m\0" >>> [9] "8\0\0\0t\0" "9\0<fe>\0\0\0" "10\0\0\0\0\0" >>>> >>> >>> I expected "0" in the first entry and everything else would be unchanged. >>> Your >>> results may vary since every time I run 'sub()' in this way, I get a >>> slightly >>> different answer in entires 2 through 11. >>> >>> As it turns out, 'gsub(fixed = TRUE)' gives me the answer I *actually* >>> wanted, >>> which was to replace the string in every entry. But I still think the >>> behavior >>> of 'sub(fixed = TRUE) is a bit odd. >>> >>>> version >>> _ >>> platform x86_64-unknown-linux-gnu >>> arch x86_64 >>> os linux-gnu >>> system x86_64, linux-gnu >>> status >>> major 2 >>> minor 2.1 >>> year 2005 >>> month 12 >>> day 20 >>> svn rev 36812 >>> language R >>>> >> >> >> Argh... >> >> year 2005 >> month 12 >> day 21 >> >> and something like this gets discovered. It's a ritual, I tell ya, a ritual! >> >> If you look at the output and terminate all strings at the embedded >> \0, it looks much more sensible, so it should be fairly easy to spot >> the cause of this bug... >> > > -- > Roger D. Peng | http://www.biostat.jhsph.edu/~rpeng/ > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > > -- Brian D. Ripley, [EMAIL PROTECTED] Professor of Applied Statistics, http://www.stats.ox.ac.uk/~ripley/ University of Oxford, Tel: +44 1865 272861 (self) 1 South Parks Road, +44 1865 272866 (PA) Oxford OX1 3TG, UK Fax: +44 1865 272595 ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel