On Feb 3, 2012, at 18:03 , G See wrote: > On Fri, Feb 3, 2012 at 10:39 AM, peter dalgaard <pda...@gmail.com> wrote: >> >> So that's a nonbreak space alright. Next question: How did it get there? I'm >> mildly surprised that it crept into the data frame, I would expect it to >> happen much easier with things typed on the keyboard (Alt-Spc on my Mac >> keyboard, e.g.). >> > > Peter, > I won't venture to guess how, but this will do it. > >> library(XML) >> x <- readHTMLTable("http://earnings.com/company.asp?client=cb&ticker=GOOG", >> stringsAsFactors=FALSE)[[21]] >> charToRaw(x[28, 4]) > [1] 6e 2f 61 c2 a0 > > Garrett
OK, if you look at the source for that page, it actually contains stuff like <td align="center">n/a </td> and   is the infamous \uA0 alias nonbreak space. So the odd thing might actually be that the Mac manages to lose the trailing nonbreak space, whereas other systems do not. AFAICS, this boils down to the matching of [[:space:]] inside > XML:::trim function (x) gsub("(^[[:space:]]+|[[:space:]]+$)", "", x) <environment: namespace:XML> A locale dependency, perhaps? -- Peter Dalgaard, Professor, Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 Frederiksberg, Denmark Phone: (+45)38153501 Email: pd....@cbs.dk Priv: pda...@gmail.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.