On Feb 3, 2012, at 18:03 , G See wrote:

> On Fri, Feb 3, 2012 at 10:39 AM, peter dalgaard <pda...@gmail.com> wrote:
>> 
>> So that's a nonbreak space alright. Next question: How did it get there? I'm 
>> mildly surprised that it crept into the data frame, I would expect it to 
>> happen much easier with things typed on the keyboard (Alt-Spc on my Mac 
>> keyboard, e.g.).
>> 
> 
> Peter,
> I won't venture to guess how, but this will do it.
> 
>> library(XML)
>> x <- readHTMLTable("http://earnings.com/company.asp?client=cb&ticker=GOOG";, 
>> stringsAsFactors=FALSE)[[21]]
>> charToRaw(x[28, 4])
> [1] 6e 2f 61 c2 a0
> 
> Garrett


OK, if you look at the source for that page, it actually contains stuff like

<td align="center">n/a&#160;</td>

and &#160; is the infamous \uA0 alias nonbreak space. So the odd thing might 
actually be that the Mac manages to lose the trailing nonbreak space, whereas 
other systems do not. AFAICS, this boils down to the matching of [[:space:]] 
inside

> XML:::trim
function (x) 
gsub("(^[[:space:]]+|[[:space:]]+$)", "", x)
<environment: namespace:XML>

A locale dependency, perhaps?

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd....@cbs.dk  Priv: pda...@gmail.com

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to