On Feb 3, 2012, at 18:03 , G See wrote:
> On Fri, Feb 3, 2012 at 10:39 AM, peter dalgaard <[email protected]> wrote:
>>
>> So that's a nonbreak space alright. Next question: How did it get there? I'm
>> mildly surprised that it crept into the data frame, I would expect it to
>> happen much easier with things typed on the keyboard (Alt-Spc on my Mac
>> keyboard, e.g.).
>>
>
> Peter,
> I won't venture to guess how, but this will do it.
>
>> library(XML)
>> x <- readHTMLTable("http://earnings.com/company.asp?client=cb&ticker=GOOG",
>> stringsAsFactors=FALSE)[[21]]
>> charToRaw(x[28, 4])
> [1] 6e 2f 61 c2 a0
>
> Garrett
OK, if you look at the source for that page, it actually contains stuff like
<td align="center">n/a </td>
and   is the infamous \uA0 alias nonbreak space. So the odd thing might
actually be that the Mac manages to lose the trailing nonbreak space, whereas
other systems do not. AFAICS, this boils down to the matching of [[:space:]]
inside
> XML:::trim
function (x)
gsub("(^[[:space:]]+|[[:space:]]+$)", "", x)
<environment: namespace:XML>
A locale dependency, perhaps?
--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: [email protected] Priv: [email protected]
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.