Hello dear R-help mailing list.


I wish to be able to have htmlParse work well with Hebrew, but it keeps to
scramble the Hebrew text in pages I feed into it.

For example:

# why can't I parse the Hebrew correctly?

library(RCurl)
library(XML)
u = "http://humus101.com/?p=2737";
a = getURL(u)
a # Here - the hebrew is fine.
a2 <- htmlParse(a)
a2 # Here it is a mess...

None of these seem to fix it:

htmlParse(a, encoding = "utf-8")

htmlParse(a, encoding = "iso8859-8")

This is my locale:

> Sys.getlocale()

[1] 
"LC_COLLATE=Hebrew_Israel.1255;LC_CTYPE=Hebrew_Israel.1255;LC_MONETARY=Hebrew_Israel.1255;LC_NUMERIC=C;LC_TIME=Hebrew_Israel.1255"
>

Any suggestions?


Thanks up front,
Tal



----------------Contact
Details:-------------------------------------------------------
Contact me: tal.gal...@gmail.com |  972-52-7275845
Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) |
www.r-statistics.com (English)
----------------------------------------------------------------------------------------------

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to