[R] Chinese characters in html source captured by download.file() are garbled code , how to convert it readable

Yong Wang Sun, 28 Jul 2013 20:34:22 -0700

Dear list,
I am working with R to download numerous html source code from which the
data extracted will be further processed.
The problem is the Chinese character in the html source code are all
garbled and I can't really find a way to convert them to something readable.
This problem persists on ubuntu-10 and win-7, English environment. Not try
Operating system in Chinese yet.
I know literally nothing about encoding and a comprehensive search online
does not save me from this woe.


# the code
download.file("
https://www.google.com.hk/finance/company_news?q=SHA:601857&gl=cn&num=200
",destfile="tmp.txt")
test<-readLines("tmp.txt",encoding="UTF-8")

    #the garbled code in "tmp.txt" and "test" is like below
    #ï¿½ï¿½&#22269;ï¿½Ûªoï¿½ÑµMï¿½aï¿½Ñ¥ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½ï¿½qï¿½]ï¿½


Any help is highly appreciated.

yong

        [[alternative HTML version deleted]]

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Chinese characters in html source captured by download.file() are garbled code , how to convert it readable

Reply via email to