Re: [R] Chinese characters encoding problem with XML

2008-12-31 Thread Wind2
Problems focused on XML methods. xml is OK. And the heading of xml as following: http://www.w3.org/TR/html4/loose.dtd";> 深圳国投 There is correct charset=gb2312, which is also the content of the web page. >doc<-xmlRoot(xml) >doc[[1]] 娣卞湷鍥芥姇 The charset has been changed to UTF-8. > doc1<-xmlRoot(

[R] Chinese characters encoding problem with XML

2008-12-30 Thread Wind
XML is a good tool reading data from web within R. But I wonder how could get the encoding correctly. library(XML) url <- 'http://www.szitic.com/docc/jz-lmzq.html' xml <- htmlTreeParse(url, useInternal=TRUE) q <- "//tbody/tr/td" dat <- unlist(xpathApply(xml, q, xmlValue)) df <- as.data.frame(t(m