Hi Matt, Thanks for having a look at this. I just spent some time looking around and couldn't find any R function to decode decimal HTML code.
Do you (or someone else on the list) knows how to program this sort of thing? (is there a formula for the translation? p.s: For it to work on my end I added the encoding parameter: readLines("http://biostatmatt.com/temp/Hebrew-decoded", warn=FALSE, encoding= "UTF-8") p.p.s: The Hebrew word I used means "peace" Cheers, Tal ----------------Contact Details:------------------------------------------------------- Contact me: tal.gal...@gmail.com | 972-52-7275845 Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) ---------------------------------------------------------------------------------------------- On Thu, Dec 9, 2010 at 8:38 PM, Matt Shotwell <shotw...@musc.edu> wrote: > Tal, > > It looks like the data you received has HTML special hex characters. > That is, 'ש' is just an ASCII HTML representation of a hex > character. It's not encoded in a special manner. > > The trick is to substitute the HTML encoded hex character for its binary > representation, or "decode" the character. I don't know of any R > function that does this, but there are web services, for example: > http://www.hashemian.com/tools/html-url-encode-decode.php > > I decoded your file using this service and posted it on my website. You > can see the difference by running: > > readLines("http://biostatmatt.com/temp/Hebrew-original", warn=FALSE) > > readLines("http://biostatmatt.com/temp/Hebrew-decoded", warn=FALSE) > > The second should display the Hebrew characters correctly (it does in my > terminal). The next thing to think about is how to automate this in R > without using the web service... We may need to write an HTMLDecode > function if there isn't one already. > > By the way, what's the Hebrew text in English? > > Best, > Matt > > > On Thu, 2010-12-09 at 12:21 -0500, Tal Galili wrote: > > I am bumping this question in the hopes that someone might be able to > > advise. > > This Hebrew and R business is not as smooth as I had hoped... > > > > Thanks, > > Tal > > > > Older massage: > > > > On Tue, Dec 7, 2010 at 2:30 PM, Tal Galili <tal.gal...@gmail.com> wrote: > > > > > Hello all, > > > > > > # I am trying to read the text in this URL: > > > u <- > > > > http://google.com/complete/search?output=toolbar&q=%d7%a9%d7%9c%d7%95%d7%9d > > > # By using this command: > > > readLines(u) > > > > > > And no matter what variation I tried, I keep getting this output: > > > [1] "<?xml version=\"1.0\"?><toplevel><CompleteSuggestion><suggestion > > > data=\"שלום\"/>< (etc...) > > > > > > > > > > Instead of this output: > > > <?xml version="1.0"?><toplevel><CompleteSuggestion><suggestion > data="ש××× > > > "/><num_queries > int="16800000"/></CompleteSuggestion><CompleteSuggestion><suggestion > > > data="ש××× ×× ××"/><num_queries int="232000"/></CompleteSuggestion> > > > <CompleteSuggestion><suggestion data="ש××× ×¢××××"/ > > > (etc....) > > > > > > > > > > > I tried: > > > readLines(u, encoding= "latin1") > > > readLines(u, encoding= "UTF-8") > > > And also changing Sys.setlocale: > > > Sys.setlocale("LC_ALL", "Hebrew") # must be done for Hebrew to work. > > > Sys.setlocale("LC_ALL", "English") # must be done for Hebrew to work. > > > > > > Are there any more options I could try to get this text properly > encoded? > > > > > > Thanks! > > > Tal > > > > > > > > > > > > ----------------Contact > > > Details:------------------------------------------------------- > > > Contact me: tal.gal...@gmail.com | 972-52-7275845 > > > Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) > | > > > www.r-statistics.com (English) > > > > > > > ---------------------------------------------------------------------------------------------- > > > > > > > > > > > > > [[alternative HTML version deleted]] > > > > -- > Matthew S. Shotwell > Graduate Student > Division of Biostatistics and Epidemiology > Medical University of South Carolina > > [[alternative HTML version deleted]]
______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.