Tal, It looks like the data you received has HTML special hex characters. That is, 'ש' is just an ASCII HTML representation of a hex character. It's not encoded in a special manner.
The trick is to substitute the HTML encoded hex character for its binary representation, or "decode" the character. I don't know of any R function that does this, but there are web services, for example: http://www.hashemian.com/tools/html-url-encode-decode.php I decoded your file using this service and posted it on my website. You can see the difference by running: readLines("http://biostatmatt.com/temp/Hebrew-original", warn=FALSE) readLines("http://biostatmatt.com/temp/Hebrew-decoded", warn=FALSE) The second should display the Hebrew characters correctly (it does in my terminal). The next thing to think about is how to automate this in R without using the web service... We may need to write an HTMLDecode function if there isn't one already. By the way, what's the Hebrew text in English? Best, Matt On Thu, 2010-12-09 at 12:21 -0500, Tal Galili wrote: > I am bumping this question in the hopes that someone might be able to > advise. > This Hebrew and R business is not as smooth as I had hoped... > > Thanks, > Tal > > Older massage: > > On Tue, Dec 7, 2010 at 2:30 PM, Tal Galili <tal.gal...@gmail.com> wrote: > > > Hello all, > > > > # I am trying to read the text in this URL: > > u <- > > http://google.com/complete/search?output=toolbar&q=%d7%a9%d7%9c%d7%95%d7%9d > > # By using this command: > > readLines(u) > > > > And no matter what variation I tried, I keep getting this output: > > [1] "<?xml version=\"1.0\"?><toplevel><CompleteSuggestion><suggestion > > data=\"שלום\"/>< (etc...) > > > > > > Instead of this output: > > <?xml version="1.0"?><toplevel><CompleteSuggestion><suggestion data="שלום > > "/><num_queries > > int="16800000"/></CompleteSuggestion><CompleteSuggestion><suggestion > > data="שלום חנוך"/><num_queries int="232000"/></CompleteSuggestion> > > <CompleteSuggestion><suggestion data="שלום עליכם"/ > > (etc....) > > > > > > > I tried: > > readLines(u, encoding= "latin1") > > readLines(u, encoding= "UTF-8") > > And also changing Sys.setlocale: > > Sys.setlocale("LC_ALL", "Hebrew") # must be done for Hebrew to work. > > Sys.setlocale("LC_ALL", "English") # must be done for Hebrew to work. > > > > Are there any more options I could try to get this text properly encoded? > > > > Thanks! > > Tal > > > > > > > > ----------------Contact > > Details:------------------------------------------------------- > > Contact me: tal.gal...@gmail.com | 972-52-7275845 > > Read me: www.talgalili.com (Hebrew) | www.biostatistics.co.il (Hebrew) | > > www.r-statistics.com (English) > > > > ---------------------------------------------------------------------------------------------- > > > > > > > > [[alternative HTML version deleted]] > -- Matthew S. Shotwell Graduate Student Division of Biostatistics and Epidemiology Medical University of South Carolina ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.