On 16/05/2011 8:33 AM, Lyolya wrote:
Dear Duncan,
Thank you very much for your reply!
I have tried what you have suggested. R was definitely assuming a different
text encoding, and after trying the l10n_info() command, I got the
following:
l10n_info()
$MBCS
[1] TRUE
$`UTF-8`
[1] TRUE
$`Latin-1`
[1] FALSE
My data is a dataframe (stored both in .xls and .dbf files) that represents
the secondary housing market for Moscow for a given period of time. The
problem is that the factors are given by Russian strings (those like general
condition of the dwelling and the material the house is built of), and R
does not read them correctly. This makes the analysis really complicated.
In order to read the file, I do the following:
require(foreign)
MSL_1010<- read.dbf("MSL_1010.dbf") # I tried both as.is=TRUE and FALSE
and then when it comes to strings it reads something like: \x96\x80\x8e.
I'm not familiar with Russian encodings. If you know what encoding is
in the file, you may be able to use iconv() to convert it to UTF-8,
which the l10n_info function says is native to your system. To
simplify things, use
read.dbf( "MSL_1010.dbf", as.is = TRUE)
so that you don't have to worry about factors and factor names. Then try
iconv(x, from="KOI8-R", to="UTF-8")
where x is one of the character vectors with bad characters. If that
doesn't work, try a different possible encoding (e.g. cp1251).
Duncan Murdoch
On 14 May 2011 01:08, Duncan Murdoch<murdoch.dun...@gmail.com> wrote:
> On 13/05/2011 4:57 PM, lyolya wrote:
>
>> Hello,
>>
>> I am experiencing a problem in reading a database in Russian. The problem
>> appears when it comes to char variables. I have already tried changing the
>> encoding, i.e.
>>
>> options(encoding="UTF-8")
>>
>> and
>>
>> options(encoding="KOI8-R")
>>
>> but every time there appear to be something unreadable in the data frame,
>> like \x82\xa2\xae\xef etc.
>>
>> Could you please answer whether it is possible to operate with Russian
>> strings in R, and, if yes, how to get to do that. Thank you, in advance.
>>
>
> Yes, it is possible. You can test it using a text editor that supports
> Russian. Just put
>
> x<- " some Russian text "
>
> into the file, the use source() to read the filename. Two things are
> likely outcomes:
>
> x will be defined to be a string holding Russian text, and it will display
> properly.
>
> OR
>
> it will be defined to be a string with lots of escapes or mis-displayed
> characters in it. In the latter case, the problem is that R is assuming a
> different encoding than your text editor. The l10n_info() will display
> information about what R is expecting.
>
> If none of the above helps you to get your code working, then you'll have
> to give details on exactly what you're doing to read the file, and exactly
> what is in the file.
>
> Duncan Murdoch
>
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.