On 09/10/2013 10:37, Milan Bouchet-Valat wrote:
Le mardi 08 octobre 2013 à 16:02 -0700, Ira Sharenow a écrit :
A colleague is sending me quite a few files that have been saved with MS
SQL Server 2005. I am using R 2.15.1 on Windows 7.

I am trying to read in the files using standard techniques. Although the
file has a csv extension when I go to Excel or WordPad and do SAVE AS I
see that it is Unicode Text. Notepad indicates that the encoding is
Unicode. Right now I have to do a few things from within Excel (such as
Text to Columns) and eventually save as a true csv file before I can
read it into R and then use it.

Is there an easy way to solve this from within R? I am also open to easy
SQL Server 2005 solutions.

I tried the following from within R.

testDF = read.table("Info06.csv", header = TRUE, sep = ",")

testDF2 =  iconv(x = testDF, from = "Unicode", to = "")

Error in iconv(x = testDF, from = "Unicode", to = "") :

unsupported conversion from 'Unicode' to '' in codepage 1252

# The next line did not produce an error message

testDF3 =  iconv(x = testDF, from = "UTF-8" , to = "")

testDF3[1:6,  1:3]

Error in testDF3[1:6, 1:3] : incorrect number of dimensions

# The next line did not produce an error message

testDF4 =  iconv(x = testDF, from = "macroman" , to = "")

testDF4[1:6,  1:3]

Error in testDF4[1:6, 1:3] : incorrect number of dimensions

  Encoding(testDF3)

[1] "unknown"

  Encoding(testDF4)

[1] "unknown"

This is the first few lines from WordPad

Date,StockID,Price,MktCap,ADV,SectorID,Days,A1,std1,std2

2006-01-03
00:00:00.000,@Stock1,2.53,467108197.38,567381.144444444,4,133.14486997089,-0.0162107939626307,0.0346283580367959,0.0126471695454834

2006-01-03
00:00:00.000,@Stock2,1.3275,829803070.531114,6134778.93292,5,124.632223896458,0.071513138376339,0.0410694546850102,0.0172091268025929
What's the actual problem? You did not state any. Do you get accentuated
characters that are not printed correctly after importing the file? In
the two lines above it does not look like there would be any non-ASCII
characters in this file, so encoding would not matter.

It is most likely UCS-2. That has embedded NULs, so the encoding does matter. All 8-bit encodings extend ASCII: others do not, in general.


--
Brian D. Ripley,                  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford,             Tel:  +44 1865 272861 (self)
1 South Parks Road,                     +44 1865 272866 (PA)
Oxford OX1 3TG, UK                Fax:  +44 1865 272595

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to