Thanks for the suggestion. From R version 3.0.2, I tried

> testDF7 =iconv(x = test07 , from = "UCS-2", to = "")

>  Encoding(testDF7)

[1] "unknown"

> testDF7[1:6]

[1] NA NA NA NA NA NA

So using "UCS-2" produced the same results as before.

I do not think there are any NA values. I cleaned up the csv file from 
within Excel. Then read it into R

>  sum(is.na(workingDF))

[1] 0

Also the Excel COUNTBLANK function gave me zero.

On 10/9/2013 11:33 PM, Prof Brian Ripley wrote:
> On 09/10/2013 10:37, Milan Bouchet-Valat wrote:
>> Le mardi 08 octobre 2013 à 16:02 -0700, Ira Sharenow a écrit :
>>> A colleague is sending me quite a few files that have been saved 
>>> with MS
>>> SQL Server 2005. I am using R 2.15.1 on Windows 7.
>>>
>>> I am trying to read in the files using standard techniques. Although 
>>> the
>>> file has a csv extension when I go to Excel or WordPad and do SAVE AS I
>>> see that it is Unicode Text. Notepad indicates that the encoding is
>>> Unicode. Right now I have to do a few things from within Excel (such as
>>> Text to Columns) and eventually save as a true csv file before I can
>>> read it into R and then use it.
>>>
>>> Is there an easy way to solve this from within R? I am also open to 
>>> easy
>>> SQL Server 2005 solutions.
>>>
>>> I tried the following from within R.
>>>
>>> testDF = read.table("Info06.csv", header = TRUE, sep = ",")
>>>
>>>> testDF2 =  iconv(x = testDF, from = "Unicode", to = "")
>>>
>>> Error in iconv(x = testDF, from = "Unicode", to = "") :
>>>
>>> unsupported conversion from 'Unicode' to '' in codepage 1252
>>>
>>> # The next line did not produce an error message
>>>
>>>> testDF3 =  iconv(x = testDF, from = "UTF-8" , to = "")
>>>
>>>> testDF3[1:6,  1:3]
>>>
>>> Error in testDF3[1:6, 1:3] : incorrect number of dimensions
>>>
>>> # The next line did not produce an error message
>>>
>>>> testDF4 =  iconv(x = testDF, from = "macroman" , to = "")
>>>
>>>> testDF4[1:6,  1:3]
>>>
>>> Error in testDF4[1:6, 1:3] : incorrect number of dimensions
>>>
>>>>   Encoding(testDF3)
>>>
>>> [1] "unknown"
>>>
>>>>   Encoding(testDF4)
>>>
>>> [1] "unknown"
>>>
>>> This is the first few lines from WordPad
>>>
>>> Date,StockID,Price,MktCap,ADV,SectorID,Days,A1,std1,std2
>>>
>>> 2006-01-03
>>> 00:00:00.000,@Stock1,2.53,467108197.38,567381.144444444,4,133.14486997089,-0.0162107939626307,0.0346283580367959,0.0126471695454834
>>>  
>>>
>>>
>>> 2006-01-03
>>> 00:00:00.000,@Stock2,1.3275,829803070.531114,6134778.93292,5,124.632223896458,0.071513138376339,0.0410694546850102,0.0172091268025929
>>>  
>>>
>> What's the actual problem? You did not state any. Do you get accentuated
>> characters that are not printed correctly after importing the file? In
>> the two lines above it does not look like there would be any non-ASCII
>> characters in this file, so encoding would not matter.
>
> It is most likely UCS-2.  That has embedded NULs, so the encoding does 
> matter.  All 8-bit encodings extend ASCII: others do not, in general.
>
>


        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to