If you use check.names=FALSE in your call to read.csv you can see that the first column name starts with the 3 bytes ef bb bf, which is the UTF-8 "byte-order mark" that Microsoft applications like to put at the start of a text file stored in UTF-8.
> v0514 <- read.csv(unz(temp, file0514[1]), stringsAsFactors=FALSE, > check.names=FALSE) > names(v0514)[1] [1] "Accident_Index" > charToRaw(names(v0514)[1]) [1] ef bb bf 41 63 63 69 64 65 6e 74 5f 49 6e 64 65 78 I thought that adding fileEncoding="UTF-8-BOM" or perhaps encoding="UTF-8-BOM" would take care of the issue, but it does not do it for me. You can remove them by hand with substring() > substring(names(v0514)[1],4) [1] "Accident_Index" Bill Dunlap TIBCO Software wdunlap tibco.com On Thu, Feb 9, 2017 at 4:13 PM, jing hua zhao <jinghuaz...@hotmail.com> wrote: > Dear R-devel, > > > I appear to see differences in behavior of unz between Windows and Linux. > > > url0514 <- > "http://data.dft.gov.uk.s3.amazonaws.com/road-accidents-safety-data/Stats19_Data_2005-2014.zip" > file0514 <- c("Vehicles0514.csv","Casualties0514.csv","Accidents0514.csv") > > temp <- tempfile() > download.file(url0514,temp) > a0514 <<- read.csv(unz(temp, file0514[3])) > > c0514 <<- read.csv(unz(temp, file0514[2])) > > v0514 <<- read.csv(unz(temp, file0514[1])) > > > Under Windows, I noticed that there are variables i..Accident_Index in > objects [a|c|v]0514, but this is not the case if zip file contains only one > file, i.e., > > file2015 <- c("Vehicles_2015.csv","Casualties_2015.csv","Accidents_2015.csv") > url2015 <- > "http://data.dft.gov.uk/road-accidents-safety-data/RoadSafetyData_2015.zip" > download.file(url2015,temp) > v2015 <<- read.csv(unz(temp, file2015[1])) > c2015 <<- read.csv(unz(temp, file2015[2])) > a2015 <<- read.csv(unz(temp, file2015[3])) > > > so to combine [a|c|v]0514 and [a|c|v]2015, I need to add something like > > > names(a0514)[names(a0514)=="ï..Accident_Index"] <- "Accident_Index" > names(c0514)[names(c0514)=="ï..Accident_Index"] <- "Accident_Index" > names(v0514)[names(v0514)=="ï..Accident_Index"] <- "Accident_Index" > > > This is unnecessary under Linux (RHEL), since those i..Accident_Index have no > i.. prefix. > > > Do I miss anything here? > > > Many thanks, > > > > > Jing Hua Zhao > > [[alternative HTML version deleted]] > > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel