On Jul 14, 2013, at 10:57 AM, David Winsemius wrote:

> 
> On Jul 14, 2013, at 9:48 AM, Houhou Li wrote:
> 
>> Hi,
>> 
>> I have several really big data files in csv format like this: the first line 
>> is the header, the second to fourth lines have info about the file and are 
>> the lines I need to skip (data in 2-4th lines are not correspoding to 
>> variable names in the hearder), from the fifth line, real data begins, but 
>> the last line is not a data line, it's the string "Done" instead of normal 
>> EOF character. All data is numeric. I tried to use read.table(), read.csv() 
>> with colClasses="numeric" and scan(), but couldn't make them work. Can 
>> anyone help me? How can I get rid of the last line "Done" automatically? I 
>> would like to use R script to do it automatically, not to do formatting in 
>> Excel then read back to R. Thank you very much, here is an example of the 
>> data:
> 
> Deleting the last line in Excel would not make sense unless this is already 
> data in Excel. Better would be to sue a text editor. Less likely to corrupt 
> the data.
> 
>> 
>> Tag,X,Y,BlobRegion,swaths,fr_int_20,fr_int_60,i60,RawTothgt,RawHtlc,RawRad20,RawRad40,RawRad60,RawRad80,CCV,BlobPerim,n_pts,n_pts_i255,vts,vts2,vtg,home,sum_ht,sum_ht_sq,dcch,dcch2,nb_ccv,n_nb,nb_sum_hts,nb_sum_hts2,z_tip_dist,nb_MassLen,n_f_rtns20,n_f_rtns60,max_fl_pt_count,loreyrawht,p00ile_cm,p25ile_cm,p50ile_cm,p75ile_cm,iq25,iq50,iq75,mean_intns
>> 01_24_2013.001,SF12
>>        5413
>>   509627.82,  4869704.98,   509999.83,  4869999.98
>> 123,509692.55,4869856.64,18,0,80.53,81.03,84,36.2100,17.1521,4.0359,4.0359,3.8881,2.9217,1737.13,31.42,210,210,0.828,0.955,0.281,28.50,5746.46,163727.12,0.764,1.000,1147.23,33,769.16,19024.42,0.01,0.09,174,163,174,34.90,140,2369,2849,3157,33,81,110,71.59
>> 159,509679.19,4869855.54,18,0,77.62,78.97,75,30.4000,11.2000,2.5319,2.5129,2.3365,1.8315,3248.82,21.42,90,90,0.877,0.936,0.589,22.91,2000.74,46861.45,0.691,0.999,1772.06,14,365.47,10233.32,0.04,0.68,81,66,81,33.29,905,1869,2272,2633,55,82,98,71.62
> 
> Read the first line with readLines using n=1 saving as 'colnams'
> Read the dat <- read.table( ...  with skip=4, sep=",", and fill = TRUE
> Delete last line holding "Done" and a large number of NA's
> names(dat) <- scan(text=colnams, what=character(0), sep="," )
> 
> (Tested. Expected results achieved.)

 Lines <- 
"Tag,X,Y,BlobRegion,swaths,fr_int_20,fr_int_60,i60,RawTothgt,RawHtlc,RawRad20, 
RawRad40,RawRad60,RawRad80,CCV,BlobPerim,n_pts,n_pts_i255,vts,vts2,vtg,home,sum_ht,
 sum_ht_sq,dcch,dcch2,nb_ccv,n_nb,nb_sum_hts,nb_sum_hts2,z_tip_dist,nb_MassLen, 
n_f_rtns20,n_f_rtns60,max_fl_pt_count,loreyrawht,p00ile_cm,p25ile_cm,p50ile_cm, 
p75ile_cm,iq25,iq50,iq75,mean_intns
 01_24_2013.001,SF12
         5413
    509627.82,  4869704.98,   509999.83,  4869999.98
 
123,509692.55,4869856.64,18,0,80.53,81.03,84,36.2100,17.1521,4.0359,4.0359,3.8881,
 
2.9217,1737.13,31.42,210,210,0.828,0.955,0.281,28.50,5746.46,163727.12,0.764,1.000,
 
1147.23,33,769.16,19024.42,0.01,0.09,174,163,174,34.90,140,2369,2849,3157,33,81,
 110,71.59
 
159,509679.19,4869855.54,18,0,77.62,78.97,75,30.4000,11.2000,2.5319,2.5129,2.3365,
 
1.8315,3248.82,21.42,90,90,0.877,0.936,0.589,22.91,2000.74,46861.45,0.691,0.999,
 
1772.06,14,365.47,10233.32,0.04,0.68,81,66,81,33.29,905,1869,2272,2633,55,82,98,71.62
 Done"
 colnams <- readLines(textConnection(Lines), n=1)
 scan(text=colnams, what=character(0), sep="," ) # check scan code
# snipped
 dat <- read.table( text=Lines, skip=4, sep=",", fill = TRUE)
 dat <- dat[-NROW(dat), ]
 names(dat) <- scan(text=colnams, what=character(0), sep="," )
# Read 44 items
 dat

> -- 
> David
> 
> 
> David Winsemius
> Alameda, CA, USA
> 
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

David Winsemius
Alameda, CA, USA

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to