On Jan 8, 2015, at 10:20 AM, DVL wrote: > I'm trying to import a many gigabyte .txt file to analyze. It is asterisk > delimited. I'm having an issue with the date field in the dataset. In the > first 165 lines dates are listed as : > YYYY-MM-DD HH:MM:SS > > Then on the 166th line and in other places the date spans two lines: > YYYY-MM-DD > HH:MM:SS > > This causes a problem because R thinks it has reached the end of a row in > the table. How can I solve this?
It would probably be easiest to edit the file in a text editor. I suppose you could also read the file in with readLines() and do the work all in R but that sounds a bit more painful than option 1 to my reading. If the problems are only those exactly as you describe, this could be an untested outline of a solution: dat <- readLines("/pat/fil.ext") marks <- nchar(dat) == 10 #or marks <- grepl("[*]", dat) # append shortened lines after broken fragments dat[ marks ] <- paste(dat[ marks ], dat[ c(head(marks,-1), FALSE) ] ) final <- dat[ ! c(head(marks,-1), FALSE) ] # remove shorter lines > View this message in context: > http://r.789695.n4.nabble.com/Huge-Dataset-Dates-Span-two-Lines-tp4701523.html > Sent from the R help mailing list archive at Nabble.com. > Nabble is not the Rhelp Archive and it also suppresses these message which you should be sure to read: *______________________________________________ *R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see *https://stat.ethz.ch/mailman/listinfo/r-help *PLEASE do read the posting guide http://www.R-project.org/posting-guide.html *and provide commented, minimal, self-contained, reproducible code. -- David Winsemius Alameda, CA, USA ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.