You can do something like this: count the number of fields in each line of the file and use the max to determine the number of columns for read.table:
file <- '/tempxx.txt' maxFields <- max(count.fields(file)) # max # now setup read.table for max number input <- read.table(file, colClasses=rep(NA, maxFields), fill=TRUE, col.names=paste("V", seq(maxFields), sep='')) On Sun, May 31, 2009 at 6:06 AM, Martin Tomko <martin.to...@geo.uzh.ch>wrote: > Dear Jim, > with the help of Ted, we diagnosed that the cause is in the extreme > variability in line length during reading in. As the table column number is > apparently determined fro mthe first five lines, what exceeds this length > gets automatically on the next line. > I am now trying to find a way to read in the data despite this. I have no > control over the table extent, the only thing that would make sense > according to my data would be to read in a fixed number of columns and merge > all remaining columns as a long string in the last one. No idea how to do > this, though. > > Thanks > Martin > > > jim holtman wrote: > >> It is still not clear to me exactly how you want to read the lines in. If >> the lines have a variable number of fields, and some of the lines might be >> wrapped, is there some way to determine where the start of each line is. >> If you are reading them in with read.csv, then the system is assuming >> that each line starts a new row. If this is not the case, then you will >> have to state the rules that determine where the lines start. You can >> always read the data in with 'scan' to separate each line and then do >> whatever processing is required to put together the rows in a data frame >> that you want. >> In one of your examples, you indicated that the line was split starting >> at the word "kempten"; if this is in the middle of the line, then you would >> have to create the break after reading the line in with 'scan' and then >> creating the rows in the dataframe. All of this can be done in R if you can >> state what the criteria is. >> On Sat, May 30, 2009 at 4:32 AM, Martin Tomko >> <martin.to...@geo.uzh.ch<mailto: >> martin.to...@geo.uzh.ch>> wrote: >> >> Jim, >> the two lines I put in are the actual problematic input lines. >> In these examples, there are no quotes nor # signs, although I >> have no means to make sure they do not occur in the inputs (any >> hints how I could deal with that?). >> I am trying to avoid as much pre-processing outside R as possible, >> and I have to process about 500 files with up to 3000 records >> each, so I need a more or less automated/batch solution. - so any >> string substitution will have to occur in R. But for the moment, I >> do not see a reaason for substitution, and the wrapping still occurs. >> >> Cheers >> Martin >> >> >> >> jim holtman wrote: >> >> You need to supply the actual input line so we can see what is >> happening. Are you sure you do not have unbalanced quotes in >> your input (try quote='') or do you have comment characters >> ("#") in your input? >> >> On Fri, May 29, 2009 at 3:15 PM, Martin Tomko >> <martin.to...@geo.uzh.ch <mailto:martin.to...@geo.uzh.ch> >> <mailto:martin.to...@geo.uzh.ch >> <mailto:martin.to...@geo.uzh.ch>>> wrote: >> >> Dear All, >> I am observing a strange behavior and searching the >> archives and >> help pages didn't help much. >> I have a csv with a variable number of fields in each line. >> >> I use >> dataPoints <- read.csv(inputFile, head=FALSE, sep=";",fill >> =TRUE); >> >> to read it in, and it works. But - some lines are long and >> 'wrap', >> or split and continue on the next line. So when I check the >> dim of >> the frame, they are not correct and I can see when I do a >> printout >> that the lines is split into two in the frame. I checked >> the input >> file and all is good. >> >> an example of the input is: >> 37;2175168475;13;8.522729;47.19537;16366...@n00 >> ;30;sculpture;bird;tourism;animal;statue;canon;eos;rebel;schweiz;switzerland;eagle;swiss;adler;skulptur;zug;1750;28;tamron;f28;canton;tourismus;vogel;baar;kanton;xti;tamron1750;1750mm;tamron1750mm;400d;rabbitriotnet; >> >> where the last values occurs on the next line in the data >> frame. >> >> It does not have to be the last value, as in the follwong >> example, >> the word "kempten" starts the next line: >> 39;167757703;12;10.309295;47.724545;21903...@n00 >> ;36;white;building;tower;clock;clouds;germany;bayern;deutschland;bavaria;europa;europe;eagle;adler;eu;wolke;dome;townhall;rathaus;turm;weiss;allemagne;europeanunion;bundesrepublik;gebaeude;glocke;brd;allgau;kuppel;europ;kempten;niemcy;europo;federalrepublic;europaischeunion;europaeischeunion;germanio; >> >> What could be the reason? >> >> I ws thinking about solving the issue by using a different >> separator, that I would use for the first 7 fields and >> concatenating all of the remaining values into a single stirng >> value, but could not figure out how to do such a >> substitution in >> R. Unfortunately, on my system I cannot specify a range for >> sed... >> >> Thanks for any help/pointers >> Martin >> >> ______________________________________________ >> R-help@r-project.org <mailto:R-help@r-project.org> >> <mailto:R-help@r-project.org <mailto:R-help@r-project.org>> >> mailing list >> >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> >> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> >> <http://www.r-project.org/posting-guide.html> >> <http://www.r-project.org/posting-guide.html> >> >> and provide commented, minimal, self-contained, >> reproducible code. >> >> >> >> >> -- Jim Holtman >> Cincinnati, OH >> +1 513 646 9390 >> >> What is the problem that you are trying to solve? >> >> >> >> >> >> -- >> Jim Holtman >> Cincinnati, OH >> +1 513 646 9390 >> >> What is the problem that you are trying to solve? >> > > > -- > Martin Tomko > Postdoctoral Research Assistant Geographic Information Systems Division > Department of Geography > University of Zurich - Irchel > Winterthurerstr. 190 > CH-8057 Zurich, Switzerland > > email: martin.to...@geo.uzh.ch > site: http://www.geo.uzh.ch/~mtomko > mob: +41-788 629 558 > tel: +41-44-6355256 > fax: +41-44-6356848 > > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.