You could pre-process your data into a more sensible format. Or you could use scan to read each line of the file, count the number of colons, then use read.table with ncolons + 1 columns. Or you could use read.table with many more columns than are ever going to be in the data, then delete the empty ones. Or you could use read.table to read everything in as a signle column, then use strsplit() to split it at the colons.
There are generally lots of ways to do things, but they vary in efficiency both on the programming side and the execution side. For instance, the lots of columns solution is by far the easiest on the programmer, but is terribly inefficient and may fail completely for very large datasets. Sarah On Sun, Mar 6, 2011 at 12:47 PM, Johannes Graumann <johannes_graum...@web.de> wrote: > Thank you for pointing this out. This is really inconvenient as I do not > know a priori how many and where those darn cases containing an additional > (or more) ":" might be ... > > The seems to work, but will fail if there's a "1:sdfjhlfkh:2:adlkjf" > somewhere (1 & 2 both integerable). > > na.exclude(as.integer(scan("/tmp/testfile.txt",sep=":",what="integer"))) > > More robust pointers anyone? > > Joh > > Sarah Goslee wrote: > >> Not so much a mystery. read.table() only looks at the first 5 lines when >> decided how many columns your file has (as described in the Details >> section of the help). >> >> The easiest solution is to add a col.names argument to read.table() with >> the correct number of names. >> >> You may want to also include as.is=TRUE if you don't want your data to >> be imported as factors. If you expect character but have factor you may >> get unexpected results later. >> >> Sarah >> >> On Sun, Mar 6, 2011 at 5:04 AM, Johannes Graumann >> <johannes_graum...@web.de> wrote: >>> Hello, > >>> >>> Please have a look at the code below, which I use to read in the attached >>> file. As line 18 of the file reads "1065:>sp|Q9V3T9|ADRO_DROME >>> NADPH:adrenodoxin oxidoreductase, mitochondrial OS=Drosophila >>> melanogaster GN=dare PE=2 SV=1", I expect the code below to produce a 3 >>> column data frame with most of the last column empty and line 18 to >>> produce a data.frame row like so: >>> >>> V1 >>> 1065 >>> V2 >>> >sp|Q9V3T9|ADRO_DROME NADPH >>> V3 >>> adrenodoxin oxidoreductase, mitochondrial OS=Drosophila >>> melanogaster GN=dare PE=2 SV=1 >>> >>> Why is that not so? >>> >>> Thanks for any hint. >>> >>> Sincerely, Joh >>> >>> read.table( >>> "/tmp/testfile.txt", >>> sep=":", >>> header=FALSE, >>> quote="", >>> fill=TRUE >>> )[19,] >> -- Sarah Goslee http://www.functionaldiversity.org ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.