On Feb 9, 2014, at 2:48 PM, Burhan ul haq wrote: > Hi, > > I am trying to read in a file, which is not delimited by any specific > characters. > > Something as follows: > ## ------------------------------------------------------------------- Lines <- readLines(textConnection("GunPos RaceNo Name Gender Cat Club GunTime ChipPos ChipTime 1,10038, Carl Allwood M Sutton & Ashfield Harriers 02:38:40 1 02:38:40 2,10098, Adam Holland M Votwo/USN 02:41:25 2 02:41:25 3,13007, Pumlani Bangani M 02:43:23 3 02:43:23 4,10028, Anthony Jackson M Sittingbourne Striders 02:44:39 4 02:44:39 5,10187, Peter Stockdale M 02:45:26 5 02:45:25 6,10064, Jared Bethell M Harlow RC 02:46:43 6 02:46:40 7,13003, Sarah Harris F 35 Long Eaton RC 02:47:47 7 02:47:44 8,13009, Rod Harris M 02:47:47 8 02:47:45 9,10033, Carl Sommer M Huncote Harriers 02:47:59 9 02:47:58 10,10037, Peter Swaine M Charnwood AC 02:49:28 10 02:49:27 11,10048, Pavel Toropov M 02:50:41 11 02:50:41 12,10008, Derek Dunne M 45 Treasury Running Club 02:51:42 12 02:51:40 13,10044, Matthew Nutt M Scunthorpe 02:52:20 13 02:52:15 14,10380, Ludovic Renou M 02:53:37 14 02:53:34 15,10056, Alex Keenan M 02:53:48 15 02:53:47"))
Lines1 <- sub("( M | F )", ",\\1,", Lines) Lines2 <- sub("( \\d+ )", ",\\1,", Lines1) Need to edit header to have commas as separators. You can then just use: read.table (text=Lines2, sep=",", header=TRUE) > > > As I failed to read it in via R or Excel, I used a text editor with > regular expressions, sublime to be exact. I was trying to convert it > in CSV format, and was successful to put commas for the first two > entries, as follows: > > ## ------------------------------------------------------------------- > GunPos RaceNo Name Gender Cat Club GunTime ChipPos ChipTime > 1,10038, Carl Allwood ,M ,Sutton & Ashfield Harriers 02:38:40 1 02:38:40 > 2,10098, Adam Holland ,M ,Votwo/USN 02:41:25 2 02:41:25 > 3,13007, Pumlani Bangani ,M ,02:43:23 3 02:43:23 > 4,10028, Anthony Jackson ,M ,Sittingbourne Striders 02:44:39 4 02:44:39 > 5,10187, Peter Stockdale ,M ,02:45:26 5 02:45:25 > 6,10064, Jared Bethell ,M ,Harlow RC 02:46:43 6 02:46:40 > 7,13003, Sarah Harris ,F ,35 Long Eaton RC 02:47:47 7 02:47:44 > 8,13009, Rod Harris ,M ,02:47:47 8 02:47:45 > 9,10033, Carl Sommer ,M ,Huncote Harriers 02:47:59 9 02:47:58 > 10,10037, Peter Swaine ,M ,Charnwood AC 02:49:28 10 02:49:27 > 11,10048, Pavel Toropov ,M ,02:50:41 11 02:50:41 > 12,10008, Derek Dunne ,M ,45 Treasury Running Club 02:51:42 12 02:51:40 > 13,10044, Matthew Nutt ,M ,Scunthorpe 02:52:20 13 02:52:15 > 14,10380, Ludovic Renou ,M ,02:53:37 14 02:53:34 > 15,10056, Alex Keenan ,M ,02:53:48 15 02:53:47 > ## ------------------------------------------------------------------- > > I am failing after that, I tried to search the expression: > (.)*(\d{2}:\d{2}:\d{2})( ) > and replace it with: \1,\2,\3, with the result: > > ## ------------------------------------------------------------------- > GunPos RaceNo Name Gender Cat Club GunTime ChipPos ChipTime > ,02:38:40, 1 02:38:40 > ,02:41:25, 2 02:41:25 > ## ------------------------------------------------------------------- > > How do I fix the regular expression here. If you examine the later > entries some name contains hyphen, or have three parts, so other > approaches do not work well. > > Secondly, is there a better way to handle this problem. The original > input file is in pdf format.I copied the text, and made a txt file out > of it. > > The input txt file is attached. > > Thanks in advance for any suggestions. > > David Winsemius Alameda, CA, USA ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.