Try this: > cat(c("[ID: 001 ] [Writer: Steven Moffat ] [Rating: 8.9 ] Doctor Who", + "[ID: 002 ] [Writer: Joss Whedon ] [Rating: 8.8 ] Buffy", + "[ID: 003 ] [Writer: J. Michael Straczynski ] [Rating: 7.4 ]Babylon"), + sep = "\n", file = "tmp.txt") > > # read in the data and parse it assuming it has the same structure > input <- readLines('tmp.txt') > # parse it item by item > x.id <- sub(".*\\[ID: ([[:digit:]]+).*", "\\1 <file://0.0.0.1/>", input) > x.writer <- sub(".*\\[Writer:([^]]+).*", '\\1', input) > x.rating <- sub(".*\\[Rating: ([0-9.]+).*", '\\1', input) > x.prog <- sub(".*\\](.*)", '\\1', input) > #create dataframe > data.frame(id=x.id, writer=x.writer, rating=x.rating, prog=x.prog) id writer rating prog 1 001 Steven Moffat 8.9 Doctor Who 2 002 Joss Whedon 8.8 Buffy 3 003 J. Michael Straczynski 7.4 Babylon >
On Thu, May 6, 2010 at 9:58 AM, Tony B <tony.bre...@googlemail.com> wrote: > Dear all > > Lets say I have a plain text file as follows: > > > cat(c("[ID: 001 ] [Writer: Steven Moffat ] [Rating: 8.9 ] Doctor Who", > + "[ID: 002 ] [Writer: Joss Whedon ] [Rating: 8.8 ] Buffy", > + "[ID: 003 ] [Writer: J. Michael Straczynski ] [Rating: 7.4 ] > Babylon [5]"), > + sep = "\n", file = "tmp.txt") > > I would somehow like to read in this file to R and covert it into a > data frame like this: > > > DF <- data.frame(ID = c("001", "002", "003"), > + Writer = c("Steven Moffat", "Joss Whedon", "J. > Michael Straczynski"), > + Rating = c("8.9", "8.8", "7.4"), > + Text = c("Doctor Who", "Buffy", "Babylon [5]"), > stringsAsFactors = FALSE) > > > My initial thoughts were to use readLines on the text file and maybe > do some regular expressions and also use strsplit(..); but having > confused myself after several attempts I was wondering if there is a > way, perhaps using maybe read.table instead? My end goal is to > hopefully convert DF into an XML structure. > > Thank you kindly in advance for your time, > Tony Breyal > > # Windows Vista > > sessionInfo() > R version 2.11.0 (2010-04-22) > i386-pc-mingw32 > > locale: > [1] LC_COLLATE=English_United Kingdom.1252 LC_CTYPE=English_United > Kingdom.1252 LC_MONETARY=English_United Kingdom.1252 > LC_NUMERIC=C LC_TIME=English_United Kingdom. > 1252 > > attached base packages: > [1] stats graphics grDevices utils datasets methods > base > > other attached packages: > [1] XML_2.8-1 > > loaded via a namespace (and not attached): > [1] tools_2.11.0 > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html> > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.