Try this:

> cat(c("[ID: 001 ] [Writer: Steven Moffat ] [Rating: 8.9 ] Doctor Who",
+       "[ID: 002 ] [Writer: Joss Whedon ] [Rating: 8.8 ] Buffy",
+       "[ID: 003 ] [Writer: J. Michael Straczynski ] [Rating: 7.4
]Babylon"),
+       sep = "\n", file = "tmp.txt")
>
> # read in the data and parse it assuming it has the same structure
> input <- readLines('tmp.txt')
> # parse it item by item
> x.id <- sub(".*\\[ID: ([[:digit:]]+).*", "\\1 <file://0.0.0.1/>", input)
> x.writer <- sub(".*\\[Writer:([^]]+).*", '\\1', input)
> x.rating <- sub(".*\\[Rating: ([0-9.]+).*", '\\1', input)
> x.prog <- sub(".*\\](.*)", '\\1', input)
> #create dataframe
> data.frame(id=x.id, writer=x.writer, rating=x.rating, prog=x.prog)
   id                   writer rating        prog
1 001           Steven Moffat     8.9  Doctor Who
2 002             Joss Whedon     8.8       Buffy
3 003  J. Michael Straczynski     7.4     Babylon
>


On Thu, May 6, 2010 at 9:58 AM, Tony B <tony.bre...@googlemail.com> wrote:

> Dear all
>
> Lets say I have a plain text file as follows:
>
> > cat(c("[ID: 001 ] [Writer: Steven Moffat ] [Rating: 8.9 ] Doctor Who",
> +       "[ID: 002 ] [Writer: Joss Whedon ] [Rating: 8.8 ] Buffy",
> +       "[ID: 003 ] [Writer: J. Michael Straczynski ] [Rating: 7.4 ]
> Babylon [5]"),
> +       sep = "\n", file = "tmp.txt")
>
> I would somehow like to read in this file to R and covert it into a
> data frame like this:
>
> > DF <- data.frame(ID = c("001", "002", "003"),
> +                 Writer = c("Steven Moffat", "Joss Whedon", "J.
> Michael Straczynski"),
> +                 Rating = c("8.9", "8.8", "7.4"),
> +                 Text = c("Doctor Who", "Buffy", "Babylon [5]"),
> stringsAsFactors = FALSE)
>
>
> My initial thoughts were to use readLines on the text file and maybe
> do some regular expressions and also use strsplit(..); but having
> confused myself after several attempts I was wondering if there is a
> way, perhaps using maybe read.table instead?  My end goal is to
> hopefully convert DF into an XML structure.
>
> Thank you kindly in advance for your time,
> Tony Breyal
>
> # Windows Vista
> > sessionInfo()
> R version 2.11.0 (2010-04-22)
> i386-pc-mingw32
>
> locale:
> [1] LC_COLLATE=English_United Kingdom.1252  LC_CTYPE=English_United
> Kingdom.1252    LC_MONETARY=English_United Kingdom.1252
> LC_NUMERIC=C                            LC_TIME=English_United Kingdom.
> 1252
>
> attached base packages:
> [1] stats     graphics  grDevices utils     datasets  methods
> base
>
> other attached packages:
> [1] XML_2.8-1
>
> loaded via a namespace (and not attached):
> [1] tools_2.11.0
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html<http://www.r-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.
>



-- 
Jim Holtman
Cincinnati, OH
+1 513 646 9390

What is the problem that you are trying to solve?

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to