Just one comment. The code posted works as shown but if in your case Lines is actually composed of separate lines rather than one big string as in my example then you will need to add a simplify = c argument to each strapply call.
On Wed, Nov 5, 2008 at 7:32 AM, Gabor Grothendieck <[EMAIL PROTECTED]> wrote: > Here is another solution made slightly shorter by using > strapply twice: > > z <- zoo(strapply(Lines, "[0-9]+[.][0-9]+", as.numeric)[[1]], > strapply(Lines, "....-..-..", as.Date)[[1]]) > > or to create a data frame: > > DF <- data.frame(date = strapply(Lines, "....-..-..", as.Date)[[1]], > price = strapply(Lines, "[0-9]+[.][0-9]+", as.numeric)[[1]]) > > On Wed, Nov 5, 2008 at 6:22 AM, Gabor Grothendieck > <[EMAIL PROTECTED]> wrote: >> As others have pointed out its close to XML but not quite >> there; however, you could use strapply in gsubfn to extract >> the data. It pulls out the data matching the regular expression >> giving vector, vec, consisting of: date price date price ... >> Pulling out even and odd elements separately and >> converting them to Date and numeric, respectively, gives the >> resulting data.frame. >> >> See >> http://gsubfn.googlecode.com >> for more on the gsubfn package and >> the three zoo vignettes in the zoo package for more on it. >> >> Lines <- '- <Temp diffgr:id="Temp14" msdata:rowOrder="13"> >> <Date>2005-01-17T00:00:00+05:30</Date> >> <SecurityID>10149</SecurityID> >> <PriceClose>1288.40002</PriceClose> >> </Temp> >> - <Temp diffgr:id="Temp15" msdata:rowOrder="14"> >> <Date>2005-01-18T00:00:00+05:30</Date> >> <SecurityID>10149</SecurityID> >> <PriceClose>1291.69995</PriceClose> >> </Temp> >> - <Temp diffgr:id="Temp16" msdata:rowOrder="15"> >> <Date>2005-01-19T00:00:00+05:30</Date> >> <SecurityID>10149</SecurityID> >> <PriceClose>1288.19995</PriceClose> >> </Temp>' >> >> library(gsubfn) >> vec <- strapply(Lines, "....-..-..|[0-9]+[.][0-9]+")[[1]] >> ix <- seq_along(vec) %% 2 == 1 >> DF <- data.frame(date = as.Date(vec[ix]), price = as.numeric(vec[!ix])) >> >> # or, instead of the last line, you could convert it to a zoo object so >> # that its in a more convenient form for time series manipulation: >> >> library(zoo) >> z <- zoo(as.numeric(vec[!ix]), as.Date(vec[ix])) >> >> >> >> On Wed, Nov 5, 2008 at 1:22 AM, RON70 <[EMAIL PROTECTED]> wrote: >>> >>> Hi everyone, >>> >>> I have this kind of raw dataset : >>> >>> - <Temp diffgr:id="Temp14" msdata:rowOrder="13"> >>> <Date>2005-01-17T00:00:00+05:30</Date> >>> <SecurityID>10149</SecurityID> >>> <PriceClose>1288.40002</PriceClose> >>> </Temp> >>> - <Temp diffgr:id="Temp15" msdata:rowOrder="14"> >>> <Date>2005-01-18T00:00:00+05:30</Date> >>> <SecurityID>10149</SecurityID> >>> <PriceClose>1291.69995</PriceClose> >>> </Temp> >>> - <Temp diffgr:id="Temp16" msdata:rowOrder="15"> >>> <Date>2005-01-19T00:00:00+05:30</Date> >>> <SecurityID>10149</SecurityID> >>> <PriceClose>1288.19995</PriceClose> >>> </Temp> >>> >>> I was looking for some R procedure to extract data from this, that should be >>> in following format : >>> >>> 2005-01-17 1288.40002 >>> 2005-01-18 1291.69995 >>> 2005-01-19 1288.19995 >>> >>> Can R help me to do this? >>> >>> -- >>> View this message in context: >>> http://www.nabble.com/How-to-extract-following-data-tp20336690p20336690.html >>> Sent from the R help mailing list archive at Nabble.com. >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >>> >> > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.