Thank you so much Jeff. It worked for this example. When I read it from a file (c:\data\test.txt) it did not work
KLEM="c:\data" KR=paste(KLEM,"\test.txt",sep="") indta <- readLines(KR, skip=46) # not interested in the first 46 lines) pattern <- "^.*group (\\d+)[^:]*: *([-+0-9.eE]*).*$" firstlines <- grep( pattern, indta ) # Replace the matched portion (entire string) with the first capture # string v1 <- as.numeric( sub( pattern, "\\1", indta[ firstlines ] ) ) # Replace the matched portion (entire string) with the second capture # string v2 <- as.numeric( sub( pattern, "\\2", indta[ firstlines ] ) ) # Convert the lines just after the first lines to numeric v3 <- as.numeric( indta[ firstlines + 1 ] ) # put it all into a data frame result <- data.frame( Group = v1, Mean = v2, SE = v3 ) result [1] Group Mean SE <0 rows> (or 0-length row.names) Thank you in advance On Tue, May 31, 2016 at 1:12 AM, Jeff Newmiller <jdnew...@dcn.davis.ca.us> wrote: > Please learn to post in plain text (the setting is in your email client... > somewhere), as HTML is "What We See Is Not What You Saw" on this mailing > list. In conjunction with that, try reading some of the fine material > mentioned in the Posting Guide about making reproducible examples like this > one: > > # You could read in a file > # indta <- readLines( "out.txt" ) > # but there is no "current directory" in an email > # so here I have used the dput() function to make source code > # that creates a self-contained R object > > indta <- c( > "Mean of weight group 1, SE of mean : 72.289037489555276", > " 11.512956539215610", > "Average weight of group 2, SE of Mean : 83.940053900595013", > " 10.198495690144522", > "group 3 mean , SE of Mean : 78.310441258245469", > " 13.015876679555", > "Mean of weight of group 4, SE of Mean : 76.967516495101669", > " 12.1254882985", "") > > # Regular expression patterns are discussed all over the internet > # in many places OTHER than R > # You can start with ?regex, but there are many fine tutorials also > > pattern <- "^.*group (\\d+)[^:]*: *([-+0-9.eE]*).*$" > # For this task the regex has to match the whole "first line" of each set > # ^ =match starting at the beginning of the string > # .* =any character, zero or more times > # "group " =match these characters > # ( =first capture string starts here > # \\d = any digit (first backslash for R, second backslash for regex) > # + =one or more of the preceding (any digit) > # ) =end of first capture string > # [^:] =any non-colon character > # * =zero or more of the preceding (non-colon character) > # : =match a colon exactly > # " *" =match zero or more spaces > # ( =second capture string starts here > # [ =start of a set of equally acceptable characters > # -+ =either of these characters are acceptable > # 0-9 =any digit would be acceptable > # . =a period is acceptable (this is inside the []) > # eE =in case you get exponential notation input > # ] =end of the set of acceptable characters (number) > # * =number of acceptable characters can be zero or more > # ) =second capture string stops here > # .* =zero or more of any character (just in case) > # $ =at end of pattern, requires that the match reach the end > # of the string > > # identify indexes of strings that match the pattern > firstlines <- grep( pattern, indta ) > # Replace the matched portion (entire string) with the first capture # > string > v1 <- as.numeric( sub( pattern, "\\1", indta[ firstlines ] ) ) > # Replace the matched portion (entire string) with the second capture # > string > v2 <- as.numeric( sub( pattern, "\\2", indta[ firstlines ] ) ) > # Convert the lines just after the first lines to numeric > v3 <- as.numeric( indta[ firstlines + 1 ] ) > # put it all into a data frame > result <- data.frame( Group = v1, Mean = v2, SE = v3 ) > > Figuring out how to deliver your result (output) is a separate question that > depends where you want it to go. > > > On Mon, 30 May 2016, Val wrote: > >> Hi all, >> >> I have a messy text file and from this text file I want extract some >> information >> here is the text file (out.txt). One record has tow lines. The mean comes >> in the first line and the SE of the mean is on the second line. Here is >> the >> sample of the data. >> >> Mean of weight group 1, SE of mean : 72.289037489555276 >> 11.512956539215610 >> Average weight of group 2, SE of Mean : 83.940053900595013 >> 10.198495690144522 >> group 3 mean , SE of Mean : 78.310441258245469 >> 13.015876679555 >> Mean of weight of group 4, SE of Mean : 76.967516495101669 >> 12.1254882985 >> >> I want produce the following table. How do i read it first and then >> produce a >> >> >> Gr1 72.289037489555276 11.512956539215610 >> Gr2 83.940053900595013 10.198495690144522 >> Gr3 78.310441258245469 13.015876679555 >> Gr4 76.967516495101669 12.1254882985 >> >> >> Thank you in advance >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > --------------------------------------------------------------------------- > Jeff Newmiller The ..... ..... Go Live... > DCN:<jdnew...@dcn.davis.ca.us> Basics: ##.#. ##.#. Live Go... > Live: OO#.. Dead: OO#.. Playing > Research Engineer (Solar/Batteries O.O#. #.O#. with > /Software/Embedded Controllers) .OO#. .OO#. rocks...1k > --------------------------------------------------------------------------- ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.