> On Jan 4, 2017, at 8:51 PM, David Wolfskill <r...@catwhisker.org> wrote: > > On Wed, Jan 04, 2017 at 08:33:46PM -0800, David Winsemius wrote: >> ... >> Perhaps something like this: >> >> # function to read the values in 'values': >> parse_values <- function(x) {scan(text= gsub( "\\[|\\]","",x), sep=",") } >> >> # the apply function reads line-by-line >> new_dat <- apply(test_data, 1, function(d) data.frame( as.list(d[!names(d) >> %in% "values"]), nvals <- parse_values(d['values']) ) ) > > Hmmm.... OK; that looks a lot better than the stuff that was coming to > my mind -- thanks! :-) > >> ... >> # Could suppress the report from scan by adding quiet = TRUE >> # now take this list of 4 line data.frames and "rbind" them >> # If you wanted these to remain character you would use >> stringsAsFactors=FALSE in the data.frame call >>> new_df <- do.call("rbind", new_dat) > > Aye. > >> ... >>> (I will also end up collecting all of the records for a given timestamp >>> and hostname, and creating one very wide record with all of the data >>> from the set of records thus found. I already have (yes, Perl) code to >>> do this -- though if there's a reasonable way to avoid that, I'm >>> interested.) >> >> I thought you wanted the data in long form. > > Sorry; I'm not understanding what you mean: My background is a lot more > toward systems administration than statistical analysis. > > The repository I'm using has a rather large number of individual metrics > from a given server -- each provided on a separate row. (That's why one > of the columns is called "name" -- it provides the (base) "name" of the > metric that corresponds to the "values" on the given row.) I'll plan to > assemble the rows for a given server & timestamp into a single row -- > thuse, I would have the tcp_connection_count for the "last ACK" state > and for the "fin_wait_2" state, as well as CpuSystem, CpuUser, CpuIdle, > ... for the given server & timestamp on a single row (eventually).
If you wanted it in wide form, you could just join the individual id columns to a named list made from the parsed 'values'. > new_dat <- apply(test_data, 1, function(d) data.frame( as.list(d[!names(d) > %in% "values"]), as.list( setNames( parse_values(d['values']), paste0( "V", > 1:4) ) ) ) ) Read 4 items Read 4 items Read 4 items Read 4 items Read 4 items Read 4 items Read 4 items Read 4 items Read 4 items Read 4 items Read 4 items Read 4 items Read 4 items Read 4 items Read 4 items > new_df <- do.call("rbind", new_dat) > > str(new_df) 'data.frame': 15 obs. of 14 variables: $ start : Factor w/ 1 level "1482793200": 1 1 1 1 1 1 1 1 1 1 ... $ hostname : Factor w/ 2 levels "c001.example.net",..: 1 1 1 1 1 1 1 1 1 1 ... $ mtype : Factor w/ 3 levels "health","net",..: 1 1 2 2 2 2 2 2 2 3 ... $ limit_type: Factor w/ 3 levels "fill","serve",..: 1 2 3 3 3 3 3 3 3 3 ... $ hw : Factor w/ 2 levels "1.16","1.21": 1 1 1 1 1 1 1 1 1 1 ... $ fw : Factor w/ 2 levels "2017Q1.1.1","2016Q4.2.13": 1 1 1 1 1 1 1 1 1 1 ... $ tcp_state : Factor w/ 6 levels "","closed","closing",..: 1 1 1 1 2 3 4 5 6 1 ... $ value_type: Factor w/ 2 levels "limit","": 1 1 2 2 2 2 2 2 2 2 ... $ nic : Factor w/ 3 levels "all","","mce0": 1 1 2 2 2 2 2 2 2 2 ... $ name : Factor w/ 8 levels "in_download_window",..: 1 1 2 3 4 4 4 4 4 5 ... $ V1 : num 0 0 260411 18436 5 ... $ V2 : num 0 0 258470 18249 3 ... $ V3 : num 0 0 260579 18202 3 ... $ V4 : num 0 0 258763 17818 3 ... > >> ... > > Thanks again! > > Peace, > david > -- > David H. Wolfskill r...@catwhisker.org > Epistemology for post-truthers: How do we select parts of reality to ignore? > > See http://www.catwhisker.org/~david/publickey.gpg for my public key. David Winsemius Alameda, CA, USA ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.