Problem solved by Josh O'Brien on stackoverflow, http://stackoverflow.com/questions/12393004/parsing-back-to-messy-api-strcuture/12435389#12435389
some_magic <- function(df) { ## Replace NA with "", converting column types as needed df[] <- lapply(df, function(X) { if(any(is.na(X))) {X[is.na(X)] <- ""; X} else {X} }) ## Print integers in first column as 2-digit character strings ## (DO NOTE: Hardwiring the number of printed digits here is probably ## inadvisable, though needed to _exactly_ reconstitute RAW.API.) df[[1]] <- sprintf("%02.0f", df[[1]]) ## Separately build header and table body, then suture them together l1 <- paste(names(df), collapse=",") l2 <- capture.output(write.table(df, sep=",", col.names=FALSE, row.names=FALSE)) out <- paste0(c(l1, l2, ""), collapse="\n") ## Reattach attributes att <- list("`Content-Type`" = structure(c("text/html", "utf-8"), .Names = c("", "charset"))) attributes(out) <- att out } identical(some_magic(df), RAW.API) # [1] TRUE On Thu, Sep 13, 2012 at 11:32 AM, Eric Fail <eric.f...@gmx.us> wrote: > Dear Jim, > > Thank you for your response I appreciate your effort! > > It is close, I must admit that. What I am looking for is an object > that is identical to 'RAW.API,' or at least in the stricture (I guess > i do not need the ","`Content-Type`" = structure(c("text/html", > "utf-8"), .Names = c("", > "charset")))" part. > > When I investigate 'x.out' it also have the NA's. I've tried to fix > it, but I had to give up. It is strange because getting there seems so > easy (warning false logic!). > > Here is what I got on my looong and alternative route in the hope that > someone on the list might be able to help > > RAW.API <- > structure("id,event_arm,name,dob,pushed_text,pushed_calc,complete\n\"01\",\"event_1_arm_1\",\"John\",\"1979-05-01\",\"\",\"\",2\n\"01\",\"event_2_arm_1\",\"John\",\"2012-09-02\",\"abc\",\"123\",1\n\"01\",\"event_3_arm_1\",\"John\",\"2012-09-10\",\"\",\"\",2\n\"02\",\"event_1_arm_1\",\"Mary\",\"1951-09-10\",\"def\",\"456\",2\n\"02\",\"event_2_arm_1\",\"Mary\",\"1978-09-12\",\"\",\"\",2\n", > "`Content-Type`" = structure(c("text/html", "utf-8"), .Names = > c("","charset"))) > > # I used an alternative way of converting it to a dataset to keep the > leading 0 in the id variables > x <- read.table(file = textConnection(RAW.API ), header = TRUE, sep = > ",", na.strings = "", stringsAsFactors = FALSE, colClasses ="character") > x > > # now put it back into the same string; write.csv does quote alphanumerics > write.csv(x, textConnection('output', 'w'), row.names = FALSE) > unlockBinding("output", env = .GlobalEnv) > # fixes the problem with the header > output[1] <- gsub("\\\"", "", output[1]) > # removes NAs > output <- gsub("NA", "\"\"", output) > # removes "\ at the beginning of each line > output <- gsub("^\\\"", "", output) > # removes an " at the end of each line > output <- gsub("\\\"$", "", output) > # same as before > x.out <- paste(output, collapse = '\n\"') > # adds an line break at the end > x.out <- gsub("$", "\n", x.out) > > # so much manual gsub ... > > Any help would be very much appreciated. > > On Wed, Sep 12, 2012 at 5:54 PM, jim holtman <jholt...@gmail.com> wrote: >> This is close, but it does quote the header names, but does produce >> the same dataframe when read back in: >> >>> RAW.API <- >>> structure("id,event_arm,name,dob,pushed_text,pushed_calc,complete\n\"01\",\"event_1_arm_1\",\"John\",\"1979-05-01\",\"\",\"\",2\n\"01\",\"event_2_arm_1\",\"John\",\"2012-09-02\",\"abc\",\"123\",1\n\"01\",\"event_3_arm_1\",\"John\",\"2012-09-10\",\"\",\"\",2\n\"02\",\"event_1_arm_1\",\"Mary\",\"1951-09-10\",\"def\",\"456\",2\n\"02\",\"event_2_arm_1\",\"Mary\",\"1978-09-12\",\"\",\"\",2\n", >>> "`Content-Type`" = structure(c("text/html", "utf-8"), .Names = c("", >>> "charset"))) >>> x <- read.csv(textConnection(RAW.API), as.is = TRUE) >>> x >> id event_arm name dob pushed_text pushed_calc complete >> 1 1 event_1_arm_1 John 1979-05-01 NA 2 >> 2 1 event_2_arm_1 John 2012-09-02 abc 123 1 >> 3 1 event_3_arm_1 John 2012-09-10 NA 2 >> 4 2 event_1_arm_1 Mary 1951-09-10 def 456 2 >> 5 2 event_2_arm_1 Mary 1978-09-12 NA 2 >>> >>> # now put it back into the same string; write.csv does quote alphanumerics >>> write.csv(x, textConnection('output', 'w'), row.names = FALSE) >>> x.out <- paste(output, collapse = '\n') >>> # read it back in to show it is the same >>> x.in <- read.csv(textConnection(x.out), as.is = TRUE) >>> x.in >> id event_arm name dob pushed_text pushed_calc complete >> 1 1 event_1_arm_1 John 1979-05-01 NA 2 >> 2 1 event_2_arm_1 John 2012-09-02 abc 123 1 >> 3 1 event_3_arm_1 John 2012-09-10 NA 2 >> 4 2 event_1_arm_1 Mary 1951-09-10 def 456 2 >> 5 2 event_2_arm_1 Mary 1978-09-12 NA 2 >>> >> >> >> On Wed, Sep 12, 2012 at 8:21 PM, Eric Fail <eric.f...@gmx.us> wrote: >>> Dear R experts, >>> >>> I'm reading data from an online database via API and it gets delivered in >>> this messy comma separated structure, >>> >>>> RAW.API <- >>>> structure("id,event_arm,name,dob,pushed_text,pushed_calc,complete\n\"01\",\"event_1_arm_1\",\"John\",\"1979-05-01\",\"\",\"\",2\n\"01\",\"event_2_arm_1\",\"John\",\"2012-09-02\",\"abc\",\"123\",1\n\"01\",\"event_3_arm_1\",\"John\",\"2012-09-10\",\"\",\"\",2\n\"02\",\"event_1_arm_1\",\"Mary\",\"1951-09-10\",\"def\",\"456\",2\n\"02\",\"event_2_arm_1\",\"Mary\",\"1978-09-12\",\"\",\"\",2\n", >>>> "`Content-Type`" = structure(c("text/html", "utf-8"), .Names = c("", >>>> "charset"))) >>> >>> I have this script that nicely parses it into a data frame, >>> >>>> (df <- read.table(file = textConnection(RAW.API), header = TRUE, >>> sep = ",", na.strings = "", stringsAsFactors = FALSE)) >>>> id event_arm name dob pushed_text pushed_calc complete >>>> 1 1 event_1_arm_1 John 1979-05-01 <NA> NA 2 >>>> 2 1 event_2_arm_1 John 2012-09-02 abc 123 1 >>>> 3 1 event_3_arm_1 John 2012-09-10 <NA> NA 2 >>>> 4 2 event_1_arm_1 Mary 1951-09-10 def 456 2 >>>> 5 2 event_2_arm_1 Mary 1978-09-12 <NA> NA 2 >>> >>> I then do some calculations and write them to pushed_text and pushed_calc >>> whereafter I need to format the data back to the messy comma separated >>> structure it came in. >>> >>> I imagine something like this, >>> >>>> API.back <- `some magic command`(df, ...) >>> >>>> identical(RAW.API, API.back) >>>> [1] TRUE >>> >>> Some command that can format my data from the data frame I made, df, back >>> to the structure that the raw API-object came in, RAW.API. >>> >>> Any help would be appreciated. >>> >>> Thanks for reading. >>> >>> Eric >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. >> >> >> >> -- >> Jim Holtman >> Data Munger Guru >> >> What is the problem that you are trying to solve? >> Tell me what you want to do, not how you want to do it. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.