Dear Jim, Thank you for your response I appreciate your effort!
It is close, I must admit that. What I am looking for is an object that is identical to 'RAW.API,' or at least in the stricture (I guess i do not need the ","`Content-Type`" = structure(c("text/html", "utf-8"), .Names = c("", "charset")))" part. When I investigate 'x.out' it also have the NA's. I've tried to fix it, but I had to give up. It is strange because getting there seems so easy (warning false logic!). Here is what I got on my looong and alternative route in the hope that someone on the list might be able to help RAW.API <- structure("id,event_arm,name,dob,pushed_text,pushed_calc,complete\n\"01\",\"event_1_arm_1\",\"John\",\"1979-05-01\",\"\",\"\",2\n\"01\",\"event_2_arm_1\",\"John\",\"2012-09-02\",\"abc\",\"123\",1\n\"01\",\"event_3_arm_1\",\"John\",\"2012-09-10\",\"\",\"\",2\n\"02\",\"event_1_arm_1\",\"Mary\",\"1951-09-10\",\"def\",\"456\",2\n\"02\",\"event_2_arm_1\",\"Mary\",\"1978-09-12\",\"\",\"\",2\n", "`Content-Type`" = structure(c("text/html", "utf-8"), .Names = c("","charset"))) # I used an alternative way of converting it to a dataset to keep the leading 0 in the id variables x <- read.table(file = textConnection(RAW.API ), header = TRUE, sep = ",", na.strings = "", stringsAsFactors = FALSE, colClasses ="character") x # now put it back into the same string; write.csv does quote alphanumerics write.csv(x, textConnection('output', 'w'), row.names = FALSE) unlockBinding("output", env = .GlobalEnv) # fixes the problem with the header output[1] <- gsub("\\\"", "", output[1]) # removes NAs output <- gsub("NA", "\"\"", output) # removes "\ at the beginning of each line output <- gsub("^\\\"", "", output) # removes an " at the end of each line output <- gsub("\\\"$", "", output) # same as before x.out <- paste(output, collapse = '\n\"') # adds an line break at the end x.out <- gsub("$", "\n", x.out) # so much manual gsub ... Any help would be very much appreciated. On Wed, Sep 12, 2012 at 5:54 PM, jim holtman <jholt...@gmail.com> wrote: > This is close, but it does quote the header names, but does produce > the same dataframe when read back in: > >> RAW.API <- >> structure("id,event_arm,name,dob,pushed_text,pushed_calc,complete\n\"01\",\"event_1_arm_1\",\"John\",\"1979-05-01\",\"\",\"\",2\n\"01\",\"event_2_arm_1\",\"John\",\"2012-09-02\",\"abc\",\"123\",1\n\"01\",\"event_3_arm_1\",\"John\",\"2012-09-10\",\"\",\"\",2\n\"02\",\"event_1_arm_1\",\"Mary\",\"1951-09-10\",\"def\",\"456\",2\n\"02\",\"event_2_arm_1\",\"Mary\",\"1978-09-12\",\"\",\"\",2\n", >> "`Content-Type`" = structure(c("text/html", "utf-8"), .Names = c("", >> "charset"))) >> x <- read.csv(textConnection(RAW.API), as.is = TRUE) >> x > id event_arm name dob pushed_text pushed_calc complete > 1 1 event_1_arm_1 John 1979-05-01 NA 2 > 2 1 event_2_arm_1 John 2012-09-02 abc 123 1 > 3 1 event_3_arm_1 John 2012-09-10 NA 2 > 4 2 event_1_arm_1 Mary 1951-09-10 def 456 2 > 5 2 event_2_arm_1 Mary 1978-09-12 NA 2 >> >> # now put it back into the same string; write.csv does quote alphanumerics >> write.csv(x, textConnection('output', 'w'), row.names = FALSE) >> x.out <- paste(output, collapse = '\n') >> # read it back in to show it is the same >> x.in <- read.csv(textConnection(x.out), as.is = TRUE) >> x.in > id event_arm name dob pushed_text pushed_calc complete > 1 1 event_1_arm_1 John 1979-05-01 NA 2 > 2 1 event_2_arm_1 John 2012-09-02 abc 123 1 > 3 1 event_3_arm_1 John 2012-09-10 NA 2 > 4 2 event_1_arm_1 Mary 1951-09-10 def 456 2 > 5 2 event_2_arm_1 Mary 1978-09-12 NA 2 >> > > > On Wed, Sep 12, 2012 at 8:21 PM, Eric Fail <eric.f...@gmx.us> wrote: >> Dear R experts, >> >> I'm reading data from an online database via API and it gets delivered in >> this messy comma separated structure, >> >>> RAW.API <- >>> structure("id,event_arm,name,dob,pushed_text,pushed_calc,complete\n\"01\",\"event_1_arm_1\",\"John\",\"1979-05-01\",\"\",\"\",2\n\"01\",\"event_2_arm_1\",\"John\",\"2012-09-02\",\"abc\",\"123\",1\n\"01\",\"event_3_arm_1\",\"John\",\"2012-09-10\",\"\",\"\",2\n\"02\",\"event_1_arm_1\",\"Mary\",\"1951-09-10\",\"def\",\"456\",2\n\"02\",\"event_2_arm_1\",\"Mary\",\"1978-09-12\",\"\",\"\",2\n", >>> "`Content-Type`" = structure(c("text/html", "utf-8"), .Names = c("", >>> "charset"))) >> >> I have this script that nicely parses it into a data frame, >> >>> (df <- read.table(file = textConnection(RAW.API), header = TRUE, >> sep = ",", na.strings = "", stringsAsFactors = FALSE)) >>> id event_arm name dob pushed_text pushed_calc complete >>> 1 1 event_1_arm_1 John 1979-05-01 <NA> NA 2 >>> 2 1 event_2_arm_1 John 2012-09-02 abc 123 1 >>> 3 1 event_3_arm_1 John 2012-09-10 <NA> NA 2 >>> 4 2 event_1_arm_1 Mary 1951-09-10 def 456 2 >>> 5 2 event_2_arm_1 Mary 1978-09-12 <NA> NA 2 >> >> I then do some calculations and write them to pushed_text and pushed_calc >> whereafter I need to format the data back to the messy comma separated >> structure it came in. >> >> I imagine something like this, >> >>> API.back <- `some magic command`(df, ...) >> >>> identical(RAW.API, API.back) >>> [1] TRUE >> >> Some command that can format my data from the data frame I made, df, back to >> the structure that the raw API-object came in, RAW.API. >> >> Any help would be appreciated. >> >> Thanks for reading. >> >> Eric >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > > > -- > Jim Holtman > Data Munger Guru > > What is the problem that you are trying to solve? > Tell me what you want to do, not how you want to do it. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.