(Note: This follows an earlier mistaken reply just to Duncan) Multiple "amens!" to Duncan's comments...
However: Here is a start at my interpretation of how to do what you want. Note first that your "example" listed 4 fields in the line, but you showed only 3. I modified your example for 3 text fields, only one of which has brackets ([...]) in it I assume. Here is a little example of how to use regex's to replace the commas within the brackets by "-", which would presumably then allow you to easily convert the text into a data frame e.g. using textConnection() and read.csv. Obviously, if this is not what you meant, read no further. ##Example txt <-c("Sam, [HadoopAnalyst, DBA, Developer], R46443 ","Jan, DBA, R101", "Mary, [Stats, Designer, R], t14") wh <- grep("\\[.+\\]",txt) ## which records need to be modified? fixup <- gsub(" *, *","-",sub(".+(\\[.+\\]).+","\\1",txt[wh])) ## bracketed expressions, changing "," to "-" ## Unfortunately, the "replacement" argument in sub() is not vectorized, se we need a loop: for(i in wh) txt[wh[i]] <- sub("\\[.+\\]",fixup[i],txt[wh[i]]) ## replace original bracketed text with fixed up bracketed text > txt [1] "Sam, [HadoopAnalyst-DBA-Developer], R46443 " [2] "Jan, DBA, R101" [3] "Mary, [HadoopAnalyst-DBA-Developer], t14" Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Sun, Apr 7, 2019 at 9:00 AM Duncan Murdoch <murdoch.dun...@gmail.com> wrote: > On 06/04/2019 10:03 a.m., Amit Govil wrote: > > Hi, > > > > I have a bunch of csv files to read in R. I'm unable to read them > correctly > > because in some of the files, there is a column ("Role") which has comma > in > > the values. > > > > Sample data: > > > > User, Role, Rule, GAPId > > Sam, [HadoopAnalyst, DBA, Developer], R46443 > > > > I'm trying to play with the below code but it doesnt work: > > Since you didn't give a reproducible example, you should at least say > what "doesn't work" means. > > But here's some general advice: if you want to debug code, don't write > huge expressions like the chain of functions below, put things in > temporary variables and make sure you get what you were expecting at > each stage. > > Instead of > > > > files <- list.files(pattern='.*REDUNDANT(.*).csv$') > > > > tbl <- sapply(files, function(f) { > > gsub('\\[|\\]', '"', readLines(f)) %>% > > read.csv(text = ., check.names = FALSE) > > }) %>% > > bind_rows(.id = "id") %>% > > select(id, User, Rule) %>% > > distinct() > > try > > > files <- list.files(pattern='.*REDUNDANT(.*).csv$') > > tmp1 <- sapply(files, function(f) { > gsub('\\[|\\]', '"', readLines(f)) %>% > read.csv(text = ., check.names = FALSE) > }) > > tmp2 <- tmp1 %>% bind_rows(.id = "id") > > tmp3 <- tmp2 %>% select(id, User, Rule) > > tbl <- tmp3 %>% distinct() > > (You don't need pipes here, but it will make it easier to put the giant > expression back together at the end.) > > Then look at tmp1, tmp2, tmp3 as well as tbl to see where things went > wrong. > > Duncan Murdoch > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.