try this: > f.extract <- function(formula) + { + # pattern to match the initial chemical + # assumes chemical starts with an upper case and optional lower case followed + # by zero or more digits. + first <- "^([[:upper:]][[:lower:]]?)([0-9]*).*" + # inverse of above to remove the initial chemical + last <- "^[[:upper:]][[:lower:]]?[0-9]*(.*)" + result <- list() + extract <- formula + # repeat as long as there is data + while ((start <- nchar(extract)) > 0){ + chem <- sub(first, '\\1 \\2', extract) + extract <- sub(last, '\\1', extract) + # if the number of characters is the same, then there was an error + if (nchar(extract) == start){ + warning("Invalid formula:", formula) + return(NULL) + } + # append to the list + result[[length(result) + 1L]] <- strsplit(chem, ' ')[[1]] + } + result + } > f.extract("C5H11BrO") [[1]] [1] "C" "5"
[[2]] [1] "H" "11" [[3]] [1] "Br" [[4]] [1] "O" > f.extract("H2O") [[1]] [1] "H" "2" [[2]] [1] "O" > f.extract("CCC") [[1]] [1] "C" [[2]] [1] "C" [[3]] [1] "C" > f.extract("Crr") # bad NULL Warning message: In f.extract("Crr") : Invalid formula:Crr > > On Sun, Dec 26, 2010 at 6:29 PM, Bryan Hanson <han...@depauw.edu> wrote: > Hello R Folks... > > I've been looking around the 'net and I see many complex solutions in > various languages to this question, but I have a pretty simple need (and I'm > not much good at regex). I want to use a chemical formula as a function > argument. The formula would be in "Hill order" which is to list C, then H, > then all other elements in alphabetical order. My example will have only a > limited number of elements, few enough that one can search directly for each > element. So some examples would be C5H12, or C5H12O or C5H11BrO (note that > for oxygen and bromine, O or Br, there is no following number meaning a 1 is > implied). > > Let's say > >> form <- "C5H11BrO" > > I'd like to get the count of each element, so in this case I need to extract > C and 5, H and 11, Br and 1, O and 1 (I want to calculate the molecular > weight by mulitplying). Sounds pretty simple, but my experiments with grep > and strsplit don't immediately clue me into an obvious solution. As I said, > I don't need a general solution to the problem of calculating molecular > weight from an arbitrary formula, that seems quite challenging, just a way > to convert "form" into a list or data frame which I can then do the math on. > > Here's hoping this is a simple issue for more experienced R users! TIA, > Bryan > *********** > Bryan Hanson > Professor of Chemistry & Biochemistry > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.