Re: [R] Parsing a Simple Chemical Formula

Spencer Graves Sun, 26 Dec 2010 20:26:02 -0800

Mike Marchywka's post mentioned a CRAN package, "rpubchem",missed by my search for "chemical formula". A further search for"chemical" and "chemistry" still missed it. "compound" found it.Adding "compounds" and combining them with "union" produced a list of564 links in 219 packages; 7 of the help pages were for "rpubchem".The package with the most matches is "seacarb" (seawater carbonatechemistry with R: 21 matches), followed by "CHNOSZ", previouslymentioned (19 matches). " rpubchem" is the 22nd package on this list (5matches, with a max score of 32, less than the max score of 2 otherpackages with 5 matches).


      Spencer


On 12/26/2010 7:36 PM, Bryan Hanson wrote:

Hi David & others...
I did find the function you recommended, plus, it's even easier (but alittle hidden in the doc): >element(form, "mass"). But, this uses theatomic masses from the periodic table, which are weighted averages ofthe isotopes of each element. What I'm doing actually involves massspectrometry, so I need the isotope masses, which are integers (think12C, 13C, 14C, but the periodic table says 12.011 reflecting therelative abundances). I used Gabor's solution and got my littlefunction humming. Plus, I have several things to read through fromthe various recommendations.
Thanks again, Bryan

On Dec 26, 2010, at 10:21 PM, David Winsemius wrote:
On Dec 26, 2010, at 8:28 PM, Bryan Hanson wrote:
Thanks Spencer, I'll definitely have a look at this package and it'svignettes. I believe I have looked at it before, but didn't catchit on this particular search. Bryan
Using the thermo list that the makeup function accesses to get itsvalid atomic symbols one can arrive at the the answer you positedwould be too difficult in you first posting, the atomic weight fromthe formulae:
> str(thermo$element)
'data.frame':    130 obs. of  6 variables:
$ element: chr  "Z" "O" "H" "He" ...
$ state  : chr  "aq" "gas" "gas" "gas" ...
$ source : chr  "CWM89" "CWM89" "CWM89" "CWM89" ...
$ mass   : num  0 16 1.01 4 20.18 ...
$ s      : num  -15.6 49 31.2 30.2 35 ...
$ n      : int  1 2 2 1 1 1 1 1 2 2 ...

patts <- paste("^", rownames(makeup(form)), "$", sep="")
makuform<- makeup(form)
makuform$amass <- sapply(patts, function(x) {return( thermo$element[grep(x, thermo$element[[1]])[1], "mass"])} )
sum(makuform$amass *makuform$count)
# [1] 167.0457
On Dec 26, 2010, at 8:16 PM, Spencer Graves wrote:
p.s. help(pac=CHNOSZ) reveals that this package has 3 vignettes.I have not looked at these vignettes, but most vignettes provideexcellent introductions (though rarely with complete coverage) ofimportant capabilities of the package. (The 'sos' package includesa vignette, which exposes more capabilities than the example below.)
######################
   Have you considered the 'CHNOSZ' package?
makeup("C5H11BrO" )
count
C      5
H     11
Br     1
O      1


   I found this using the 'sos' package as follows:


library(sos)
cf <- ???'chemical formula'
found 21 matches;  retrieving 2 pages
cf
The print method for "cf" opened the results in a web browser,which showed that the "CHNOSZ" package had 14 of these 11 matches,and the other 7 were in 7 different packages. Moreover, the"CHNOSZ" package is devoted to "Chemical Thermodynamics andActivity Diagrams" and provides many more capabilities that mightinterest you.
   Hope this helps.
   Spencer


On 12/26/2010 5:01 PM, Bryan Hanson wrote:
Well let me just say thanks and WOW! Four great ideas, eachworthy of
study and I'll learn several things from each.  Interestingly, these
solutions seem more general and more compact than the solutions I
found on the 'net using python and perl.  More evidence for the power
of R!  A big thanks to each of you!  Bryan

On Dec 26, 2010, at 7:26 PM, Gabor Grothendieck wrote:
On Sun, Dec 26, 2010 at 6:29 PM, Bryan Hanson <han...@depauw.edu>wrote:
Hello R Folks...
I've been looking around the 'net and I see many complexsolutions in
various languages to this question, but I have a pretty simple need
(and I'm
not much good at regex).  I want to use a chemical formula as a
function
argument.  The formula would be in "Hill order" which is to list C,
then H,
then all other elements in alphabetical order. My example willhave
only a
limited number of elements, few enough that one can search directly
for each
element.  So some examples would be C5H12, or C5H12O or C5H11BrO
(note that
for oxygen and bromine, O or Br, there is no following number
meaning a 1 is
implied).

Let's say
form <- "C5H11BrO"
I'd like to get the count of each element, so in this case Ineed to
extract
C and 5, H and 11, Br and 1, O and 1 (I want to calculate themolecular
weight by mulitplying).  Sounds pretty simple, but my experiments
with grep
and strsplit don't immediately clue me into an obvioussolution. As
I said,
I don't need a general solution to the problem of calculatingmolecularweight from an arbitrary formula, that seems quite challenging,just
a way
to convert "form" into a list or data frame which I can then do the
math on.

Here's hoping this is a simple issue for more experienced R users!
TIA,
This can be done by strapply in gsubfn.  It matches the regular
expression to the target string passing the back references (the
parenthesized portions of the regular expression) through aspecified
function as successive arguments.
Thus the first arg is form, your input string. The second arg isthe
regular expression which matches an upper case letter optionally
followed by lower case letters and all that is optionallyfollowed by
digits.  The third arg is a function shown in a formula
representation. strapply passes the back references (i.e. theportions
within parentheses) to the function as the two arguments.  Finally
simplify is another function in formula notation which turns the
result into a matrix and then a data frame.  Finally we make the
second column of the data frame numeric.

library(gsubfn)

DF <- strapply(form,
"([A-Z][a-z]*)(\\d*)",
~ c(..1, if (nchar(..2)) ..2 else 1),
simplify = ~ as.data.frame(t(matrix(..1, 2)), stringsAsFactors =
FALSE))
DF[[2]] <- as.numeric(DF[[2]])

DF looks like this:
DF
V1 V2
1  C  5
2  H 11
3 Br  1
4  O  1



--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
David Winsemius, MD
West Hartford, CT


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Parsing a Simple Chemical Formula

Reply via email to