Re: [R] Pattern match

neetika nath Thu, 21 Apr 2011 02:28:32 -0700

Thank you Dennis,

yes the problem is the input file. i have .rdf file and the format is in
same way i have posted earlier. if i open that file in notepad++ the lines
are divided or broken  with CR+LF character. so any suggestion to retrieve
SpeciesScientific information without changing the input file?


Thank you

On Wed, Apr 20, 2011 at 9:49 PM, Dennis Murphy <djmu...@gmail.com> wrote:

> Hi:
>
> This is a bit of a roundabout approach; I'm sure that folks with regex
> expertise will trump this in a heartbeat. I modified the last piece of
> the string a bit to accommodate the approach below. Depending on where
> the strings have line breaks, you may have some odd '\n' characters
> inserted.
>
> # Step 1: read the input as a single character string
> u <- "SpeciesCommon=(Human);SpeciesScientific=(Homo
>
> sapiens);ReactiveCentres=(N,C,C,C,+H,O,C,C,C,C,O,H);BondInvolved=(C-H);EzCatDBID=(S00343);BondFormed=(O-H,O-H);Bond=(255B);Cofactors=(Cu(II),CU,501,A,Cu(II),CU,502,A);CatalyticSwissProt=(P25006);SpeciesScientific=(Achromobacter
> cycloclastes);SpeciesCommon=(Bacteria);Reactive=(Ce+)"
>
> # Step 2: Split input lines by the ';' delimiter and then use lapply()
> to split variable names from values.
> # This results in a nested list for ulist2.
> ulist <- strsplit(u, ';')
> ulist2 <- lapply(ulist, function(s) strsplit(s, '='))
>
> # Step 3: Break out the results into a matrix whose first column is
> the variable name
> # and whose second column is the value (with parens included)
> # This avoids dealing with nested lists
> v <- matrix(unlist(ulist2), ncol = 2, byrow = TRUE)
>
> # Step 4: Strip off the parens
> w <- apply(v, 2, function(s) gsub('([\\(\\)])', '', s))
> colnames(w) <- c('Name', 'Value')
> w
>      Name                 Value
>  [1,] "SpeciesCommon"      "Human"
>  [2,] "SpeciesScientific"  "Homo sapiens"
>  [3,] "ReactiveCentres"    "N,C,C,C,+H,O,C,C,C,C,O,H"
>  [4,] "BondInvolved"       "C-H"
>  [5,] "EzCatDBID"          "S00343"
>  [6,] "BondFormed"         "O-H,O-H"
>  [7,] "Bond"               "255B"
>  [8,] "Cofactors"          "CuII,CU,501,A,CuII,CU,502,A"
>  [9,] "CatalyticSwissProt" "P25006"
> [10,] "SpeciesScientific"  "Achromobacter\ncycloclastes"
> [11,] "SpeciesCommon"      "Bacteria"
> [12,] "Reactive"           "Ce+"
>
> # Step 5: Subset out the values of the SpeciesScientific variables
> subset(as.data.frame(w), Name == 'SpeciesScientific', select = 'Value')
>                         Value
> 2                 Homo sapiens
> 10 Achromobacter\ncycloclastes
>
>
> One possible 'advantage' of this approach is that if you have a number
> of string records of this type, you can create nested lists for each
> string and then manipulate the lists to get what you need. Hopefully
> you can use some of these ideas for other purposes as well.
>
> Dennis
>
>
>
> On Wed, Apr 20, 2011 at 10:17 AM, Neeti <nikkiha...@gmail.com> wrote:
> > Hi ALL,
> >
> > I have very simple question regarding pattern matching. Could anyone tell
> me
> > how to I can use R to retrieve string pattern from text file.  for
> example
> > my file contain following information
> >
> > SpeciesCommon=(Human);SpeciesScientific=(Homo
> > sapiens);ReactiveCentres=(N,C,C,C,+
> >
> H,O,C,C,C,C,O,H);BondInvolved=(C-H);EzCatDBID=(S00343);BondFormed=(O-H,O-H);Bond+
> >
> 255B);Cofactors=(Cu(II),CU,501,A,Cu(II),CU,502,A);CatalyticSwissProt=(P25006);Sp+
> > eciesScientific=(Achromobacter
> > cycloclastes);SpeciesCommon=(Bacteria);ReactiveCe+
> >
> > and I want to extract SpeciesScientific = (?) information from this
> file.
> > Problem is in 3rd line where SpeciesScientific word is divided with +.
> >
> > Could anyone help me please?
> > Thank you
> >
> >
> > --
> > View this message in context:
> http://r.789695.n4.nabble.com/Pattern-match-tp3463625p3463625.html
> > Sent from the R help mailing list archive at Nabble.com.
> >
> > ______________________________________________
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> >
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Pattern match

Reply via email to