Re: [R] How to remove some rows from a data.frame

affy snp Mon, 24 Dec 2007 08:42:47 -0800

Hi Henrique,

I read in the matrix as x
and replicated your code:


f <- function(x)
{
cbind.data.frame(chr=unique(x$chr),
                Start=min(x$pos),
                End=max(x$pos),
                Rows=nrow(x),
                Pattern=paste("(", x$s1, x$s2, ")")
                )
}

do.call("rbind", lapply(lapply(split(df, paste(df$s1, df$s2)), f), unique))

am I doing right?

Allen


On Dec 24, 2007 11:32 AM, Henrique Dallazuanna <[EMAIL PROTECTED]> wrote:

> My object df is this:
>
>  df <- structure(list(BAC = structure(c(13L, 3L, 8L, 14L, 12L, 4L, 2L,
> 5L, 7L, 9L, 11L, 10L, 6L, 1L), .Label = c("CTD-2003M22", "RP11-155C15",
> "RP11-198H14", "RP11-210E16", "RP11-210F8", "RP11-218N6", "RP11-263L17",
> "RP11-267M21", "RP11-340F16", "RP11-474G23", "RP11-68A1", "RP11-6B16",
> "RP11-80G24", "RP11-89A19"), class = "factor"), chr = c(1L, 1L,
> 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L), pos = c(77465510L,
> 78696291L, 79681704L, 80950808L, 82255496L, 228801510L, 230957584L,
> 237932418L, 65724492L, 65879898L, 67718674L, 68318411L, 68454651L,
> 68567494L), s1 = c(-1L, -1L, -1L, -1L, -1L, 0L, 0L, 0L, 0L, 0L,
> 0L, 0L, 0L, 0L), s2 = c(0L, 0L, 0L, 0L, 0L, -1L, -1L, -1L, 1L,
> 1L, 0L, 0L, 0L, 0L), Count = c(1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
> 1, 1, 1, 1)), .Names = c("BAC", "chr", "pos", "s1", "s2", "Count"
> ), row.names = c(NA, -14L), class = "data.frame")
>
> works for me
>
> On 24/12/2007, affy snp <[EMAIL PROTECTED]> wrote:
> > Dear Henrique,
> >
> > Thanks a lot for the help! I got this:
> >
> > > f <- function(x)
> > + {
> > + cbind.data.frame(chr=unique(x$chr),
> > +                 Start=min(x$pos),
> > +                 End=max(x$pos),
> > +                 Rows=nrow(x),
> > +                 Pattern=paste("(", x$s1, x$s2, ")")
> > +                 )
> > + }
> > >
> > > do.call("rbind", lapply(lapply(split(df, paste(df$s1, df$s2)), f),
> > unique))
> > Error in split.default(df, paste(df$s1, df$s2)) :
> >   first argument must be a vector
> >
> >
> > Any clue?
> >
> > Best,
> >      Allen
> >
> >
> > On Dec 24, 2007 11:27 AM, Henrique Dallazuanna < [EMAIL PROTECTED]>
> wrote:
> > > Try this:
> > >
> > > f <- function(x)
> > > {
> > > cbind.data.frame (chr=unique(x$chr),
> > >                 Start=min(x$pos),
> > >                 End=max(x$pos),
> > >                 Rows=nrow(x),
> > >                 Pattern=paste("(", x$s1, x$s2, ")")
> > >                 )
> > > }
> > >
> > > do.call("rbind", lapply(lapply(split(df, paste(df$s1, df$s2)), f),
> > unique))
> > >
> > >
> > >
> > >
> > >
> > > On 24/12/2007, affy snp <[EMAIL PROTECTED] > wrote:
> > > > Thanks Moshe! I apologize for not being so clear about the
> > > > second part. Again, below is how the data looks like. The
> > > > pattern for columns s1 and s2 will be:
> > > >
> > > > (-1 -1)  (-1 0)  (-1 1)  (0 -1)   (0 0)   (0 1)  (1 -1)   (1 0)   (1
> 1)
> > > >  104    131     57      631     305    668    33       15     107
> > > >
> > > > There are 9 patterns, in other words, 9 combinations of -1,1, 0
> > > > given in the parenthesis. The occurring numbers are underneath.
> > > > What I wish to have is that: scan the data from the begin,
> > > > if any consecutive rows are of the same pattern (one of the 9
> > > > combinations in the above), we will 'memorize' the following
> > information:
> > > >
> > > > the number in 'chr' column, the number in 'pos' column for the first
> > > > row in the consecutive rows, the number in 'pos' column for the
> > > > last row in the consecutive rows, how many rows of the consecutive
> > > > rows, the corresponding pattern for them.
> > > >
> > > > I forgot to reinforce one requirement before for definition of
> > > > the consecutive rows, which is that they are in the consecutive
> > > > orders and are of the same number of 'chr'.
> > > >
> > > > Just to illustrate this, an example could be that, based on the
> data:
> > > >
> > > > BAC                 chr    pos          s1   s2
> > > > RP11-80G24    1    77465510    0    0
> > > > RP11-198H14    1    78696291    -1    0
> > > > RP11-267M21    1    79681704    -1    0
> > > > RP11-89A19      1    80950808    -1    0
> > > > RP11-6B16        1    82255496    -1    0
> > > > RP11-210E16    2    228801510    -1   0
> > > >
> > > > even though row 2---6 are of the same pattern, which is -1 0
> > > > and are in the consecutive order, but row 6 is of different number
> > > > of 'chr' than other rows. Therefore, we will not count row 6 and
> > > > end up with:
> > > > chr    Start           End        #of_rows          pattern
> > > > 1    78696291    82255496   4                    (-1 0)
> > > >
> > > > Hope this is clear. Thank you once again and Merry X'mas!
> > > >
> > > > Best,
> > > >     Allen
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > > BAC                 chr    pos          s1   s2
> > > > > RP11-80G24    1    77465510    -1    0
> > > > > RP11-198H14    1    78696291    -1    0
> > > > > RP11-267M21    1    79681704    -1    0
> > > > > RP11-89A19      1    80950808    -1    0
> > > > > RP11-6B16        1    82255496    -1    0
> > > > > RP11-210E16    1    228801510    0    -1
> > > > > RP11-155C15    1    230957584    0    -1
> > > > > RP11-210F8      1    237932418    0    -1
> > > > > RP11-263L17     2    65724492    0    1
> > > > > RP11-340F16     2    65879898    0    1
> > > > > RP11-68A1        2    67718674    0    0
> > > > > RP11-474G23    2    68318411    0    0
> > > > > RP11-218N6      2    68454651    0    0
> > > > > CTD-2003M22    2    68567494    0    0
> > > > > .....
> > > > >
> > > >
> > > > On Dec 24, 2007 3:54 AM, Moshe Olshansky <[EMAIL PROTECTED]>
> wrote:
> > > >
> > > > > To answer your firs question try
> > > > >
> > > > > M[-which( M$s1 == 0 & M$s2 == 0),]
> > > > >
> > > > > For the second question, you must start with the more
> > > > > precise definition of the grouping criterion.
> > > > >
> > > > > --- affy snp <[EMAIL PROTECTED]> wrote:
> > > > >
> > > > > > Hello list,
> > > > > >
> > > > > > I have a data frame M like:
> > > > > >
> > > > > > BAC                 chr    pos          s1   s2
> > > > > > RP11-80G24    1    77465510    -1    0
> > > > > > RP11-198H14    1    78696291    -1    0
> > > > > > RP11-267M21    1    79681704    -1    0
> > > > > > RP11-89A19      1    80950808    -1    0
> > > > > > RP11-6B16        1    82255496    -1    0
> > > > > > RP11-210E16    1    228801510    0    -1
> > > > > > RP11-155C15    1    230957584    0    -1
> > > > > > RP11-210F8      1    237932418    0    -1
> > > > > > RP11-263L17     2    65724492    0    1
> > > > > > RP11-340F16     2    65879898    0    1
> > > > > > RP11-68A1        2    67718674    0    0
> > > > > > RP11-474G23    2    68318411    0    0
> > > > > > RP11-218N6      2    68454651    0    0
> > > > > > CTD-2003M22    2    68567494    0    0
> > > > > > .....
> > > > > >
> > > > > > how to remove those rows which have 0 for both of
> > > > > > columns s1,s2?
> > > > > > sth like M[!M$21=0&!M$s2=0]?
> > > > > >
> > > > > > Moreover, I want to get a list which could find a
> > > > > > subset of rows which have
> > > > > > the same pattern of data. For example, the first 8
> > > > > > rows in M can be
> > > > > > clustered
> > > > > > into 2 groups (represented below in 2 rows) and
> > > > > > shown as:
> > > > > >
> > > > > > chr             Start       End             # of
> > > > > > rows     Pattern
> > > > > > 1             77465510   82255496       5
> > > > > >   (-1 0)
> > > > > > 1            228801510  237932418     3
> > > > > > (0 -1)
> > > > > >
> > > > > > Can anybody help me out of this? Thank you very much
> > > > > > and happy holiday!
> > > > > >
> > > > > > Best,
> > > > > >     Allen
> > > > > >
> > > > > >       [[alternative HTML version deleted]]
> > > > > >
> > > > > > ______________________________________________
> > > > > > R-help@r-project.org mailing list
> > > > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > > > PLEASE do read the posting guide
> > > > > > http://www.R-project.org/posting-guide.html
> > > > > > and provide commented, minimal, self-contained,
> > > > > > reproducible code.
> > > > > >
> > > > >
> > > > >
> > > >
> > > >        [[alternative HTML version deleted]]
> > > >
> > > > ______________________________________________
> > > > R-help@r-project.org mailing list
> > > > https://stat.ethz.ch/mailman/listinfo/r-help
> > > > PLEASE do read the posting guide
> > http://www.R-project.org/posting-guide.html
> > > > and provide commented, minimal, self-contained, reproducible code.
> > > >
> > >
> > >
> > > --
> > > Henrique Dallazuanna
> > > Curitiba-Paraná-Brasil
> > > 25° 25' 40" S 49° 16' 22" O
> > >
> >
> >
>
>
> --
> Henrique Dallazuanna
> Curitiba-Paraná-Brasil
> 25° 25' 40" S 49° 16' 22" O
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to remove some rows from a data.frame

Reply via email to