If DF is your data frame then: DF$xp.bg <- ave(DF$xp.norm, DF$gene, FUN = min)
will create a new column such that the entry in each row has the minimum xp.norm of all rows with the same gene. ave does use split internally but I think it would be worth trying anyways since its only one short line of code. See help(ave) On Thu, Jan 28, 2010 at 7:05 AM, Irene Gallego Romero <ig...@cam.ac.uk> wrote: > Dear R users, > > I have a dataframe (main.table) with ~30,000 rows and 6 columns, of > which here are a few rows: > > id chr window gene xp.norm xp.top > 129 1_32 1 32 TAS1R1 1.28882115 FALSE > 130 1_32 1 32 ZBTB48 1.28882115 FALSE > 131 1_32 1 32 KLHL21 1.28882115 FALSE > 132 1_32 1 32 PHF13 1.28882115 FALSE > 133 1_33 1 33 PHF13 1.02727430 FALSE > 134 1_33 1 33 THAP3 1.02727430 FALSE > 135 1_33 1 33 DNAJC11 1.02727430 FALSE > 136 1_33 1 33 CAMTA1 1.02727430 FALSE > 137 1_34 1 34 CAMTA1 1.40312732 TRUE > 138 1_35 1 35 CAMTA1 1.52104538 FALSE > 139 1_36 1 36 CAMTA1 1.04853732 FALSE > 140 1_37 1 37 CAMTA1 0.64794094 FALSE > 141 1_38 1 38 CAMTA1 1.23026086 TRUE > 142 1_38 1 38 VAMP3 1.23026086 TRUE > 143 1_38 1 38 PER3 1.23026086 TRUE > 144 1_39 1 39 PER3 1.18154967 TRUE > 145 1_39 1 39 UTS2 1.18154967 TRUE > 146 1_39 1 39 TNFRSF9 1.18154967 TRUE > 147 1_39 1 39 PARK7 1.18154967 TRUE > 148 1_39 1 39 ERRFI1 1.18154967 TRUE > 149 1_40 1 40 no_gene 1.79796879 FALSE > 150 1_41 1 41 SLC45A1 0.20193560 FALSE > > I want to create two new columns, xp.bg and xp.n.top, using the > following criteria: > > If gene is the same in consecutive rows, xp.bg is the minimum value of > xp.norm in those rows; if gene is not the same, xp.bg is simply the > value of xp.norm for that row; > > Likewise, if there's a run of contiguous xp.top = TRUE values, > xp.n.top is the minimum value in that range, and if xp.top is false or > NA, xp.n.top is NA, or 0 (I don't care). > > So, in the above example, > xp.bg for rows 136:141 should be 0.64794094, and is equal to xp.norm > for all other rows, > xp.n.top for row 137 is 1.40312732, 1.18154967 for rows 141:148, and > 0/NA for all other rows. > > Is there a way to combine indexing and if statements or some such to > accomplish this? I want to it this without using split(main.table, > main.table$gene), because there's about 20,000 unique entries for > gene, and one of the entries, no_gene, is repeated throughout. I > thought briefly of subsetting the rows where xp.top is TRUE, but I > then don't know how to set the range for min, so that it only looks at > what would originally have been consecutive rows, and searching the > help has not proved particularly useful. > > Thanks in advance, > Irene Gallego Romero > > > -- > Irene Gallego Romero > Leverhulme Centre for Human Evolutionary Studies > University of Cambridge > Fitzwilliam St > Cambridge > CB1 3QH > UK > email: ig...@cam.ac.uk > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.