On Fri, Oct 8, 2010 at 10:18 AM, Gabor Grothendieck <ggrothendi...@gmail.com> wrote: > On Fri, Oct 8, 2010 at 1:19 AM, burgundy <saub...@yahoo.com> wrote: >> >> Hello, >> >> I have a dataframe (tab separated file) which looks like the example below - >> two values separated by a comma, and tab separation between each of these. >> >> [,1] [,2] [,3] [ ,4] >> [1,] 0,1 1,3 40,10 0,0 >> [2,] 20,5 4,2 10,40 10,0 >> [3,] 0,11 1,2 120,10 0,0 >> >> I would like to calculate the percentage of the smallest number separated by >> the comma by: >> 1) summing the values e.g. for [1,3] where 40,10, 40+10 = 50 >> 2) taking the first value and dividing it by the total e.g. for [1,3], 40/50 >> = 0.8 >> 3) where the value generated by 2) is >0.5, print 1-value, otherwise, leave >> value e.g. for [1,3], where value is 0.8, print 1-0.8 = 0.2 >> >> plan to generate file like: >> >> [,1] [,2] [,3] [,4] >> [1,] 1 0.25 0.2 0 >> [2,] 0.2 0.33 0.2 1 >> [3,] 1 0.33 0.08 0 > > Try using gsubfn in gsubfn (http://gsubfn.googlecode.com). Using that > match a regular expression consisting of digits, a comma and digits > capturing the two strings of digits and passing them to function f > replacing the expression with the output of f. Then read the > resulting text into a data frame. > > library(gsubfn) > L <- c(" 0,1 1,3 40,10 0,0", " 20,5 4,2 10,40 10,0", > " 0,11 1,2 120,10 0,0") > > f <- function(a, b) { x <- as.numeric(c(a, b)); min(x)/sum(x) } > L2 <- gsubfn("(\\d+),(\\d+)", f, L) > > DF <- read.table(textConnection(L2)) > > which gives: > >> DF > V1 V2 V3 V4 > 1 0.0 0.2500000 0.20000000 NaN > 2 0.2 0.3333333 0.20000000 0 > 3 0.0 0.3333333 0.07692308 NaN
A further simplification would be to use strapply from the same package. It eliminates the need for read.table at the end: > strapply(L, "(\\d+),(\\d+)", f, simplify = rbind) [,1] [,2] [,3] [,4] [1,] 0.0 0.2500000 0.20000000 NaN [2,] 0.2 0.3333333 0.20000000 0 [3,] 0.0 0.3333333 0.07692308 NaN -- Statistics & Software Consulting GKX Group, GKX Associates Inc. tel: 1-877-GKX-GROUP email: ggrothendieck at gmail.com ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.