Sorry about this one being long, and I apologise beforehand if there is something obvious here that I have missed. I am new to creating my own functions in R, and I am uncertain of how they work.
I have a data set that I have read into a data frame: > gctable[1:5,] refseq geometry X60_origin X60_terminus length kingdom 1 NC_009484 cir 1790000 773000 3389227 Bacteria 2 NC_009484 cir 1790000 773000 3389227 Bacteria 3 NC_009484 cir 1790000 773000 3389227 Bacteria 4 NC_009484 cir 1790000 773000 3389227 Bacteria 5 NC_009484 cir 1790000 773000 3389227 Bacteria grp feature gene begin dir gc_content replicor LEADLAG 1 Alphaproteobacteria CDS CDS 261 + 0.654244 RIGHT LEAD 2 Alphaproteobacteria CDS CDS 1737 - 0.651408 RIGHT LAG 3 Alphaproteobacteria CDS CDS 2902 + 0.607843 RIGHT LEAD 4 Alphaproteobacteria CDS CDS 3693 + 0.617647 RIGHT LEAD 5 Alphaproteobacteria CDS CDS 4227 + 0.699208 RIGHT LEAD > Most of these columns are factors. Now, I have a function that I would like to employ on this data frame. Right now I cannot get it to work, and that seems to be due to the columns in the data frame being factors. I tested it with a data frame created from vectors, and it worked fine. The function: percentdistance <- function(origin, terminus, length, begin, replicor){ print(c(origin, terminus, length, begin, repl)) d = 0 if (terminus>origin) { if(replicor=="LEFT") { d = -((origin-begin)%%length) } else { d = (begin-origin) } } else { if (replicor=="LEFT") { d=(origin-begin) } else{ d = -((begin-origin)%%length) } } d/length*2 } The error I get: > percentdistance(gctable$X60_origin, gctable$X60_terminus, gctable$length, > gctable$begin, gctable$replicor) [1] 87 87 87 87 87 87 87 87 87 87 87 87 87 87 87 87 87 87 [19] 87 87 87 87 87 87 87 87 87 87 87 87 87 87 87 87 87 87 [37] 87 87 87 87 87 87 87 87 87 87 87 87 87 87 87 87 87 87 [55] 87 87 87 87 87 87 87 87 87 87 87 87 87 87 87 87 87 87 [73] 87 87 87 87 87 87 87 87 87 87 87 87 87 87 87 87 87 87 [91] 87 87 87 87 87 87 87 87 87 87 87 87 87 87 87 87 87 87 [109] 87 87 87 87 87 87 87 87 87 87 87 87 87 87 87 87 87 87 [127] 87 87 87 87 87 87 87 87 87 87 87 87 87 87 87 87 87 87 .....[99919] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 [99937] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 [99955] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 [99973] 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 [99991] 2 2 2 2 2 2 2 2 2 [ reached getOption("max.print") -- omitted 8526091 entries ]] Error in if (terminus > origin) { : missing value where TRUE/FALSE needed In addition: Warning messages: 1: > not meaningful for factors in: Ops.factor(terminus, origin) 2: the condition has length > 1 and only the first element will be used in: if (terminus > origin) { > This worked nice when the input were columns from a data frame created from vectors. I have also tried the different apply-functions, although I am uncertain of which one would be appropriate here. I would like to use this function to create a new data frame which would look something like this: new_frame = (gctable$feature, gctable$gene, gctable$kingdom, gctable$grp, gctable$gc_content, percentdistance(gctable)) I am uncertain of how to proceed. Should I deconstruct the data frame within the function, or should I get just the numbers out of the factors and input that into the function? Or is my solution way off from how things are done in R? Thankyou very much for your help! Karin -- Karin Lagesen, PhD student [EMAIL PROTECTED] http://folk.uio.no/karinlag ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.