[Rd] strsplit convert data
I am using the following code but I do not know the debug and run for correct errors library (tcltk) file <-tclvalue (tkgetOpenFile ()) if (nchar (file)) { tkmessageBox ("Select the file") } else { tkmessageBox (message = paste ("Was select file", file)) Dataset <- read.table (file, header = FALSE, sep = "", na.strings = "NA", dec =".", strip.white = TRUE) Dataset } input <- readLine (Dataset) input close (Dataset) input <- gsub ('*','', input) in.s <- strsplit (input, '') id <- sort (unique (unlist (in.s))) # Create the output matrix output <- matrix (0, ncol = length (id), nrow = length (in.s)) colNames (output) <- id for (i in seq_along (in.s)) { output [i, unlist (in.s [[i]])] <- 1 } write.csv (output, file = res.csv) Thanks -- View this message in context: http://r.789695.n4.nabble.com/strsplit-convert-data-tp3932704p3932704.html Sent from the R devel mailing list archive at Nabble.com. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] C function is wrong under Windows 7
Dear mailing list, I have a C function that gives me a wrong result when I run it under Windows 7. This is the code under Linux (RHEL5): > library(phenoTest) > data(epheno) > sign <- sample(featureNames(epheno))[1:20] > score <- getFc(epheno)[,1] > head(score) 1007_s_at 1053_at117_at121_at 1255_g_at 1294_at -1.183019 1.113544 1.186186 -1.034779 -1.044456 -1.023471 > s <- which(names(score) %in% sign) > es.c <- .Call('getEs',score,s,PACKAGE='phenoTest') > head(es.c) [1] -0.001020408 -0.002040816 -0.003061224 -0.004081633 -0.005102041 [6] -0.006122449 > es.c <- .Call('getEs',score,s,PACKAGE='phenoTest') > head(es.c) [1] -0.001020408 -0.002040816 -0.003061224 -0.004081633 -0.005102041 [6] -0.006122449 > sessionInfo() R version 2.13.0 (2011-04-13) Platform: x86_64-redhat-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] phenoTest_1.1.1 RSQLite_0.8-4DBI_0.2-5 [4] Heatplus_1.22.0 annotate_1.30.1 AnnotationDbi_1.14.1 [7] Biobase_2.12.2 loaded via a namespace (and not attached): [1] affyio_1.20.0 biomaRt_2.8.1 Biostrings_2.21.6 [4] Category_2.18.0 cluster_1.13.3 gdata_2.7.1 [7] genefilter_1.34.0 gplots_2.10.1 graph_1.30.0 [10] grid_2.13.0 GSEABase_1.14.0 gtools_2.6.2 [13] hgu133a.db_2.5.0Hmisc_3.8-3 hopach_2.12.0 [16] IRanges_1.11.11 lattice_0.19-23 limma_3.8.3 [19] Matrix_0.9996875-3 mgcv_1.7-5 nlme_3.1-100 [22] oligoClasses_1.14.0 RBGL_1.22.0 RCurl_1.6-10 [25] SNPchip_1.16.0 splines_2.13.0 survival_2.36-5 [28] tools_2.13.0XML_3.4-0 xtable_1.5-6 As you see es.c is correct. I checked it doing the same computation with R. It also runs without problems under Mac. I run valgrind on the same piece of code and got no errors. This is the same piece of code under Windows 7: > library(phenoTest) > data(epheno) > sign <- sample(featureNames(epheno))[1:20] > score <- getFc(epheno)[,1] > head(score) 1007_s_at 1053_at117_at121_at 1255_g_at 1294_at -1.183019 1.113544 1.186186 -1.034779 -1.044456 -1.023471 > s <- which(names(score) %in% sign) > es.c <- .Call('getEs',score,s,PACKAGE='phenoTest') > head(es.c) [1] 1.447208e+215 1.447208e+215 1.447208e+215 1.447208e+215 1.447208e+215 1.447208e+215 > es.c <- .Call('getEs',score,s,PACKAGE='phenoTest') > head(es.c) [1] 3.176615e+170 3.176615e+170 3.176615e+170 3.176615e+170 3.176615e+170 3.176615e+170 > sessionInfo() R version 2.14.0 alpha (2011-10-13 r57240) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=Spanish_Spain.1252 LC_CTYPE=Spanish_Spain.1252 [3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C [5] LC_TIME=Spanish_Spain.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] phenoTest_1.1.1 RSQLite_0.10.0DBI_0.2-5 [4] Heatplus_1.99.0 annotate_1.31.1 AnnotationDbi_1.15.36 [7] Biobase_2.13.10 loaded via a namespace (and not attached): [1] affyio_1.21.2biomaRt_2.9.3Biostrings_2.21.11 [4] Category_2.19.1 cluster_1.14.0 gdata_2.8.2 [7] genefilter_1.35.0gplots_2.10.1graph_1.31.2 [10] grid_2.14.0 GSEABase_1.15.0 gtools_2.6.2 [13] hgu133a.db_2.6.3 Hmisc_3.8-3 hopach_2.13.1 [16] IRanges_1.11.31 lattice_0.19-33 limma_3.9.21 [19] Matrix_1.0-0 mgcv_1.7-8 nlme_3.1-102 [22] oligoClasses_1.15.56 RBGL_1.29.0 RCurl_1.6-10.1 [25] SNPchip_1.17.0 splines_2.14.0 survival_2.36-10 [28] tools_2.14.0 XML_3.4-2.2 xtable_1.6-0 [31] zlibbioc_0.1.8 es.c is not correct under Windows. It also gives a different result when i rerun the same function. This is the C code: #include "getEs.h" #include #include double absolute(double x) { if (x<0) return (-x); else return (x); } void cumsum(double *x, int len) { int i; for (i = 1; i < len; ++i) { *(x + i) = *(x + i) + *(x + i -1); } } double getNr(double *fchr, int *sign, int signLen) { int i; double nr; nr = 0.0; for (i = 0; i < signLen; ++i) { nr = absolute(fchr[sign[i] -1]) + nr; } return nr; } void getPhit(double *fchr, int *sign, int signLen, double nr, double *phit) { int i; for (i = 0; i < signLen; ++i) { *(phit + sign[i]-1) = absolute(*(fchr + sign[i]-1)) / nr; } } void getPmiss(int *sign, int fchrLen, int signLen, double *pmiss) { int i; double tmp = 1.0 / (fchrLen-signLen); for (i = 0; i < fchrLen; ++i) { *(pmiss + i) = tmp; } for (i = 0; i < signLen; ++i) { *(pmiss + sign[i]-1) = 0; } } SEXP getEs(SEXP fchr, SE
Re: [Rd] RFC: 'igraph' package update and backward compatibility
On 10/20/2011 11:57 AM, Hadley Wickham wrote: Generally, the absence of versioned dependencies makes it extremely difficult to aggressively improve the design of a package. I think that aggressively varying a given packages' API would be confusing to most users, and damage acceptance of the package. No matter how implemented, the semantic work remains: if ggplot2 V0.89 has features I desire, I must re-examine all of my code in light of all the changes between V0.80 and current. Ugh. I think that many folks desire to upgrade because of some process like "Oh! I see on the list that Hadley fixed [irritating inconvenience] I was wrestling with. I can now use [recent happy thing] on my code I wrote last January. Making that a difficult, thought-intensive process would, IMO, hurt more people than it would help. By a lot. I can not easily remedy them without breaking large amounts of existing code or starting work on ggplot3. My sense is that your position is unusual: you are doing original research all up and down the stack of visualization, representation, analysis... This means that you have bona-fide new structures, which would simply demand new APIs, with much greater frequency than many coders. Maybe the answer is to lower your internal resistance to revising the package name? This would accomplish your multiple-versioning cleanly, and clearly communicate the immiscibility of the Old Code with the New to your customer base. You clearly see a large hump of work to be done before you move to the new version, but you could give yourself permission to rev it every.. what; year? Two? ggplot2012? :) - Allen S. Rout __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] C function is wrong under Windows 7
On Mon, Oct 24, 2011 at 6:04 AM, Evarist Planet wrote: > Dear mailing list, > > I have a C function that gives me a wrong result when I run it under Windows > 7. The fact that you extract pointers to the contents of fchr and sign before coercing them to REALSXP is deeply suspicious to me, though I admit I don't see how it causes the problem, since you don't use the new versions, you don't overwrite them, and the old versions should still be protected as arguments to the function. > SEXP getEs(SEXP fchr, SEXP sign) > { > int i, nfchr, nsign; > double *rfchr = REAL(fchr), *res; > int *rsign = INTEGER(sign); > > PROTECT(fchr = coerceVector(fchr, REALSXP)); > PROTECT(sign = coerceVector(sign, REALSXP)); > > nfchr = length(fchr); > nsign = length(sign); > > SEXP es; > PROTECT(es = allocVector(REALSXP, nfchr)); > res = REAL(es); > > double nr = getNr(rfchr, rsign, nsign); > > SEXP phit; > PROTECT(phit = allocVector(REALSXP, nfchr)); > double *rphit; > rphit = REAL(phit); > for(i = 0; i < nfchr; i++) rphit[i] = 0.0; > getPhit(rfchr, rsign, nsign, nr, rphit); > cumsum(rphit, nfchr); > > SEXP pmiss; > PROTECT(pmiss = allocVector(REALSXP, nfchr)); > double *rpmiss; > rpmiss = REAL(pmiss); > for(i = 0; i < nfchr; i++) rpmiss[i] = 0.0; > getPmiss(rsign, nfchr, nsign, rpmiss); > cumsum(rpmiss, nfchr); > > for (i = 0; i < nfchr; ++i) { > res[i] = rphit[i] - rpmiss[i]; > } > > UNPROTECT(5); > return es; > } > > Could you please help me to find out what I am doing wrong? > > Many thanks in advance, > > -- > Evarist Planet > Research officer, Bioinformatics and Biostatistics unit > IRB Barcelona > Tel (+34) 93 402 0553 > Fax (+34) 93 402 0257 > > evarist.pla...@irbbarcelona.org > http://www.irbbarcelona.org/bioinformatics > > [[alternative HTML version deleted]] > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > -- Thomas Lumley Professor of Biostatistics University of Auckland __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] C function is wrong under Windows 7
On 10/24/2011 06:04 AM, Evarist Planet wrote: Dear mailing list, I have a C function that gives me a wrong result when I run it under Windows Hi Evarist -- It seems like this can be written reasonably efficiently in R? getEs <-function(fchr, sign) { nfchr <- length(fchr) nsign <- length(sign) nr <- sum(abs(fchr[sign])) phit <- numeric(nfchr) phit[sign] <- abs(fchr[sign]) / nr phit <- cumsum(phit) pmiss <- numeric(nfchr) pmiss[-sign] <- 1 / (nfchr - nsign) pmiss <- cumsum(pmiss) phit - pmiss } es.c <- .Call('getEs',score,s,PACKAGE='phenoTest') all.equal(es.c, getEs(score, s)) (for your C code, it would help to have a simple reproducible example that doesn't rely on phenoTest). Martin 7. This is the code under Linux (RHEL5): library(phenoTest) data(epheno) sign<- sample(featureNames(epheno))[1:20] score<- getFc(epheno)[,1] head(score) 1007_s_at 1053_at117_at121_at 1255_g_at 1294_at -1.183019 1.113544 1.186186 -1.034779 -1.044456 -1.023471 s<- which(names(score) %in% sign) es.c<- .Call('getEs',score,s,PACKAGE='phenoTest') head(es.c) [1] -0.001020408 -0.002040816 -0.003061224 -0.004081633 -0.005102041 [6] -0.006122449 es.c<- .Call('getEs',score,s,PACKAGE='phenoTest') head(es.c) [1] -0.001020408 -0.002040816 -0.003061224 -0.004081633 -0.005102041 [6] -0.006122449 sessionInfo() R version 2.13.0 (2011-04-13) Platform: x86_64-redhat-linux-gnu (64-bit) locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=C LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=C [9] LC_ADDRESS=C LC_TELEPHONE=C [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] phenoTest_1.1.1 RSQLite_0.8-4DBI_0.2-5 [4] Heatplus_1.22.0 annotate_1.30.1 AnnotationDbi_1.14.1 [7] Biobase_2.12.2 loaded via a namespace (and not attached): [1] affyio_1.20.0 biomaRt_2.8.1 Biostrings_2.21.6 [4] Category_2.18.0 cluster_1.13.3 gdata_2.7.1 [7] genefilter_1.34.0 gplots_2.10.1 graph_1.30.0 [10] grid_2.13.0 GSEABase_1.14.0 gtools_2.6.2 [13] hgu133a.db_2.5.0Hmisc_3.8-3 hopach_2.12.0 [16] IRanges_1.11.11 lattice_0.19-23 limma_3.8.3 [19] Matrix_0.9996875-3 mgcv_1.7-5 nlme_3.1-100 [22] oligoClasses_1.14.0 RBGL_1.22.0 RCurl_1.6-10 [25] SNPchip_1.16.0 splines_2.13.0 survival_2.36-5 [28] tools_2.13.0XML_3.4-0 xtable_1.5-6 As you see es.c is correct. I checked it doing the same computation with R. It also runs without problems under Mac. I run valgrind on the same piece of code and got no errors. This is the same piece of code under Windows 7: library(phenoTest) data(epheno) sign<- sample(featureNames(epheno))[1:20] score<- getFc(epheno)[,1] head(score) 1007_s_at 1053_at117_at121_at 1255_g_at 1294_at -1.183019 1.113544 1.186186 -1.034779 -1.044456 -1.023471 s<- which(names(score) %in% sign) es.c<- .Call('getEs',score,s,PACKAGE='phenoTest') head(es.c) [1] 1.447208e+215 1.447208e+215 1.447208e+215 1.447208e+215 1.447208e+215 1.447208e+215 es.c<- .Call('getEs',score,s,PACKAGE='phenoTest') head(es.c) [1] 3.176615e+170 3.176615e+170 3.176615e+170 3.176615e+170 3.176615e+170 3.176615e+170 sessionInfo() R version 2.14.0 alpha (2011-10-13 r57240) Platform: i386-pc-mingw32/i386 (32-bit) locale: [1] LC_COLLATE=Spanish_Spain.1252 LC_CTYPE=Spanish_Spain.1252 [3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C [5] LC_TIME=Spanish_Spain.1252 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] phenoTest_1.1.1 RSQLite_0.10.0DBI_0.2-5 [4] Heatplus_1.99.0 annotate_1.31.1 AnnotationDbi_1.15.36 [7] Biobase_2.13.10 loaded via a namespace (and not attached): [1] affyio_1.21.2biomaRt_2.9.3Biostrings_2.21.11 [4] Category_2.19.1 cluster_1.14.0 gdata_2.8.2 [7] genefilter_1.35.0gplots_2.10.1graph_1.31.2 [10] grid_2.14.0 GSEABase_1.15.0 gtools_2.6.2 [13] hgu133a.db_2.6.3 Hmisc_3.8-3 hopach_2.13.1 [16] IRanges_1.11.31 lattice_0.19-33 limma_3.9.21 [19] Matrix_1.0-0 mgcv_1.7-8 nlme_3.1-102 [22] oligoClasses_1.15.56 RBGL_1.29.0 RCurl_1.6-10.1 [25] SNPchip_1.17.0 splines_2.14.0 survival_2.36-10 [28] tools_2.14.0 XML_3.4-2.2 xtable_1.6-0 [31] zlibbioc_0.1.8 es.c is not correct under Windows. It also gives a different result when i rerun the same function. This is the C code: #include "getEs.h" #include #include double absolute(double x) { if (x<0) return (-x); else return (x); } void cumsum(double *x, int len) { int i; for (i = 1; i< len; ++i) { *(x + i) = *(x + i) + *(x + i -1); } } double getNr(do
Re: [Rd] droplevels: drops contrasts as well
On Fri, Oct 21, 2011 at 5:57 AM, Thaler, Thorn, LAUSANNE, Applied Mathematics wrote: > Dear all, > > Today I figured out that there is a neat function called droplevels, > which, well, drops unused levels in a data frame. I tried the function > with some of my data sets and it turned out that not only the unused > levels were dropped but also the contrasts I set via "C". I had a look > into the code, and this behaviour arises from the fact that droplevels > uses simply factor to drop the unused levels, which uses the default > contrasts as set by options("contrasts"). > > I think this behaviour is annoying, because if one does not look > carefully enough, one looses the contrasts silently. Hence may I suggest > to change the code of droplevels to something like the following: This silently changes the contrasts -- eg, if the first level of the factor is one of the empty levels, the reference level used by contr.treatment() will change. Also, if the contrasts are a matrix rather than specifying a contrast function, the matrix will be invalid for the the new factor. I think just having a warning would be better -- in general it's not clear what (if anything) it means to have the same contrasts on factors with different numbers of levels. -thomas -- Thomas Lumley Professor of Biostatistics University of Auckland __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] strsplit convert data
Le lundi 24 octobre 2011 à 03:31 -0700, RMSOPS a écrit : > I am using the following code but I do not know the debug and run for > correct errors > > library (tcltk) > > > file <-tclvalue (tkgetOpenFile ()) >if (nchar (file)) >{ > tkmessageBox ("Select the file") >} > else >{ > tkmessageBox (message = paste ("Was select file", file)) > Dataset <- read.table (file, header = FALSE, sep = "", na.strings = > "NA", dec =".", strip.white = TRUE) > Dataset > } > > input <- readLine (Dataset) > input > close (Dataset) > input <- gsub ('*','', input) > in.s <- strsplit (input, '') > id <- sort (unique (unlist (in.s))) > > # Create the output matrix > output <- matrix (0, ncol = length (id), nrow = length (in.s)) > colNames (output) <- id > > for (i in seq_along (in.s)) { > output [i, unlist (in.s [[i]])] <- 1 > } > > write.csv (output, file = res.csv) Hm, that code triggers many errors. What's the problem you'd like help on? You can't reasonably expect people to fix all of your code... ;-) __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] strsplit convert data
Hello To be more specific, my problem is here Line <-dataset $ Items [i] print (line) in.s <- strsplit (line, '') I am reading lines from a file Line 1 A, B, C, D, G Line 2 A, C, E, ... line n F, G the problem is that I can not make the split of the comma, so I can not get the output A B C D E F G O [1,] 1 1 1 1 1 0 0 0 [2,] 1 0 1 0 1 0 0 1 [n] 0 0 0 0 0 1 1 0 -- View this message in context: http://r.789695.n4.nabble.com/strsplit-convert-data-tp3932704p3934808.html Sent from the R devel mailing list archive at Nabble.com. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] strsplit convert data
On 25 Oct 2011, at 00:34, RMSOPS wrote: > To be more specific, my problem is here > Line <-dataset $ Items [i] > print (line) > in.s <- strsplit (line, '') > > I am reading lines from a file > Line 1 A, B, C, D, G > Line 2 A, C, E, > ... > line n F, G > > the problem is that I can not make the split of the comma, so I can not get > the > output > A B C D E F G O > [1,] 1 1 1 1 1 0 0 0 > [2,] 1 0 1 0 1 0 0 1 > [n] 0 0 0 0 0 1 1 0 Hi, hmm, to be honest I didn't get your point yet but let me try. If you read lines like : "A, B, C, D, G" etc. then you can split them by using: strsplit(line, ', *') [1] "A" "B" "C" "D" "G" [', *' is a regular expression to match a "," followed by 0 or more spaces] Then next issue I see here is that you create num times output matrices in your for loop "for(i in 1:num)" and overwrite the file res.csv all the time. Either you know in beforehand which columns could appear (and how many rows) then you can create your matrix before that loop or you should go with cbind() and rbind() resp. or similar. Hope that helps a bit, --Hans __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel