[Rd] strsplit convert data

2011-10-24 Thread RMSOPS

I am using the following code but I do not know the debug and run for
 correct errors

 library (tcltk)


 file <-tclvalue (tkgetOpenFile ())
   if (nchar (file))
   {
 tkmessageBox ("Select the file")
   }
 else
   {
 tkmessageBox (message = paste ("Was select file", file))
 Dataset <- read.table (file, header = FALSE, sep = "", na.strings =
"NA", dec =".", strip.white = TRUE)
  Dataset
 }

 input <- readLine (Dataset)
 input
 close (Dataset)
 input <- gsub ('*','', input)
 in.s <- strsplit (input, '')
 id <- sort (unique (unlist (in.s)))

 # Create the output matrix
  output <- matrix (0, ncol = length (id), nrow = length (in.s))
  colNames (output) <- id

  for (i in seq_along (in.s)) {
  output [i, unlist (in.s [[i]])] <- 1
  }
  
 write.csv (output, file = res.csv)

 Thanks















--
View this message in context: 
http://r.789695.n4.nabble.com/strsplit-convert-data-tp3932704p3932704.html
Sent from the R devel mailing list archive at Nabble.com.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] C function is wrong under Windows 7

2011-10-24 Thread Evarist Planet
Dear mailing list,

I have a C function that gives me a wrong result when I run it under Windows
7.

This is the code under Linux (RHEL5):
> library(phenoTest)
> data(epheno)
> sign <- sample(featureNames(epheno))[1:20]
> score <- getFc(epheno)[,1]
> head(score)
1007_s_at   1053_at117_at121_at 1255_g_at   1294_at
-1.183019  1.113544  1.186186 -1.034779 -1.044456 -1.023471
> s <- which(names(score) %in% sign)
> es.c <- .Call('getEs',score,s,PACKAGE='phenoTest')
> head(es.c)
[1] -0.001020408 -0.002040816 -0.003061224 -0.004081633 -0.005102041
[6] -0.006122449
> es.c <- .Call('getEs',score,s,PACKAGE='phenoTest')
> head(es.c)
[1] -0.001020408 -0.002040816 -0.003061224 -0.004081633 -0.005102041
[6] -0.006122449
> sessionInfo()
R version 2.13.0 (2011-04-13)
Platform: x86_64-redhat-linux-gnu (64-bit)

locale:
[1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8   LC_NAME=C
[9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] phenoTest_1.1.1  RSQLite_0.8-4DBI_0.2-5
[4] Heatplus_1.22.0  annotate_1.30.1  AnnotationDbi_1.14.1
[7] Biobase_2.12.2

loaded via a namespace (and not attached):
[1] affyio_1.20.0   biomaRt_2.8.1   Biostrings_2.21.6
[4] Category_2.18.0 cluster_1.13.3  gdata_2.7.1
[7] genefilter_1.34.0   gplots_2.10.1   graph_1.30.0
[10] grid_2.13.0 GSEABase_1.14.0 gtools_2.6.2
[13] hgu133a.db_2.5.0Hmisc_3.8-3 hopach_2.12.0
[16] IRanges_1.11.11 lattice_0.19-23 limma_3.8.3
[19] Matrix_0.9996875-3  mgcv_1.7-5  nlme_3.1-100
[22] oligoClasses_1.14.0 RBGL_1.22.0 RCurl_1.6-10
[25] SNPchip_1.16.0  splines_2.13.0  survival_2.36-5
[28] tools_2.13.0XML_3.4-0   xtable_1.5-6

As you see es.c is correct. I checked it doing the same computation with R.
It also runs without problems under Mac. I run valgrind on the same piece of
code and got no errors.

This is the same piece of code under Windows 7:
> library(phenoTest)
> data(epheno)
> sign <- sample(featureNames(epheno))[1:20]
> score <- getFc(epheno)[,1]
> head(score)
1007_s_at   1053_at117_at121_at 1255_g_at   1294_at
-1.183019  1.113544  1.186186 -1.034779 -1.044456 -1.023471
> s <- which(names(score) %in% sign)
> es.c <- .Call('getEs',score,s,PACKAGE='phenoTest')
> head(es.c)
[1] 1.447208e+215 1.447208e+215 1.447208e+215 1.447208e+215 1.447208e+215
1.447208e+215
> es.c <- .Call('getEs',score,s,PACKAGE='phenoTest')
> head(es.c)
[1] 3.176615e+170 3.176615e+170 3.176615e+170 3.176615e+170 3.176615e+170
3.176615e+170
> sessionInfo()

R version 2.14.0 alpha (2011-10-13 r57240)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=Spanish_Spain.1252  LC_CTYPE=Spanish_Spain.1252
[3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C
[5] LC_TIME=Spanish_Spain.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] phenoTest_1.1.1   RSQLite_0.10.0DBI_0.2-5
[4] Heatplus_1.99.0   annotate_1.31.1   AnnotationDbi_1.15.36
[7] Biobase_2.13.10

loaded via a namespace (and not attached):
[1] affyio_1.21.2biomaRt_2.9.3Biostrings_2.21.11
[4] Category_2.19.1  cluster_1.14.0   gdata_2.8.2
[7] genefilter_1.35.0gplots_2.10.1graph_1.31.2
[10] grid_2.14.0  GSEABase_1.15.0  gtools_2.6.2
[13] hgu133a.db_2.6.3 Hmisc_3.8-3  hopach_2.13.1
[16] IRanges_1.11.31  lattice_0.19-33  limma_3.9.21
[19] Matrix_1.0-0 mgcv_1.7-8   nlme_3.1-102
[22] oligoClasses_1.15.56 RBGL_1.29.0  RCurl_1.6-10.1
[25] SNPchip_1.17.0   splines_2.14.0   survival_2.36-10
[28] tools_2.14.0 XML_3.4-2.2  xtable_1.6-0
[31] zlibbioc_0.1.8

es.c is not correct under Windows. It also gives a different result when i
rerun the same function.

This is the C code:
#include "getEs.h"
#include 
#include 

double absolute(double x)
{
 if (x<0)
 return (-x);
 else
 return (x);
}

void cumsum(double *x, int len)
{
 int i;
 for (i = 1; i < len; ++i) {
   *(x + i) = *(x + i) + *(x + i -1);
 }
}

double getNr(double *fchr, int *sign, int signLen)
{
 int i;
 double nr;
 nr = 0.0;
 for (i = 0; i < signLen; ++i) {
   nr = absolute(fchr[sign[i] -1]) + nr;
   }
 return nr;
}

void getPhit(double *fchr, int *sign, int signLen, double nr, double *phit)
{
 int i;
 for (i = 0; i < signLen; ++i) {
   *(phit + sign[i]-1) = absolute(*(fchr + sign[i]-1)) / nr;
   }
}

void getPmiss(int *sign, int fchrLen, int signLen, double *pmiss)
{
 int i;
 double tmp = 1.0 / (fchrLen-signLen);
 for (i = 0; i < fchrLen; ++i) {
   *(pmiss + i) = tmp;
   }
 for (i = 0; i < signLen; ++i) {
   *(pmiss + sign[i]-1) = 0;
   }
}

SEXP getEs(SEXP fchr, SE

Re: [Rd] RFC: 'igraph' package update and backward compatibility

2011-10-24 Thread Allen S. Rout

On 10/20/2011 11:57 AM, Hadley Wickham wrote:


Generally, the absence of versioned dependencies makes it extremely
difficult to aggressively improve the design of a package.


I think that aggressively varying a given packages' API would be
confusing to most users, and damage acceptance of the package.

No matter how implemented, the semantic work remains: if ggplot2 V0.89
has features I desire, I must re-examine all of my code in light of
all the changes between V0.80 and current.  Ugh.

I think that many folks desire to upgrade because of some process like
"Oh!  I see on the list that Hadley fixed [irritating inconvenience] I
was wrestling with.  I can now use [recent happy thing] on my code I
wrote last January.

Making that a difficult, thought-intensive process would, IMO, hurt more 
people than it would help.  By a lot.



I can not easily remedy them without breaking large amounts of
existing code or starting work on ggplot3.


My sense is that your position is unusual: you are doing original
research all up and down the stack of visualization, representation,
analysis...  This means that you have bona-fide new structures, which
would simply demand new APIs, with much greater frequency than many
coders.

Maybe the answer is to lower your internal resistance to revising the
package name?  This would accomplish your multiple-versioning cleanly,
and clearly communicate the immiscibility of the Old Code with the New
to your customer base.  You clearly see a large hump of work to be
done before you move to the new version, but you could give yourself
permission to rev it every.. what; year?  Two?

ggplot2012?   :)

- Allen S. Rout

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] C function is wrong under Windows 7

2011-10-24 Thread Thomas Lumley
On Mon, Oct 24, 2011 at 6:04 AM, Evarist Planet
 wrote:
> Dear mailing list,
>
> I have a C function that gives me a wrong result when I run it under Windows
> 7.

The fact that you extract pointers to the contents of fchr and sign
before coercing them to REALSXP is deeply suspicious to me, though I
admit I don't see how it causes the problem, since you don't use the
new versions, you don't overwrite them, and the old versions should
still be protected as arguments to the function.

> SEXP getEs(SEXP fchr, SEXP sign)
> {
>  int i, nfchr, nsign;
>  double *rfchr = REAL(fchr), *res;
>  int *rsign = INTEGER(sign);
>
>  PROTECT(fchr = coerceVector(fchr, REALSXP));
>  PROTECT(sign = coerceVector(sign, REALSXP));
>
>  nfchr = length(fchr);
>  nsign = length(sign);
>
>  SEXP es;
>  PROTECT(es = allocVector(REALSXP, nfchr));
>  res = REAL(es);
>
>  double nr = getNr(rfchr, rsign, nsign);
>
>  SEXP phit;
>  PROTECT(phit = allocVector(REALSXP, nfchr));
>  double *rphit;
>  rphit = REAL(phit);
>  for(i = 0; i < nfchr; i++) rphit[i] = 0.0;
>  getPhit(rfchr, rsign, nsign, nr, rphit);
>  cumsum(rphit, nfchr);
>
>  SEXP pmiss;
>  PROTECT(pmiss = allocVector(REALSXP, nfchr));
>  double *rpmiss;
>  rpmiss = REAL(pmiss);
>  for(i = 0; i < nfchr; i++) rpmiss[i] = 0.0;
>  getPmiss(rsign, nfchr, nsign, rpmiss);
>  cumsum(rpmiss, nfchr);
>
>  for (i = 0; i < nfchr; ++i) {
>   res[i] = rphit[i] - rpmiss[i];
>   }
>
>  UNPROTECT(5);
>  return es;
> }
>
> Could you please help me to find out what I am doing wrong?
>
> Many thanks in advance,
>
> --
> Evarist Planet
> Research officer, Bioinformatics and Biostatistics unit
> IRB Barcelona
> Tel (+34) 93 402 0553
> Fax (+34) 93 402 0257
>
> evarist.pla...@irbbarcelona.org
> http://www.irbbarcelona.org/bioinformatics
>
>        [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



-- 
Thomas Lumley
Professor of Biostatistics
University of Auckland

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] C function is wrong under Windows 7

2011-10-24 Thread Martin Morgan

On 10/24/2011 06:04 AM, Evarist Planet wrote:

Dear mailing list,

I have a C function that gives me a wrong result when I run it under Windows


Hi Evarist --

It seems like this can be written reasonably efficiently in R?

getEs <-function(fchr, sign) {
nfchr <- length(fchr)
nsign <- length(sign)

nr <- sum(abs(fchr[sign]))

phit <- numeric(nfchr)
phit[sign] <- abs(fchr[sign]) / nr
phit <- cumsum(phit)

pmiss <- numeric(nfchr)
pmiss[-sign] <- 1 / (nfchr - nsign)
pmiss <- cumsum(pmiss)

phit - pmiss
}

es.c <- .Call('getEs',score,s,PACKAGE='phenoTest')
all.equal(es.c, getEs(score, s))

(for your C code, it would help to have a simple reproducible example 
that doesn't rely on phenoTest).


Martin


7.

This is the code under Linux (RHEL5):

library(phenoTest)
data(epheno)
sign<- sample(featureNames(epheno))[1:20]
score<- getFc(epheno)[,1]
head(score)

1007_s_at   1053_at117_at121_at 1255_g_at   1294_at
-1.183019  1.113544  1.186186 -1.034779 -1.044456 -1.023471

s<- which(names(score) %in% sign)
es.c<- .Call('getEs',score,s,PACKAGE='phenoTest')
head(es.c)

[1] -0.001020408 -0.002040816 -0.003061224 -0.004081633 -0.005102041
[6] -0.006122449

es.c<- .Call('getEs',score,s,PACKAGE='phenoTest')
head(es.c)

[1] -0.001020408 -0.002040816 -0.003061224 -0.004081633 -0.005102041
[6] -0.006122449

sessionInfo()

R version 2.13.0 (2011-04-13)
Platform: x86_64-redhat-linux-gnu (64-bit)

locale:
[1] LC_CTYPE=en_US.UTF-8   LC_NUMERIC=C
[3] LC_TIME=en_US.UTF-8LC_COLLATE=en_US.UTF-8
[5] LC_MONETARY=C  LC_MESSAGES=en_US.UTF-8
[7] LC_PAPER=en_US.UTF-8   LC_NAME=C
[9] LC_ADDRESS=C   LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] phenoTest_1.1.1  RSQLite_0.8-4DBI_0.2-5
[4] Heatplus_1.22.0  annotate_1.30.1  AnnotationDbi_1.14.1
[7] Biobase_2.12.2

loaded via a namespace (and not attached):
[1] affyio_1.20.0   biomaRt_2.8.1   Biostrings_2.21.6
[4] Category_2.18.0 cluster_1.13.3  gdata_2.7.1
[7] genefilter_1.34.0   gplots_2.10.1   graph_1.30.0
[10] grid_2.13.0 GSEABase_1.14.0 gtools_2.6.2
[13] hgu133a.db_2.5.0Hmisc_3.8-3 hopach_2.12.0
[16] IRanges_1.11.11 lattice_0.19-23 limma_3.8.3
[19] Matrix_0.9996875-3  mgcv_1.7-5  nlme_3.1-100
[22] oligoClasses_1.14.0 RBGL_1.22.0 RCurl_1.6-10
[25] SNPchip_1.16.0  splines_2.13.0  survival_2.36-5
[28] tools_2.13.0XML_3.4-0   xtable_1.5-6

As you see es.c is correct. I checked it doing the same computation with R.
It also runs without problems under Mac. I run valgrind on the same piece of
code and got no errors.

This is the same piece of code under Windows 7:

library(phenoTest)
data(epheno)
sign<- sample(featureNames(epheno))[1:20]
score<- getFc(epheno)[,1]
head(score)

1007_s_at   1053_at117_at121_at 1255_g_at   1294_at
-1.183019  1.113544  1.186186 -1.034779 -1.044456 -1.023471

s<- which(names(score) %in% sign)
es.c<- .Call('getEs',score,s,PACKAGE='phenoTest')
head(es.c)

[1] 1.447208e+215 1.447208e+215 1.447208e+215 1.447208e+215 1.447208e+215
1.447208e+215

es.c<- .Call('getEs',score,s,PACKAGE='phenoTest')
head(es.c)

[1] 3.176615e+170 3.176615e+170 3.176615e+170 3.176615e+170 3.176615e+170
3.176615e+170

sessionInfo()


R version 2.14.0 alpha (2011-10-13 r57240)
Platform: i386-pc-mingw32/i386 (32-bit)

locale:
[1] LC_COLLATE=Spanish_Spain.1252  LC_CTYPE=Spanish_Spain.1252
[3] LC_MONETARY=Spanish_Spain.1252 LC_NUMERIC=C
[5] LC_TIME=Spanish_Spain.1252

attached base packages:
[1] stats graphics  grDevices utils datasets  methods   base

other attached packages:
[1] phenoTest_1.1.1   RSQLite_0.10.0DBI_0.2-5
[4] Heatplus_1.99.0   annotate_1.31.1   AnnotationDbi_1.15.36
[7] Biobase_2.13.10

loaded via a namespace (and not attached):
[1] affyio_1.21.2biomaRt_2.9.3Biostrings_2.21.11
[4] Category_2.19.1  cluster_1.14.0   gdata_2.8.2
[7] genefilter_1.35.0gplots_2.10.1graph_1.31.2
[10] grid_2.14.0  GSEABase_1.15.0  gtools_2.6.2
[13] hgu133a.db_2.6.3 Hmisc_3.8-3  hopach_2.13.1
[16] IRanges_1.11.31  lattice_0.19-33  limma_3.9.21
[19] Matrix_1.0-0 mgcv_1.7-8   nlme_3.1-102
[22] oligoClasses_1.15.56 RBGL_1.29.0  RCurl_1.6-10.1
[25] SNPchip_1.17.0   splines_2.14.0   survival_2.36-10
[28] tools_2.14.0 XML_3.4-2.2  xtable_1.6-0
[31] zlibbioc_0.1.8

es.c is not correct under Windows. It also gives a different result when i
rerun the same function.

This is the C code:
#include "getEs.h"
#include
#include

double absolute(double x)
{
  if (x<0)
  return (-x);
  else
  return (x);
}

void cumsum(double *x, int len)
{
  int i;
  for (i = 1; i<  len; ++i) {
*(x + i) = *(x + i) + *(x + i -1);
  }
}

double getNr(do

Re: [Rd] droplevels: drops contrasts as well

2011-10-24 Thread Thomas Lumley
On Fri, Oct 21, 2011 at 5:57 AM, Thaler, Thorn, LAUSANNE, Applied
Mathematics  wrote:
> Dear all,
>
> Today I figured out that there is a neat function called droplevels,
> which, well, drops unused levels in a data frame. I tried the function
> with some of my data sets and it turned out that not only the unused
> levels were dropped but also the contrasts I set via "C". I had a look
> into the code, and this behaviour arises from the fact that droplevels
> uses simply factor to drop the unused levels, which uses the default
> contrasts as set by options("contrasts").
>
> I think this behaviour is annoying, because if one does not look
> carefully enough, one looses the contrasts silently. Hence may I suggest
> to change the code of droplevels to something like the following:

This silently changes the contrasts -- eg, if the first level of the
factor is one of the empty levels, the reference level used by
contr.treatment() will change.  Also, if the contrasts are a matrix
rather than specifying a contrast function, the matrix will be invalid
for the the new factor.

I think just having a warning would be better -- in general it's not
clear what (if anything) it means to have the same contrasts on
factors with different numbers of levels.

   -thomas

-- 
Thomas Lumley
Professor of Biostatistics
University of Auckland

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] strsplit convert data

2011-10-24 Thread Milan Bouchet-Valat
Le lundi 24 octobre 2011 à 03:31 -0700, RMSOPS a écrit :
> I am using the following code but I do not know the debug and run for
>  correct errors
> 
>  library (tcltk)
> 
> 
>  file <-tclvalue (tkgetOpenFile ())
>if (nchar (file))
>{
>  tkmessageBox ("Select the file")
>}
>  else
>{
>  tkmessageBox (message = paste ("Was select file", file))
>  Dataset <- read.table (file, header = FALSE, sep = "", na.strings =
> "NA", dec =".", strip.white = TRUE)
>   Dataset
>  }
> 
>  input <- readLine (Dataset)
>  input
>  close (Dataset)
>  input <- gsub ('*','', input)
>  in.s <- strsplit (input, '')
>  id <- sort (unique (unlist (in.s)))
> 
>  # Create the output matrix
>   output <- matrix (0, ncol = length (id), nrow = length (in.s))
>   colNames (output) <- id
> 
>   for (i in seq_along (in.s)) {
>   output [i, unlist (in.s [[i]])] <- 1
>   }
>   
>  write.csv (output, file = res.csv)
Hm, that code triggers many errors. What's the problem you'd like help
on? You can't reasonably expect people to fix all of your code... ;-)

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] strsplit convert data

2011-10-24 Thread RMSOPS

Hello

   To be more specific, my problem is here
 Line <-dataset $ Items [i]
   print (line)
   in.s <- strsplit (line, '')

 I am reading lines from a file
 Line 1 A, B, C, D, G
 Line 2 A, C, E,
 ...
 line n F, G

 the problem is that I can not make the split of the comma, so I can not get
the
 output
  A B C D E F G O
 [1,] 1 1 1 1 1 0 0 0
 [2,] 1 0 1 0 1 0 0 1
 [n] 0 0 0 0 0 1 1 0















--
View this message in context: 
http://r.789695.n4.nabble.com/strsplit-convert-data-tp3932704p3934808.html
Sent from the R devel mailing list archive at Nabble.com.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] strsplit convert data

2011-10-24 Thread Hans-Jörg Bibiko
On 25 Oct 2011, at 00:34, RMSOPS wrote:
>   To be more specific, my problem is here
> Line <-dataset $ Items [i]
>   print (line)
>   in.s <- strsplit (line, '')
> 
> I am reading lines from a file
> Line 1 A, B, C, D, G
> Line 2 A, C, E,
> ...
> line n F, G
> 
> the problem is that I can not make the split of the comma, so I can not get
> the
> output
>  A B C D E F G O
> [1,] 1 1 1 1 1 0 0 0
> [2,] 1 0 1 0 1 0 0 1
> [n] 0 0 0 0 0 1 1 0

Hi,

hmm, to be honest I didn't get your point yet but let me try.

If you read lines like :

"A, B, C, D, G" etc.

then you can split them by using:

strsplit(line, ', *')
[1] "A" "B" "C" "D" "G"

[', *' is a regular expression to match a "," followed by 0 or more spaces]

Then next issue I see here is that you create num times output matrices in your 
for loop "for(i in 1:num)" and overwrite the file res.csv all the time. Either 
you know in beforehand which columns could appear (and how many rows) then you 
can create your matrix before that loop or you should go with cbind() and 
rbind() resp. or similar.


Hope that helps a bit,
--Hans
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel