[Rd] e1071: using svm with sparse matrices (PR#8527)

2006-01-27 Thread julien . gagneur
Full_Name: Julien Gagneur
Version: 2.2.1
OS: Linux (Suse 9.3)
Submission from: (NULL) (194.94.44.4)


Using the SparseM library (SparseM_0.66)
and the e1071 library (e1071_1.5-12)


I fail using svm method with a sparse matrix. Here is a sample example.

I experienced the same problem under Windows.



> library(SparseM)
[1] "SparseM library loaded"
> library("e1071")
Loading required package: class
> data(iris)
> attach(iris)
> M=as.matrix(iris[,1:4])
> Msparse=as.matrix.csr(M)
> Species=iris[,5]
> mod=svm(Msparse,Species)
Error in svm.default(Msparse, Species) : object "nac" not found

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] pgamma - inadequate algorithm design and poor coding (PR#8528)

2006-01-27 Thread ripley
R versions 2.1.0 to present.

Examples shown were computed under Windows R-devel, current SVN, but ix86 
Linux shows similar behaviour (sometimes NaN or -Inf rather than Inf, 
depending on the compiler and optimization level used).


The replacement pgamma algorithm used from R 2.1.0 has an inadequate 
design and no supporting documentation whatsoever.  There is no reference 
given to support the algorithm, and it seems very desirable to use only 
algorithms with a published (and preferably refereed) analysis, or at 
least of impeccable provenance.

The following errors were found by investigating an example in the 
d-p-q-r-tests.R regression tests that gave NaN on a real system.

These errors were not present in R 2.0.0, which used a normal 
approximation in that region.  We could fix this by reverting where needed 
to a normal approximation, but that leaves the problem that we have no 
proof of the validity or accuracy of the rest of the algorithm (if indeed 
it is accurate).

?pgamma says

  As from R 2.1.0 'pgamma()' uses a new algorithm (mainly by Morten
  Welinder) which should be uniformly as accurate as AS 239.

Well, it 'should be' but it is not, and we should not be making statements 
like that.  Those in the email quoted in pgamma.c exhibit hubris.

There are also at least two examples of sloppy coding that lead to numeric 
overflow and complete loss of accuracy.


Consider

> pgamma(seq(0.75, 1.25, by=0.05)*1e100, shape = 1e100, log=T)
  [1] -3.768207e+98   NaN   NaN   NaN   NaN
  [6] -6.931472e-01   NaN   NaN   NaN   NaN
[11]  0.00e+00
Warning message:
NaNs produced in: pgamma(q, shape, scale, lower.tail, log.p)
> pgamma(seq(0.75, 1.25, by=0.05)*1e100, shape = 1e100, log=T, lower=F)
  [1]  0.00e+00   NaN   NaN   NaN   NaN
  [6] -6.931472e-01   Inf   Inf   Inf   Inf
[11] -2.685645e+98
Warning message:
NaNs produced in: pgamma(q, shape, scale, lower.tail, log.p)

> pgamma(c(1-1e-10, 1+1e-10)*1e100, shape = 1e100)
[1] NaN NaN

(shape=1e25 is enough to cause a breakdown in the first of these, and 
1e60 in the rest.)

The code has four branches

1) x <= 1

2) x <= alph - 1 && x < 0.8 * (alph + 50)).  This has the comment
/* incl. large alph */, but that is false.

3) if (alph - 1 < x && alph < 0.8 * (x + 50)).  This has the comment
/* incl. large x */, but again false.

4) The rest, which uses an asymptotic expansion in

pt_ = -x * (log(1 + (lambda-x)/x) - (lambda-x)/x)
= -x * log((lambda-x)/x) - (lambda-x)

and naively assumes that this is small enough to use a power series 
expansion in 1/x with coefficients as powers of pt_.  To make matters 
worse, consider

> pgamma(0.9*1e25, 1e25, log=T)
pgamma_raw(x=9e+024, alph=1e+025, low=1, log=1)
  using ppois_asymp()
pt_ = 5.3605156578263e+022
pp*_asymp(): f=-2.0803930924076e-013 np=-5.3605156578263e+022 
nd=-5.3605156578263e+022  f*nd=11151979746.284
[1] -Inf

Hmm, how did that manage to lose *all* the accuracy here?  Hubris again 
appears in the comments.

Here np and nd are on log scale and if they are large they will be almost 
equal (and negative), and f is not large (and if it were we could have 
computed log f).  So we can compute the log of np+f*nd accurately as

log(np*(1+f*nd/np)) = lnp + log(1+f*nd/np) = lnp + log1p(f*exp(lnd-lnp))


Almost all the mass of gamma(shape=1e100) is concentrated at the nearest 
representable value to 1e100:

> qgamma(c(1e-16, 1-1e-16), 1e100)-1e100
[1] 0 0

(if it can be believed, but this can be verified independently).  So being 
accurate in the middle of the range is pretty academic, but one can at 
least avoid returning the nonsense of non-monotone cdfs and NaN/infinite 
probabilities.

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] rbind/cbind unimplemented for raw (RAWSXP) types. (PR#8529)

2006-01-27 Thread hin-tak . leung
Full_Name: Hin-Tak Leung
Version: R 2.2.1
OS: x86_64-redhat-linux-gnu
Submission from: (NULL) (131.111.186.92)


rbind/cbind is unimplemented for raw (RAWSXP) types.

I have a working patch implementing the functionality,
to follow.

--please do not edit the information below--

Version:
 platform = x86_64-redhat-linux-gnu
 arch = x86_64
 os = linux-gnu
 system = x86_64, linux-gnu
 status =
 major = 2
 minor = 2.1
 year = 2005
 month = 12
 day = 20
 svn rev = 36812
 language = R

Locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=C;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=C;LC_IDENTIFICATION=C

Search Path:
 .GlobalEnv, package:methods, package:stats, package:graphics,
package:grDevices, package:utils, package:datasets, Autoloads, package:base

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] sub* assgnment unimplemented for raw (RAWSXP) types. (PR#8530)

2006-01-27 Thread hin-tak . leung
Full_Name: Hin-Tak Leung
Version: R 2.2.1
OS: x86_64-redhat-linux-gnu
Submission from: (NULL) (131.111.186.92)


Matrix subset assignment and [[<- assignment
are unimplemented for raw (RAWSXP) types.

I have a working patch implementing the functionality,
to follow.

--please do not edit the information below--

Version:
 platform = x86_64-redhat-linux-gnu
 arch = x86_64
 os = linux-gnu
 system = x86_64, linux-gnu
 status =
 major = 2
 minor = 2.1
 year = 2005
 month = 12
 day = 20
 svn rev = 36812
 language = R

Locale:
LC_CTYPE=en_US.UTF-8;LC_NUMERIC=C;LC_TIME=en_US.UTF-8;LC_COLLATE=en_US.UTF-8;LC_MONETARY=en_US.UTF-8;LC_MESSAGES=en_US.UTF-8;LC_PAPER=C;LC_NAME=C;LC_ADDRESS=C;LC_TELEPHONE=C;LC_MEASUREMENT=C;LC_IDENTIFICATION=C

Search Path:
 .GlobalEnv, package:methods, package:stats, package:graphics,
package:grDevices, package:utils, package:datasets, Autoloads, package:base

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] rbind/cbind unimplemented for raw (RAWSXP) types. (PR#8529)

2006-01-27 Thread Hin-Tak Leung

[EMAIL PROTECTED] wrote:

Full_Name: Hin-Tak Leung
Version: R 2.2.1
OS: x86_64-redhat-linux-gnu
Submission from: (NULL) (131.111.186.92)


rbind/cbind is unimplemented for raw (RAWSXP) types.

I have a working patch implementing the functionality,
to follow.


Attached in ready-to-patch form and also insert (white spaces
will go wrong) here. Please review, comment and possibly commit,
and hope to see it in R 2.3.x


--- src/main/bind.c.orig2005-10-06 11:25:22.0 +0100
+++ src/main/bind.c 2006-01-27 11:55:32.0 +
@@ -997,6 +997,7 @@
 case CPLXSXP:
 case STRSXP:
 case VECSXP:
+case RAWSXP:
break;
/* we don't handle expressions: we could, but coercion of a matrix
   to an expression is not ideal */
@@ -1164,6 +1165,18 @@
}
}
 }
+else if (mode == RAWSXP) {
+   for (t = args; t != R_NilValue; t = CDR(t)) {
+   u = PRVALUE(CAR(t));
+   if (isMatrix(u) || length(u) >= lenmin) {
+   u = coerceVector(u, RAWSXP);
+   k = LENGTH(u);
+   idx = (!isMatrix(u)) ? rows : k;
+   for (i = 0; i < idx; i++)
+   RAW(result)[n++] = RAW(u)[i % k];
+   }
+   }
+}
 else {
for (t = args; t != R_NilValue; t = CDR(t)) {
u = PRVALUE(CAR(t));
@@ -1385,6 +1398,21 @@
}
}
 }
+else if (mode == RAWSXP) {
+   for (t = args; t != R_NilValue; t = CDR(t)) {
+   u = PRVALUE(CAR(t));
+   if (isMatrix(u) || length(u) >= lenmin) {
+   u = coerceVector(u, RAWSXP);
+   k = LENGTH(u);
+   idx = (isMatrix(u)) ? nrows(u) : (k > 0);
+   for (i = 0; i < idx; i++)
+   for (j = 0; j < cols; j++)
+   RAW(result)[i + n + (j * rows)]
+   = RAW(u)[(i + j * idx) % k];
+   n += idx;
+   }
+   }
+}
 else {
for (t = args; t != R_NilValue; t = CDR(t)) {
u = PRVALUE(CAR(t));

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] sub* assgnment unimplemented for raw (RAWSXP) types. (PR#8530)

2006-01-27 Thread Hin-Tak Leung

[EMAIL PROTECTED] wrote:

Full_Name: Hin-Tak Leung
Version: R 2.2.1
OS: x86_64-redhat-linux-gnu
Submission from: (NULL) (131.111.186.92)


Matrix subset assignment and [[<- assignment
are unimplemented for raw (RAWSXP) types.

I have a working patch implementing the functionality,
to follow.


Same thing, sorry for two bug reports and two patches related
to RAWSXP, but they are independent issues.

Attached in ready-to-patch form and also insert (white spaces
will go wrong) here. Please review, comment and possibly commit,
and hope to see it in R 2.3.x.

==
--- src/main/subassign.c.orig   2005-12-05 23:00:17.0 +
+++ src/main/subassign.c2006-01-27 12:50:47.0 +
@@ -868,6 +868,23 @@
}
}
break;
+case 2424: /* raw   <- raw   */
+
+   for (j = 0; j < ncs; j++) {
+   jj = INTEGER(sc)[j];
+   if (jj == NA_INTEGER) continue;
+   jj = jj - 1;
+   for (i = 0; i < nrs; i++) {
+   ii = INTEGER(sr)[i];
+   if (ii == NA_INTEGER) continue;
+   ii = ii - 1;
+   ij = ii + jj * nr;
+   RAW(x)[ij] = RAW(y)[k];
+   k = (k + 1) % ny;
+   }
+   }
+   break;
+
 default:
error(_("incompatible types (case %d) in matrix subset 
assignment"),

  which);
@@ -1611,6 +1628,11 @@
SET_VECTOR_ELT(x, offset, y);
break;

+   case 2424:  /* raw <- raw */
+
+   RAW(x)[offset] = RAW(y)[0];
+   break;
+
default:
error(_("incompatible types (%d) in [[ assignment"), which);
}
===
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] e1071: using svm with sparse matrices (PR#8527)

2006-01-27 Thread Kasper Daniel Hansen
First: this is not a bug, more a feature request.

Secondly, even if it was a bug, it is _not_ a bug in R, please read  
the posting rules for bugs. Now a member of R-core has to use  
valuable time to clean up after your bug report.

Correspondence such as this such really be sent to the package  
maintainers (and - perhaps - a cc to R-devel).

Having said all of this, of course it would be nice if svn supported  
sparse matrices.

/Kasper


On Jan 27, 2006, at 2:02 AM, [EMAIL PROTECTED] wrote:

> Full_Name: Julien Gagneur
> Version: 2.2.1
> OS: Linux (Suse 9.3)
> Submission from: (NULL) (194.94.44.4)
>
>
> Using the SparseM library (SparseM_0.66)
> and the e1071 library (e1071_1.5-12)
>
>
> I fail using svm method with a sparse matrix. Here is a sample  
> example.
>
> I experienced the same problem under Windows.
>
>
>
>> library(SparseM)
> [1] "SparseM library loaded"
>> library("e1071")
> Loading required package: class
>> data(iris)
>> attach(iris)
>> M=as.matrix(iris[,1:4])
>> Msparse=as.matrix.csr(M)
>> Species=iris[,5]
>> mod=svm(Msparse,Species)
> Error in svm.default(Msparse, Species) : object "nac" not found
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] e1071: using svm with sparse matrices (PR#8527)

2006-01-27 Thread Liaw, Andy
From: Kasper Daniel Hansen
> 
> First: this is not a bug, more a feature request.
> 
> Secondly, even if it was a bug, it is _not_ a bug in R, please read  
> the posting rules for bugs. Now a member of R-core has to use  
> valuable time to clean up after your bug report.
> 
> Correspondence such as this such really be sent to the package  
> maintainers (and - perhaps - a cc to R-devel).
> 
> Having said all of this, of course it would be nice if svn supported  
> sparse matrices.

Libsvm, the `engine' underneath svm() in `e1071', uses a sparse
representation of the data.  I vaguely recall seeing Chih-Jen Lin's code
that uses the SparseM package to pass sparse data to svm()...  David would
know best, of course.

Andy

 
> /Kasper
> 
> 
> On Jan 27, 2006, at 2:02 AM, [EMAIL PROTECTED] wrote:
> 
> > Full_Name: Julien Gagneur
> > Version: 2.2.1
> > OS: Linux (Suse 9.3)
> > Submission from: (NULL) (194.94.44.4)
> >
> >
> > Using the SparseM library (SparseM_0.66)
> > and the e1071 library (e1071_1.5-12)
> >
> >
> > I fail using svm method with a sparse matrix. Here is a sample  
> > example.
> >
> > I experienced the same problem under Windows.
> >
> >
> >
> >> library(SparseM)
> > [1] "SparseM library loaded"
> >> library("e1071")
> > Loading required package: class
> >> data(iris)
> >> attach(iris)
> >> M=as.matrix(iris[,1:4])
> >> Msparse=as.matrix.csr(M)
> >> Species=iris[,5]
> >> mod=svm(Msparse,Species)
> > Error in svm.default(Msparse, Species) : object "nac" not found
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Tiny typo in ?sprintf

2006-01-27 Thread Stephen Weigand
Greetings,

Reading ?sprintf it seems like the word "be" was omitted:

--- ./src/library/base/man/sprintf.Rd   Tue Sep 20 19:48:55 2005
+++ /tmp/sprintf.Rd Fri Jan 27 13:51:10 2006
@@ -78,7 +78,7 @@
   }
   Further, as from \R 2.1.0, immediately after \code{\%} may come
   \code{1$} to \code{99$} to refer to the numbered argument: this allows
-  arguments to referenced out of order and is mainly intended for
+  arguments to be referenced out of order and is mainly intended for
   translators of error messages.  If this is done it is best if all
   formats are numbered: if not the unnumbered ones process the arguments
   in order.  See the examples.

Thank you,

Stephen

::
Stephen Weigand
Division of Biostatistics
Mayo Clinic Rochester, Minn., USA
Phone (507) 266-1650, fax 284-9542

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] PR#1654

2006-01-27 Thread cain
Full_Name: Kevin Cain
Version: 2.2.1
OS: Windows XP
Submission from: (NULL) (128.95.124.144)


In the following command, R ignores the xaxp command - it places tick marks at
1, 1.5, 2.0, etc regardless of what the third argument is (I want them only at
integer values).

plot(x=c(min(time),max(time)),y=c(min(y-se),max(y+se)),type='n',xaxp=c(1,4,3),
  xlab='Measurement Time Point',
  ylab='Accelerometer Mean VMU/min')

A prior bug report says this problem has been fixed (Graphics-fixed/1654), but
it looks like that referred to the unix version.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Your false remark on PR#1654

2006-01-27 Thread ripley
The comment says

`Works in 2.3.0: BDR fixed inline [xy]axp'

and 2.2.1 is not 2.3.0.

Your example is not reproducible, so we cannot check.  If you get a copy 
of R-devel (as the FAQ and posting guide ask you to before posting) you 
can check for yourself.

Please do study the FAQ and avoid time-wasting.


On Sat, 28 Jan 2006 [EMAIL PROTECTED] wrote:

> Full_Name: Kevin Cain
> Version: 2.2.1
> OS: Windows XP
> Submission from: (NULL) (128.95.124.144)
>
>
> In the following command, R ignores the xaxp command - it places tick marks at
> 1, 1.5, 2.0, etc regardless of what the third argument is (I want them only at
> integer values).
>
> plot(x=c(min(time),max(time)),y=c(min(y-se),max(y+se)),type='n',xaxp=c(1,4,3),
>  xlab='Measurement Time Point',
>  ylab='Accelerometer Mean VMU/min')
>
> A prior bug report says this problem has been fixed (Graphics-fixed/1654), but
> it looks like that referred to the unix version.
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>

-- 
Brian D. Ripley,  [EMAIL PROTECTED]
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel