Re: [Rd] Serial connections?

2010-04-21 Thread Philippe Grosjean
There is another option I use since a couple of years to pilot 
scientific devices, to program my chemostats, etc. and that is 
platform-independent: Tcl.


Tcl i/o design is excellent and very robust, and Tcl/Tk is integrated in 
R with the tcltk package.


It is really just a mather of a few lines of code in R to communicate 
through a serial port from R using Tcl. Something like:


require(tcltk)
.Tcl('set R_com1 [open "com1" r+]') # Works on Windows too!

# There are many config parameters available here... just an example
.Tcl('fconfigure $R_com1 -mode "9600,n,8,1" -buffering none -blocking 0')

# Send a command to your device through the serial port
.Tcl('puts -nonewline $R_com1 {my_cmd}')

# Read a line of text from the serial port
line <- tclvalue(.Tcl('gets $R_com1'))

With a little bit more code, one can program Tcl to call R code 
automatically everytime new data is pushed by the connected device 
through the serial port. This is done using something like:


.Tcl('fileevent $R_com1 readable [list Handler_com1 $R_com1]')

Here, "Handler_com1" is a Tcl function. So, some care must be taken 
using tcltk's .Tcl.callback() to trigger the event on the R side. One 
way to deal with this easily is by using tclFun() from the tcltk2 package.


In tcltk2, there is also ?tclTaskSchedule that can be of interest in the 
context of serial port communication to trigger a R function in the 
background regularly and collect data actively from the serial port.


All these tools give me a lot a flexibility to communicate through the 
serial port from R,... and most importantly, to write my code in a 
portable way (tested on Windows XP and Linux Ubuntu).


If there is some interest in this approach, I could initiate a 'tclcom' 
R package on R-Forge and place there the code I have.


Best,

Philippe
..<°}))><
 ) ) ) ) )
( ( ( ( (Prof. Philippe Grosjean
 ) ) ) ) )
( ( ( ( (Numerical Ecology of Aquatic Systems
 ) ) ) ) )   Mons University, Belgium
( ( ( ( (
..

On 21/04/10 03:17, shotwelm wrote:

Simon is right of course, there are plenty of sensors that would work
just fine at 9600 baud (like a thermistor rigged to an ADC). There's a
theorem along these lines (Nyquist sampling theorem?). I think piping
the output to R is a clever solution. I added a few lines to the ttys.c
program so that the baud rate is a command line option (i.e. -B9600)
  and confirmed it will compile in
Linux (2.6.30). Maybe it will save a step. Microcontrollers really are
addictive!

For an ioctl package, I was originally thinking of using file
descriptors directly. However, I agree this feels like subverting what
could be an extension of the connections API. Given that "everything is
a file" in POSIX systems, there may be an argument for an ioctl package
that is independent of the connections implementation, say to do things
that connections were not designed to do. For example, interfacing with
V4L2 devices usually involves many ioctl calls, an mmap call, but rarely
read or write calls. But maybe it would just be better to pipe this type
of output to R also...

-Matt

On Tue, 2010-04-20 at 16:42 -0400, Simon Urbanek wrote:

On Apr 20, 2010, at 11:51 AM, shotwelm wrote:


I've done some microcontroller work over serial also. Unfortunately, 
interfacing with a serial port is system dependent, and the mechanisms can be 
quite different, as you probably know. It appears that Simon has a solution 
below that will work if you are willing to accept the default baud rate (9600 
is way too slow for good sensor data


[OT: define "good" ;) - good doesn't mean fast - besides it won't be any good 
if it is too fast to be meaningfully processed -- that's a different story, though :P - 
and it is trivial to change so the solution works in general]



), parity, etc.. or use external tools. On POSIX systems, you would need access 
to the termios.h header and the system ioctl function in order to change these 
settings. Although I'm not 100% sure, I don't think R has this capability ... 
yet.

I'm new to the list, but I'd be surprised if the R developers that have been 
around awhile haven't already considered adding support for ioctls and the 
POSIX terminal interface. This makes me wonder why it's not there. If there is 
no good reason, I'm starting to see a series of R packages (or core extensions) 
developing.


Good luck ;). The issue is that connections are inherently backend-independent 
which implies that packages have no access to connection internals as they can 
change at any time. This means that you can't enhance them without putting the 
enhancements into R itself. This implies that you have to make a strong case 
since you need a volunteer in R-core to maintain that code etc.



With a package for ioctls, we could use all sorts of cool stuff, like 
Video4Linux2 (webcams, HAM radio, tuners)..

[Rd] New CRAN test platform - Sparc Solaris

2010-04-21 Thread Prof Brian Ripley
The CRAN package check page now shows results on Sparc Solaris 10 
using a server donated by Sun.


This box tests several different aspects from the existing Intel-based 
test boxes:


- the CPU is big-endian.

- it does not have extended-precision doubles in FPU registers.  (It 
does support long doubles if selected explicitly.)


- the C/C++ header set is from a different tradition.

- the Sun Studio compiler is used (as has been on one of the Linux 
boxes, and we may phase the latter out in due course).


- the toolset is AT&T-based Unix rather than GNU or BSD.  In 
particular Solaris make, sed and tar and an actual Bourne shell are 
used.


There is only a limited set of additional software installed.  A few 
things have had to be compiled with gcc, notably the packages using 
Gtk+ (and Solaris' own installation of Gtk+ is too old for RGtk2) and 
MCMCpack.


The server does not have a graphics card and this is causing some 
problems with the X11 installation, including bitmap devices and rgl, 
that we are still working on.


Amongst the package issues shown up are

- unterminated final lines in src/Makevars[.in] cause failures.

- there are many packages which require GNU make and fail to declare 
it.  These include Cairo JavaGD Rcpp RcppExamples RInside farmR 
highlight phyclust png rJava.


- the assumption that 'tar' accepts compressed archives (gdata).

- 60-odd C++ -using packages are not correct C++, often because they 
use GNU extensions or have missing headers (and C++ requires headers 
for prototypes), or use C99 functions that are not in C++ and hence 
not in the headers when used from C++ -- Solaris is rather strict 
about the latter.



Brian Ripley

--
Brian D. Ripley,  rip...@stats.ox.ac.uk
Professor of Applied Statistics,  http://www.stats.ox.ac.uk/~ripley/
University of Oxford, Tel:  +44 1865 272861 (self)
1 South Parks Road, +44 1865 272866 (PA)
Oxford OX1 3TG, UKFax:  +44 1865 272595

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Serial connections?

2010-04-21 Thread Simon Urbanek
Philippe,

unfortunately that approach has one major drawback - it does not give you a 
connection. As I said in the previous e-mail it is fairly easy to talk to a tty 
directly, but the challenge is to turn it into a connection. I don't like the 
Tcl approach for several reasons, one of them being that it's entirely 
unnatural since you have to construct strings of code -- you could as well use 
rJava and Java serial connections which has a little more friendly syntax (but 
I wouldn't recommend that, either) or any other language R interfaces to.

I was thinking about it a bit and I may be tempted to have a dab at a tty 
connection, but I still would not want to mess with ioctl (the other part Matt 
mentioned).

Cheers,
Simon



On Apr 21, 2010, at 6:30 AM, Philippe Grosjean wrote:

> There is another option I use since a couple of years to pilot scientific 
> devices, to program my chemostats, etc. and that is platform-independent: Tcl.
> 
> Tcl i/o design is excellent and very robust, and Tcl/Tk is integrated in R 
> with the tcltk package.
> 
> It is really just a mather of a few lines of code in R to communicate through 
> a serial port from R using Tcl. Something like:
> 
> require(tcltk)
> .Tcl('set R_com1 [open "com1" r+]') # Works on Windows too!
> 
> # There are many config parameters available here... just an example
> .Tcl('fconfigure $R_com1 -mode "9600,n,8,1" -buffering none -blocking 0')
> 
> # Send a command to your device through the serial port
> .Tcl('puts -nonewline $R_com1 {my_cmd}')
> 
> # Read a line of text from the serial port
> line <- tclvalue(.Tcl('gets $R_com1'))
> 
> With a little bit more code, one can program Tcl to call R code automatically 
> everytime new data is pushed by the connected device through the serial port. 
> This is done using something like:
> 
> .Tcl('fileevent $R_com1 readable [list Handler_com1 $R_com1]')
> 
> Here, "Handler_com1" is a Tcl function. So, some care must be taken using 
> tcltk's .Tcl.callback() to trigger the event on the R side. One way to deal 
> with this easily is by using tclFun() from the tcltk2 package.
> 
> In tcltk2, there is also ?tclTaskSchedule that can be of interest in the 
> context of serial port communication to trigger a R function in the 
> background regularly and collect data actively from the serial port.
> 
> All these tools give me a lot a flexibility to communicate through the serial 
> port from R,... and most importantly, to write my code in a portable way 
> (tested on Windows XP and Linux Ubuntu).
> 
> If there is some interest in this approach, I could initiate a 'tclcom' R 
> package on R-Forge and place there the code I have.
> 
> Best,
> 
> Philippe
> ..<°}))><
> ) ) ) ) )
> ( ( ( ( (Prof. Philippe Grosjean
> ) ) ) ) )
> ( ( ( ( (Numerical Ecology of Aquatic Systems
> ) ) ) ) )   Mons University, Belgium
> ( ( ( ( (
> ..
> 
> On 21/04/10 03:17, shotwelm wrote:
>> Simon is right of course, there are plenty of sensors that would work
>> just fine at 9600 baud (like a thermistor rigged to an ADC). There's a
>> theorem along these lines (Nyquist sampling theorem?). I think piping
>> the output to R is a clever solution. I added a few lines to the ttys.c
>> program so that the baud rate is a command line option (i.e. -B9600)
>>   and confirmed it will compile in
>> Linux (2.6.30). Maybe it will save a step. Microcontrollers really are
>> addictive!
>> 
>> For an ioctl package, I was originally thinking of using file
>> descriptors directly. However, I agree this feels like subverting what
>> could be an extension of the connections API. Given that "everything is
>> a file" in POSIX systems, there may be an argument for an ioctl package
>> that is independent of the connections implementation, say to do things
>> that connections were not designed to do. For example, interfacing with
>> V4L2 devices usually involves many ioctl calls, an mmap call, but rarely
>> read or write calls. But maybe it would just be better to pipe this type
>> of output to R also...
>> 
>> -Matt
>> 
>> On Tue, 2010-04-20 at 16:42 -0400, Simon Urbanek wrote:
>>> On Apr 20, 2010, at 11:51 AM, shotwelm wrote:
>>> 
 I've done some microcontroller work over serial also. Unfortunately, 
 interfacing with a serial port is system dependent, and the mechanisms can 
 be quite different, as you probably know. It appears that Simon has a 
 solution below that will work if you are willing to accept the default 
 baud rate (9600 is way too slow for good sensor data
>>> 
>>> [OT: define "good" ;) - good doesn't mean fast - besides it won't be any 
>>> good if it is too fast to be meaningfully processed -- that's a different 
>>> story, though :P - and it is trivial to change so the solution works in 
>>> general]
>>> 
>>> 
 ), parity, etc.. or use external tools. On POSIX syst

Re: [Rd] Serial connections?

2010-04-21 Thread Philippe Grosjean

Simon,

I see what you mean, and I agree it would be nice to have a connection. 
That would be the natural way in S.


Yet, connections are driven by R. You cannot open a connection and let 
the connected device trigger some R code when data is available, don't you?


Otherwise, I don't understand your problem with "strings of code". A .R 
file also contains a series of "strings of code" interpreted by the R 
parser when you source() it. Just that code happens to be S. Here the 
code is Tcl. You can source as well a .tcl file with the same code if 
you prefer...


Best,

Philippe

On 21/04/10 14:07, Simon Urbanek wrote:

Philippe,

unfortunately that approach has one major drawback - it does not give you a 
connection. As I said in the previous e-mail it is fairly easy to talk to a tty 
directly, but the challenge is to turn it into a connection. I don't like the 
Tcl approach for several reasons, one of them being that it's entirely 
unnatural since you have to construct strings of code -- you could as well use 
rJava and Java serial connections which has a little more friendly syntax (but 
I wouldn't recommend that, either) or any other language R interfaces to.

I was thinking about it a bit and I may be tempted to have a dab at a tty 
connection, but I still would not want to mess with ioctl (the other part Matt 
mentioned).

Cheers,
Simon



On Apr 21, 2010, at 6:30 AM, Philippe Grosjean wrote:


There is another option I use since a couple of years to pilot scientific 
devices, to program my chemostats, etc. and that is platform-independent: Tcl.

Tcl i/o design is excellent and very robust, and Tcl/Tk is integrated in R with 
the tcltk package.

It is really just a mather of a few lines of code in R to communicate through a 
serial port from R using Tcl. Something like:

require(tcltk)
.Tcl('set R_com1 [open "com1" r+]') # Works on Windows too!

# There are many config parameters available here... just an example
.Tcl('fconfigure $R_com1 -mode "9600,n,8,1" -buffering none -blocking 0')

# Send a command to your device through the serial port
.Tcl('puts -nonewline $R_com1 {my_cmd}')

# Read a line of text from the serial port
line<- tclvalue(.Tcl('gets $R_com1'))

With a little bit more code, one can program Tcl to call R code automatically 
everytime new data is pushed by the connected device through the serial port. 
This is done using something like:

.Tcl('fileevent $R_com1 readable [list Handler_com1 $R_com1]')

Here, "Handler_com1" is a Tcl function. So, some care must be taken using 
tcltk's .Tcl.callback() to trigger the event on the R side. One way to deal with this 
easily is by using tclFun() from the tcltk2 package.

In tcltk2, there is also ?tclTaskSchedule that can be of interest in the 
context of serial port communication to trigger a R function in the background 
regularly and collect data actively from the serial port.

All these tools give me a lot a flexibility to communicate through the serial 
port from R,... and most importantly, to write my code in a portable way 
(tested on Windows XP and Linux Ubuntu).

If there is some interest in this approach, I could initiate a 'tclcom' R 
package on R-Forge and place there the code I have.

Best,

Philippe
..<°}))><
) ) ) ) )
( ( ( ( (Prof. Philippe Grosjean
) ) ) ) )
( ( ( ( (Numerical Ecology of Aquatic Systems
) ) ) ) )   Mons University, Belgium
( ( ( ( (
..

On 21/04/10 03:17, shotwelm wrote:

Simon is right of course, there are plenty of sensors that would work
just fine at 9600 baud (like a thermistor rigged to an ADC). There's a
theorem along these lines (Nyquist sampling theorem?). I think piping
the output to R is a clever solution. I added a few lines to the ttys.c
program so that the baud rate is a command line option (i.e. -B9600)
   and confirmed it will compile in
Linux (2.6.30). Maybe it will save a step. Microcontrollers really are
addictive!

For an ioctl package, I was originally thinking of using file
descriptors directly. However, I agree this feels like subverting what
could be an extension of the connections API. Given that "everything is
a file" in POSIX systems, there may be an argument for an ioctl package
that is independent of the connections implementation, say to do things
that connections were not designed to do. For example, interfacing with
V4L2 devices usually involves many ioctl calls, an mmap call, but rarely
read or write calls. But maybe it would just be better to pipe this type
of output to R also...

-Matt

On Tue, 2010-04-20 at 16:42 -0400, Simon Urbanek wrote:

On Apr 20, 2010, at 11:51 AM, shotwelm wrote:


I've done some microcontroller work over serial also. Unfortunately, 
interfacing with a serial port is system dependent, and the mechanisms can be 
quite different, as you probably know. It appears that Simon has a solution 
below that 

[Rd] suggestion how to use memcpy in duplicate.c

2010-04-21 Thread Matthew Dowle
>From copyVector in duplicate.c :

void copyVector(SEXP s, SEXP t)
{
int i, ns, nt;
nt = LENGTH(t);
ns = LENGTH(s);
switch (TYPEOF(s)) {
...
case INTSXP:
for (i = 0; i < ns; i++)
INTEGER(s)[i] = INTEGER(t)[i % nt];
break;
...

could that be replaced with :

case INTSXP:
for (i=0; i: surely memcpy would be faster here?

which seems to refer to the for loop  :

else { \
int __i__; \
type *__fp__ = fun(from), *__tp__ = fun(to); \
for (__i__ = 0; __i__ < __n__; __i__++) \
  __tp__[__i__] = __fp__[__i__]; \
  } \

Could that loop be replaced by the following ?

   else { \
   memcpy((char *)DATAPTR(to), (char *)DATAPTR(from), __n__*sizeof(type)); \
   }\

In the data.table package, dogroups.c uses this technique, so the principle 
is tested and works well so far.

Are there any road blocks preventing this change, or is anyone already 
working on it ?  If not then I'll try and test it (on Ubuntu 32bit) and 
submit patch with timings, as before.  Comments/pointers much appreciated.

Matthew

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] suggestion how to use memcpy in duplicate.c

2010-04-21 Thread Simon Urbanek
Matt,

On Apr 21, 2010, at 11:54 AM, Matthew Dowle wrote:

>> From copyVector in duplicate.c :
> 
> void copyVector(SEXP s, SEXP t)
> {
>int i, ns, nt;
>nt = LENGTH(t);
>ns = LENGTH(s);
>switch (TYPEOF(s)) {
> ...
>case INTSXP:
>for (i = 0; i < ns; i++)
>INTEGER(s)[i] = INTEGER(t)[i % nt];
>break;
> ...
> 
> could that be replaced with :
> 
>case INTSXP:
>for (i=0; imemcpy((char *)DATAPTR(s)+i*nt*sizeof(int), (char *)DATAPTR(t), 
> nt*sizeof(int));
>break;
> 

Won't that miss the last incomplete chunk? (and please don't use DATAPTR on 
INTSXP even though the effect is currently the same)

In general it seems that the it depends on nt whether this is efficient or not 
since calls to short memcpy are expensive (very small nt that is).

 I ran some empirical tests to compare memcpy vs for() (x86_64, OS X) and the 
results were encouraging - depending on the size of the copied block the 
difference could be quite big:
tiny block (ca. n = 32 or less) - for() is faster
small block (n ~ 1k) - memcpy is ca. 8x faster
as the size increases the gap closes (presumably due to RAM bandwidth 
limitations) so for n = 512M it is ~30%.

Of course this is contingent on the implementation of memcpy, compiler, 
architecture etc. And will only matter if copying is what you do most of the 
time ...

Cheers,
Simon



> and similar for the other types in copyVector.  This won't help regular 
> vector copies, since those seem to be done by the DUPLICATE_ATOMIC_VECTOR 
> macro, see next suggestion below, but it should help copyMatrix which calls 
> copyVector, scan.c which calls copyVector on three lines, dcf.c (once) and 
> dounzip.c (once).
> 
> For the DUPLICATE_ATOMIC_VECTOR macro there is already a comment next to it 
> :
> 
>: surely memcpy would be faster here?
> 
> which seems to refer to the for loop  :
> 
>else { \
>int __i__; \
>type *__fp__ = fun(from), *__tp__ = fun(to); \
>for (__i__ = 0; __i__ < __n__; __i__++) \
>  __tp__[__i__] = __fp__[__i__]; \
>  } \
> 
> Could that loop be replaced by the following ?
> 
>   else { \
>   memcpy((char *)DATAPTR(to), (char *)DATAPTR(from), __n__*sizeof(type)); \
>   }\
> 
> In the data.table package, dogroups.c uses this technique, so the principle 
> is tested and works well so far.
> 
> Are there any road blocks preventing this change, or is anyone already 
> working on it ?  If not then I'll try and test it (on Ubuntu 32bit) and 
> submit patch with timings, as before.  Comments/pointers much appreciated.
> 
> Matthew
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Bugs? when dealing with contrasts

2010-04-21 Thread Gabor Grothendieck
R version 2.10.1 (2009-12-14)
Copyright (C) 2009 The R Foundation for Statistical Computing
ISBN 3-900051-07-0

R is free software and comes with ABSOLUTELY NO WARRANTY.
You are welcome to redistribute it under certain conditions.
Type 'license()' or 'licence()' for distribution details.

  Natural language support but running in an English locale

R is a collaborative project with many contributors.
Type 'contributors()' for more information and
'citation()' on how to cite R or R packages in publications.
Below are two cases where I don't seem to be getting contr.sum
contrasts even though they were specified.  Are these bugs?

The first case is an interaction between continuous and factor variables.

The second case contrasts= was specified as an arg to lm.  The second
works ok if we set the contrasts through options but not if we set it
through an lm argument.

>
> # 1. In this case I don't seem to be getting contr.sum contrasts:
>
> options(contrasts = c("contr.sum", "contr.poly"))
> getOption("contrasts")
[1] "contr.sum"  "contr.poly"
> scores <- rep(seq(-2, 2), 3); scores
 [1] -2 -1  0  1  2 -2 -1  0  1  2 -2 -1  0  1  2
> fac <- gl(3, 5); fac
 [1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3
Levels: 1 2 3
>
> # I get this:
> model.matrix(~ scores:fac)
   (Intercept) scores:fac1 scores:fac2 scores:fac3
11  -2   0   0
21  -1   0   0
31   0   0   0
41   1   0   0
51   2   0   0
61   0  -2   0
71   0  -1   0
81   0   0   0
91   0   1   0
10   1   0   2   0
11   1   0   0  -2
12   1   0   0  -1
13   1   0   0   0
14   1   0   0   1
15   1   0   0   2
attr(,"assign")
[1] 0 1 1 1
attr(,"contrasts")
attr(,"contrasts")$fac
[1] "contr.sum"

>
> # But I was expecting this since I am using contr.sum
> cbind(1, model.matrix(~ fac)[,2:3] * scores)
 fac1 fac2
1  1   -20
2  1   -10
3  100
4  110
5  120
6  10   -2
7  10   -1
8  100
9  101
10 102
11 122
12 111
13 100
14 1   -1   -1
15 1   -2   -2
>
>
> # 2.
> # here I don't get contr.sum but rather get contr.treatment
> options(contrasts = c("contr.treatment", "contr.poly"))
> getOption("contrasts")
[1] "contr.treatment" "contr.poly"
> model.matrix(lm(seq(15) ~ fac, contrasts = c("contr.sum", "contr.poly")))
   (Intercept) fac2 fac3
1100
2100
3100
4100
5100
6110
7110
8110
9110
10   110
11   101
12   101
13   101
14   101
15   101
attr(,"assign")
[1] 0 1 1
attr(,"contrasts")
attr(,"contrasts")$fac
[1] "contr.treatment"

> model.matrix(lm(seq(15) ~ fac)) # same
   (Intercept) fac2 fac3
1100
2100
3100
4100
5100
6110
7110
8110
9110
10   110
11   101
12   101
13   101
14   101
15   101
attr(,"assign")
[1] 0 1 1
attr(,"contrasts")
attr(,"contrasts")$fac
[1] "contr.treatment"

>
> # I was expecting the first one to give me this
> options(contrasts = c("contr.sum", "contr.poly"))
> model.matrix(lm(seq(15) ~ fac))
   (Intercept) fac1 fac2
1110
2110
3110
4110
5110
6101
7101
8101
9101
10   101
11   1   -1   -1
12   1   -1   -1
13   1   -1   -1
14   1   -1   -1
15   1   -1   -1
attr(,"assign")
[1] 0 1 1
attr(,"contrasts")
attr(,"contrasts")$fac
[1] "contr.sum"

>
> R.version.string
[1] "R version 2.10.1 (2009-12-14)"
> win.version()
[1] "Windows Vista (build 6002) Service Pack 2"

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] suggestion how to use memcpy in duplicate.c

2010-04-21 Thread Seth Falcon

On 4/21/10 10:45 AM, Simon Urbanek wrote:

Won't that miss the last incomplete chunk? (and please don't use
DATAPTR on INTSXP even though the effect is currently the same)

In general it seems that the it depends on nt whether this is
efficient or not since calls to short memcpy are expensive (very
small nt that is).

I ran some empirical tests to compare memcpy vs for() (x86_64, OS X)
and the results were encouraging - depending on the size of the
copied block the difference could be quite big: tiny block (ca. n =
32 or less) - for() is faster small block (n ~ 1k) - memcpy is ca. 8x
faster as the size increases the gap closes (presumably due to RAM
bandwidth limitations) so for n = 512M it is ~30%.




Of course this is contingent on the implementation of memcpy,
compiler, architecture etc. And will only matter if copying is what
you do most of the time ...


Copying of vectors is something that I would expect to happen fairly 
often in many applications of R.


Is for() faster on small blocks by enough that one would want to branch 
based on size?


+ seth

--
Seth Falcon | @sfalcon | http://userprimary.net/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] suggestion how to use memcpy in duplicate.c

2010-04-21 Thread Romain Francois

Le 21/04/10 17:54, Matthew Dowle a écrit :



From copyVector in duplicate.c :


void copyVector(SEXP s, SEXP t)
{
 int i, ns, nt;
 nt = LENGTH(t);
 ns = LENGTH(s);
 switch (TYPEOF(s)) {
...
 case INTSXP:
 for (i = 0; i<  ns; i++)
 INTEGER(s)[i] = INTEGER(t)[i % nt];
 break;
...

could that be replaced with :

 case INTSXP:
 for (i=0; i

or at least with something like this:

int* p_s = INTEGER(s) ;
int* p_t = INTEGER(t) ;
for( i=0 ; i < ns ; i++){
p_s[i] = p_t[i % nt];
}

since expanding the INTEGER macro over and over has a price.


and similar for the other types in copyVector.  This won't help regular
vector copies, since those seem to be done by the DUPLICATE_ATOMIC_VECTOR
macro, see next suggestion below, but it should help copyMatrix which calls
copyVector, scan.c which calls copyVector on three lines, dcf.c (once) and
dounzip.c (once).

For the DUPLICATE_ATOMIC_VECTOR macro there is already a comment next to it
:

 : surely memcpy would be faster here?

which seems to refer to the for loop  :

 else { \
 int __i__; \
 type *__fp__ = fun(from), *__tp__ = fun(to); \
 for (__i__ = 0; __i__<  __n__; __i__++) \
   __tp__[__i__] = __fp__[__i__]; \
   } \

Could that loop be replaced by the following ?

else { \
memcpy((char *)DATAPTR(to), (char *)DATAPTR(from), __n__*sizeof(type)); \
}\

In the data.table package, dogroups.c uses this technique, so the principle
is tested and works well so far.

Are there any road blocks preventing this change, or is anyone already
working on it ?  If not then I'll try and test it (on Ubuntu 32bit) and
submit patch with timings, as before.  Comments/pointers much appreciated.

Matthew

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel





--
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30
http://romainfrancois.blog.free.fr
|- http://bit.ly/9aKDM9 : embed images in Rd documents
|- http://tr.im/OIXN : raster images and RImageJ
|- http://tr.im/OcQe : Rcpp 0.7.7

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] suggestion how to use memcpy in duplicate.c

2010-04-21 Thread Simon Urbanek

On Apr 21, 2010, at 2:15 PM, Seth Falcon wrote:

> On 4/21/10 10:45 AM, Simon Urbanek wrote:
>> Won't that miss the last incomplete chunk? (and please don't use
>> DATAPTR on INTSXP even though the effect is currently the same)
>> 
>> In general it seems that the it depends on nt whether this is
>> efficient or not since calls to short memcpy are expensive (very
>> small nt that is).
>> 
>> I ran some empirical tests to compare memcpy vs for() (x86_64, OS X)
>> and the results were encouraging - depending on the size of the
>> copied block the difference could be quite big: tiny block (ca. n =
>> 32 or less) - for() is faster small block (n ~ 1k) - memcpy is ca. 8x
>> faster as the size increases the gap closes (presumably due to RAM
>> bandwidth limitations) so for n = 512M it is ~30%.
>> 
> 
>> Of course this is contingent on the implementation of memcpy,
>> compiler, architecture etc. And will only matter if copying is what
>> you do most of the time ...
> 
> Copying of vectors is something that I would expect to happen fairly often in 
> many applications of R.
> 
> Is for() faster on small blocks by enough that one would want to branch based 
> on size?
> 

Good question. Given that the branching itself adds overhead possibly not. In 
the best case for() can be ~40% faster (for single-digit n) but that means 
billions of copies to make a difference (since the operation itself is so 
fast). The break-even point on my test machine is n=32 and when I added the 
branching it took 20% hit so I guess it's simply not worth it. The only case 
that may be worth branching is n:1 since that is likely a fairly common use 
(the branching penalty in copy routines is lower than comparing memcpy/for 
implementations since the branching can be done before the outer for loop so 
this may vary case-by-case).

Cheers,
Simon

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] suggestion how to use memcpy in duplicate.c

2010-04-21 Thread Simon Urbanek

On Apr 21, 2010, at 3:32 PM, Romain Francois wrote:

> Le 21/04/10 17:54, Matthew Dowle a écrit :
>> 
>>> From copyVector in duplicate.c :
>> 
>> void copyVector(SEXP s, SEXP t)
>> {
>> int i, ns, nt;
>> nt = LENGTH(t);
>> ns = LENGTH(s);
>> switch (TYPEOF(s)) {
>> ...
>> case INTSXP:
>> for (i = 0; i<  ns; i++)
>> INTEGER(s)[i] = INTEGER(t)[i % nt];
>> break;
>> ...
>> 
>> could that be replaced with :
>> 
>> case INTSXP:
>> for (i=0; i> memcpy((char *)DATAPTR(s)+i*nt*sizeof(int), (char *)DATAPTR(t),
>> nt*sizeof(int));
>> break;
> 
> or at least with something like this:
> 
> int* p_s = INTEGER(s) ;
> int* p_t = INTEGER(t) ;
> for( i=0 ; i < ns ; i++){
>   p_s[i] = p_t[i % nt];
> }
> 
> since expanding the INTEGER macro over and over has a price.
> 

... in packages, yes, but not in core R.

Cheers,
Simon


>> and similar for the other types in copyVector.  This won't help regular
>> vector copies, since those seem to be done by the DUPLICATE_ATOMIC_VECTOR
>> macro, see next suggestion below, but it should help copyMatrix which calls
>> copyVector, scan.c which calls copyVector on three lines, dcf.c (once) and
>> dounzip.c (once).
>> 
>> For the DUPLICATE_ATOMIC_VECTOR macro there is already a comment next to it
>> :
>> 
>> : surely memcpy would be faster here?
>> 
>> which seems to refer to the for loop  :
>> 
>> else { \
>> int __i__; \
>> type *__fp__ = fun(from), *__tp__ = fun(to); \
>> for (__i__ = 0; __i__<  __n__; __i__++) \
>>   __tp__[__i__] = __fp__[__i__]; \
>>   } \
>> 
>> Could that loop be replaced by the following ?
>> 
>>else { \
>>memcpy((char *)DATAPTR(to), (char *)DATAPTR(from), __n__*sizeof(type)); \
>>}\
>> 
>> In the data.table package, dogroups.c uses this technique, so the principle
>> is tested and works well so far.
>> 
>> Are there any road blocks preventing this change, or is anyone already
>> working on it ?  If not then I'll try and test it (on Ubuntu 32bit) and
>> submit patch with timings, as before.  Comments/pointers much appreciated.
>> 
>> Matthew
>> 
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>> 
>> 
> 
> 
> -- 
> Romain Francois
> Professional R Enthusiast
> +33(0) 6 28 91 30 30
> http://romainfrancois.blog.free.fr
> |- http://bit.ly/9aKDM9 : embed images in Rd documents
> |- http://tr.im/OIXN : raster images and RImageJ
> |- http://tr.im/OcQe : Rcpp 0.7.7
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] suggestion how to use memcpy in duplicate.c

2010-04-21 Thread Romain Francois

Le 21/04/10 21:39, Simon Urbanek a écrit :



On Apr 21, 2010, at 3:32 PM, Romain Francois wrote:


Le 21/04/10 17:54, Matthew Dowle a écrit :



 From copyVector in duplicate.c :


void copyVector(SEXP s, SEXP t)
{
 int i, ns, nt;
 nt = LENGTH(t);
 ns = LENGTH(s);
 switch (TYPEOF(s)) {
...
 case INTSXP:
 for (i = 0; i<   ns; i++)
 INTEGER(s)[i] = INTEGER(t)[i % nt];
 break;
...

could that be replaced with :

 case INTSXP:
 for (i=0; i

or at least with something like this:

int* p_s = INTEGER(s) ;
int* p_t = INTEGER(t) ;
for( i=0 ; i<  ns ; i++){
p_s[i] = p_t[i % nt];
}

since expanding the INTEGER macro over and over has a price.



... in packages, yes, but not in core R.


Hmm. I was not talking about the overhead of the INTEGER function:

int *(INTEGER)(SEXP x) {
if(TYPEOF(x) != INTSXP && TYPEOF(x) != LGLSXP)
error("%s() can only be applied to a '%s', not a '%s'",
  "INTEGER", "integer", type2char(TYPEOF(x)));
return INTEGER(x);
}



but the one related to the macro.

#define INTEGER(x)  ((int *) DATAPTR(x))
#define DATAPTR(x)  (((SEXPREC_ALIGN *) (x)) + 1)

so the loop expands to :

for (i = 0; i<   ns; i++)
  ((int *) (((SEXPREC_ALIGN *) (s)) + 1))[i] = ((int *) 
(((SEXPREC_ALIGN *) (t)) + 1))[i % nt];


I still believe grabbing the pointer just once for s and once for t is 
more efficient ...



Cheers,
Simon



and similar for the other types in copyVector.  This won't help regular
vector copies, since those seem to be done by the DUPLICATE_ATOMIC_VECTOR
macro, see next suggestion below, but it should help copyMatrix which calls
copyVector, scan.c which calls copyVector on three lines, dcf.c (once) and
dounzip.c (once).

For the DUPLICATE_ATOMIC_VECTOR macro there is already a comment next to it
:

 : surely memcpy would be faster here?

which seems to refer to the for loop  :

 else { \
 int __i__; \
 type *__fp__ = fun(from), *__tp__ = fun(to); \
 for (__i__ = 0; __i__<   __n__; __i__++) \
   __tp__[__i__] = __fp__[__i__]; \
   } \

Could that loop be replaced by the following ?

else { \
memcpy((char *)DATAPTR(to), (char *)DATAPTR(from), __n__*sizeof(type)); \
}\

In the data.table package, dogroups.c uses this technique, so the principle
is tested and works well so far.

Are there any road blocks preventing this change, or is anyone already
working on it ?  If not then I'll try and test it (on Ubuntu 32bit) and
submit patch with timings, as before.  Comments/pointers much appreciated.

Matthew



--
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30
http://romainfrancois.blog.free.fr
|- http://bit.ly/9aKDM9 : embed images in Rd documents
|- http://tr.im/OIXN : raster images and RImageJ
|- http://tr.im/OcQe : Rcpp 0.7.7

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Bugs? when dealing with contrasts

2010-04-21 Thread Peter Dalgaard
Gabor Grothendieck wrote:
> R version 2.10.1 (2009-12-14)
> Copyright (C) 2009 The R Foundation for Statistical Computing
> ISBN 3-900051-07-0
> 
> R is free software and comes with ABSOLUTELY NO WARRANTY.
> You are welcome to redistribute it under certain conditions.
> Type 'license()' or 'licence()' for distribution details.
> 
>   Natural language support but running in an English locale
> 
> R is a collaborative project with many contributors.
> Type 'contributors()' for more information and
> 'citation()' on how to cite R or R packages in publications.
> Below are two cases where I don't seem to be getting contr.sum
> contrasts even though they were specified.  Are these bugs?
> 
> The first case is an interaction between continuous and factor variables.
> 
> The second case contrasts= was specified as an arg to lm.  The second
> works ok if we set the contrasts through options but not if we set it
> through an lm argument.

In case #2, I think you just failed to read the help page.

> model.matrix(~fac, contrasts=list(fac="contr.sum"))
   (Intercept) fac1 fac2
1110
2110
3110
4110
5110
6101
7101
8101
9101
10   101
11   1   -1   -1
12   1   -1   -1
13   1   -1   -1
14   1   -1   -1
15   1   -1   -1
attr(,"assign")
[1] 0 1 1
attr(,"contrasts")
attr(,"contrasts")$fac
[1] "contr.sum"

As for case #1, the rules are tricky in cases where interactions are
present without main effects, but AFAICS, what you observe is
essentially the same effect as

> model.matrix(~fac-1, contrasts=list(fac="contr.sum"))
   fac1 fac2 fac3
1 100
2 100
3 100
4 100
5 100
6 010
7 010
8 010
9 010
10010
11001
12001
13001
14001
15001
attr(,"assign")
[1] 1 1 1
attr(,"contrasts")
attr(,"contrasts")$fac
[1] "contr.sum"


I.e., that R reverts to using indicator variables when the intercept is
absent.

> 
>> # 1. In this case I don't seem to be getting contr.sum contrasts:
>>
>> options(contrasts = c("contr.sum", "contr.poly"))
>> getOption("contrasts")
> [1] "contr.sum"  "contr.poly"
>> scores <- rep(seq(-2, 2), 3); scores
>  [1] -2 -1  0  1  2 -2 -1  0  1  2 -2 -1  0  1  2
>> fac <- gl(3, 5); fac
>  [1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3
> Levels: 1 2 3
>> # I get this:
>> model.matrix(~ scores:fac)
>(Intercept) scores:fac1 scores:fac2 scores:fac3
> 11  -2   0   0
> 21  -1   0   0
> 31   0   0   0
> 41   1   0   0
> 51   2   0   0
> 61   0  -2   0
> 71   0  -1   0
> 81   0   0   0
> 91   0   1   0
> 10   1   0   2   0
> 11   1   0   0  -2
> 12   1   0   0  -1
> 13   1   0   0   0
> 14   1   0   0   1
> 15   1   0   0   2
> attr(,"assign")
> [1] 0 1 1 1
> attr(,"contrasts")
> attr(,"contrasts")$fac
> [1] "contr.sum"
> 
>> # But I was expecting this since I am using contr.sum
>> cbind(1, model.matrix(~ fac)[,2:3] * scores)
>  fac1 fac2
> 1  1   -20
> 2  1   -10
> 3  100
> 4  110
> 5  120
> 6  10   -2
> 7  10   -1
> 8  100
> 9  101
> 10 102
> 11 122
> 12 111
> 13 100
> 14 1   -1   -1
> 15 1   -2   -2
>>
>> # 2.
>> # here I don't get contr.sum but rather get contr.treatment
>> options(contrasts = c("contr.treatment", "contr.poly"))
>> getOption("contrasts")
> [1] "contr.treatment" "contr.poly"
>> model.matrix(lm(seq(15) ~ fac, contrasts = c("contr.sum", "contr.poly")))
>(Intercept) fac2 fac3
> 1100
> 2100
> 3100
> 4100
> 5100
> 6110
> 7110
> 8110
> 9110
> 10   110
> 11   101
> 12   101
> 13   101
> 14   101
> 15   101
> attr(,"assign")
> [1] 0 1 1
> attr(,"contrasts")
> attr(,"contrasts")$fac
> [1] "contr.treatment"
> 
>> model.matrix(lm(seq(15) ~ fac)) # same
>(Intercept) fac2 fac3
> 1100
> 2100
> 3100
> 4100
> 5100
> 6110
> 71 

Re: [Rd] suggestion how to use memcpy in duplicate.c

2010-04-21 Thread Simon Urbanek

On Apr 21, 2010, at 4:13 PM, Romain Francois wrote:

> Le 21/04/10 21:39, Simon Urbanek a écrit :
>> 
>> 
>> On Apr 21, 2010, at 3:32 PM, Romain Francois wrote:
>> 
>>> Le 21/04/10 17:54, Matthew Dowle a écrit :
 
> From copyVector in duplicate.c :
 
 void copyVector(SEXP s, SEXP t)
 {
 int i, ns, nt;
 nt = LENGTH(t);
 ns = LENGTH(s);
 switch (TYPEOF(s)) {
 ...
 case INTSXP:
 for (i = 0; i<   ns; i++)
 INTEGER(s)[i] = INTEGER(t)[i % nt];
 break;
 ...
 
 could that be replaced with :
 
 case INTSXP:
 for (i=0; i>>> memcpy((char *)DATAPTR(s)+i*nt*sizeof(int), (char *)DATAPTR(t),
 nt*sizeof(int));
 break;
>>> 
>>> or at least with something like this:
>>> 
>>> int* p_s = INTEGER(s) ;
>>> int* p_t = INTEGER(t) ;
>>> for( i=0 ; i<  ns ; i++){
>>> p_s[i] = p_t[i % nt];
>>> }
>>> 
>>> since expanding the INTEGER macro over and over has a price.
>>> 
>> 
>> ... in packages, yes, but not in core R.
> 
> Hmm. I was not talking about the overhead of the INTEGER function:
> 
> int *(INTEGER)(SEXP x) {
>if(TYPEOF(x) != INTSXP && TYPEOF(x) != LGLSXP)
>   error("%s() can only be applied to a '%s', not a '%s'",
> "INTEGER", "integer", type2char(TYPEOF(x)));
>return INTEGER(x);
> }
> 
> 
> 
> but the one related to the macro.
> 
> #define INTEGER(x)((int *) DATAPTR(x))
> #define DATAPTR(x)(((SEXPREC_ALIGN *) (x)) + 1)
> 
> so the loop expands to :
> 
> for (i = 0; i<   ns; i++)
>  ((int *) (((SEXPREC_ALIGN *) (s)) + 1))[i] = ((int *) 
> (((SEXPREC_ALIGN *) (t)) + 1))[i % nt];
> 
> I still believe grabbing the pointer just once for s and once for t is more 
> efficient ...
> 

Nope, since everything involved is loop invariant so the pointer values don't 
change (you'd have to declare s or t volatile to prevent that).

Try using gcc -s and you'll see that the code is the same (depending on the 
version of gcc the order of the first comparison can change so technically the 
INTEGER(x) version can save one add instruction in the degenerate case and be 
faster(!) in old gcc).

Cheers,
Simon



>> 
 and similar for the other types in copyVector.  This won't help regular
 vector copies, since those seem to be done by the DUPLICATE_ATOMIC_VECTOR
 macro, see next suggestion below, but it should help copyMatrix which calls
 copyVector, scan.c which calls copyVector on three lines, dcf.c (once) and
 dounzip.c (once).
 
 For the DUPLICATE_ATOMIC_VECTOR macro there is already a comment next to it
 :
 
 : surely memcpy would be faster here?
 
 which seems to refer to the for loop  :
 
 else { \
 int __i__; \
 type *__fp__ = fun(from), *__tp__ = fun(to); \
 for (__i__ = 0; __i__<   __n__; __i__++) \
   __tp__[__i__] = __fp__[__i__]; \
   } \
 
 Could that loop be replaced by the following ?
 
else { \
memcpy((char *)DATAPTR(to), (char *)DATAPTR(from), __n__*sizeof(type)); 
 \
}\
 
 In the data.table package, dogroups.c uses this technique, so the principle
 is tested and works well so far.
 
 Are there any road blocks preventing this change, or is anyone already
 working on it ?  If not then I'll try and test it (on Ubuntu 32bit) and
 submit patch with timings, as before.  Comments/pointers much appreciated.
 
 Matthew
> 
> 
> -- 
> Romain Francois
> Professional R Enthusiast
> +33(0) 6 28 91 30 30
> http://romainfrancois.blog.free.fr
> |- http://bit.ly/9aKDM9 : embed images in Rd documents
> |- http://tr.im/OIXN : raster images and RImageJ
> |- http://tr.im/OcQe : Rcpp 0.7.7
> 
> 
> 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Bugs? when dealing with contrasts

2010-04-21 Thread Gabor Grothendieck
On Wed, Apr 21, 2010 at 4:26 PM, Peter Dalgaard  wrote:
> As for case #1, the rules are tricky in cases where interactions are
> present without main effects, but AFAICS, what you observe is
> essentially the same effect as
>
>> model.matrix(~fac-1, contrasts=list(fac="contr.sum"))
>   fac1 fac2 fac3
> 1     1    0    0
> 2     1    0    0
> 3     1    0    0
> 4     1    0    0
> 5     1    0    0
> 6     0    1    0
> 7     0    1    0
> 8     0    1    0
> 9     0    1    0
> 10    0    1    0
> 11    0    0    1
> 12    0    0    1
> 13    0    0    1
> 14    0    0    1
> 15    0    0    1
> attr(,"assign")
> [1] 1 1 1
> attr(,"contrasts")
> attr(,"contrasts")$fac
> [1] "contr.sum"
>
>
> I.e., that R reverts to using indicator variables when the intercept is
> absent.

Is there any nice way of getting contr.sum coding for the interaction
as opposed to the ugly code in my post that I used to force it? i.e.
cbind(1, model.matrix(~ fac)[,2:3] * scores)

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] suggestion how to use memcpy in duplicate.c

2010-04-21 Thread Simon Urbanek

On Apr 21, 2010, at 4:39 PM, Simon Urbanek wrote:

> 
> On Apr 21, 2010, at 4:13 PM, Romain Francois wrote:
> 
>> Le 21/04/10 21:39, Simon Urbanek a écrit :
>>> 
>>> 
>>> On Apr 21, 2010, at 3:32 PM, Romain Francois wrote:
>>> 
 Le 21/04/10 17:54, Matthew Dowle a écrit :
> 
>> From copyVector in duplicate.c :
> 
> void copyVector(SEXP s, SEXP t)
> {
>int i, ns, nt;
>nt = LENGTH(t);
>ns = LENGTH(s);
>switch (TYPEOF(s)) {
> ...
>case INTSXP:
>for (i = 0; i<   ns; i++)
>INTEGER(s)[i] = INTEGER(t)[i % nt];
>break;
> ...
> 
> could that be replaced with :
> 
>case INTSXP:
>for (i=0; imemcpy((char *)DATAPTR(s)+i*nt*sizeof(int), (char *)DATAPTR(t),
> nt*sizeof(int));
>break;
 
 or at least with something like this:
 
 int* p_s = INTEGER(s) ;
 int* p_t = INTEGER(t) ;
 for( i=0 ; i<  ns ; i++){
p_s[i] = p_t[i % nt];
 }
 
 since expanding the INTEGER macro over and over has a price.
 
>>> 
>>> ... in packages, yes, but not in core R.
>> 
>> Hmm. I was not talking about the overhead of the INTEGER function:
>> 
>> int *(INTEGER)(SEXP x) {
>>   if(TYPEOF(x) != INTSXP && TYPEOF(x) != LGLSXP)
>>  error("%s() can only be applied to a '%s', not a '%s'",
>>"INTEGER", "integer", type2char(TYPEOF(x)));
>>   return INTEGER(x);
>> }
>> 
>> 
>> 
>> but the one related to the macro.
>> 
>> #define INTEGER(x)   ((int *) DATAPTR(x))
>> #define DATAPTR(x)   (((SEXPREC_ALIGN *) (x)) + 1)
>> 
>> so the loop expands to :
>> 
>> for (i = 0; i<   ns; i++)
>> ((int *) (((SEXPREC_ALIGN *) (s)) + 1))[i] = ((int *) 
>> (((SEXPREC_ALIGN *) (t)) + 1))[i % nt];
>> 
>> I still believe grabbing the pointer just once for s and once for t is more 
>> efficient ...
>> 
> 
> Nope, since everything involved is loop invariant so the pointer values don't 
> change (you'd have to declare s or t volatile to prevent that).
> 
> Try using gcc -s

Sorry, I meant gcc -S of course.


> and you'll see that the code is the same (depending on the version of gcc the 
> order of the first comparison can change so technically the INTEGER(x) 
> version can save one add instruction in the degenerate case and be faster(!) 
> in old gcc

[the reason being that INTEGER(x) does not need to be evaluated if the loop is 
not entered whereas p_s and p_t are populated unconditionally - smarter 
compilers will notice that p_s/p_t are not used in that case and thus generate 
identical code in both cases]

Cheers,
Simon


> 
> 
>>> 
> and similar for the other types in copyVector.  This won't help regular
> vector copies, since those seem to be done by the DUPLICATE_ATOMIC_VECTOR
> macro, see next suggestion below, but it should help copyMatrix which 
> calls
> copyVector, scan.c which calls copyVector on three lines, dcf.c (once) and
> dounzip.c (once).
> 
> For the DUPLICATE_ATOMIC_VECTOR macro there is already a comment next to 
> it
> :
> 
>: surely memcpy would be faster here?
> 
> which seems to refer to the for loop  :
> 
>else { \
>int __i__; \
>type *__fp__ = fun(from), *__tp__ = fun(to); \
>for (__i__ = 0; __i__<   __n__; __i__++) \
>  __tp__[__i__] = __fp__[__i__]; \
>  } \
> 
> Could that loop be replaced by the following ?
> 
>   else { \
>   memcpy((char *)DATAPTR(to), (char *)DATAPTR(from), __n__*sizeof(type)); 
> \
>   }\
> 
> In the data.table package, dogroups.c uses this technique, so the 
> principle
> is tested and works well so far.
> 
> Are there any road blocks preventing this change, or is anyone already
> working on it ?  If not then I'll try and test it (on Ubuntu 32bit) and
> submit patch with timings, as before.  Comments/pointers much appreciated.
> 
> Matthew
>> 
>> 
>> -- 
>> Romain Francois
>> Professional R Enthusiast
>> +33(0) 6 28 91 30 30
>> http://romainfrancois.blog.free.fr
>> |- http://bit.ly/9aKDM9 : embed images in Rd documents
>> |- http://tr.im/OIXN : raster images and RImageJ
>> |- http://tr.im/OcQe : Rcpp 0.7.7
>> 
>> 
>> 
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] suggestion how to use memcpy in duplicate.c

2010-04-21 Thread William Dunlap
If I were worried about the time this loop takes,
I would avoid using i%nt.  For the attached C code
compile with gcc 4.3.3 with -O2 I get 
  > # INTEGER() in loop
  > system.time( r1 <- .Call("my_rep1", 1:3, 1e7) )
 user  system elapsed
0.060   0.012   0.071

  > # INTEGER() before loop
  > system.time( r2 <- .Call("my_rep2", 1:3, 1e7) )
 user  system elapsed
0.076   0.008   0.086

  > # replace i%src_length in loop with j=0 before loop and
  > #if(++j==src_length) j=0 ;
  > # in the loop.
  > system.time( r3 <- .Call("my_rep3", 1:3, 1e7) )
 user  system elapsed
0.024   0.028   0.050
  > identical(r1,r2) && identical(r2,r3)
  [1] TRUE

The C code is:
#define USE_RINTERNALS /* pretend we are in the R kernel */
#include 
#include 


SEXP my_rep1(SEXP s_src, SEXP s_dest_length)
{
int src_length = length(s_src) ;
int dest_length = asInteger(s_dest_length) ;
int i,j ;
SEXP s_dest ;
PROTECT(s_dest = allocVector(INTSXP, dest_length)) ;
if(TYPEOF(s_src) != INTSXP) error("src must be integer data") ;
for(i=0;i -Original Message-
> From: r-devel-boun...@r-project.org 
> [mailto:r-devel-boun...@r-project.org] On Behalf Of Romain Francois
> Sent: Wednesday, April 21, 2010 12:32 PM
> To: Matthew Dowle
> Cc: r-de...@stat.math.ethz.ch
> Subject: Re: [Rd] suggestion how to use memcpy in duplicate.c
> 
> Le 21/04/10 17:54, Matthew Dowle a écrit :
> >
> >> From copyVector in duplicate.c :
> >
> > void copyVector(SEXP s, SEXP t)
> > {
> >  int i, ns, nt;
> >  nt = LENGTH(t);
> >  ns = LENGTH(s);
> >  switch (TYPEOF(s)) {
> > ...
> >  case INTSXP:
> >  for (i = 0; i<  ns; i++)
> >  INTEGER(s)[i] = INTEGER(t)[i % nt];
> >  break;
> > ...
> >
> > could that be replaced with :
> >
> >  case INTSXP:
> >  for (i=0; i >  memcpy((char *)DATAPTR(s)+i*nt*sizeof(int), (char 
> *)DATAPTR(t),
> > nt*sizeof(int));
> >  break;
> 
> or at least with something like this:
> 
> int* p_s = INTEGER(s) ;
> int* p_t = INTEGER(t) ;
> for( i=0 ; i < ns ; i++){
>   p_s[i] = p_t[i % nt];
> }
> 
> since expanding the INTEGER macro over and over has a price.
> 
> > and similar for the other types in copyVector.  This won't 
> help regular
> > vector copies, since those seem to be done by the 
> DUPLICATE_ATOMIC_VECTOR
> > macro, see next suggestion below, but it should help 
> copyMatrix which calls
> > copyVector, scan.c which calls copyVector on three lines, 
> dcf.c (once) and
> > dounzip.c (once).
> >
> > For the DUPLICATE_ATOMIC_VECTOR macro there is already a 
> comment next to it
> > :
> >
> >  : surely memcpy would be faster here?
> >
> > which seems to refer to the for loop  :
> >
> >  else { \
> >  int __i__; \
> >  type *__fp__ = fun(from), *__tp__ = fun(to); \
> >  for (__i__ = 0; __i__<  __n__; __i__++) \
> >__tp__[__i__] = __fp__[__i__]; \
> >} \
> >
> > Could that loop be replaced by the following ?
> >
> > else { \
> > memcpy((char *)DATAPTR(to), (char *)DATAPTR(from), 
> __n__*sizeof(type)); \
> > }\
> >
> > In the data.table package, dogroups.c uses this technique, 
> so the principle
> > is tested and works well so far.
> >
> > Are there any road blocks preventing this change, or is 
> anyone already
> > working on it ?  If not then I'll try and test it (on 
> Ubuntu 32bit) and
> > submit patch with timings, as before.  Comments/pointers 
> much appreciated.
> >
> > Matthew
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
> >
> >
> 
> 
> -- 
> Romain Francois
> Professional R Enthusiast
> +33(0) 6 28 91 30 30
> http://romainfrancois.blog.free.fr
> |- http://bit.ly/9aKDM9 : embed images in Rd documents
> |- http://tr.im/OIXN : raster images and RImageJ
> |- http://tr.im/OcQe : Rcpp 0.7.7
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] RUnit bug?

2010-04-21 Thread Dominick Samperi
There appears to be a bug in RUnit.

Given a testsuite testsuite.math, say, when I run:

runTestSuite(testsuite.math)

this works fine, provided there are no extraneous files in the
unit test subdirectory.

But if there are any Emacs temp files (with names that
end with '~') then runTestSuite gets confused and tries to
run functions from the temp files as well.

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Question of R CMD check

2010-04-21 Thread rusers.sh
Hi all,
Today, i just installed the newest R version 2.10.1 and other necessary
tools for building R package under windows,e.g. Rtools, perl. All are the
newest version.
  After the correct configuration under windows (configuration should be
correct), i use it to re-check my old package. I found the following prolem
when checking EXAMPLEs in each function, which did not exist before this
re-installation.

* checking examples ... ERROR
 Running examples in 'stam-Ex.R' failed.

  I used "\dontrun{} % enddontrun" in all the examples of my functions that
should be no problem, i think. I checked my package before and did not find
errors. I also browsed the checking results in 'stam-Ex.R'. It listed all
the example codes in that file, something like this,
  cleanEx(); nameEx("stcdt")
  ### * stcdt
  flush(stderr()); flush(stdout())
  ###example codes

  I did not met this problem before.  Any ideas on solving this?
  Thanks a lot.

-- 
-
Jane Chang
Queen's

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Bugs? when dealing with contrasts

2010-04-21 Thread Berwin A Turlach
G'day Gabor,

On Wed, 21 Apr 2010 14:00:33 -0400
Gabor Grothendieck  wrote:

> Below are two cases where I don't seem to be getting contr.sum
> contrasts even though they were specified.  Are these bugs?

Short answer: no. :)

> The first case is an interaction between continuous and factor
> variables.
> 
> The second case contrasts= was specified as an arg to lm.  The second
> works ok if we set the contrasts through options but not if we set it
> through an lm argument.
> 
> >
> > # 1. In this case I don't seem to be getting contr.sum contrasts:
> >
> > options(contrasts = c("contr.sum", "contr.poly"))
> > getOption("contrasts")
> [1] "contr.sum"  "contr.poly"
> > scores <- rep(seq(-2, 2), 3); scores
>  [1] -2 -1  0  1  2 -2 -1  0  1  2 -2 -1  0  1  2
> > fac <- gl(3, 5); fac
>  [1] 1 1 1 1 1 2 2 2 2 2 3 3 3 3 3
> Levels: 1 2 3
> >
> > # I get this:
> > model.matrix(~ scores:fac)
>(Intercept) scores:fac1 scores:fac2 scores:fac3
> 11  -2   0   0
> 21  -1   0   0
> 31   0   0   0
> 41   1   0   0
> 51   2   0   0
> 61   0  -2   0
> 71   0  -1   0
> 81   0   0   0
> 91   0   1   0
> 10   1   0   2   0
> 11   1   0   0  -2
> 12   1   0   0  -1
> 13   1   0   0   0
> 14   1   0   0   1
> 15   1   0   0   2
> attr(,"assign")
> [1] 0 1 1 1
> attr(,"contrasts")
> attr(,"contrasts")$fac
> [1] "contr.sum"

When creating the model matrix, the various levels of a factor are
first coded as indicator variables (for creation of the main effects
and for creation of interactions).  The contrast matrix is then
applied to remove redundant columns from the design matrix; that is,
columns that are known to be redundant due to the *design* (i.e. model
formula as a whole), depending on whether data is available for all
levels of a factor (or combinations of levels if several factors are in
the model) you can still end up with a design matrix that does not have
full column rank.

In this case, since there is no main effect fac, the three columns that
are put into the design matrix due to the score:fac term do not create
a singularity, hence the contrast matrix is not applied to these three
columns.

The above description is probably a bit rough and lacking in detail (if
not even wrong in some details).  IMHO, the best explanation of how R
goes from the model formula to the actual design matrix (for linear
models) can still be found in MASS; though, if I am not mistaken,
current versions of R seem to proceed slightly different to that
description in more complicated models (i.e. several factors and
continuous explanatory variables).

> > # But I was expecting this since I am using contr.sum
> > cbind(1, model.matrix(~ fac)[,2:3] * scores)
>  fac1 fac2
> 1  1   -20
> 2  1   -10
> 3  100
> 4  110
> 5  120
> 6  10   -2
> 7  10   -1
> 8  100
> 9  101
> 10 102
> 11 122
> 12 111
> 13 100
> 14 1   -1   -1
> 15 1   -2   -2

If you wish to have the design matrix that contains the interaction
terms as if the model had the main effects too but without the columns
corresponding to the main effects, then just instruct R that you want
to have such a matrix:

R> model.matrix(~ scores*fac)[,-(2:4)]
   (Intercept) scores:fac1 scores:fac2
11  -2   0
21  -1   0
31   0   0
41   1   0
51   2   0
61   0  -2
71   0  -1
81   0   0
91   0   1
10   1   0   2
11   1   2   2
12   1   1   1
13   1   0   0
14   1  -1  -1
15   1  -2  -2

Though, I am not sure why one wants to fit such a model.  

> > # 2.
> > # here I don't get contr.sum but rather get contr.treatment
> > options(contrasts = c("contr.treatment", "contr.poly"))
> > getOption("contrasts")
> [1] "contr.treatment" "contr.poly"
> > model.matrix(lm(seq(15) ~ fac, contrasts = c("contr.sum",
> > "contr.poly")))
>(Intercept) fac2 fac3
> 1100
> 2100
> 3100
> 4100
> 5100
> 6110
> 7110
> 8110
> 9110
> 10   110
> 11

Re: [Rd] Bugs? when dealing with contrasts

2010-04-21 Thread Gabor Grothendieck
On Wed, Apr 21, 2010 at 11:38 PM, Berwin A Turlach
 wrote:
>> > # But I was expecting this since I am using contr.sum
>> > cbind(1, model.matrix(~ fac)[,2:3] * scores)
>>      fac1 fac2
>> 1  1   -2    0
>> 2  1   -1    0
>> 3  1    0    0
>> 4  1    1    0
>> 5  1    2    0
>> 6  1    0   -2
>> 7  1    0   -1
>> 8  1    0    0
>> 9  1    0    1
>> 10 1    0    2
>> 11 1    2    2
>> 12 1    1    1
>> 13 1    0    0
>> 14 1   -1   -1
>> 15 1   -2   -2
>
> If you wish to have the design matrix that contains the interaction
> terms as if the model had the main effects too but without the columns
> corresponding to the main effects, then just instruct R that you want
> to have such a matrix:
>
> R> model.matrix(~ scores*fac)[,-(2:4)]
>   (Intercept) scores:fac1 scores:fac2
> 1            1          -2           0
> 2            1          -1           0
> 3            1           0           0
> 4            1           1           0
> 5            1           2           0
> 6            1           0          -2
> 7            1           0          -1
> 8            1           0           0
> 9            1           0           1
> 10           1           0           2
> 11           1           2           2
> 12           1           1           1
> 13           1           0           0
> 14           1          -1          -1
> 15           1          -2          -2
>
> Though, I am not sure why one wants to fit such a model.

To save on degrees of freedom.

I had reduced it to the minimum needed to illustrate but in fact its
closer to this (except the coding is not the one needed):

options(contrasts = c("contr.sum", "contr.poly"))
tab <- as.table(matrix(1:21, 7))
dimnames(tab) = list(X = letters[1:7], Y = LETTERS[1:3])
rr <- factor(row(tab))
cc <- factor(col(tab))
scores <- rep(seq(-3,3), 3)
model.matrix( ~ rr + cc + scores:cc)

so the main effects are rr and cc but scores takes the place of rr in
the interaction.

Your description of the process seems right since it would predict
that the following gives the required coding and it does:

model.matrix(~ scores*cc + rr)[,-2]

Thanks.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Bugs? when dealing with contrasts

2010-04-21 Thread Peter Dalgaard
Gabor Grothendieck wrote:
> On Wed, Apr 21, 2010 at 4:26 PM, Peter Dalgaard  wrote:
...
>> I.e., that R reverts to using indicator variables when the intercept is
>> absent.
> 
> Is there any nice way of getting contr.sum coding for the interaction
> as opposed to the ugly code in my post that I used to force it? i.e.
> cbind(1, model.matrix(~ fac)[,2:3] * scores)

I think not. In general, an interaction like ~fac:scores indicates three
lines with a common intercept and three different slopes, and changing
the parametrization is not supposed to change the model, whereas your
model inserts a restriction that the slopes sum to zero (if I understand
correctly). So if you want to fit "ugly" models, you get to do a little
ugly footwork.

(A similar, simpler, issue arises if you want to have a 2x2 design with
no effect in one column and/or one row (think clinical trial, placebo
vs. active, baseline vs. treated. You can only do this us explicit dummy
variables, not with the two classifications represented as factors.)


-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel