Hello,

Here is a more complete patch implementing an iterating scheme inspired from the java Iterable/Iterator design.

Nothing is changed for non S4 objects.


The patch contains 4 generic functions :

is.iterable : indicates if an object is iterable. The default method returns FALSE

iterator: returns the iterator associated with an iterable object. the idea is that this function is only called if the object is iterable.

hasNext: indicates if an iterator has more elements. The default method returns FALSE

getNext: returns the next element of the iterator



Here is an example:


require( methods )

setClass( "SimpleIterable", representation( to="integer" ) )
setClass("SimpleIterator", representation(to="integer", env = "environment") )

setMethod("is.iterable", "SimpleIterable", function(x) TRUE )
setMethod("iterator", "SimpleIterable", function(x){
        env <- new.env()
        assign( "i" , 0L, envir = env )
        new( "SimpleIterator", to = x...@to, env = env )
} )
setMethod( "hasNext", "SimpleIterator", function(x){
        res <- x...@env[["i"]] < x...@to
        res
} )
setMethod( "getNext", "SimpleIterator", function(x){
        new.i <- x...@env[["i"]] + 1L
        assign( "i", new.i, envir = x...@env )
        new.i
} )


> o <- new( "SimpleIterable", to = 10L )
> for( i in o ){
+ if( i == 3L ) next
+ if( i == 5L ) break
+  cat( i, "\n" )
+ }
1
2
4




Here is an example iterating over a java Iterable object (the methods would need a bit more error trapping) without fetching all the elements in advance:

require( rJava )
.jinit()

setMethod( "is.iterable", "jobjRef", function(x){
        .jinherits( x, "java/lang/Iterable" )
} )
setMethod( "iterator", "jobjRef", function(x){
        .jcall( x, "Ljava/util/Iterator;", "iterator" )
} )
setMethod( "hasNext", "jobjRef", function(x){
        .jcall( x, "Z", "hasNext" )
} )
setMethod( "getNext", "jobjRef", function(x){
        .jcall( x, "Ljava/lang/Object;", "next" )
} )

> v <- new( J("java/util/Vector" ) )
> v$add( new( J("java/lang/Double" ), 10.2 ) )
[1] TRUE
> v$add( new( J("java/awt/Point"), 10L, 10L ) )
[1] TRUE
> for( i in v){
+ print( i$getClass()$getName() )
+ }
[1] "java.lang.Double"
[1] "java.awt.Point"


While I'm on this, in the usual for loop :

- why is the switch inside the for. The code would be slighly more efficient if it was the other way ? - why so many calls to TYPEOF, is it not always going to return the same type ?
- why recreating v each time ?

(I probably miss something here)

Romain

On 10/13/2009 11:09 AM, Romain Francois wrote:
Hello,

Consider this :

setClass("track", representation(x="numeric", y="numeric"))
[1] "track"
o <- new( "track", x = 1, y = 2 )
for( i in o ){
+ cat( "hello\n")
+ }
Error: invalid type/length (S4/1) in vector allocation



This happens at those lines of do_for:

n = LENGTH(val);
PROTECT_WITH_INDEX(v = allocVector(TYPEOF(val), 1), &vpi);

because allocVector( S4SXP, 1) does not make sense.



What about detecting S4SXP and attempt to call as.list, similarly to
what lapply does ?

as.list.track <- function(x, ...){ list( x = x...@x, y = x...@y ) }
lapply( o, identity )
$x
[1] 1

$y
[1] 2


That would make for loops more generic, and make it possible to
implement custom "iterators". I'm attaching a patch to eval.c that does
just that. For example :

 > setClass("iterator", representation(to="integer"))
[1] "iterator"
 > o <- new( "iterator", to = 4L )
 > setGeneric( "as.list" )
[1] "as.list"
 > setMethod( "as.list", "iterator", function(x, ...) {
+ seq_len( x...@to )
+ })
[1] "as.list"
 >
 > for( i in o ){
+ cat( i, "\n" )
+ }
1
2
3
4

Obviously that is the cheap way of doing it, something better would be
to abstract a bit more what is an "iterator".



For example in java iterators implement just two methods : hasNext()
that indicates if there is a next object and next() that returns the
next object.


For the long story, one motivation for this is actually to deal with
java iterators (with the devel version of rJava, and this patch), you
might do something like this:

 > v <- new( J("java/util/Vector") )
 > v$add( new( J("java/awt/Point"), 10L, 10L ) )
[1] TRUE
 > v$add( new( J("java/lang/Double"), 10 ) )
[1] TRUE
 > for( item in v ){
+ print( item$getClass()$getName() )
+ }
[1] "java.awt.Point"
[1] "java.lang.Double"

Where the as.list method for java object references returns a list that
is filled by iterating over the object if it implements the Iterable
interface.

The drawback here is that one has to first fully retrieve the list, by
iterating in java, and then process it in R, by iterating again in R.

Romain


--
Romain Francois
Professional R Enthusiast
+33(0) 6 28 91 30 30
http://romainfrancois.blog.free.fr
|- http://tr.im/BcPw : celebrating R commit #50000
|- http://tr.im/ztCu : RGG #158:161: examples of package IDPmisc
`- http://tr.im/yw8E : New R package : sos

Index: src/main/eval.c
===================================================================
--- src/main/eval.c     (revision 50053)
+++ src/main/eval.c     (working copy)
@@ -1025,15 +1025,67 @@
        do_browser(call, op, R_NilValue, rho); \
     } } while (0)
 
+Rboolean isIterable( SEXP object, SEXP rho){
+       SEXP expr ;
+       SEXP ans ;
+       Rboolean res ;
+       /* this probably should use DispatchOrEval */
+       PROTECT( expr = lang2(install("is.iterable"), object ));
+       PROTECT( ans = eval( expr, rho ) ) ;
+       res = asLogicalNoNA(ans, expr) ;
+    UNPROTECT(2) ;
+    return res ;
+}
 
+Rboolean hasNext( SEXP object, SEXP rho){
+       SEXP expr ;
+       SEXP ans ;
+       Rboolean res ;
+       /* this probably should use DispatchOrEval */
+       PROTECT( expr = lang2(install("hasNext"), object ));
+       PROTECT( ans = eval( expr, rho ) ) ;
+       res = asLogicalNoNA(ans, expr) ;
+    UNPROTECT(2) ;
+    return res ;
+}
+
+SEXP getNext( SEXP object, SEXP rho){
+       SEXP expr, ans ; 
+       /* this probably should use DispatchOrEval */
+       PROTECT( expr = lang2(install("getNext"), object ));
+       PROTECT( ans =eval( expr, rho ) ) ;
+    UNPROTECT( 2) ;
+    return ans ;
+}
+
+SEXP iterator(SEXP object, SEXP rho){
+       SEXP expr ;
+       SEXP ans ;
+       /* this probably should use DispatchOrEval */
+       PROTECT( expr = lang2(install("iterator"), object ));
+       PROTECT( ans = eval( expr, rho ) ) ;
+       UNPROTECT(2) ;
+    return ans ;
+}
+
+SEXP asList( SEXP object, SEXP rho ){
+       SEXP expr, ans ; 
+       /* this probably should use DispatchOrEval */
+       PROTECT( expr = lang2(install("as.list"), object ));
+       PROTECT( ans =eval( expr, rho ) ) ;
+    UNPROTECT( 2) ;
+    return ans ;
+}
+
+
 SEXP attribute_hidden do_for(SEXP call, SEXP op, SEXP args, SEXP rho)
 {
     int dbg, nm;
     volatile int i, n, bgn;
     SEXP sym, body;
-    volatile SEXP ans, v, val;
+    volatile SEXP ans, v, val ;
     RCNTXT cntxt;
-    PROTECT_INDEX vpi, api;
+    PROTECT_INDEX vpi, api ;
 
     sym = CAR(args);
     val = CADR(args);
@@ -1046,24 +1098,52 @@
     PROTECT(val = eval(val, rho));
     defineVar(sym, R_NilValue, rho);
 
-    /* deal with the case where we are iterating over a factor
-       we need to coerce to character - then iterate */
-
-    if( inherits(val, "factor") ) {
-        PROTECT(ans = asCharacterFactor(val));
-       val = ans;
-       UNPROTECT(2);  /* ans and val from above */
-        PROTECT(val);
+    Rboolean iterate = FALSE ;
+    
+    /* deal with the S4 case, try to dispatch to as.list */
+    if( TYPEOF(val) == S4SXP ){
+       SEXP expr ; 
+       
+       if( isIterable(val, rho) ){
+               /* iterating using the iterator scheme */
+               iterate = TRUE ;
+               
+               /* replace val by its iterator */
+               PROTECT(ans = iterator( val, rho ) ) ;
+               val = ans; 
+               UNPROTECT(2) ; /* ans and val */
+               PROTECT(val );
+               
+               PROTECT_WITH_INDEX(v = R_NilValue, &vpi);
+       } else{ 
+               /* trying as.list. Maybe this should just throw an error */
+               PROTECT(ans = asList( val, rho ) ) ;
+               val = ans ;
+               UNPROTECT(2); /* ans and val from above*/
+               PROTECT(val);
+       }
     }
-
-    if (isList(val) || isNull(val)) {
-       n = length(val);
-       PROTECT_WITH_INDEX(v = R_NilValue, &vpi);
-    }
-    else {
-       n = LENGTH(val);
-       PROTECT_WITH_INDEX(v = allocVector(TYPEOF(val), 1), &vpi);
-    }
+    
+    if( !iterate ){
+       /* deal with the case where we are iterating over a factor
+          we need to coerce to character - then iterate */
+       
+       if( inherits(val, "factor") ) {
+           PROTECT(ans = asCharacterFactor(val));
+               val = ans;
+               UNPROTECT(2);  /* ans and val from above */
+           PROTECT(val);
+       }
+       
+       if (isList(val) || isNull(val)) {
+               n = length(val);
+               PROTECT_WITH_INDEX(v = R_NilValue, &vpi);
+       }
+       else {
+               n = LENGTH(val);
+               PROTECT_WITH_INDEX(v = allocVector(TYPEOF(val), 1), &vpi);
+       }
+    } 
     ans = R_NilValue;
 
     dbg = RDEBUG(rho);
@@ -1079,62 +1159,84 @@
     PROTECT_WITH_INDEX(ans, &api);  /**** ans should no longer be needed. LT */
     begincontext(&cntxt, CTXT_LOOP, R_NilValue, rho, R_BaseEnv, R_NilValue,
                 R_NilValue);
-    switch (SETJMP(cntxt.cjmpbuf)) {
-    case CTXT_BREAK: goto for_break;
-    case CTXT_NEXT: goto for_next;
-    }
-    for (i = 0; i < n; i++) {
-       DO_LOOP_RDEBUG(call, op, args, rho, bgn);
-       switch (TYPEOF(val)) {
-       case LGLSXP:
-           REPROTECT(v = allocVector(TYPEOF(val), 1), vpi);
-           LOGICAL(v)[0] = LOGICAL(val)[i];
-           setVar(sym, v, rho);
-           break;
-       case INTSXP:
-           REPROTECT(v = allocVector(TYPEOF(val), 1), vpi);
-           INTEGER(v)[0] = INTEGER(val)[i];
-           setVar(sym, v, rho);
-           break;
-       case REALSXP:
-           REPROTECT(v = allocVector(TYPEOF(val), 1), vpi);
-           REAL(v)[0] = REAL(val)[i];
-           setVar(sym, v, rho);
-           break;
-       case CPLXSXP:
-           REPROTECT(v = allocVector(TYPEOF(val), 1), vpi);
-           COMPLEX(v)[0] = COMPLEX(val)[i];
-           setVar(sym, v, rho);
-           break;
-       case STRSXP:
-           REPROTECT(v = allocVector(TYPEOF(val), 1), vpi);
-           SET_STRING_ELT(v, 0, STRING_ELT(val, i));
-           setVar(sym, v, rho);
-           break;
-       case RAWSXP:
-           REPROTECT(v = allocVector(TYPEOF(val), 1), vpi);
-           RAW(v)[0] = RAW(val)[i];
-           setVar(sym, v, rho);
-           break;
-       case EXPRSXP:
-       case VECSXP:
-           /* make sure loop variable is a copy if needed */
-           if(nm > 0) SET_NAMED(VECTOR_ELT(val, i), 2);
-           setVar(sym, VECTOR_ELT(val, i), rho);
-           break;
-       case LISTSXP:
-           /* make sure loop variable is a copy if needed */
-           if(nm > 0) SET_NAMED(CAR(val), 2);
-           setVar(sym, CAR(val), rho);
-           val = CDR(val);
-           break;
-       default:
-           errorcall(call, _("invalid for() loop sequence"));
-       }
-       REPROTECT(ans = eval(body, rho), api);
-    for_next:
-       ; /* needed for strict ISO C compliance, according to gcc 2.95.2 */
-    }
+       if( iterate ){
+               switch (SETJMP(cntxt.cjmpbuf)) {
+                       case CTXT_BREAK: goto for_break;
+                       case CTXT_NEXT: goto iterate_next;
+               }
+               
+               while( hasNext( val , rho ) == TRUE ){
+                       DO_LOOP_RDEBUG(call, op, args, rho, bgn);
+                       
+                       /* get the next item and set it to the loop symbol */
+                       REPROTECT(v = getNext(val, rho) , vpi);
+                       setVar(sym, v, rho);
+                       
+                       /* eval the loop body */
+                       REPROTECT(ans = eval(body, rho), api);
+                       
+                       iterate_next:
+                       ; /* needed for strict ISO C compliance, according to 
gcc 2.95.2 */
+               }
+       } else{ 
+               
+               switch (SETJMP(cntxt.cjmpbuf)) {
+                       case CTXT_BREAK: goto for_break;
+                       case CTXT_NEXT: goto for_next;
+               } 
+               for (i = 0; i < n; i++) {
+                       DO_LOOP_RDEBUG(call, op, args, rho, bgn);
+                       switch (TYPEOF(val)) {
+                       case LGLSXP:
+                           REPROTECT(v = allocVector(TYPEOF(val), 1), vpi);
+                           LOGICAL(v)[0] = LOGICAL(val)[i];
+                           setVar(sym, v, rho);
+                           break;
+                       case INTSXP:
+                           REPROTECT(v = allocVector(TYPEOF(val), 1), vpi);
+                           INTEGER(v)[0] = INTEGER(val)[i];
+                           setVar(sym, v, rho);
+                           break;
+                       case REALSXP:
+                           REPROTECT(v = allocVector(TYPEOF(val), 1), vpi);
+                           REAL(v)[0] = REAL(val)[i];
+                           setVar(sym, v, rho);
+                           break;
+                       case CPLXSXP:
+                           REPROTECT(v = allocVector(TYPEOF(val), 1), vpi);
+                           COMPLEX(v)[0] = COMPLEX(val)[i];
+                           setVar(sym, v, rho);
+                           break;
+                       case STRSXP:
+                           REPROTECT(v = allocVector(TYPEOF(val), 1), vpi);
+                           SET_STRING_ELT(v, 0, STRING_ELT(val, i));
+                           setVar(sym, v, rho);
+                           break;
+                       case RAWSXP:
+                           REPROTECT(v = allocVector(TYPEOF(val), 1), vpi);
+                           RAW(v)[0] = RAW(val)[i];
+                           setVar(sym, v, rho);
+                           break;
+                       case EXPRSXP:
+                       case VECSXP:
+                           /* make sure loop variable is a copy if needed */
+                           if(nm > 0) SET_NAMED(VECTOR_ELT(val, i), 2);
+                           setVar(sym, VECTOR_ELT(val, i), rho);
+                           break;
+                       case LISTSXP:
+                           /* make sure loop variable is a copy if needed */
+                           if(nm > 0) SET_NAMED(CAR(val), 2);
+                           setVar(sym, CAR(val), rho);
+                           val = CDR(val);
+                           break;
+                       default:
+                           errorcall(call, _("invalid for() loop sequence"));
+                       }
+                       REPROTECT(ans = eval(body, rho), api);
+               for_next:
+                       ; /* needed for strict ISO C compliance, according to 
gcc 2.95.2 */
+               }
+       }
  for_break:
     endcontext(&cntxt);
     UNPROTECT(5);
______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to