Thanks for these comments. I should alter the blog post or write some follow up.
This was a weekend blog post that only benefited from a short time of research research. I’m glad people find it useful, but I’m sure a detailed documentation of the features from the authors would be more useful. Romain > Le 24 sept. 2019 à 07:48, Gabriel Becker <gabembec...@gmail.com> a écrit : > > Hi Bob, > > Thanks for sending around the link to that. It looks mostly right and looks > like a useful onramp. There are a few things to watch out for though (I've > cc'ed Romain so he's aware of these comments). @romain I hope you taake the > following comments as they are intended, as help rather than attacks. > > The largest issue I see is that the contract for Get_region is that it > *populates the > provided buffer with a copy of the data. *That buffer is expected to be > safe to destructively modify, shuffle, etc though I don't know if we are > actually doing that anywhere. As such, if I understand his C++ correctly, > that Get_region method is not safe and shouldn't be used. > > The other point is that Dataptr_or_null is not actually *guaranteed *not to > allocate. The default method returns NULL, but we have no way of preventing > an allocation in a user-defined method, and probably (?) no easy way of > detecting that it is occurring before it causes a bug. That said, Romain is > correct that when you are writing Dataptr_or_null methods you should write > them so that they don't allocate, generally. Basically your methods for > Dataptr_or_null shouldn't allocate, but you also should not write code that > relies on hard assumptions that no one's ever will. > > Also, a small nitpick, R's internal mean function doesn't hit Dataptr, it > hits either INTEGER_ELT (which really should probably be a > ITERATE_BY_REGION) or ITERATE_BY_REGION. > > Anyway, I hope that helps. > ~G > > > > > > > On Mon, Sep 23, 2019 at 6:12 PM Bob Rudis <b...@rud.is <mailto:b...@rud.is>> > wrote: > >> Not sure if you're using just C++ or Rcpp for C++ access but >> https://purrple.cat/blog/2018/10/14/altrep-and-cpp/ has some tips on >> using C++ w/ALTREP. >> >>> On Sep 23, 2019, at 3:17 PM, Wang Jiefei <szwj...@gmail.com> wrote: >>> >>> Sorry for post a lot of things, for the first part of code, I copied my >> C++ >>> iter macro by mistake(and you can see an explicit type casting). Here is >>> the macro definition from R_exts/Itermacros.h >>> >>> #define ITERATE_BY_REGION_PARTIAL(sx, px, idx, nb, etype, vtype, \ >>> >>> strt, nfull, expr) do { \ >>> >>> * const** etype *px = DATAPTR_OR_NULL(sx); * >> \ >>> >>> if (px != NULL) { \ >>> >>> R_xlen_t __ibr_n__ = strt + nfull; \ >>> >>> R_xlen_t nb = __ibr_n__; \ >>> >>> for (R_xlen_t idx = strt; idx < __ibr_n__; idx += nb) { \ >>> >>> expr \ >>> >>> } \ >>> >>> } \ >>> >>> else ITERATE_BY_REGION_PARTIAL0(sx, px, idx, nb, etype, vtype, >>> \ >>> >>> strt, nfull, expr); \ >>> >>> } while (0) >>> >>> >>> Best, >>> >>> Jiefei >>> >>> On Mon, Sep 23, 2019 at 3:12 PM Wang Jiefei <szwj...@gmail.com> wrote: >>> >>>> Hi Gabriel, >>>> >>>> I have tried the macro and found a small issue, it seems like the macro >> is >>>> written in C and does an implicit type conversion(const void * to const >> int >>>> *), see below. While it is allowed in C, C++ seems not happy with it. >> Is it >>>> possible to add an explicit type casting so that it can be compatible >> with >>>> both language? >>>> >>>> >>>> #define ITERATE_BY_REGION_PARTIAL(sx, px, idx, nb, etype, vtype, \ >>>> >>>> strt, nfull, expr) do { \ >>>> >>>> *const etype *px = (const** etype *)DATAPTR_OR_NULL(sx); * >>>> \ >>>> >>>> if (px != NULL) { \ >>>> >>>> R_xlen_t __ibr_n__ = strt + nfull; \ >>>> >>>> R_xlen_t nb = __ibr_n__; \ >>>> >>>> for (R_xlen_t idx = strt; idx < __ibr_n__; idx += nb) { \ >>>> >>>> expr \ >>>> >>>> } \ >>>> >>>> } \ >>>> >>>> else ITERATE_BY_REGION_PARTIAL0(sx, px, idx, nb, etype, >>>> vtype, \ >>>> >>>> strt, nfull, expr); \ >>>> >>>> } while (0) >>>> >>>> >>>> Also, I notice that the element type(etype) and vector type(vtype) has >>>> to be specified in the macro. Since the SEXP is the first argument in >> the >>>> macro, it seems redundant to define etype and vtype for they have to >> match >>>> the type of the SEXP. I'm wondering if this is intentional? Will there >> be a >>>> type-free macro in R in the future? Here is a simple type-free macro I'm >>>> using. >>>> >>>> #define type_free_iter(sx, ptr, ind, nbatch,expr)\ >>>> >>>> switch(TYPEOF(sx)){\ >>>> >>>> case INTSXP:\ >>>> >>>> ITERATE_BY_REGION(sx, ptr, ind, nbatch, int, INTEGER, expr);\ >>>> >>>> break; \ >>>> >>>> case REALSXP:\ >>>> >>>> ITERATE_BY_REGION(sx, ptr, ind, nbatch, double, REAL, expr);\ >>>> >>>> break; \ >>>> >>>> case LGLSXP:\ >>>> >>>> ITERATE_BY_REGION(sx, ptr, ind, nbatch, int, LOGICAL, expr);\ >>>> >>>> break; \ >>>> >>>> default:\ >>>> >>>> Rf_error("Unknow data type\n"); \ >>>> >>>> break; \ >>>> >>>> } >>>> >>>> >>>> >>>> // [[Rcpp::export]] >>>> >>>> double sillysum(SEXP x) { >>>> >>>> double s = 0.0; >>>> >>>> type_free_iter(x, ptr, ind, nbatch, >>>> >>>> { >>>> >>>> for (int i = 0; i < nbatch; i++) { s = s + ptr[i]; } >>>> >>>> }); >>>> >>>> return s; >>>> >>>> } >>>> >>>> >>>> >>>> >>>> Best, >>>> >>>> Jiefei >>>> >>>> On Wed, Aug 28, 2019 at 2:32 PM Wang Jiefei <szwj...@gmail.com> wrote: >>>> >>>>> Thank you, Gabriel. The loop macro is very helpful. It is also exciting >>>>> to see that there are lots of changes in ALTREP in R devel version. I >>>>> really appreciate your help! >>>>> >>>>> Best, >>>>> Jiefei >>>>> >>>>> On Wed, Aug 28, 2019 at 7:37 AM Gabriel Becker <gabembec...@gmail.com> >>>>> wrote: >>>>> >>>>>> Jiefei, >>>>>> >>>>>> I've been meaning to write up something about this so hopefully this >>>>>> will be an impetus for me to actually do that, but until then, >> responses >>>>>> inline. >>>>>> >>>>>> >>>>>> On Tue, Aug 27, 2019, 7:22 PM Wang Jiefei <szwj...@gmail.com> wrote: >>>>>> >>>>>>> Hi devel team, >>>>>>> >>>>>>> I'm working on C/C++ level ALTREP compatibility for a package. The >>>>>>> package >>>>>>> previously used pointers to access the data of a SEXP, so it would >> not >>>>>>> work >>>>>>> for some ALTREP objects which do not have a pointer. I plan to >> rewrite >>>>>>> the >>>>>>> code and use functions like get_elt, get_region, and get_subset to >>>>>>> access >>>>>>> the values of a vector, so I have a few questions for ALTREP: >>>>>>> >>>>>>> 1. Since an ALTREP do not have to define all of the above >>>>>>> functions(element, region, subset), is there any way to check which >>>>>>> function has been defined for an ALTREP class? I did a search on >>>>>>> RInternal.h and altrep.c but did not find a solution for it. If not, >>>>>>> will >>>>>>> it be added in the future? >>>>>>> >>>>>> >>>>>> Element and region are guaranteed to always be defined and work (for >>>>>> altrep and non-altrep INTSXP, REALSXP, LGLSXPs, etc, we currently >> don't >>>>>> have region for STRSXP or VECSXP, I believe). If the altrep class >> does not >>>>>> provide them then default methods will be used, which may be >> inefficient in >>>>>> some cases but will work. Subset is currently a forward looking stub, >> but >>>>>> once implimented, that will also be guaranteed to work for all valid >> ALTREP >>>>>> classes. >>>>>> >>>>>> >>>>>>> >>>>>>> 2. Given the diversity of ALTREP classes, what is the best way to >> loop >>>>>>> over >>>>>>> an ALTREP object? I hope there can be an all-in-one function which >> can >>>>>>> get >>>>>>> the values from a vector as long as at least one of the above >> functions >>>>>>> has >>>>>>> been defined, so package developers would not be bothered by tons of >>>>>>> `if-else` statement if they want their package to work with ALTREP. >>>>>>> Since >>>>>>> it seems like there is no such function exist, what could be the best >>>>>>> way >>>>>>> to do the loop under the current R version? >>>>>>> >>>>>> >>>>>> The best way to loop over all SEXPs, which supports both altrep and >>>>>> nonaltrep objects is, with the ITERATE_BY_REGION (which has been in R >> for a >>>>>> number of released versions, at least since 3.5.0 I think) and the >> much >>>>>> newer (devel only) ITERATE_BY_REGION_PARTIAL macros defined in >>>>>> R_exts/Itermacros.h >>>>>> >>>>>> The meaning of the arguments is as follows for >> ITERATE_BY_REGION_PARTIAL >>>>>> are as follows (ITERATE_BY_REGION is the same except no strt, and >> nfull). >>>>>> >>>>>> >>>>>> - sx - C level variable name of the SEXP to iterate over >>>>>> - px - variable name to use for the pointer populated with data from >>>>>> a region of sx >>>>>> - idx - variable name to use for the "outer", batch counter in the >>>>>> for loop. This will contain the 0-indexed start position of the >> batch >>>>>> you're currently processing >>>>>> - nb - variable name to use for the current batch size. This will >>>>>> always either be GET_REGION_BUFFSIZE (512), or the number of >> elements >>>>>> remaining in the vector, whichever is smaller >>>>>> - etype - element (C) type, e.g., int, double, of the data >>>>>> - vtype - vector (access API) type, e.g, INTEGER, REAL >>>>>> - strt - the 0-indexed position in the vector to start iterating >>>>>> - nfull - the total number oif elements to iterate over from the >>>>>> vector >>>>>> - expr - the code to process a single batch (Which will do things to >>>>>> px, typically) >>>>>> >>>>>> >>>>>> So code to perform badly implemented not good idea summing of REALSXP >>>>>> data might look like >>>>>> >>>>>> double sillysum(SEXP x) { >>>>>> >>>>>> double s = 0.0; >>>>>> >>>>>> ITERATE_BY_REGION(x, ptr, ind, nbatch, double, REAL, >>>>>> { >>>>>> >>>>>> for(int i = 0; i < nbatch; i++) { s = s + ptr[i];} >>>>>> }) >>>>>> >>>>>> return s; >>>>>> } >>>>>> >>>>>> For meatier examples of ITERATE_BY_REGION's use in practice you can >> grep >>>>>> the R sources. I know it is used in the implementations of the various >>>>>> C-level summaries (summary.c), print and formatting functions, and >> anyNA. >>>>>> >>>>>> Some things to remember >>>>>> >>>>>> - If you have an inner loop like the one above, your total position >>>>>> in the original vector is ind + i >>>>>> - ITERATE_BY_REGION always processes the whole vector, if you need >>>>>> to only do part of it yo'll either need custom breaking for both >> inner and >>>>>> outer loopsl, or in R-devel you can use ITERATE_BY_REGION_PARTIAL >>>>>> - Don't use the variants ending in 0, all they do is skip over >>>>>> things that are a good idea in the case of non-altreps (and some >> very >>>>>> specific altreps). >>>>>> >>>>>> Hope that helps. >>>>>> >>>>>> Best, >>>>>> ~G >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> Best, >>>>>>> Jiefei >>>>>>> >>>>>>> [[alternative HTML version deleted]] >>>>>>> >>>>>>> ______________________________________________ >>>>>>> R-devel@r-project.org mailing list >>>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel >>>>>>> >>>>>> >>> >>> [[alternative HTML version deleted]] >>> >>> ______________________________________________ >>> R-devel@r-project.org mailing list >>> https://stat.ethz.ch/mailman/listinfo/r-devel >> >> > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel@r-project.org <mailto:R-devel@r-project.org> mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > <https://stat.ethz.ch/mailman/listinfo/r-devel> [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel