Thank you very much for your explanation! I'm looking forward to seeing the changes of R functions in a furture release.
Best, Jiefei Tierney, Luke <luke-tier...@uiowa.edu> 于2019年5月10日周五 下午12:22写道: > On Fri, 10 May 2019, 介非王 wrote: > > > Hi Gabriel, > > > > Thanks for your explanation, I totally understand that it is almost > > impossible to change the data structure of STRSXP. However, what I'm > > proposing is not about changing the internal representation, but rather > > about how we design and use the ALTREP API. > > > > I might do not state the workarounds clearly as English is not my first > > language. Please let me explain them again in detail. > > > > 1. Update the existing R functions. When the ALTREP API Dataptr_or_null > > returns NULL, use get_element instead(or as best as we can). I have seen > > this pattern for some R functions, but somehow there are still some > > functions left that do not follow this rule. For example, print function > > will blindly call Dataptr (It even did not call Dataptr_or_null first) > and > > forces me to allocate a large chunk of memory in R. Updating these > > functions would not completely solve the problem we are discussing but > will > > make it less serious. > > Fixing print() is pretty high priority (I thought we had done so for R > 3.6.0 but apparently not). Others will come in over time; filing a > request with bugzilla is one way to push up priority for a particular > function or set of functions. > > Keep in mind that one option for your implementation is to signal an > error if a data pointer is requested. You could make that dependent on > some sort of option setting or make the error continuable by providing > a restart. > > > 2. Update the ALTREP API, return a vector of const char *, and internally > > wrap them as CHARSXP. This can be a way to "hack" the R data structure > with > > only a little cost to create the CHARSXP header. > > That doesn't seem feasible but I may not be understanding what you mean. > > > 3. Provide character ALTREP. Instead of using string ALTREP, we can > define > > an alternative CHARSXP. By doing it we will completely solve the problem > > since the return value of the Dataptr of CHARSXP is a const char*. We do > > not have to change any internal representation of characters, it just > > requires a remap of the DATAPTR macro( or function?). > > Allowing ALTREP CHARSXP objects might be something to consider in the > future, but the combination of caching and encoding issues make that > very complex. I'm nat sure it would be a good idea or even > feasible. In any case it won't happen anytime soon. > > Best, > > luke > > > > > Again, I sincerely appreciate your time and the detailed you provided. > I'm > > looking forward to seeing any method to solve this problem in the current > > and future R release. > > > > Best, > > Jiefei > > > > Gabriel Becker <gabembec...@gmail.com> 于2019年5月9日周四 下午2:07写道: > > > >> Hi Jiefei, > >> > >> The issue here is that while the memory consequences of what you're > >> describing may be true, this is simply how R handles character vector > (what > >> you're calling string) values internally. It doesn't actually have > anything > >> to do with ALTREP. Standard character vector SEXPs have an array of > CHARSXP > >> pointers in their payload (what is returned by DATAPTR) as well. > >> > >> As far as I know, this is important for string caching and is actually > >> intended to save memory when the same string value appears many times > in an > >> R session (and takes up more bytes than a pointer), though I haven't dug > >> around R's low-level string handling a ton. Either way though, this > would > >> be a much much larger change than just changing the ALTREP API (which > for > >> things like this explicitly and intentionally matches how the C api > behaves > >> for non-ALTREP SEXPs for compatability). > >> > >> Likewise the reason that get_element is going to return a CHARSXP, is > >> because that is what STRING_ELT(x, i) returns (equivalent to (SEXP) > >> DATAPTR(x)[i] ), so I don't think that can be changed either. > >> > >> One other thing to note, though, is that if your'e asking for the > dataptr > >> (and it isn't read only) then you're basically stepping out of ALTREP > space > >> anyway, so it makes sense that a normally laid-out STRSXP (with it's > >> CHARSXP payload). > >> > >> Best, > >> ~G > >> > >> On Thu, May 9, 2019 at 8:09 AM 介非王 <szwj...@gmail.com> wrote: > >> > >>> Hello from Bioconductor, > >>> > >>> I'm developing a package to share R objects across clusters using boost > >>> library. The concept is similar to mmap package: > >>> https://cran.r-project.org/web/packages/mmap/index.html . However, I > >>> have a > >>> problem when I was trying to write Dataptr_method for the alternative > >>> string. > >>> > >>> Based on my understanding, the return value of the Dataptr_method > function > >>> should be a vector of CHARSXP pointers. This design might be > problematic > >>> in > >>> two ways: > >>> > >>> 1. The behavior of Dataptr_method function is inconsistent for string > and > >>> the other ALTREP types. For the other types we return a vector of pure > >>> data > >>> in memory allocated outside of R, but for the string, we return a > vector > >>> of > >>> R objects allocated by R. > >>> > >>> 2. It causes an unnecessary duplication of the data. In order to return > >>> CHARSXPs to R, It forces me to allocate CHARSXPs and copy the entire > data > >>> to the R process. By contrast, for the other ALTREP types, say > altreal, I > >>> can just return the pointer to R if the data is in the memory. > >>> > >>> The same problem occurs for Elt_method as well but is less serious > since > >>> only one CHARSXPs is allocated. Because my package is designed for > sharing > >>> a large R object. An allocation of the memory is undesired especially > when > >>> the data is read-only in the code(eg. print function). I'm not sure if > >>> there are any solutions existed in the current R version, but I can > >>> imagine > >>> three workarounds: > >>> > >>> 1. Change the behavior of the R functions and use get_element function > >>> instead of Dataptr function. This would make the problem more > >>> memory-friendly but still cause the allocation. > >>> > >>> 2. Return a vector of const char* in Dataptr method. It would be very > >>> efficient and consistent with the return values of the other ALTREP > types. > >>> > >>> 3. Provide an alternative CHARSXP. This might be the best solution > since > >>> STRSXP behaves more like a list instead of a string, so an alternative > >>> CHARSXP fits the concept of ALTREP better. > >>> > >>> Since I'm not an expert in R so I might post a solved problem. I would > be > >>> very happy and appreciate any suggestions regarding this problem. > >>> > >>> Best, > >>> Jiefei > >>> > >>> [[alternative HTML version deleted]] > >>> > >>> ______________________________________________ > >>> R-devel@r-project.org mailing list > >>> https://stat.ethz.ch/mailman/listinfo/r-devel > >>> > >> > > > > > > -- > Luke Tierney > Ralph E. Wareham Professor of Mathematical Sciences > University of Iowa Phone: 319-335-3386 > Department of Statistics and Fax: 319-335-3017 > Actuarial Science > 241 Schaeffer Hall email: luke-tier...@uiowa.edu > Iowa City, IA 52242 WWW: http://www.stat.uiowa.edu -- Jiefei Wang Room 2-501,Tangxuan,QilinGarden,NanshanDistrict,Shenzhen Guangdong,China Phone (+86)18312589584 szw...@gmail.com [[alternative HTML version deleted]] ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel