В Wed, 19 Jun 2024 09:52:20 +0200 Jan van der Laan <rh...@eoos.dds.nl> пишет:
> What is the status of supporting long vectors in data.frames (e.g. > data.frames with more than 2^31 records)? Is this something that is > being worked on? Is there a time line for this? Is this something I > can contribute to? Apologies if you've already received a better answer off-list. >From from my limited understanding, the problem with supporting larger-than-(2^31-1) dimensions has multiple facets: - In many parts of R code, there's the assumption that dim() is of integer type. That wouldn't be a problem by itself, except... - R currently lacks a native 64-bit integer type. About a year ago Gabe Becker mentioned that Luke Tierney has been considering improvements in this direction, but it's hard to introduce 64-bit integers without making the user worry even more about data types (numeric != integer != 64-bit integer) or introducing a lot of overhead (64-bit integers being twice as large as 32-bit ones and, depending on the workload, frequently redundant). - Two-dimensional objects eventually get transformed into matrices and handed to LAPACK for linear algebra operations. Currently, the interface used by R to talk to BLAS and LAPACK only supports 32-bit signed integers for lengths. 64-bit BLASes and LAPACKs do exist (e.g. OpenBLAS can be compiled with 64-bit lengths), but we haven't taught R to use them. (This isn't limited to array dimensions, by the way. If you try to svd() a 40000 by 40000 matrix, it'll try to ask for temporary memory with length that overflows a signed 32-bit integer, get a much shorter allocation instead, promptly overflow the buffer and crash the process.) As you see, it's interconnected; work on one thing will involve the other two. -- Best regards, Ivan ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel