Re: [Rd] inflate zlib compressed data using base R or CRAN package?
On Thu, Nov 28, 2013 at 4:48 PM, Simon Urbanek wrote: > On Nov 27, 2013, at 8:30 PM, Murray Stokely wrote: > >> I think none of these examples describe a zlib compressed data block inside >> a binary file that the OP asked about, as all of your examples are e.g. >> prepending gzip or zip headers. >> >> Greg, is memDecompress what you are looking for? >> > > I think so. > > But this is interesting — I think the documentation of > memCompress/memDecompress is not quite correct and the parameters are > misleading. Although it does mention the gzip headers, it is incorrect since > zlib format is not a subset of the gzip format (albeit they use the same > compression method), so you cannot extract gzip content using zlib > decompression - you’ll get internal error -3 in memDecompress(2) if you try > it since it expects the zlib header which is different form the gzip one. Interestingly. Just to make sure: are you 100% certain about this? >From the http://svn.r-project.org/R/trunk/src/main/connections.c: case 2: /* gzip */ { uLong inlen = LENGTH(from), outlen = 3*inlen; int res; Bytef *buf, *p = (Bytef *)RAW(from); /* we check for a file header */ if (p[0] == 0x1f && p[1] == 0x8b) { p += 2; inlen -= 2; } while(1) { buf = (Bytef *) R_alloc(outlen, sizeof(Bytef)); res = uncompress(buf, &outlen, p, inlen); if(res == Z_BUF_ERROR) { outlen *= 2; continue; } if(res == Z_OK) break; error("internal error %d in memDecompress(%d)", res, type); } ans = allocVector(RAWSXP, outlen); memcpy(RAW(ans), buf, outlen); break; } That code looks for the 0x1F 0x8B magic number, which is the one for gzip [http://www.gzip.org/zlib/rfc-gzip.html#header-trailer]. Or are you saying that that if statement is incorrect? (Disclaimer: I don't know much about gzip/zlib, but I happens to recognize that gzip magic number.) /Henrik > So “gzip” in type is a misnomer - it should say “zlib” since it can neither > read nor write the gzip format. Also the documentation should make it clear > since it’s pointless to try to use this on gzip contents. The better > alternative would be to support both gzip and zlib since R can deal with both > — the issue is that it will break code that used type=“gzip” explicitly to > mean “zlib” so I’m not sure there is a good way out. > > Cheers, > Simon > > >> >> On Wed, Nov 27, 2013 at 5:22 PM, Dirk Eddelbuettel wrote: >> >>> >>> On 27 November 2013 at 18:38, Dirk Eddelbuettel wrote: >>> | >>> | On 27 November 2013 at 23:49, Dr Gregory Jefferis wrote: >>> | | I have a binary file type that includes a zlib compressed data block >>> (ie >>> | | not gzip). Is anyone aware of a way using base R or a CRAN package to >>> | | decompress this kind of data (from disk or memory). So far I have found >>> | | Rcompression::decompress on omegahat, but I would prefer to keep >>> | | dependencies on CRAN (or bioconductor). I am also trying to avoid >>> | | writing yet another C level interface to part of zlib. >>> | >>> | Unless I am missing something, this is in base R; see help(connections). >>> | >>> | Here is a quick demo: >>> | >>> | R> write.csv(trees, file="/tmp/trees.csv")# data we all have >>> | R> system("gzip -v /tmp/trees.csv") # as I am lazy here >>> | /tmp/trees.csv:50.5% -- replaced with /tmp/trees.csv.gz >>> | R> read.csv(gzfile("/tmp/trees.csv.gz")) # works out of the box >>> >>> Oh, and in case you meant zip file containing a data file, that also works. >>> >>> First converting what I did last >>> >>> edd@max:/tmp$ gunzip trees.csv.gz >>> edd@max:/tmp$ zip trees.zip trees.csv >>> adding: trees.csv (deflated 50%) >>> edd@max:/tmp$ >>> >>> Then reading the csv from inside the zip file: >>> >>> R> read.csv(unz("/tmp/trees.zip", "trees.csv")) >>>X Girth Height Volume >>> 1 1 8.3 70 10.3 >>> 2 2 8.6 65 10.3 >>> 3 3 8.8 63 10.2 >>> 4 4 10.5 72 16.4 >>> 5 5 10.7 81 18.8 >>> 6 6 10.8 83 19.7 >>> 7 7 11.0 66 15.6 >>> 8 8 11.0 75 18.2 >>> 9 9 11.1 80 22.6 >>> 10 10 11.2 75 19.9 >>> 11 11 11.3 79 24.2 >>> 12 12 11.4 76 21.0 >>> 13 13 11.4 76 21.4 >>> 14 14 11.7 69 21.3 >>> 15 15 12.0 75 19.1 >>> 16 16 12.9 74 22.2 >>> 17 17 12.9 85 33.8 >>> 18 18 13.3 86 27.4 >>> 19 19 13.7 71 25.7 >>> 20 20 13.8 64 24.9 >>> 21 21 14.0 78 34.5 >>> 22 22 14.2 80 31.7 >>> 23 23 14.5 74 36.3 >>> 24 24 16.0 72 38.3 >>> 25 25 16.3 77 42.6 >>> 26 26 17.3 81 55.4 >>> 27 27 17.5 82 55.7 >>> 28 28 17.9 80 58.3 >>> 29 29 18.0 80 51.5 >>> 30 30 18.0 80 51.0 >>> 31 31 20.6 87 77.0 >>> R> >>> >>> Regards, Dirk >>> >>> -- >>> Dirk Eddelbuettel | e...@debian.org | http://dirk.eddelbuettel.com >>> >>> _
[Rd] How to catch warnings sent by arguments of s4 methods ?
Hello, I apologized if this had already been addressed, and I also submitted this problem on SO: http://stackoverflow.com/questions/20268021/how-to-catch-warnings-sent-during-s4-method-selection Example code: setGeneric('my_method', function(x) standardGeneric('my_method') ) setMethod('my_method', 'ANY', function(x) invisible()) withCallingHandlers(my_method(warning('argh')), warning = function(w) { stop('got warning:', w) }) # this does not catch the warning It seems that the warnings emitted during the evaluation of the arguments of S4 methods can not get caught using withCallingHandlers(). Is this expected ? Is there a work-around ? Best, Karl Forner __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] inflate zlib compressed data using base R or CRAN package?
On Nov 29, 2013, at 4:37 AM, Henrik Bengtsson wrote: > On Thu, Nov 28, 2013 at 4:48 PM, Simon Urbanek > wrote: >> On Nov 27, 2013, at 8:30 PM, Murray Stokely wrote: >> >>> I think none of these examples describe a zlib compressed data block inside >>> a binary file that the OP asked about, as all of your examples are e.g. >>> prepending gzip or zip headers. >>> >>> Greg, is memDecompress what you are looking for? >>> >> >> I think so. >> >> But this is interesting — I think the documentation of >> memCompress/memDecompress is not quite correct and the parameters are >> misleading. Although it does mention the gzip headers, it is incorrect since >> zlib format is not a subset of the gzip format (albeit they use the same >> compression method), so you cannot extract gzip content using zlib >> decompression - you’ll get internal error -3 in memDecompress(2) if you try >> it since it expects the zlib header which is different form the gzip one. > > Interestingly. Just to make sure: are you 100% certain about this? Yes, see below. >> From the http://svn.r-project.org/R/trunk/src/main/connections.c: > >case 2: /* gzip */ >{ > uLong inlen = LENGTH(from), outlen = 3*inlen; > int res; > Bytef *buf, *p = (Bytef *)RAW(from); > /* we check for a file header */ > if (p[0] == 0x1f && p[1] == 0x8b) { p += 2; inlen -= 2; } > while(1) { > buf = (Bytef *) R_alloc(outlen, sizeof(Bytef)); > res = uncompress(buf, &outlen, p, inlen); > if(res == Z_BUF_ERROR) { outlen *= 2; continue; } > if(res == Z_OK) break; > error("internal error %d in memDecompress(%d)", res, type); > } > ans = allocVector(RAWSXP, outlen); > memcpy(RAW(ans), buf, outlen); > break; >} > > That code looks for the 0x1F 0x8B magic number, which is the one for > gzip [http://www.gzip.org/zlib/rfc-gzip.html#header-trailer]. Or are > you saying that that if statement is incorrect? (Disclaimer: I don't > know much about gzip/zlib, but I happens to recognize that gzip magic > number.) > The above assumes that zlib is a subset of gzip which is *not* true - that was the point I was making. zlibs has *different* headers than gzip, not just fewer bytes. gzip has lots of other things in the header and they even also use different CRC methods. To illustrate: > writeBin(charToRaw("1234"), f<-gzfile("test.gz","wb")) > close(f) > readBin("test.gz",raw(),100) [1] 1f 8b 08 00 00 00 00 00 00 03 33 34 32 36 01 [16] 00 a3 e0 e3 9b 04 00 00 00 > memCompress("1234") [1] 78 9c 33 34 32 36 01 00 01 f8 00 cb As you can see gzip uses a different header (it starts with 0x1f 0x8b but then has many other files like mod time etc.) - the compressed payload starts at byte 11 and the CRC is 64-bit wide. In contrast, zlib has no magic header but it also has just two-byte header followed by the payload (starting at byte 3) and 32-bit CRC. So the two are entirely incompatible - you cannot decompress gzip format with zlib parser and vice-versa. The payload is the same, but the headers and trailers are entirely different. That's why Greg was specifically asking about zlib which does *not* mean gzip. Cheers, Simon > /Henrik > >> So “gzip” in type is a misnomer - it should say “zlib” since it can neither >> read nor write the gzip format. Also the documentation should make it clear >> since it’s pointless to try to use this on gzip contents. The better >> alternative would be to support both gzip and zlib since R can deal with >> both — the issue is that it will break code that used type=“gzip” explicitly >> to mean “zlib” so I’m not sure there is a good way out. >> >> Cheers, >> Simon >> >> >>> >>> On Wed, Nov 27, 2013 at 5:22 PM, Dirk Eddelbuettel wrote: >>> On 27 November 2013 at 18:38, Dirk Eddelbuettel wrote: | | On 27 November 2013 at 23:49, Dr Gregory Jefferis wrote: | | I have a binary file type that includes a zlib compressed data block (ie | | not gzip). Is anyone aware of a way using base R or a CRAN package to | | decompress this kind of data (from disk or memory). So far I have found | | Rcompression::decompress on omegahat, but I would prefer to keep | | dependencies on CRAN (or bioconductor). I am also trying to avoid | | writing yet another C level interface to part of zlib. | | Unless I am missing something, this is in base R; see help(connections). | | Here is a quick demo: | | R> write.csv(trees, file="/tmp/trees.csv")# data we all have | R> system("gzip -v /tmp/trees.csv") # as I am lazy here | /tmp/trees.csv:50.5% -- replaced with /tmp/trees.csv.gz | R> read.csv(gzfile("/tmp/trees.csv.gz")) # works out of the box Oh, and in case you meant zip file containing a data file, that also works. First converting what I did last edd@max:/tmp$ gunzi