date:20131129

Re: [Rd] inflate zlib compressed data using base R or CRAN package?

2013-11-29 Thread Henrik Bengtsson

On Thu, Nov 28, 2013 at 4:48 PM, Simon Urbanek
 wrote:
> On Nov 27, 2013, at 8:30 PM, Murray Stokely  wrote:
>
>> I think none of these examples describe a zlib compressed data block inside 
>> a binary file that the OP asked about, as all of your examples are e.g. 
>> prepending gzip or zip headers.
>>
>> Greg, is memDecompress what you are looking for?
>>
>
> I think so.
>
> But this is interesting — I think the documentation of 
> memCompress/memDecompress is not quite correct and the parameters are 
> misleading. Although it does mention the gzip headers, it is incorrect since 
> zlib format is not a subset of the gzip format (albeit they use the same 
> compression method), so you cannot extract gzip content using zlib 
> decompression - you’ll get  internal error -3 in memDecompress(2) if you try 
> it since it expects the zlib header which is different form the gzip one.

Interestingly.  Just to make sure: are you 100% certain about this?
>From the http://svn.r-project.org/R/trunk/src/main/connections.c:

case 2: /* gzip */
{
uLong inlen = LENGTH(from), outlen = 3*inlen;
int res;
Bytef *buf, *p = (Bytef *)RAW(from);
/* we check for a file header */
if (p[0] == 0x1f && p[1] == 0x8b) { p += 2; inlen -= 2; }
while(1) {
buf = (Bytef *) R_alloc(outlen, sizeof(Bytef));
res = uncompress(buf, &outlen, p, inlen);
if(res == Z_BUF_ERROR) { outlen *= 2; continue; }
if(res == Z_OK) break;
error("internal error %d in memDecompress(%d)", res, type);
}
ans = allocVector(RAWSXP, outlen);
memcpy(RAW(ans), buf, outlen);
break;
}

That code looks for the 0x1F 0x8B magic number, which is the one for
gzip [http://www.gzip.org/zlib/rfc-gzip.html#header-trailer].  Or are
you saying that that if statement is incorrect?  (Disclaimer: I don't
know much about gzip/zlib, but I happens to recognize that gzip magic
number.)

/Henrik

> So “gzip” in type is a misnomer - it should say “zlib” since it can neither 
> read nor write the gzip format. Also the documentation should make it clear 
> since it’s pointless to try to use this on gzip contents. The better 
> alternative would be to support both gzip and zlib since R can deal with both 
> — the issue is that it will break code that used type=“gzip” explicitly to 
> mean “zlib” so I’m not sure there is a good way out.
>
> Cheers,
> Simon
>
>
>>
>> On Wed, Nov 27, 2013 at 5:22 PM, Dirk Eddelbuettel  wrote:
>>
>>>
>>> On 27 November 2013 at 18:38, Dirk Eddelbuettel wrote:
>>> |
>>> | On 27 November 2013 at 23:49, Dr Gregory Jefferis wrote:
>>> | | I have a binary file type that includes a zlib compressed data block
>>> (ie
>>> | | not gzip). Is anyone aware of a way using base R or a CRAN package to
>>> | | decompress this kind of data (from disk or memory). So far I have found
>>> | | Rcompression::decompress on omegahat, but I would prefer to keep
>>> | | dependencies on CRAN (or bioconductor). I am also trying to avoid
>>> | | writing yet another C level interface to part of zlib.
>>> |
>>> | Unless I am missing something, this is in base R; see help(connections).
>>> |
>>> | Here is a quick demo:
>>> |
>>> | R> write.csv(trees, file="/tmp/trees.csv")# data we all have
>>> | R> system("gzip -v /tmp/trees.csv")   # as I am lazy here
>>> | /tmp/trees.csv:50.5% -- replaced with /tmp/trees.csv.gz
>>> | R> read.csv(gzfile("/tmp/trees.csv.gz"))  # works out of the box
>>>
>>> Oh, and in case you meant zip file containing a data file, that also works.
>>>
>>> First converting what I did last
>>>
>>> edd@max:/tmp$ gunzip trees.csv.gz
>>> edd@max:/tmp$ zip trees.zip trees.csv
>>>  adding: trees.csv (deflated 50%)
>>> edd@max:/tmp$
>>>
>>> Then reading the csv from inside the zip file:
>>>
>>> R> read.csv(unz("/tmp/trees.zip", "trees.csv"))
>>>X Girth Height Volume
>>> 1   1   8.3 70   10.3
>>> 2   2   8.6 65   10.3
>>> 3   3   8.8 63   10.2
>>> 4   4  10.5 72   16.4
>>> 5   5  10.7 81   18.8
>>> 6   6  10.8 83   19.7
>>> 7   7  11.0 66   15.6
>>> 8   8  11.0 75   18.2
>>> 9   9  11.1 80   22.6
>>> 10 10  11.2 75   19.9
>>> 11 11  11.3 79   24.2
>>> 12 12  11.4 76   21.0
>>> 13 13  11.4 76   21.4
>>> 14 14  11.7 69   21.3
>>> 15 15  12.0 75   19.1
>>> 16 16  12.9 74   22.2
>>> 17 17  12.9 85   33.8
>>> 18 18  13.3 86   27.4
>>> 19 19  13.7 71   25.7
>>> 20 20  13.8 64   24.9
>>> 21 21  14.0 78   34.5
>>> 22 22  14.2 80   31.7
>>> 23 23  14.5 74   36.3
>>> 24 24  16.0 72   38.3
>>> 25 25  16.3 77   42.6
>>> 26 26  17.3 81   55.4
>>> 27 27  17.5 82   55.7
>>> 28 28  17.9 80   58.3
>>> 29 29  18.0 80   51.5
>>> 30 30  18.0 80   51.0
>>> 31 31  20.6 87   77.0
>>> R>
>>>
>>> Regards, Dirk
>>>
>>> --
>>> Dirk Eddelbuettel | e...@debian.org | http://dirk.eddelbuettel.com
>>>
>>> _

[Rd] How to catch warnings sent by arguments of s4 methods ?

2013-11-29 Thread Karl Forner

Hello,

I apologized if this had already been addressed, and I also submitted
this problem on SO:
http://stackoverflow.com/questions/20268021/how-to-catch-warnings-sent-during-s4-method-selection

Example code:
setGeneric('my_method', function(x) standardGeneric('my_method') )
setMethod('my_method', 'ANY', function(x) invisible())

withCallingHandlers(my_method(warning('argh')), warning = function(w)
{ stop('got warning:', w) })
# this does not catch the warning

It seems that the warnings emitted during the evaluation of the
arguments of S4 methods can not get caught using
withCallingHandlers().

Is this expected ? Is there a work-around ?

Best,
Karl Forner

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] inflate zlib compressed data using base R or CRAN package?

2013-11-29 Thread Simon Urbanek

On Nov 29, 2013, at 4:37 AM, Henrik Bengtsson  wrote:

> On Thu, Nov 28, 2013 at 4:48 PM, Simon Urbanek
>  wrote:
>> On Nov 27, 2013, at 8:30 PM, Murray Stokely  wrote:
>> 
>>> I think none of these examples describe a zlib compressed data block inside 
>>> a binary file that the OP asked about, as all of your examples are e.g. 
>>> prepending gzip or zip headers.
>>> 
>>> Greg, is memDecompress what you are looking for?
>>> 
>> 
>> I think so.
>> 
>> But this is interesting — I think the documentation of 
>> memCompress/memDecompress is not quite correct and the parameters are 
>> misleading. Although it does mention the gzip headers, it is incorrect since 
>> zlib format is not a subset of the gzip format (albeit they use the same 
>> compression method), so you cannot extract gzip content using zlib 
>> decompression - you’ll get  internal error -3 in memDecompress(2) if you try 
>> it since it expects the zlib header which is different form the gzip one.
> 
> Interestingly.  Just to make sure: are you 100% certain about this?

Yes, see below.

>> From the http://svn.r-project.org/R/trunk/src/main/connections.c:
> 
>case 2: /* gzip */
>{
>   uLong inlen = LENGTH(from), outlen = 3*inlen;
>   int res;
>   Bytef *buf, *p = (Bytef *)RAW(from);
>   /* we check for a file header */
>   if (p[0] == 0x1f && p[1] == 0x8b) { p += 2; inlen -= 2; }
>   while(1) {
>   buf = (Bytef *) R_alloc(outlen, sizeof(Bytef));
>   res = uncompress(buf, &outlen, p, inlen);
>   if(res == Z_BUF_ERROR) { outlen *= 2; continue; }
>   if(res == Z_OK) break;
>   error("internal error %d in memDecompress(%d)", res, type);
>   }
>   ans = allocVector(RAWSXP, outlen);
>   memcpy(RAW(ans), buf, outlen);
>   break;
>}
> 
> That code looks for the 0x1F 0x8B magic number, which is the one for
> gzip [http://www.gzip.org/zlib/rfc-gzip.html#header-trailer].  Or are
> you saying that that if statement is incorrect?  (Disclaimer: I don't
> know much about gzip/zlib, but I happens to recognize that gzip magic
> number.)
> 

The above assumes that zlib is a subset of gzip which is *not* true - that was 
the point I was making. zlibs has *different* headers than gzip, not just fewer 
bytes. gzip has lots of other things in the header and they even also use 
different CRC methods. 

To illustrate:

> writeBin(charToRaw("1234"), f<-gzfile("test.gz","wb"))
> close(f)
> readBin("test.gz",raw(),100)
 [1] 1f 8b 08 00 00 00 00 00 00 03 33 34 32 36 01
[16] 00 a3 e0 e3 9b 04 00 00 00
> memCompress("1234")
 [1] 78 9c 33 34 32 36 01 00 01 f8 00 cb

As you can see gzip uses a different header (it starts with 0x1f 0x8b but then 
has many other files like mod time etc.) - the compressed payload starts at 
byte 11 and the CRC is 64-bit wide. In contrast, zlib has no magic header but 
it also has just two-byte header followed by the payload (starting at byte 3) 
and 32-bit CRC. So the two are entirely incompatible - you cannot decompress 
gzip format with zlib parser and vice-versa. The payload is the same, but the 
headers and trailers are entirely different. That's why Greg was specifically 
asking about zlib which does *not* mean gzip.

Cheers,
Simon

> /Henrik
> 
>> So “gzip” in type is a misnomer - it should say “zlib” since it can neither 
>> read nor write the gzip format. Also the documentation should make it clear 
>> since it’s pointless to try to use this on gzip contents. The better 
>> alternative would be to support both gzip and zlib since R can deal with 
>> both — the issue is that it will break code that used type=“gzip” explicitly 
>> to mean “zlib” so I’m not sure there is a good way out.
>> 
>> Cheers,
>> Simon
>> 
>> 
>>> 
>>> On Wed, Nov 27, 2013 at 5:22 PM, Dirk Eddelbuettel  wrote:
>>> 

 On 27 November 2013 at 18:38, Dirk Eddelbuettel wrote:
 |
 | On 27 November 2013 at 23:49, Dr Gregory Jefferis wrote:
 | | I have a binary file type that includes a zlib compressed data block
 (ie
 | | not gzip). Is anyone aware of a way using base R or a CRAN package to
 | | decompress this kind of data (from disk or memory). So far I have found
 | | Rcompression::decompress on omegahat, but I would prefer to keep
 | | dependencies on CRAN (or bioconductor). I am also trying to avoid
 | | writing yet another C level interface to part of zlib.
 |
 | Unless I am missing something, this is in base R; see help(connections).
 |
 | Here is a quick demo:
 |
 | R> write.csv(trees, file="/tmp/trees.csv")# data we all have
 | R> system("gzip -v /tmp/trees.csv")   # as I am lazy here
 | /tmp/trees.csv:50.5% -- replaced with /tmp/trees.csv.gz
 | R> read.csv(gzfile("/tmp/trees.csv.gz"))  # works out of the box

 Oh, and in case you meant zip file containing a data file, that also works.

 First converting what I did last

 edd@max:/tmp$ gunzi

Re: [Rd] inflate zlib compressed data using base R or CRAN package?

[Rd] How to catch warnings sent by arguments of s4 methods ?

Re: [Rd] inflate zlib compressed data using base R or CRAN package?

3 matches

Site Navigation

Mail list logo

Footer information