(this may be a duplicate post since I attached a file to the previous 
try...sorry about that)

Below are the first few lines of a zlib compressed byte array written from Java 
with the Deflator class.  

> readBin("row_1",raw(),10000000)
   [1] 4c 45 50 e2 49 d5 86 bc 48 a1 32 5d 49 9d f5 90 48 e0 14 33 49 8f 54 6a 
49 77 c9 48 48 d9 ec 56 47 91 48 f0 47 25 56 ef 47 b8 f5 7b 46 35 25 00 47 73 
11 c5 48 6c 8e b9 47 ca 71 92 46 8d dc aa 45 92 0e

I’m trying to read it into R with Rcompression and I can't get it to work.  I 
think it may be because Java’s Deflator class by default (see below ... the 
nowrap parameter) writes the data without the header and checksum.  I can't 
change the Java creation code.  I think uncompress() reads a zlib package (with 
headers) and gunzip() reads a gzip package (with headers).  Is there a way to 
read the package load without headers?  It is my understanding that the package 
load (minus the headers) is the same for gzip and zlib.  The Ruby thread at the 
bottom seems to be related.  Thanks for any help!

> compressedData = readBin("row_1",raw(),10000000)
> uncompress(compressedData)
Error in uncompress(compressedData) : corrupted compressed (gzip) source

> gunzip(compressedData)
Error in gunzip(compressedData) : 
  Failed to uncompress the raw data: (-3) incorrect header check

--------------

Java Deflater

public Deflater(int level, boolean nowrap) Creates a new compressor using the 
specified compression level. If 'nowrap' is true then the ZLIB header and 
checksum fields will not be used in order to support the compression format 
used in both GZIP and PKZIP.

Parameters:
level - the compression level (0-9)
nowrap - if true then use GZIP compatible compression

http://java.sun.com/j2se/1.5.0/docs/api/java/util/zip/Deflater.html#Deflater(int,
 boolean)

--------------

These threads also seem to be dealing with the same issue….

http://www.groupsrv.com/science/about98918.html

http://www.ruby-forum.com/topic/183400

The Ruby thread says “As could be seen in your first post, you are using 
-MAX_WBITS, which enables old (headerless? don't know what it's called) zlib 
format, that has no gzip header and no checksum. Maybe you should be using 
+MAX_WBITS (the default), which adds necessary header and checksum.”

Ben Stabler
Systems Analysis Group
Parsons Brinckerhoff
503.478.2859


___________________________
NOTICE: This communication and any attachments ("this message") may contain 
confidential information for 
the sole use of the intended recipient(s). Any unauthorized use, disclosure, 
viewing, copying, alteration, 
dissemination or distribution of, or reliance on this message is strictly 
prohibited. If you have received this 
message in error, or you are not an authorized recipient, please notify the 
sender immediately by replying 
to this message, delete this message and all copies from your e-mail system and 
destroy any printed copies.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to