Prof Brian Ripley wrote: >> I have zlib compressed strings (example is attached) > > What is that file? Not gzip compression: > > gannet% file compressed.txt > compressed.txt: ASCII text, with very long lines > > since gzip uses a magic header that 'file' knows about. And even if > the header was stripped, such files are 8-bit and yours is ASCII. > Try >> x <- 'Johannes Graumann' >> xx <- charToRaw(x) >> xxx <- memCompress(xx, "g") >> rawToChar(xxx) > [1] "x\x9c\xf3\xca\xcfH\xcc\xcbK-Vp/J,\xcd\0052\001:\n\006\x90" > > to see what a real gzipped string looks like. > >> and would like to decompress them using memDecompress ... >> >> I try this: >>> connection <- file("compressed.txt","r") >>> compressed <- readLines(connection) I am dealing with mass spectrometric data in a XML file format (mzXML). The biggest part of the contained data is actual mass spectra that are base64 encoded and optionally compressed using http://zlib.net (saving quite some storage space). When they are compressed I just get an XML node that looks like this <peaks>CONTENT OF THE ORIGINAL ATTACHMENT HERE</peaks> I would like to be able to decompress that string and thought that memDecompress was the right tool to do so ...
> You have not told us the 'at a minimum' information requested in the > posting guide. But you should not expect that to read a binary file, > especially not in a MBCS locale. We have readBin for that purpose. I'm actually reading this in as a string from the XML file ... >>> memDecompress(as.raw(compressed),type="g") > > I don't think you know what as.raw does: it does not convert bytes in > a character string to raw (for which you need charToRaw). > > It is always a good idea to look at each stage of your computation: > >> as.raw(compressed) > [1] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 00 00 00 > [26] 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 > 00 00 00 Yup, that was plain stupid and trying to make memDecompress run at all (since handing it the character string also resulted in an error. > sessionInfo() R version 2.10.1 (2009-12-14) x86_64-pc-linux-gnu locale: [1] LC_CTYPE=en_US.UTF-8 LC_NUMERIC=C [3] LC_TIME=en_US.UTF-8 LC_COLLATE=en_US.UTF-8 [5] LC_MONETARY=en_US.UTF-8 LC_MESSAGES=en_US.UTF-8 [7] LC_PAPER=en_US.UTF-8 LC_NAME=en_US.UTF-8 [9] LC_ADDRESS=en_US.UTF-8 LC_TELEPHONE=en_US.UTF-8 [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] rkward_0.5.1 loaded via a namespace (and not attached): [1] tools_2.10.1 Thanks for any further hints, Joh ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.