I was able to reproduce this bug. After some investigating, it's clearly localized to gztell (a zlib function), and the z_off_t type. However, there may be a broader cross-compiling problem. I don't know what procedure Brandon used to compile the 32 bit version (I used the gcc -m32 flag), but we should be sure that we're doing this correctly (and document it!) before going on a goose chase. The real issue may or may not be related to zlib, but only manifested there. Discussion of my findings are below.
-Matt I checked to ensure that R's file function was recognizing the gzip file as such. So that's not the problem. I next modified some code in gzfile_seek, just above and below the call to gztell (line 1230 of connections.c), and defined a small function z_off_t_print, to print the bits of the z_off_t offset in least significant order (assuming little endian): static void z_off_t_print(z_off_t) { z_off_t mask = 1; while( mask > 0 ) { printf("%u", (mask & u) > 0 ); mask <<= 1; } printf("\n"); } static double gzfile_seek(Rconnection con, double where, int origin, int rw) { gzFile fp = ((Rgzfileconn)(con->private))->fp; /** begin modified code **/ z_off_t pos; printf("sizeof(z_off_t): %u\n", sizeof(z_off_t)); printf("sizeof(double): %u\n", sizeof(double)); printf("before gztell():\n"); z_off_t_print(pos); pos = gztell(fp); printf("after gztell():\n"); z_off_t_print(pos); printf("(double) pos: %f\n", (double) pos); /** end modified code **/ ... Here's what happens running code similar to yours in the 32 bit build: > zz <- gzfile("ex.gz", "w") # compressed file > cat("TITLE extra line", "2 3 5 7", + "", "11 13 17", file = zz, sep = "\n") > close(zz) > blah = file("ex.gz", "r") > seek(blah, 5) sizeof(z_off_t): 8 sizeof(double): 8 before gztell(): 000000000000000000000000000000000000000000000000000000000000000 after gztell(): 000000000000000000000000000000000000110000111011110111001001000 (double) pos: 665367468683821056.000000 [1] 6.653675e+17 > seek(blah) before gztell(): 000000000000000000000000000000000000000000000000000000000000000 after gztell(): 101000000000000000000000000000000000110000111011110111001001000 (double) pos: 665367468683821056.000000 [1] 6.653675e+17 Hence, gztell is doing what we expect in the least significant 32 bits (which is binary for decimal 5), but returns junk in the most significant 32 bits. Here are the results for the 64 bit build: > zz <- gzfile("ex.gz", "w") # compressed file > cat("TITLE extra line", "2 3 5 7", "", "11 13 17", file = zz, sep = "\n") > close(zz) > blah = file("ex.gz", "r") > seek(blah, 5) sizeof(z_off_t): 8 sizeof(double): 8 before gztell(): 000000000000000000000000000000000000000000000000000000000000000 after gztell(): 000000000000000000000000000000000000000000000000000000000000000 (double) pos: 0.000000 [1] 0 > seek(blah) before gztell(): 000000000000000000000000000000000000000000000000000000000000000 after gztell(): 101000000000000000000000000000000000000000000000000000000000000 (double) pos: 5.000000 [1] 5 No problems with the 64 bit build. On Tue, 2010-06-22 at 13:04 -0400, Brandon Whitcher wrote: > I have installed both 32-bit and 64-bit versions of R2.12.0 (2010-06-15 > r52300) on my Ubuntu 10.04 64-bit system. I observe the following behavior > when running the examples from base::connections. There appears to be a > problem with seek() on a .gz file when using a 32-bit installation of > R2.12.0, but the problem doesn't appear in the 64-bit installation. I > realize that seek() has been difficult in the past, and I don't want to open > old wounds, but is this a known problem? Is this easily fixable? I have a > package that relies on seek() when accessing gzipped files. > > Using the 32-bit installation... > > *> zz <- file("ex.data", "w") # open an output file connection > > cat("TITLE extra line", "2 3 5 7", "", "11 13 17", file = zz, sep = > "\n") > > cat("One more line\n", file = zz) > > close(zz) > > blah = file("ex.data", "r") > > seek(blah) > [1] 0 > > > > zz <- gzfile("ex.gz", "w") # compressed file > > cat("TITLE extra line", "2 3 5 7", "", "11 13 17", file = zz, sep = > "\n") > > close(zz) > > blah = file("ex.gz", "r") > > seek(blah) > [1] 7.80707e+17 > > > > zz <- bzfile("ex.bz2", "w") # bzip2-ed file > > cat("TITLE extra line", "2 3 5 7", "", "11 13 17", file = zz, sep = > "\n") > > close(zz) > > blah = file("ex.bz2", "r") > > seek(blah) > Error in seek.connection(blah) : 'seek' not enabled for this connection > >* > > Using the 64-bit installation... > > *> zz <- file("ex.data", "w") # open an output file connection > > cat("TITLE extra line", "2 3 5 7", "", "11 13 17", file = zz, sep = "\n") > > cat("One more line\n", file = zz) > > close(zz) > > blah = file("ex.data", "r") > > seek(blah) > [1] 0 > > > > zz <- gzfile("ex.gz", "w") # compressed file > > cat("TITLE extra line", "2 3 5 7", "", "11 13 17", file = zz, sep = "\n") > > close(zz) > > blah = file("ex.gz", "r") > > seek(blah) > [1] 0 > > > > zz <- bzfile("ex.bz2", "w") # bzip2-ed file > > cat("TITLE extra line", "2 3 5 7", "", "11 13 17", file = zz, sep = "\n") > > close(zz) > > blah = file("ex.bz2", "r") > > seek(blah) > Error in seek.connection(blah) : 'seek' not enabled for this connection > > * > > thanks, > > Brandon > > [[alternative HTML version deleted]] > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel -- Matthew S. Shotwell Graduate Student Division of Biostatistics and Epidemiology Medical University of South Carolina http://biostatmatt.com ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel