I don't know what could there apaprently be exactly at byte offset
2848 in some buffer but files reporing to be fine by bzip2 --test
can't be processed by BZip2CompressorInputStream:
~
$
_IFL="/home/lbrtchx/cmllpz/LklWb/org/wikimedia/dumps/enwiki/20200920/enwiki-20200920-pages-articles-multistrea
the files decompress fine using Linux bzip2:
$ time bzip2 --decompress --verbose --keep
"enwiki-20200920-pages-articles-multistream1.xml-p1p41242.bz2"
enwiki-20200920-pages-articles-multistream1.xml-p1p41242.bz2: done
real2m22.089s
user2m6.664s
sys 0m7.184s
$ time bzip2 --decomp
user128m4.964s
sys 1m9.108s
$ time bzip2 --decompress --verbose --keep "${_IFL}"
enwiki-latest-pages-articles.xml.bz2: done
real147m59.737s
user124m31.476s
sys 8m1.516s
$
On 10/13/20, Albretch Mueller wrote:
> As part of my corpora research work I have to work with su
As part of my corpora research work I have to work with such large
text files. Wikipedia dumps are bzip2 so I have been working with:
commons/compress/compressors/bzip2/BZip2CompressorInputStream.html
and I consistently notice that it just stops processing without an
error of any kind.
I che
On Sun, Jun 8, 2008 at 10:18 AM, Phil Steitz <[EMAIL PROTECTED]>
> Its probably best to take the discussion to the dev list.
~
Hi,
~
this thread started in [EMAIL PROTECTED] as
"commons.apache.org/math/stat/"
~
Formal need for a way to keep incremental statistics as part of the package:
~
If yo