Re: [compress] BZip2CompressorInputStream stops working without rhyme or reason ...

2020-10-16 Thread Peter Lee
Hi Albretch, This seems to be more suitable to be discussed in JIRA : https://issues.apache.org/jira/projects/COMPRESS/issues (https://link.getmailspring.com/link/1994dde5-ebdc-4f24-9bc7-105cf6551...@getmailspring.com/0?redirect=https%3A%2F%2Fissues.apache.org%2Fjira%2Fprojects%2FCOMPRESS%2Fissue

Re: [compress] BZip2CompressorInputStream stops working without rhyme or reason ...

2020-10-14 Thread Albretch Mueller
I don't know what could there apaprently be exactly at byte offset 2848 in some buffer but files reporing to be fine by bzip2 --test can't be processed by BZip2CompressorInputStream: ~ $ _IFL="/home/lbrtchx/cmllpz/LklWb/org/wikimedia/dumps/enwiki/20200920/enwiki-20200920-pages-articles-multistrea

Re: [compress] BZip2CompressorInputStream stops working without rhyme or reason ...

2020-10-14 Thread Albretch Mueller
the files decompress fine using Linux bzip2: $ time bzip2 --decompress --verbose --keep "enwiki-20200920-pages-articles-multistream1.xml-p1p41242.bz2" enwiki-20200920-pages-articles-multistream1.xml-p1p41242.bz2: done real2m22.089s user2m6.664s sys 0m7.184s $ time bzip2 --decomp

Re: [compress] BZip2CompressorInputStream stops working without rhyme or reason ...

2020-10-13 Thread Albretch Mueller
$ _IFL="enwiki-20141008-pages-articles-multistream.xml.bz2" $ time bzip2 --test --verbose "${_IFL}" enwiki-20141008-pages-articles-multistream.xml.bz2: ok real93m51.202s user92m31.600s sys 0m35.188s $ time bzip2 --decompress --verbose --keep "${_IFL}" enwiki-20141008-pages-articl

[compress] BZip2CompressorInputStream stops working without rhyme or reason ...

2020-10-13 Thread Albretch Mueller
As part of my corpora research work I have to work with such large text files. Wikipedia dumps are bzip2 so I have been working with: commons/compress/compressors/bzip2/BZip2CompressorInputStream.html and I consistently notice that it just stops processing without an error of any kind. I che