[ 
https://issues.apache.org/jira/browse/TIKA-3110?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17134328#comment-17134328
 ] 

Stefan Bodewig commented on TIKA-3110:
--------------------------------------

The short answer is: yes.

The longer version: don't assume anything when dealing with archiving formats 
that have been around for decades. If only there was *the* tar format. :)

I will avoid the terms block and record as GNU tar and BSD tar seem to use them 
differently.

Traditionally the tar format contains chunks of 512 bytes of data and groups 10 
such chunks to a larger unit. Likely because it could be written to a tape more 
easily if you wrote this bigger amount of data. Back then all tar archives 
would consist of 5kB blocks and the archive would be padded by 0s to make it 
reach a multiple of 5kB if the last entry didn't fill the unit entirely.

A lot of dialects spawned. Some tar tools will not fill the last unit. In order 
to make things worse tar archives are supposed to signal EOF by two 512 byte 
chunks of zeros. Some archivers create such markers, others only add one chunk, 
others don't do either.

consumeRemainderOfLastBlock tries to consume the whole 5kB unit it is looking 
at and if the stream ends permaturely, wll, then it has probably been created 
by an archiver that didn't care and we won't complain.

 

> cannot extract metadata from 7z .tar archive
> --------------------------------------------
>
>                 Key: TIKA-3110
>                 URL: https://issues.apache.org/jira/browse/TIKA-3110
>             Project: Tika
>          Issue Type: Bug
>          Components: mime, parser
>    Affects Versions: 1.24.1
>            Reporter: Alex
>            Priority: Major
>         Attachments: 7ztar.tar
>
>
> When I extracted metadata from .tar archive wich was created by linux bash 
> it's works as I expect but if .tar archive was created by 7z I got an error:
>  TikaException: TIKA-198: Illegal IOException from 
> org.apache.tika.parser.pkg.PackageParser@4d0f2471 
> I created a project on GitHub for your convenience. It includes 2 files and 
> code for play around: [https://github.com/AlexOkayJ/apache-tika-tar-issue.git]
>   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to