This looks great, well done Tika!
Thank you for sharing, Tim
Cheers
Stefan
On 2017-07-05, Allison, Timothy B. wrote:
> Fellow file-philes on [compress],
> Sebastian Nagel has added file type id via Apache Tika to Common Crawl.
> While Tika is not 100% accurate, this means that we have
Fellow file-philes on [compress],
Sebastian Nagel has added file type id via Apache Tika to Common Crawl. While
Tika is not 100% accurate, this means that we have far better clarity on mime
type than relying on the http header+file suffix. So, for testing purposes,
you (or we over on Tika)