Re: [compress] FW: Tika content detection and crawled "remote" content

2017-07-05 Thread Stefan Bodewig
This looks great, well done Tika! Thank you for sharing, Tim Cheers Stefan On 2017-07-05, Allison, Timothy B. wrote: > Fellow file-philes on [compress], > Sebastian Nagel has added file type id via Apache Tika to Common Crawl. > While Tika is not 100% accurate, this means that we have

[compress] FW: Tika content detection and crawled "remote" content

2017-07-05 Thread Allison, Timothy B.
Fellow file-philes on [compress], Sebastian Nagel has added file type id via Apache Tika to Common Crawl. While Tika is not 100% accurate, this means that we have far better clarity on mime type than relying on the http header+file suffix. So, for testing purposes, you (or we over on Tika)