Still waiting for reports... We've had quite a few files go from application/x-123 to image/x-tga via TIKA-2527.
I think this is expected because they all appear to be embedded files, with file names that end in .tga. But I wanted to confirm this is expected. There's also one example of: application/x-stata-dta -> image/x-tga, which is probably wrong: http://162.242.228.174/docs/commoncrawl2_likely_broken/BT/BTTVHEUDLE7WODDGPYT6LLA6LXMHS3CX.dta -----Original Message----- From: Allison, Timothy B. [mailto:[email protected]] Sent: Wednesday, March 28, 2018 10:55 AM To: [email protected] Subject: 1.18 pre rc regression tests All, I've run the initial regression tests. The corpus size is now big enough that I have to migrate the H2 tables to postgres before writing the reports. I'll post the reports as soon as they're finally ready, but I'm starting to go through some results now. Cheers, Tim
