[
https://issues.apache.org/jira/browse/TIKA-3392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17342626#comment-17342626
]
Andrei Dobrescu commented on TIKA-3392:
---------------------------------------
I did a bit of research before posting this issue. Thing is:
- All Android apps do bundle all their dependent libraries. So if in app A1 you
import library L1 with version V1 and in app A2 you import library L1 with
version V2, it will be ok, because the APK file format is containerized.
- The exception are the classes from the Android SDK. The SDK is the only
system-level library, common to all apps. It contains Java SE classes, Android
specific classes, such as the UI toolkit. Problem is, when they developed
Android, some Genius from Google thought it's a good idea to put in the SDK
JSON.org, Apache HTTP client, org.xml.*, org.xmlpull.* libraries. [You can find
the documen tation of the SDK
here|https://developer.android.com/reference/packages]
As you can see, the SDK contains an implementation of org.xml.sax. I can import
latest Apache Xerces but org.xml.* classes will always resolve to the ones from
the SDK. The classes from the SDK doesn't support "secure-processing", and
because of that Tika library will crash.
I can think of 3 solutions to this problem:
- Guys from Google could update or remove their org.xml.* classes from the SDK.
This surely won't happen.
- I can stop using Tika, and start using another mime type detector, such as
the linux file command: [https://stackoverflow.com/a/2227201/11536597|like
this]. I could compile the [http://www.darwinsys.com/file/|source code] to
target Android, then bundle the native library.
- Tika could stop using secure-processing XML feature. Why is it even needed?
Is it important? Can the library work without it? It basically crashes at
MimeTypesReader.java:429 / newSaxParser method /
factory.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true);
> Apache Tika V1.26 doen't work on Android anymore. Issue with org.xml
> dependencies.
> ----------------------------------------------------------------------------------
>
> Key: TIKA-3392
> URL: https://issues.apache.org/jira/browse/TIKA-3392
> Project: Tika
> Issue Type: Bug
> Components: core
> Affects Versions: 1.26
> Environment: Android 11
> Reporter: Andrei Dobrescu
> Priority: Major
> Labels: android
> Attachments: image-2021-05-11-17-53-58-291.png
>
>
> I use Apache Tika on Android in order to detect mime type of varios files:
> Apache Tika V1.10 works fine on Android:
> {code:java}
> implementation 'org.apache.tika:tika-core:1.10'
> {code}
> {code:java}
> val mimeType = file.inputStream().buffered().use { inputStream ->
> AutoDetectParser().detector .detect(inputStream, Metadata()).toString()
> }
> {code}
> However, Tika V1.26 will crash when trying to detect the mime type:
> {code:java}
> implementation 'org.apache.tika:tika-core:1.10'
> {code}
> {noformat}
> java.lang.ExceptionInInitializerError
> at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:69)
> at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:100)
> at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:189)
> at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:604)
> at
> org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:83)
> at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:257)
> at org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:422)
> at
> org.apache.tika.parser.AutoDetectParser.<init>(AutoDetectParser.java:55){noformat}
> {noformat}
> CAUSE:
> java.lang.RuntimeException: problem initializing SAXParser pool
> at
> org.apache.tika.mime.MimeTypesReader.<clinit>(MimeTypesReader.java:119)
> at
> org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:69)
> at
> org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:100)
> at
> org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:189)
> at
> org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:604)
> at
> org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:83)
> at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:257)
> at
> org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:422)
> at
> org.apache.tika.parser.AutoDetectParser.<init>(AutoDetectParser.java:55){noformat}
> {noformat}
> CAUSE OF CAUSE:
> org.apache.tika.exception.TikaException: problem creating SAX parser factory
> at
> org.apache.tika.mime.MimeTypesReader.newSAXParser(MimeTypesReader.java:433)
> at
> org.apache.tika.mime.MimeTypesReader.setPoolSize(MimeTypesReader.java:417)
> at
> org.apache.tika.mime.MimeTypesReader.<clinit>(MimeTypesReader.java:117)
> at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:69)
> at
> org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:100)
> at
> org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:189)
> at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:604)
> at
> org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:83)
> at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:257)
> at
> org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:422)
> at
> org.apache.tika.parser.AutoDetectParser.<init>(AutoDetectParser.java:55){noformat}
> {noformat}
> CAUSE OF CAUSE OF CAUSE:
> org.xml.sax.SAXNotRecognizedException:
> http://javax.xml.XMLConstants/feature/secure-processing
> at
> org.apache.harmony.xml.parsers.SAXParserFactoryImpl.setFeature(SAXParserFactoryImpl.java:93)
> at
> org.apache.tika.mime.MimeTypesReader.newSAXParser(MimeTypesReader.java:429)
> at
> org.apache.tika.mime.MimeTypesReader.setPoolSize(MimeTypesReader.java:417)
> at
> org.apache.tika.mime.MimeTypesReader.<clinit>(MimeTypesReader.java:117)
> at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:69)
> at
> org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:100)
> at
> org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:189)
> at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:604)
> at
> org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:83)
> at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:257)
> at
> org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:422)
> at
> org.apache.tika.parser.AutoDetectParser.<init>(AutoDetectParser.java:55)
> {noformat}
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)