[
https://issues.apache.org/jira/browse/TIKA-3392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17342626#comment-17342626
]
Andrei Dobrescu edited comment on TIKA-3392 at 5/11/21, 3:07 PM:
-----------------------------------------------------------------
I did a bit of research before posting this issue. Thing is:
- All Android apps do bundle all their dependent libraries. So if in app A1
you import library L1 with version V1 and in app A2 you import library L1 with
version V2, it will be ok, because the APK file format is containerized.
- The exception are the classes from the Android SDK. The SDK is the only
system-level library, common to all apps. The SDK is deeply bundle to the
Android OS version (so you'll have a version of the SDK for each OS version).
It contains Java SE classes and Android specific classes, such as the UI
toolkit. Problem is, when they developed Android, some Genius from Google
thought it's a good idea to put in the SDK JSON.org, Apache HTTP client,
org.xml.*,* org.xmlpull. libraries. [You can find the documentation of the SDK
here|https://developer.android.com/reference/packages]
As you can see, the SDK contains an implementation of org.xml.sax. I can import
latest Apache Xerces but org.xml.* classes will always resolve to the ones from
the SDK. The classes from the SDK doesn't support "secure-processing", and
because of that Tika library will crash.
I can think of 3 solutions to this problem:
- Guys from Google could update or remove their org.xml.* classes from the
SDK. This surely won't happen.
- I can stop using Tika, and start using another mime type detector, such as
the linux file command: [like
this|https://stackoverflow.com/a/2227201/11536597]. I could compile the [source
code|http://www.darwinsys.com/file/] to target Android, then bundle the
compiled binary into my app.
- Tika could stop using secure-processing XML feature. Why is it even needed?
Is it important? Can the library work without it? It basically crashes at
MimeTypesReader.java:429 / newSaxParser method /
factory.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true);
was (Author: andob):
I did a bit of research before posting this issue. Thing is:
- All Android apps do bundle all their dependent libraries. So if in app A1
you import library L1 with version V1 and in app A2 you import library L1 with
version V2, it will be ok, because the APK file format is containerized.
- The exception are the classes from the Android SDK. The SDK is the only
system-level library, common to all apps. The SDK is deeply bundle to the
Android OS version (so you'll have a version of the SDK for each OS version).
It contains Java SE classes and Android specific classes, such as the UI
toolkit. Problem is, when they developed Android, some Genius from Google
thought it's a good idea to put in the SDK JSON.org, Apache HTTP client,
org.xml.*,* org.xmlpull. libraries. [You can find the documen tation of the SDK
here|https://developer.android.com/reference/packages]
As you can see, the SDK contains an implementation of org.xml.sax. I can import
latest Apache Xerces but org.xml.* classes will always resolve to the ones from
the SDK. The classes from the SDK doesn't support "secure-processing", and
because of that Tika library will crash.
I can think of 3 solutions to this problem:
- Guys from Google could update or remove their org.xml.* classes from the
SDK. This surely won't happen.
- I can stop using Tika, and start using another mime type detector, such as
the linux file command: [https://stackoverflow.com/a/2227201/11536597|like
this]. I could compile the [http://www.darwinsys.com/file/|source code] to
target Android, then bundle the native library.
- Tika could stop using secure-processing XML feature. Why is it even needed?
Is it important? Can the library work without it? It basically crashes at
MimeTypesReader.java:429 / newSaxParser method /
factory.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true);
> Apache Tika V1.26 doen't work on Android anymore. Issue with org.xml
> dependencies.
> ----------------------------------------------------------------------------------
>
> Key: TIKA-3392
> URL: https://issues.apache.org/jira/browse/TIKA-3392
> Project: Tika
> Issue Type: Bug
> Components: core
> Affects Versions: 1.26
> Environment: Android 11
> Reporter: Andrei Dobrescu
> Priority: Major
> Labels: android
> Attachments: image-2021-05-11-17-53-58-291.png
>
>
> I use Apache Tika on Android in order to detect mime type of varios files:
> Apache Tika V1.10 works fine on Android:
> {code:java}
> implementation 'org.apache.tika:tika-core:1.10'
> {code}
> {code:java}
> val mimeType = file.inputStream().buffered().use { inputStream ->
> AutoDetectParser().detector .detect(inputStream, Metadata()).toString()
> }
> {code}
> However, Tika V1.26 will crash when trying to detect the mime type:
> {code:java}
> implementation 'org.apache.tika:tika-core:1.10'
> {code}
> {noformat}
> java.lang.ExceptionInInitializerError
> at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:69)
> at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:100)
> at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:189)
> at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:604)
> at
> org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:83)
> at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:257)
> at org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:422)
> at
> org.apache.tika.parser.AutoDetectParser.<init>(AutoDetectParser.java:55){noformat}
> {noformat}
> CAUSE:
> java.lang.RuntimeException: problem initializing SAXParser pool
> at
> org.apache.tika.mime.MimeTypesReader.<clinit>(MimeTypesReader.java:119)
> at
> org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:69)
> at
> org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:100)
> at
> org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:189)
> at
> org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:604)
> at
> org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:83)
> at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:257)
> at
> org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:422)
> at
> org.apache.tika.parser.AutoDetectParser.<init>(AutoDetectParser.java:55){noformat}
> {noformat}
> CAUSE OF CAUSE:
> org.apache.tika.exception.TikaException: problem creating SAX parser factory
> at
> org.apache.tika.mime.MimeTypesReader.newSAXParser(MimeTypesReader.java:433)
> at
> org.apache.tika.mime.MimeTypesReader.setPoolSize(MimeTypesReader.java:417)
> at
> org.apache.tika.mime.MimeTypesReader.<clinit>(MimeTypesReader.java:117)
> at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:69)
> at
> org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:100)
> at
> org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:189)
> at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:604)
> at
> org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:83)
> at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:257)
> at
> org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:422)
> at
> org.apache.tika.parser.AutoDetectParser.<init>(AutoDetectParser.java:55){noformat}
> {noformat}
> CAUSE OF CAUSE OF CAUSE:
> org.xml.sax.SAXNotRecognizedException:
> http://javax.xml.XMLConstants/feature/secure-processing
> at
> org.apache.harmony.xml.parsers.SAXParserFactoryImpl.setFeature(SAXParserFactoryImpl.java:93)
> at
> org.apache.tika.mime.MimeTypesReader.newSAXParser(MimeTypesReader.java:429)
> at
> org.apache.tika.mime.MimeTypesReader.setPoolSize(MimeTypesReader.java:417)
> at
> org.apache.tika.mime.MimeTypesReader.<clinit>(MimeTypesReader.java:117)
> at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:69)
> at
> org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:100)
> at
> org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:189)
> at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:604)
> at
> org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:83)
> at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:257)
> at
> org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:422)
> at
> org.apache.tika.parser.AutoDetectParser.<init>(AutoDetectParser.java:55)
> {noformat}
>
>
--
This message was sent by Atlassian Jira
(v8.3.4#803005)