[ 
https://issues.apache.org/jira/browse/TIKA-3392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17342626#comment-17342626
 ] 

Andrei Dobrescu edited comment on TIKA-3392 at 5/11/21, 3:07 PM:
-----------------------------------------------------------------

I did a bit of research before posting this issue. Thing is:
 - All Android apps do bundle all their dependent libraries. So if in app A1 
you import library L1 with version V1 and in app A2 you import library L1 with 
version V2, it will be ok, because the APK file format is containerized.
 - The exception are the classes from the Android SDK. The SDK is the only 
system-level library, common to all apps. The SDK is deeply bundle to the 
Android OS version (so you'll have a version of the SDK for each OS version). 
It contains Java SE classes and Android specific classes, such as the UI 
toolkit. Problem is, when they developed Android, some Genius from Google 
thought it's a good idea to put in the SDK JSON.org, Apache HTTP client, 
org.xml.*,* org.xmlpull. libraries. [You can find the documentation of the SDK 
here|https://developer.android.com/reference/packages]

As you can see, the SDK contains an implementation of org.xml.sax. I can import 
latest Apache Xerces but org.xml.* classes will always resolve to the ones from 
the SDK. The classes from the SDK doesn't support "secure-processing", and 
because of that Tika library will crash.

I can think of 3 solutions to this problem:
 - Guys from Google could update or remove their org.xml.* classes from the 
SDK. This surely won't happen.
 - I can stop using Tika, and start using another mime type detector, such as 
the linux file command: [like 
this|https://stackoverflow.com/a/2227201/11536597]. I could compile the [source 
code|http://www.darwinsys.com/file/] to target Android, then bundle the 
compiled binary into my app.
 - Tika could stop using secure-processing XML feature. Why is it even needed? 
Is it important? Can the library work without it? It basically crashes at 
MimeTypesReader.java:429 / newSaxParser method / 
factory.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true);


was (Author: andob):
I did a bit of research before posting this issue. Thing is:
 - All Android apps do bundle all their dependent libraries. So if in app A1 
you import library L1 with version V1 and in app A2 you import library L1 with 
version V2, it will be ok, because the APK file format is containerized.
 - The exception are the classes from the Android SDK. The SDK is the only 
system-level library, common to all apps. The SDK is deeply bundle to the 
Android OS version (so you'll have a version of the SDK for each OS version). 
It contains Java SE classes and Android specific classes, such as the UI 
toolkit. Problem is, when they developed Android, some Genius from Google 
thought it's a good idea to put in the SDK JSON.org, Apache HTTP client, 
org.xml.*,* org.xmlpull. libraries. [You can find the documen tation of the SDK 
here|https://developer.android.com/reference/packages]

As you can see, the SDK contains an implementation of org.xml.sax. I can import 
latest Apache Xerces but org.xml.* classes will always resolve to the ones from 
the SDK. The classes from the SDK doesn't support "secure-processing", and 
because of that Tika library will crash.

I can think of 3 solutions to this problem:
 - Guys from Google could update or remove their org.xml.* classes from the 
SDK. This surely won't happen.
 - I can stop using Tika, and start using another mime type detector, such as 
the linux file command: [https://stackoverflow.com/a/2227201/11536597|like 
this]. I could compile the [http://www.darwinsys.com/file/|source code] to 
target Android, then bundle the native library.
 - Tika could stop using secure-processing XML feature. Why is it even needed? 
Is it important? Can the library work without it? It basically crashes at 
MimeTypesReader.java:429 / newSaxParser method / 
factory.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true);

> Apache Tika V1.26 doen't work on Android anymore. Issue with org.xml 
> dependencies.
> ----------------------------------------------------------------------------------
>
>                 Key: TIKA-3392
>                 URL: https://issues.apache.org/jira/browse/TIKA-3392
>             Project: Tika
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.26
>         Environment: Android 11
>            Reporter: Andrei Dobrescu
>            Priority: Major
>              Labels: android
>         Attachments: image-2021-05-11-17-53-58-291.png
>
>
> I use Apache Tika on Android in order to detect mime type of varios files:
> Apache Tika V1.10 works fine on Android:
> {code:java}
> implementation 'org.apache.tika:tika-core:1.10'
> {code}
> {code:java}
> val mimeType = file.inputStream().buffered().use { inputStream ->
>     AutoDetectParser().detector .detect(inputStream, Metadata()).toString()
> }
> {code}
> However, Tika V1.26 will crash when trying to detect the mime type:
> {code:java}
> implementation 'org.apache.tika:tika-core:1.10'
> {code}
> {noformat}
> java.lang.ExceptionInInitializerError
>     at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:69)
>     at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:100)
>     at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:189)
>     at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:604)
>     at 
> org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:83)
>     at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:257)
>     at org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:422)
>     at 
> org.apache.tika.parser.AutoDetectParser.<init>(AutoDetectParser.java:55){noformat}
> {noformat}
> CAUSE:
> java.lang.RuntimeException: problem initializing SAXParser pool
>         at 
> org.apache.tika.mime.MimeTypesReader.<clinit>(MimeTypesReader.java:119)
>         at 
> org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:69)
>         at 
> org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:100)
>         at 
> org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:189)
>         at 
> org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:604)
>         at 
> org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:83)
>         at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:257)
>         at 
> org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:422)
>         at 
> org.apache.tika.parser.AutoDetectParser.<init>(AutoDetectParser.java:55){noformat}
> {noformat}
> CAUSE OF CAUSE:
>  org.apache.tika.exception.TikaException: problem creating SAX parser factory
>      at 
> org.apache.tika.mime.MimeTypesReader.newSAXParser(MimeTypesReader.java:433)
>      at 
> org.apache.tika.mime.MimeTypesReader.setPoolSize(MimeTypesReader.java:417)
>      at 
> org.apache.tika.mime.MimeTypesReader.<clinit>(MimeTypesReader.java:117)
>      at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:69)
>      at 
> org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:100)
>      at 
> org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:189)
>      at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:604)
>      at 
> org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:83)
>      at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:257)
>      at 
> org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:422)
>      at 
> org.apache.tika.parser.AutoDetectParser.<init>(AutoDetectParser.java:55){noformat}
> {noformat}
> CAUSE OF CAUSE OF CAUSE:
> org.xml.sax.SAXNotRecognizedException: 
> http://javax.xml.XMLConstants/feature/secure-processing
>      at 
> org.apache.harmony.xml.parsers.SAXParserFactoryImpl.setFeature(SAXParserFactoryImpl.java:93)
>      at 
> org.apache.tika.mime.MimeTypesReader.newSAXParser(MimeTypesReader.java:429)
>      at 
> org.apache.tika.mime.MimeTypesReader.setPoolSize(MimeTypesReader.java:417)
>      at 
> org.apache.tika.mime.MimeTypesReader.<clinit>(MimeTypesReader.java:117)
>      at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:69)
>      at 
> org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:100)
>      at 
> org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:189)
>      at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:604)
>      at 
> org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:83)
>      at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:257)
>      at 
> org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:422)
>      at 
> org.apache.tika.parser.AutoDetectParser.<init>(AutoDetectParser.java:55)
> {noformat}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to