[ 
https://issues.apache.org/jira/browse/TIKA-3392?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17342626#comment-17342626
 ] 

Andrei Dobrescu commented on TIKA-3392:
---------------------------------------

I did a bit of research before posting this issue. Thing is:
- All Android apps do bundle all their dependent libraries. So if in app A1 you 
import library L1 with version V1 and in app A2 you import library L1 with 
version V2, it will be ok, because the APK file format is containerized.
- The exception are the classes from the Android SDK. The SDK is the only 
system-level library, common to all apps. It contains Java SE classes, Android 
specific classes, such as the UI toolkit. Problem is, when they developed 
Android, some Genius from Google thought it's a good idea to put in the SDK 
JSON.org, Apache HTTP client, org.xml.*, org.xmlpull.* libraries. [You can find 
the documen tation of the SDK 
here|https://developer.android.com/reference/packages]


As you can see, the SDK contains an implementation of org.xml.sax. I can import 
latest Apache Xerces but org.xml.* classes will always resolve to the ones from 
the SDK. The classes from the SDK doesn't support "secure-processing", and 
because of that Tika library will crash.

I can think of 3 solutions to this problem:
- Guys from Google could update or remove their org.xml.* classes from the SDK. 
This surely won't happen.
- I can stop using Tika, and start using another mime type detector, such as 
the linux file command: [https://stackoverflow.com/a/2227201/11536597|like 
this]. I could compile the [http://www.darwinsys.com/file/|source code] to 
target Android, then bundle the native library.
- Tika could stop using secure-processing XML feature. Why is it even needed? 
Is it important? Can the library work without it? It basically crashes at 
MimeTypesReader.java:429 / newSaxParser method / 
factory.setFeature(XMLConstants.FEATURE_SECURE_PROCESSING, true);

> Apache Tika V1.26 doen't work on Android anymore. Issue with org.xml 
> dependencies.
> ----------------------------------------------------------------------------------
>
>                 Key: TIKA-3392
>                 URL: https://issues.apache.org/jira/browse/TIKA-3392
>             Project: Tika
>          Issue Type: Bug
>          Components: core
>    Affects Versions: 1.26
>         Environment: Android 11
>            Reporter: Andrei Dobrescu
>            Priority: Major
>              Labels: android
>         Attachments: image-2021-05-11-17-53-58-291.png
>
>
> I use Apache Tika on Android in order to detect mime type of varios files:
> Apache Tika V1.10 works fine on Android:
> {code:java}
> implementation 'org.apache.tika:tika-core:1.10'
> {code}
> {code:java}
> val mimeType = file.inputStream().buffered().use { inputStream ->
>     AutoDetectParser().detector .detect(inputStream, Metadata()).toString()
> }
> {code}
> However, Tika V1.26 will crash when trying to detect the mime type:
> {code:java}
> implementation 'org.apache.tika:tika-core:1.10'
> {code}
> {noformat}
> java.lang.ExceptionInInitializerError
>     at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:69)
>     at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:100)
>     at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:189)
>     at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:604)
>     at 
> org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:83)
>     at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:257)
>     at org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:422)
>     at 
> org.apache.tika.parser.AutoDetectParser.<init>(AutoDetectParser.java:55){noformat}
> {noformat}
> CAUSE:
> java.lang.RuntimeException: problem initializing SAXParser pool
>         at 
> org.apache.tika.mime.MimeTypesReader.<clinit>(MimeTypesReader.java:119)
>         at 
> org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:69)
>         at 
> org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:100)
>         at 
> org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:189)
>         at 
> org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:604)
>         at 
> org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:83)
>         at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:257)
>         at 
> org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:422)
>         at 
> org.apache.tika.parser.AutoDetectParser.<init>(AutoDetectParser.java:55){noformat}
> {noformat}
> CAUSE OF CAUSE:
>  org.apache.tika.exception.TikaException: problem creating SAX parser factory
>      at 
> org.apache.tika.mime.MimeTypesReader.newSAXParser(MimeTypesReader.java:433)
>      at 
> org.apache.tika.mime.MimeTypesReader.setPoolSize(MimeTypesReader.java:417)
>      at 
> org.apache.tika.mime.MimeTypesReader.<clinit>(MimeTypesReader.java:117)
>      at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:69)
>      at 
> org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:100)
>      at 
> org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:189)
>      at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:604)
>      at 
> org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:83)
>      at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:257)
>      at 
> org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:422)
>      at 
> org.apache.tika.parser.AutoDetectParser.<init>(AutoDetectParser.java:55){noformat}
> {noformat}
> CAUSE OF CAUSE OF CAUSE:
> org.xml.sax.SAXNotRecognizedException: 
> http://javax.xml.XMLConstants/feature/secure-processing
>      at 
> org.apache.harmony.xml.parsers.SAXParserFactoryImpl.setFeature(SAXParserFactoryImpl.java:93)
>      at 
> org.apache.tika.mime.MimeTypesReader.newSAXParser(MimeTypesReader.java:429)
>      at 
> org.apache.tika.mime.MimeTypesReader.setPoolSize(MimeTypesReader.java:417)
>      at 
> org.apache.tika.mime.MimeTypesReader.<clinit>(MimeTypesReader.java:117)
>      at org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:69)
>      at 
> org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:100)
>      at 
> org.apache.tika.mime.MimeTypesFactory.create(MimeTypesFactory.java:189)
>      at org.apache.tika.mime.MimeTypes.getDefaultMimeTypes(MimeTypes.java:604)
>      at 
> org.apache.tika.config.TikaConfig.getDefaultMimeTypes(TikaConfig.java:83)
>      at org.apache.tika.config.TikaConfig.<init>(TikaConfig.java:257)
>      at 
> org.apache.tika.config.TikaConfig.getDefaultConfig(TikaConfig.java:422)
>      at 
> org.apache.tika.parser.AutoDetectParser.<init>(AutoDetectParser.java:55)
> {noformat}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to