[
https://issues.apache.org/jira/browse/TIKA-1706?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Yaniv Kunda updated TIKA-1706:
------------------------------
Attachment: TIKA-1706-2.patch
TIKA-1706-1.patch
A proposed patch per [~grossws]'s suggestion from the dev mailing list -
The first patch contains the following:
- creation of the secondary jar using maven-shade-plugin:
-- used the *uber* classifier using <shadedClassifierName>
alternatives: shaded, nodep, all, etc.
Which one is best?
-- commons-io shaded under
{{shaded.commons-io.$\{commons.io.version\}.org.apache.commons.io}} to avoid
potential conflicts with other commons-io-shading dependencies e.g. as in
org.ops4j.pax.url:pax-url-aether:2.3.0
-- automatic removal of unused classes using <minimizeJar>
- deprecated all classes that were copied from commons-io and modified them to
extend their new counterparts
- deprecated all constructors
- removed all identical or functionally identical methods
- modified all remaining methods to call alternative existing jdk/commons-io
methods, deprecated them and refered to the used alternatives
_*Note: this was done only in IOUtils, where many methods that has the same
signature as the ones in commons-io were modified along the way to use UTF-8
instead of the platform default._
- all things should remain backward-compatible, except one:
org.apache.tika.io.TaggedIOException(IOException, Object) will now throw a
ClassCastException if the Object is not Serializable
The second patch contains trivial import changes in tika-core from
org.apache.tika.io to org.apache.commons.io
> Bring back commons-io to tika-core
> ----------------------------------
>
> Key: TIKA-1706
> URL: https://issues.apache.org/jira/browse/TIKA-1706
> Project: Tika
> Issue Type: Improvement
> Components: core
> Reporter: Yaniv Kunda
> Priority: Minor
> Fix For: 1.11
>
> Attachments: TIKA-1706-1.patch, TIKA-1706-2.patch
>
>
> TIKA-249 inlined select commons-io classes in order to simplify the
> dependency tree and save some space.
> I believe these arguments are weaker nowadays due to the following concerns:
> - Most of the non-core modules already use commons-io, and since tika-core is
> usually not used by itself, commons-io is already included with it
> - Since some modules use both tika-core and commons-io, it's not clear which
> code should be used
> - Having the inlined classes causes more maintenance and/or technology debt
> (which in turn causes more maintenance)
> - Newer commons-io code utilizes newer platform code, e.g. using Charset
> objects instead of encoding names, being able to use StringBuilder instead of
> StringBuffer, and so on.
> I'll be happy to provide a patch to replace usages of the inlined classes
> with commons-io classes if this is accepted.
--
This message was sent by Atlassian JIRA
(v6.3.4#6332)