[
https://issues.apache.org/jira/browse/TIKA-3864?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17612827#comment-17612827
]
Hudson commented on TIKA-3864:
------------------------------
UNSTABLE: Integrated in Jenkins build Tika » tika-main-jdk8 #832 (See
[https://ci-builds.apache.org/job/Tika/job/tika-main-jdk8/832/])
TIKA-3864 - url decode fetchkey when sent in via a header. (tallison:
[https://github.com/apache/tika/commit/89c6b7229d481eba9d4afcad40456796d27b304f])
* (edit) CHANGES.txt
* (edit)
tika-server/tika-server-core/src/main/java/org/apache/tika/server/core/FetcherStreamFactory.java
TIKA-3864 - url decode fetchkey when sent in via a header -- backdoor for
legacy behavior. (tallison:
[https://github.com/apache/tika/commit/3dfd7f72388ae8ca950e6b37bc1e09146c663717])
* (edit)
tika-server/tika-server-core/src/main/java/org/apache/tika/server/core/FetcherStreamFactory.java
> Non-ascii UTF-8 characters in fetchKey not working with FileSystemFetcher
> -------------------------------------------------------------------------
>
> Key: TIKA-3864
> URL: https://issues.apache.org/jira/browse/TIKA-3864
> Project: Tika
> Issue Type: Bug
> Components: tika-pipes, tika-server
> Affects Versions: 2.4.1
> Environment: debian:bullseye docker container running
> tika-server-standard-2.4.1jar
> Reporter: Tong Wang
> Priority: Major
> Fix For: 2.5.1
>
>
> When use FileSystemFetcher, if there is non-ascii characters in fetchKey,
> Tika Server throws exception because the file name is incorrect. Here is an
> example:
> {code:java}
> curl -v -X PUT http://tika:9998/rmeta/text --header "fetcherName: restricted"
> --header "fetchKey: 中文.txt" {code}
> I get java.nio.file.NoSuchFileException:
> {code:java}
> Caused by: java.nio.file.NoSuchFileException: /restricted/䏿.txt at
> java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92)
> at
> java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
> at
> java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116)
> at java.base/sun.nio.fs.UnixPath.toRealPath(UnixPath.java:860) at
> org.apache.tika.pipes.fetcher.fs.FileSystemFetcher.fetch(FileSystemFetcher.java:64)
> at
> org.apache.tika.server.core.FetcherStreamFactory.getInputStream(FetcherStreamFactory.java:90)
> at
> org.apache.tika.server.core.resource.TikaResource.getInputStream(TikaResource.java:159)
> {code}
>
> When I try to quote the characters:
> {code:java}
> curl -v -X PUT http://tika:9998/rmeta/text --header "fetcherName: restricted"
> --header "fetchKey: %E4%B8%AD%E6%96%87.txt" {code}
> I still get a java.nio.file.NoSuchFileException:
> {code:java}
> Caused by: java.nio.file.NoSuchFileException:
> /restricted/%E4%B8%AD%E6%96%87.txt at
> java.base/sun.nio.fs.UnixException.translateToIOException(UnixException.java:92)
> at
> java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:111)
> at
> java.base/sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:116)
> at java.base/sun.nio.fs.UnixPath.toRealPath(UnixPath.java:860) at
> org.apache.tika.pipes.fetcher.fs.FileSystemFetcher.fetch(FileSystemFetcher.java:64)
> at
> org.apache.tika.server.core.FetcherStreamFactory.getInputStream(FetcherStreamFactory.java:90)
> at
> org.apache.tika.server.core.resource.TikaResource.getInputStream(TikaResource.java:159){code}
> BTW, locale is set to C.UTF-8 on Tika Server:
> {code:java}
> # locale
> LANG=C.UTF-8
> LANGUAGE=
> LC_CTYPE="C.UTF-8"
> LC_NUMERIC="C.UTF-8"
> LC_TIME="C.UTF-8"
> LC_COLLATE="C.UTF-8"
> LC_MONETARY="C.UTF-8"
> LC_MESSAGES="C.UTF-8"
> LC_PAPER="C.UTF-8"
> LC_NAME="C.UTF-8"
> LC_ADDRESS="C.UTF-8"
> LC_TELEPHONE="C.UTF-8"
> LC_MEASUREMENT="C.UTF-8"
> LC_IDENTIFICATION="C.UTF-8"
> LC_ALL= {code}
>
--
This message was sent by Atlassian Jira
(v8.20.10#820010)