[jira] [Commented] (SOLR-7632) Change the ExtractingRequestHandler to use Tika-Server

Alexandre Rafalovitch (Jira) Fri, 04 Sep 2020 07:29:05 -0700


    [ 
https://issues.apache.org/jira/browse/SOLR-7632?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17190751#comment-17190751
 ]


Alexandre Rafalovitch commented on SOLR-7632:
---------------------------------------------

I agree on the critical path. I was just wondering whether, given the number of 
internal changes and explanations required on release, it makes sense to also 
make it into a more flexible architecture on the Solr side.

Making it URP, I think would allow to compose it with other pipeline elements 
in different order (e.g. preprocess file name, feed to Tika, apply DateParser), 
or possibly even distribute the load by running it on each node, instead of as 
first step. But that's just an idea. If others do not see the benefits, it is 
not worth chasing.

> Change the ExtractingRequestHandler to use Tika-Server
> ------------------------------------------------------
>
>                 Key: SOLR-7632
>                 URL: https://issues.apache.org/jira/browse/SOLR-7632
>             Project: Solr
>          Issue Type: Improvement
>          Components: contrib - Solr Cell (Tika extraction)
>            Reporter: Chris A. Mattmann
>            Priority: Major
>              Labels: gsoc2017, memex
>
> It's a pain to upgrade Tika's jars all the times when we release, and if Tika 
> fails it messes up the ExtractingRequestHandler (e.g., the document type 
> caused Tika to fail, etc). A more reliable way and also separated, and easier 
> to deploy version of the ExtractingRequestHandler would make a network call 
> to the Tika JAXRS server, and then call Tika on the Solr server side, get the 
> results and then index the information that way. I have a patch in the works 
> from the DARPA Memex project and I hope to post it soon.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (SOLR-7632) Change the ExtractingRequestHandler to use Tika-Server

Reply via email to