[
https://issues.apache.org/jira/browse/CONNECTORS-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18064411#comment-18064411
]
Piergiorgio Lucidi commented on CONNECTORS-1778:
------------------------------------------------
I think that we could eventually proceed in the following way:
# Add Configuration Flag: Introduce a new specification parameter Ignore
Exception (ignoreException) defaulting to false or true based on preference,
though true provides the requested resilience.
# Update UI: Add a checkbox in the "Field Mapping" tab of the Tika Service
transformation to allow users to toggle this behavior.
# Modify Error Handling: Currently, IOException (communication failure) and
503 (Tika restarting) trigger a ServiceInterruption. With the flag enabled,
these errors will instead be caught, logged as a document-level rejection
(DOCUMENTSTATUS_REJECTED), and allow the job to proceed to the next document.
What do you think [~mbiso] ?
Please let us know.
> Error: Repeated service interruptions - failure processing document: The
> target server failed to respond
> --------------------------------------------------------------------------------------------------------
>
> Key: CONNECTORS-1778
> URL: https://issues.apache.org/jira/browse/CONNECTORS-1778
> Project: ManifoldCF
> Issue Type: Bug
> Components: Tika extractor
> Affects Versions: ManifoldCF 2.28
> Reporter: mbiso
> Assignee: Piergiorgio Lucidi
> Priority: Major
> Attachments: ErrorManifoldCF.jpg
>
>
> Hi.
> I have a job ingesting a windows network share.
> It use tika server (standalone)
> There are many errors on Tika because some files cause error like:
>
> {code:java}
> ERROR [qtp131037934-61] 10:44:03,903
> org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler Failed to parse SST
> index '1356 '
> java.lang.NumberFormatException: For input string: "1356 " {code}
> The errors cause a restart of a child tika process, and this is reported like
> an interruption in the ManifoldCF job.
> It ends with the message: "Error: Repeated service interruptions - failure
> processing document: The target server failed to respond"
>
> How could I get over this issue?
> I have opened an issue [TIKA-4494 ] on Tika as well, but It could be a right
> behaviour on Tika: many errors cause a restart child process, so this is a
> problem for me.
>
> Any suggestion?
> Thanks a lot.
> Mario Bisonti
--
This message was sent by Atlassian Jira
(v8.20.10#820010)