[ 
https://issues.apache.org/jira/browse/CONNECTORS-1778?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=18064411#comment-18064411
 ] 

Piergiorgio Lucidi commented on CONNECTORS-1778:
------------------------------------------------

I think that we could eventually proceed in the following way:
 # Add Configuration Flag: Introduce a new specification parameter Ignore 
Exception (ignoreException) defaulting to false or true based on preference, 
though true provides the requested resilience.
 # Update UI: Add a checkbox in the "Field Mapping" tab of the Tika Service 
transformation to allow users to toggle this behavior.
 # Modify Error Handling: Currently, IOException (communication failure) and 
503 (Tika restarting) trigger a ServiceInterruption. With the flag enabled, 
these errors will instead be caught, logged as a document-level rejection 
(DOCUMENTSTATUS_REJECTED), and allow the job to proceed to the next document.

What do you think [~mbiso] ?

Please let us know.

> Error: Repeated service interruptions - failure processing document: The 
> target server failed to respond
> --------------------------------------------------------------------------------------------------------
>
>                 Key: CONNECTORS-1778
>                 URL: https://issues.apache.org/jira/browse/CONNECTORS-1778
>             Project: ManifoldCF
>          Issue Type: Bug
>          Components: Tika extractor
>    Affects Versions: ManifoldCF 2.28
>            Reporter: mbiso
>            Assignee: Piergiorgio Lucidi
>            Priority: Major
>         Attachments: ErrorManifoldCF.jpg
>
>
> Hi.
> I have a job ingesting a windows network share.
> It use tika server (standalone)
> There are many errors on Tika because some files cause error like:
>  
> {code:java}
> ERROR [qtp131037934-61] 10:44:03,903 
> org.apache.poi.xssf.eventusermodel.XSSFSheetXMLHandler Failed to parse SST 
> index '1356 '
> java.lang.NumberFormatException: For input string: "1356 " {code}
> The errors cause a restart of a child tika process, and this is reported like 
> an interruption in the ManifoldCF job.
> It ends with the message: "Error: Repeated service interruptions - failure 
> processing document: The target server failed to respond"
>  
> How could I get over this issue?
> I have opened an issue [TIKA-4494 ] on Tika as well,  but It could be a right 
> behaviour on Tika: many errors cause a restart child process, so this is a 
> problem for me.
>  
> Any suggestion?
> Thanks a lot.
> Mario Bisonti



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

Reply via email to