I would say that you could determine a row that gives a bad URL, and then
run it in DIH admin interface (or the command-line) with "debug" enabled
The url parameter going into tika should be present in its transformed form
before the next entity gets going. This works in a similar scenario for
me
all caveat: I have
not tried this with "standalone" server or with any SOLR type project.
Cheers!Steve
> From: teag...@insystechinc.com
> To: solr-user@lucene.apache.org
> Subject: RE: Tika HTTP 400 Errors with DIH
> Date: Fri, 5 Dec 2014 12:03:23 -0500
>
> Alex,
&g
t of the URL correct? Is there some other setting I've
missed?
I appreciate the suggestions!
-Teague
-Original Message-
From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
Sent: Thursday, December 04, 2014 12:22 PM
To: solr-user
Subject: Re: Tika HTTP 400 Errors with DIH
R
or.
>>>
>>> Any ideas?
>>>
>>> Thanks!
>>> -Teague
>>> -Original Message-
>>> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
>>> Sent: Tuesday, December 02, 2014 1:45 PM
>>> To: solr-user
>>>
gue
>> -Original Message-
>> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
>> Sent: Tuesday, December 02, 2014 1:45 PM
>> To: solr-user
>> Subject: Re: Tika HTTP 400 Errors with DIH
>>
>> On 2 December 2014 at 13:19, Teague James wrote:
&g
; From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
> Sent: Tuesday, December 02, 2014 1:45 PM
> To: solr-user
> Subject: Re: Tika HTTP 400 Errors with DIH
>
> On 2 December 2014 at 13:19, Teague James wrote:
>> clob="true"
>
> What does ClobTransformer
Alexandre Rafalovitch [mailto:arafa...@gmail.com]
Sent: Tuesday, December 02, 2014 1:45 PM
To: solr-user
Subject: Re: Tika HTTP 400 Errors with DIH
On 2 December 2014 at 13:19, Teague James wrote:
> clob="true"
What does ClobTransformer is doing on the DownloadURL field? Is it possibl
On 2 December 2014 at 13:19, Teague James wrote:
> clob="true"
What does ClobTransformer is doing on the DownloadURL field? Is it
possible it is corrupting the value somehow?
Regards,
Alex.
Personal: http://www.outerthoughts.com/ and @arafalov
Solr resources and newsletter: http://www.solr-s
Hi all,
I am using Solr 4.9.0 to index a DB with DIH. In the DB there is a URL
field. In the DIH Tika uses that field to fetch and parse the documents. The
URL from the field is valid and will download the document in the browser
just fine. But Tika is getting HTTP response code 400. Any ideas why