Alex,

Your suggestion might be a solution, but the issue isn't that the resource 
isn't found. Like Walter said 400 is a "bad request" which makes me wonder, 
what is the DIH/Tika doing when trying to access the documents? What is the 
"request" that is bad? Is there any other way to suss this out? Placing a 
network monitor in this case would be on the extreme end of difficult.

I know that the URL stored is good and that the resource exists by copying it 
out of a Solr query and pasting it into the browser, so that eliminates 404 and 
500 errors. Is the format of the URL correct? Is there some other setting I've 
missed?

I appreciate the suggestions!

-Teague


-----Original Message-----
From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] 
Sent: Thursday, December 04, 2014 12:22 PM
To: solr-user
Subject: Re: Tika HTTP 400 Errors with DIH

Right. Resource not found (on server).

The end result is the same. If it works in the browser but not from the 
application than either not the same URL is being requested or - somehow - not 
even the same server.

The solution (watching network traffic) is still the same, right?

Regards,
   Alex.
Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and 
newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers 
community: https://www.linkedin.com/groups?gid=6713853


On 4 December 2014 at 11:51, Walter Underwood <wun...@wunderwood.org> wrote:
> No, 400 should mean that the request was bad. When the server fails, that is 
> a 500.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/
>
>
> On Dec 4, 2014, at 8:43 AM, Alexandre Rafalovitch <arafa...@gmail.com> wrote:
>
>> 400 error means something wrong on the server (resource not found).
>> So, it would be useful to see what URL is actually being requested.
>>
>> Can you run some sort of network tracer to see the actual network 
>> request (dtrace, Wireshark, etc)? That will dissect the problem into 
>> half for you.
>>
>> Regards,
>>   Alex.
>> Personal: http://www.outerthoughts.com/ and @arafalov Solr resources 
>> and newsletter: http://www.solr-start.com/ and @solrstart Solr 
>> popularizers community: https://www.linkedin.com/groups?gid=6713853
>>
>>
>> On 4 December 2014 at 09:42, Teague James <teag...@insystechinc.com> wrote:
>>> The database stores the URL as a CLOB. Querying Solr shows that the field 
>>> value is "http://www.someaddress.com/documents/document1.docx";
>>> The URL works if I copy and paste it to the browser, but Tika gets a 400 
>>> error.
>>>
>>> Any ideas?
>>>
>>> Thanks!
>>> -Teague
>>> -----Original Message-----
>>> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com]
>>> Sent: Tuesday, December 02, 2014 1:45 PM
>>> To: solr-user
>>> Subject: Re: Tika HTTP 400 Errors with DIH
>>>
>>> On 2 December 2014 at 13:19, Teague James <teag...@insystechinc.com> wrote:
>>>> clob="true"
>>>
>>> What does ClobTransformer is doing on the DownloadURL field? Is it possible 
>>> it is corrupting the value somehow?
>>>
>>> Regards,
>>>   Alex.
>>>
>>> Personal: http://www.outerthoughts.com/ and @arafalov Solr resources 
>>> and newsletter: http://www.solr-start.com/ and @solrstart Solr 
>>> popularizers community: https://www.linkedin.com/groups?gid=6713853
>>>
>

Reply via email to