Alex, Your suggestion might be a solution, but the issue isn't that the resource isn't found. Like Walter said 400 is a "bad request" which makes me wonder, what is the DIH/Tika doing when trying to access the documents? What is the "request" that is bad? Is there any other way to suss this out? Placing a network monitor in this case would be on the extreme end of difficult.
I know that the URL stored is good and that the resource exists by copying it out of a Solr query and pasting it into the browser, so that eliminates 404 and 500 errors. Is the format of the URL correct? Is there some other setting I've missed? I appreciate the suggestions! -Teague -----Original Message----- From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] Sent: Thursday, December 04, 2014 12:22 PM To: solr-user Subject: Re: Tika HTTP 400 Errors with DIH Right. Resource not found (on server). The end result is the same. If it works in the browser but not from the application than either not the same URL is being requested or - somehow - not even the same server. The solution (watching network traffic) is still the same, right? Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On 4 December 2014 at 11:51, Walter Underwood <wun...@wunderwood.org> wrote: > No, 400 should mean that the request was bad. When the server fails, that is > a 500. > > wunder > Walter Underwood > wun...@wunderwood.org > http://observer.wunderwood.org/ > > > On Dec 4, 2014, at 8:43 AM, Alexandre Rafalovitch <arafa...@gmail.com> wrote: > >> 400 error means something wrong on the server (resource not found). >> So, it would be useful to see what URL is actually being requested. >> >> Can you run some sort of network tracer to see the actual network >> request (dtrace, Wireshark, etc)? That will dissect the problem into >> half for you. >> >> Regards, >> Alex. >> Personal: http://www.outerthoughts.com/ and @arafalov Solr resources >> and newsletter: http://www.solr-start.com/ and @solrstart Solr >> popularizers community: https://www.linkedin.com/groups?gid=6713853 >> >> >> On 4 December 2014 at 09:42, Teague James <teag...@insystechinc.com> wrote: >>> The database stores the URL as a CLOB. Querying Solr shows that the field >>> value is "http://www.someaddress.com/documents/document1.docx" >>> The URL works if I copy and paste it to the browser, but Tika gets a 400 >>> error. >>> >>> Any ideas? >>> >>> Thanks! >>> -Teague >>> -----Original Message----- >>> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] >>> Sent: Tuesday, December 02, 2014 1:45 PM >>> To: solr-user >>> Subject: Re: Tika HTTP 400 Errors with DIH >>> >>> On 2 December 2014 at 13:19, Teague James <teag...@insystechinc.com> wrote: >>>> clob="true" >>> >>> What does ClobTransformer is doing on the DownloadURL field? Is it possible >>> it is corrupting the value somehow? >>> >>> Regards, >>> Alex. >>> >>> Personal: http://www.outerthoughts.com/ and @arafalov Solr resources >>> and newsletter: http://www.solr-start.com/ and @solrstart Solr >>> popularizers community: https://www.linkedin.com/groups?gid=6713853 >>> >