Re: Tika HTTP 400 Errors with DIH

2014-12-08 Thread Dan Davis
I would say that you could determine a row that gives a bad URL, and then run it in DIH admin interface (or the command-line) with "debug" enabled The url parameter going into tika should be present in its transformed form before the next entity gets going. This works in a similar scenario for me

RE: Tika HTTP 400 Errors with DIH

2014-12-05 Thread steve
all caveat: I have not tried this with "standalone" server or with any SOLR type project. Cheers!Steve > From: teag...@insystechinc.com > To: solr-user@lucene.apache.org > Subject: RE: Tika HTTP 400 Errors with DIH > Date: Fri, 5 Dec 2014 12:03:23 -0500 > > Alex, &g

RE: Tika HTTP 400 Errors with DIH

2014-12-05 Thread Teague James
t of the URL correct? Is there some other setting I've missed? I appreciate the suggestions! -Teague -Original Message- From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] Sent: Thursday, December 04, 2014 12:22 PM To: solr-user Subject: Re: Tika HTTP 400 Errors with DIH R

Re: Tika HTTP 400 Errors with DIH

2014-12-04 Thread Alexandre Rafalovitch
or. >>> >>> Any ideas? >>> >>> Thanks! >>> -Teague >>> -Original Message- >>> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] >>> Sent: Tuesday, December 02, 2014 1:45 PM >>> To: solr-user >>>

Re: Tika HTTP 400 Errors with DIH

2014-12-04 Thread Walter Underwood
gue >> -Original Message- >> From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] >> Sent: Tuesday, December 02, 2014 1:45 PM >> To: solr-user >> Subject: Re: Tika HTTP 400 Errors with DIH >> >> On 2 December 2014 at 13:19, Teague James wrote: &g

Re: Tika HTTP 400 Errors with DIH

2014-12-04 Thread Alexandre Rafalovitch
; From: Alexandre Rafalovitch [mailto:arafa...@gmail.com] > Sent: Tuesday, December 02, 2014 1:45 PM > To: solr-user > Subject: Re: Tika HTTP 400 Errors with DIH > > On 2 December 2014 at 13:19, Teague James wrote: >> clob="true" > > What does ClobTransformer

RE: Tika HTTP 400 Errors with DIH

2014-12-04 Thread Teague James
Alexandre Rafalovitch [mailto:arafa...@gmail.com] Sent: Tuesday, December 02, 2014 1:45 PM To: solr-user Subject: Re: Tika HTTP 400 Errors with DIH On 2 December 2014 at 13:19, Teague James wrote: > clob="true" What does ClobTransformer is doing on the DownloadURL field? Is it possibl

Re: Tika HTTP 400 Errors with DIH

2014-12-02 Thread Alexandre Rafalovitch
On 2 December 2014 at 13:19, Teague James wrote: > clob="true" What does ClobTransformer is doing on the DownloadURL field? Is it possible it is corrupting the value somehow? Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-s

Tika HTTP 400 Errors with DIH

2014-12-02 Thread Teague James
Hi all, I am using Solr 4.9.0 to index a DB with DIH. In the DB there is a URL field. In the DIH Tika uses that field to fetch and parse the documents. The URL from the field is valid and will download the document in the browser just fine. But Tika is getting HTTP response code 400. Any ideas why