Hi all, I am using Solr 4.9.0 to index a DB with DIH. In the DB there is a URL field. In the DIH Tika uses that field to fetch and parse the documents. The URL from the field is valid and will download the document in the browser just fine. But Tika is getting HTTP response code 400. Any ideas why?
ERROR BinURLDataSource java.io.IOException: Server returned HTTP response code: 400 for URL: EntityProcessorWrapper Exception in entity : tika_content:org.apache.solr.handler.dataimport.DataImportHandlerException: Exception in invoking url DIH <dataConfig> <dataSource type="JdbcDataSource" name="ds-1" driver="net.sourceforge.jtds.jdbc.Driver" url="jdbc:jtds:sqlserver://1.2.3.4/database;instance=INSTANCE;user=USER;pass word=PASSWORD" /> <dataSource type="BinURLDataSource" name="ds-2" /> <document> <entity name="db_content" dataSource="ds-1" transformer="ClobTransformer, RegexTransformer" query="SELECT ContentID, DownloadURL FROM DATABASE.VIEW <field column="ContentID" name="id" /> <field column="DownloadURL" clob="true" name="DownloadURL" /> <entity name="tika_content" processor="TikaEntityProcessor" url="${db_content.DownloadURL}" onError="continue" dataSource="ds-2"> <field column="TikaParsedContent" /> </entity> </entity> </document> </dataConfig> SCHEMA - Fields <field name="DownloadURL" type="string" indexed="true" stored="true" /> <field name="TikaParsedContent" type="text_general" indexed="true" stored="true" multiValued="true"/>